480127 – memory leak in rpmlib

Bug 480127 - memory leak in rpmlib

Summary: memory leak in rpmlib

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	rpm
Sub Component:
Version:	5.2
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Panu Matilainen
QA Contact:	Petr Sklenar
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	479640
TreeView+	depends on / blocked

Reported:	2009-01-15 09:36 UTC by Miroslav Suchý
Modified:	2009-09-23 12:01 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2009-09-23 12:01:35 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Miroslav Suchý 2009-01-15 09:36:47 UTC

Description of problem:
in the /lib/misc.c within routine rpmHeaderGetEntry (which is commonly seen in the core files from Signal 6 core files):

        /* XXX FIXME: memory leak. */
        msgstr = headerSprintf(h, fmt, rpmTagTable, rpmHeaderFormats,
&errstr);
        if (msgstr) {
            *p = (void *) msgstr;
            if (type)   *type = RPM_STRING_TYPE;
            if (c)      *c = 1;
            return 1;
        } else {
            if (c)      *c = 0;
            return 0;
        }

I check last rpm (rpm-4.4.2-48) and it seems that thich code is still there. According our findings (see BZ 173424) this leak cause problem in RHN Satellite in long run.

Version-Release number of selected component (if applicable):
rpm-4.4.2-48

How reproducible:
hardly - see 173424 for details

Steps to Reproduce:
1. install rhn satellite with package specspo
2. put it under high load
3. try to rhnpush some package several times
4. rpmlib start to leaks and it will result in seg faults of httpd.
  
Actual results:
seg faults of httpd

Expected results:
rhnlib not leaks

Comment 1 Panu Matilainen 2009-01-16 14:05:16 UTC

AFAICT the memleak comment in rpmHeaderGetEntry() refers to the fact that for summary, description and group it returns a malloced string of RPM_STRING_TYPE which headerFreeData() doesn't free. The python bindings "know" this funky little detail and take care of it, and at least I'm not able to reproduce leakage from that.
 
Looking at the dumps in bug 173424, it seems to me more like setenv() related memory corruption, not leak. The rpm tag translation fiddles LANGUAGE environment variable back and forth for each translated item, and increments _nl_msg_cat_cntr on each change. On a very busy box, I could imagine _nl_msg_cat_cntr possibly wrapping around and maybe something can't handle that - I dunno, that's just a wild guess but there's all sorts of things piled up in here, for example perl doing something in this area:

==4465== Invalid free() / delete / delete[]
==4465==    at 0x1B8FF382: free (vg_replace_malloc.c:235)
==4465==    by 0x1BFFA0DE: Perl_safesysfree (in
/usr/lib/perl5/5.8.5/i386-linux-thread-multi/CORE/libperl.so)
==4465==    by 0x1BFFDC07: Perl_my_setenv (in
/usr/lib/perl5/5.8.5/i386-linux-thread-multi/CORE/libperl.so)
==4465==    by 0x1BDF9C0A: mod_perl_pass_env (perl_config.c:207)

To put it another way, obviously the rpm translation code is causing problems (the way it works is pretty wicked), but whether that's the bug or is it just triggering problems elsewhere is not that clear.

In the meanwhile, there's a much less intrusive way to disable the translations than having spacewalk conflict with specspo:

rpm.delMacro("_i18ndomains")

I'm not familiar with spacewalk codebase so can't suggest where exactly to put it, but somewhere after rpm module has been loaded is will do.

Comment 2 Panu Matilainen 2009-01-23 17:30:18 UTC

FWIW, perl's environment handling seems to be somewhat controversial. Possibly related:

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=142523, for which the upstream report is here: http://rt.perl.org/rt3/Public/Bug/Display.html?id=1170

Also mod_perl has some interesting commentary:
     /* Force the environment to be copied out of its original location
        above argv[].  This fixes a crash caused when a module called putenv()
        before any Perl modified the environment - environ would change to a
        new value, and the check in my_setenv() to duplicate the environment
        would fail, and then setting some environment value which had a previous
        value would cause perl to try to free() something from the original env.
        This crashed free(). */
     my_setenv("MODPERL_ENV_FIXUP", "0");
     my_setenv("MODPERL_ENV_FIXUP", NULL);

CC'ing perl maintainer for possible comments.

Comment 3 Marcela Mašláňová 2009-01-26 14:03:26 UTC

I suppose rhn is using mod_perl for httpd? CC'ing mod_perl maintainer for his thoughts.

Comment 4 Miroslav Suchý 2009-01-26 14:55:00 UTC

Yes, we use mod_perl for httpd.
For RHEL5 we use mod_perl from RHEL.
For RHEL4 we pack our own mod_perl (src.rpm taken from Red Hat Web Application Stack) since we need 2.0 and plain RHEL4 has 1.99

Comment 5 Joe Orton 2009-01-28 14:48:54 UTC

The env var handling in mod_perl 1.x looks like serious voodoo too me.  Do you have some PerlPassEnv configured here?

There's lots I don't understand here.

1) this is reported against RHEL5 but bug 173424 seems to be talking about RHEL4/3 only.  Is this problem reproducible on RHEL5 at all?  With the RHEL5 httpd/mod_perl stack?  mod_perl 1.x is *way* different from 2.x.

2) The report says:

"rpmlib start to leaks and it will result in seg faults of httpd."

is this two separate problems?  A memory leak, and an unrelated crash?  A memory leak which is leading to OOM and hence httpd crashing?  Or do you not mean "start to leaks" but "starts to corrupt memory", or what?  Or is it just conjecture that rpmlib is involved?

Comment 6 Miroslav Suchý 2009-01-28 15:43:22 UTC

> Is this problem reproducible on RHEL5 at all?
I will try to reproduce it for RHEL5. I will try to find time for this next week.

> A memory leak, and an unrelated crash?  A
memory leak which is leading to OOM and hence httpd crashing?  Or do you not
mean "start to leaks" but "starts to corrupt memory", or what?

I think it was OOM crash. But ping cperry who has been working on that issue to clarify. Cliff?

Comment 7 Jan Pazdziora (Red Hat) 2009-02-09 13:12:42 UTC

(In reply to comment #6)
> > Is this problem reproducible on RHEL5 at all?
> I will try to reproduce it for RHEL5. I will try to find time for this next
> week.

Mirek, what's the status about getting reproducer for this?

If we do not have the reproducer, I intend to just ask QA to start testing Satellite with specspo installed, and hopefully their automation tests will be able to get some reproducer for us. Or not, in which case the problem simply does not materialize with latest Apache / rpm / whatever.

But I do not like the fact that we are removing the specspo in install.pl, never giving QA a chance to the thing.

Comment 8 Denise Dumas 2009-03-13 18:16:22 UTC

Since no agreement on the problem or the fix, this is moving out for consideration in 5.5

Comment 9 Clifford Perry 2009-03-16 18:09:24 UTC

I will be honest in saying that I have not looked at this issue for over 2+ years, since we just removed specspo from the OS of Satellite systems to stop the Apache 1.3 segmentation faults from occurring. The Sig 11's were happening when mod_python called rpmlib to read rpm headers of rpm's being uploaded into a Satellite via apache. When specspo was installed we did some sort of in memory string translation which *somewhere* messed things up eventually leading to corruption and sig11 of Apache. At the time the current apache and rpm maintainers (over email communications) were unable to provide a solution other than the one I ultimately choose for Satellite. 

I think the specspo would try to re-allocate a region of memory that had not been freed up yet, or buffer overflow (do not remember exactly). Somewhere/somehow things got confused and just crashed :)

While it would be great that new OS (RHEL 3 vs 4/5), new Apache (1.3 vs 2.x) helps, I doubt it. 

If during Mirek's testing he was unable to replicate the issue any more, maybe glibc or either Apache is doing something better, rpm is better or it just disappeared. I would agree with comment #7 in allowing Satellite QE time to test with specspo installed and seeing if the issue is still reproducable. 

Cliff

Comment 10 Jan Pazdziora (Red Hat) 2009-03-23 10:39:00 UTC

I'd like to point out that for QE to test with specspo, we should make it easier for them by not silently removing specspo in install.pl.

In fact, I wonder if any of those

   php|piranha|squirrelmail|specspo

packages listed there pose a problem.

Comment 12 RHEL Program Management 2009-09-23 12:01:35 UTC

Development Management has reviewed and declined this request.  You may appeal
this decision by reopening this request.

Note You need to log in before you can comment on or make changes to this bug.