Bug 33110

Summary: showstopper: endless(?) malloc frenzy in rpm-4.0.2
Product: [Retired] Red Hat Linux Reporter: j. alan eldridge <alane>
Component: rpmAssignee: Jeff Johnson <jbj>
Status: CLOSED WORKSFORME QA Contact: David Lawrence <dkl>
Severity: high Docs Contact:
Priority: high    
Version: 7.0   
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2001-03-25 16:56:37 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description j. alan eldridge 2001-03-25 06:37:29 UTC
this is a showstopper.

[you may wonder where i came up with this pathological monster...
rpmfile.py does this to get all the array variables out of an rpm file.]

here's the command:
/bin/rpm -qp --qf
[%{HEADERIMAGE}+++]@@@[%{HEADERSIGNATURES}+++]@@@[%{HEADERIMMUTABLE}+++]@@@[%{HEADERREGIONS}+++]@@@[%{HEADERI18NTABLE}+++]@@@[%{SIGSIZE}+++]@@@[%{SIGPGP}+++]@@@[%{SIGMD5}+++]@@@[%{SIGGPG}+++]@@@[%{NAME}+++]@@@[%{VERSION}+++]@@@[%{RELEASE}+++]@@@[%{EPOCH}+++]@@@[%{SERIAL}+++]@@@[%{SUMMARY}+++]@@@[%{DESCRIPTION}+++]@@@[%{BUILDTIME}+++]@@@[%{BUILDHOST}+++]@@@[%{INSTALLTIME}+++]@@@[%{SIZE}+++]@@@[%{DISTRIBUTION}+++]@@@[%{VENDOR}+++]@@@[%{GIF}+++]@@@[%{XPM}+++]@@@[%{LICENSE}+++]@@@[%{COPYRIGHT}+++]@@@[%{PACKAGER}+++]@@@[%{GROUP}+++]@@@[%{SOURCE}+++]@@@[%{PATCH}+++]@@@[%{URL}+++]@@@[%{OS}+++]@@@[%{ARCH}+++]@@@[%{PREIN}+++]@@@[%{POSTIN}+++]@@@[%{PREUN}+++]@@@[%{POSTUN}+++]@@@[%{OLDFILENAMES}+++]@@@[%{FILESIZES}+++]@@@[%{FILESTATES}+++]@@@[%{FILEMODES}+++]@@@[%{FILERDEVS}+++]@@@[%{FILEMTIMES}+++]@@@[%{FILEMD5S}+++]@@@[%{FILELINKTOS}+++]@@@[%{FILEFLAGS}+++]@@@[%{ROOT}+++]@@@[%{FILEUSERNAME}+++]@@@[%{FILEGROUPNAME}+++]@@@[%{ICON}+++]@@@[%{SOURCERPM}+++]@@@[%{FILEVERIFYFLAGS}+++]@@@[%{ARCHIVESIZE}+++]@@@[%{PROVIDENAME}+++]@@@[%{PROVIDES}+++]@@@[%{REQUIREFLAGS}+++]@@@[%{REQUIRENAME}+++]@@@[%{REQUIREVERSION}+++]@@@[%{CONFLICTFLAGS}+++]@@@[%{CONFLICTNAME}+++]@@@[%{CONFLICTVERSION}+++]@@@[%{BUILDROOT}+++]@@@[%{EXCLUDEARCH}+++]@@@[%{EXCLUDEOS}+++]@@@[%{EXCLUSIVEARCH}+++]@@@[%{EXCLUSIVEOS}+++]@@@[%{RPMVERSION}+++]@@@[%{TRIGGERSCRIPTS}+++]@@@[%{TRIGGERNAME}+++]@@@[%{TRIGGERVERSION}+++]@@@[%{TRIGGERFLAGS}+++]@@@[%{TRIGGERINDEX}+++]@@@[%{VERIFYSCRIPT}+++]@@@[%{CHANGELOGTIME}+++]@@@[%{CHANGELOGNAME}+++]@@@[%{CHANGELOGTEXT}+++]@@@[%{PREINPROG}+++]@@@[%{POSTINPROG}+++]@@@[%{PREUNPROG}+++]@@@[%{POSTUNPROG}+++]@@@[%{BUILDARCHS}+++]@@@[%{OBSOLETENAME}+++]@@@[%{OBSOLETES}+++]@@@[%{VERIFYSCRIPTPROG}+++]@@@[%{TRIGGERSCRIPTPROG}+++]@@@[%{COOKIE}+++]@@@[%{FILEDEVICES}+++]@@@[%{FILEINODES}+++]@@@[%{FILELANGS}+++]@@@[%{PREFIXES}+++]@@@[%{INSTPREFIXES}+++]@@@[%{OLDORIGFILENAMES}+++]@@@[%{BUILDMACROS}+++]@@@[%{PROVIDEFLAGS}+++]@@@[%{PROVIDEVERSION}+++]@@@[%{OBSOLETEFLAGS}+++]@@@[%{OBSOLETEVERSION}+++]@@@[%{DIRINDEXES}+++]@@@[%{BASENAMES}+++]@@@[%{DIRNAMES}+++]@@@[%{OPTFLAGS}+++]@@@[%{DISTURL}+++]@@@[%{PAYLOADFORMAT}+++]@@@[%{PAYLOADCOMPRESSOR}+++]@@@[%{PAYLOADFLAGS}+++]@@@[%{MULTILIBS}+++]@@@[%{INSTALLTID}+++]@@@[%{REMOVETID}+++]@@@[%{FILENAMES}+++]@@@[%{FSSIZES}+++]@@@[%{FSNAMES}+++]@@@[%{INSTALLPREFIX}+++]@@@[%{TRIGGERCONDS}+++]@@@[%{TRIGGERTYPE}+++]@@@
-vv amaya-4.3-1.i386.rpm


watching this little gem with strace shows that it just starts calling
brk(), over and over and over, and never stops... umm, well, it'll stop
*eventually* when I run out of paging space.

let's look at the output from ps [cue music: "The Torture Never Stops" by
FZ]:

alane     6921  0.0  3.8 12784 9888 pts/6    R    01:27   0:00
/usr/lib/rpm/rp
alane6921 99.9 33.1 88112 85272 pts/6   R    01:27   0:02
/usr/lib/rpm/rp
alane     6921 85.2 53.6 140944 138132 pts/6 R    01:27   0:04
/usr/lib/rpm/rp
alane     6921 87.4 69.4 181648 178864 pts/6 R    01:27   0:06
/usr/lib/rpm/rp
alane     6921 74.6 78.6 205328 202560 pts/6 R    01:27   0:07
/usr/lib/rpm/rp
alane     6921 66.0 81.1 235952 208920 pts/6 R    01:27   0:09
/usr/lib/rpm/rp
alane     6921 59.7 81.0 259600 208740 pts/6 R    01:27   0:10
/usr/lib/rpm/rp
alane     6921 56.9 81.1 276688 208912 pts/6 R    01:27   0:11
/usr/lib/rpm/rp
alane     6921 58.0 80.9 296208 208456 pts/6 R    01:27   0:13
/usr/lib/rpm/rp
alane     6921 57.8 79.8 317776 205508 pts/6 R    01:27   0:15
/usr/lib/rpm/rp
alane     6921 54.9 78.9 335504 203284 pts/6 R    01:27   0:16
/usr/lib/rpm/rp
alane     6921 54.7 79.4 353552 204660 pts/6 R    01:27   0:18
/usr/lib/rpm/rp
alane     6921 55.9 79.8 370480 205708 pts/6 R    01:27   0:19
/usr/lib/rpm/rp
alane     6921 55.1 81.4 385072 209864 pts/6 R    01:27   0:20
/usr/lib/rpm/rp
alane     6921 57.1 83.9 403728 216116 pts/6 R    01:27   0:22
/usr/lib/rpm/rp
alane     6921 57.6 84.0 422416 216508 pts/6 R    01:27   0:24
/usr/lib/rpm/rp
alane     6921 55.0 83.3 436528 214680 pts/6 R    01:27   0:26
/usr/lib/rpm/rp
alane     6921 53.2 82.8 453104 213420 pts/6 R    01:27   0:28
/usr/lib/rpm/rp
alane     6921 53.8 83.2 469392 214412 pts/6 R    01:27   0:30
/usr/lib/rpm/rp
alane     6921 54.1 82.7 484528 213000 pts/6 R    01:27   0:31
/usr/lib/rpm/rp
alane     6921 55.1 82.8 498000 213324 pts/6 R    01:27   0:33
/usr/lib/rpm/rp
alane     6921 55.4 82.8 512112 213416 pts/6 R    01:27   0:35
/usr/lib/rpm/rp
alane     6921 56.5 82.6 525680 212912 pts/6 R    01:27   0:37
/usr/lib/rpm/rp
alane     6921 57.5 81.8 538928 210692 pts/6 R    01:27   0:39
/usr/lib/rpm/rp
alane     6921 57.6 82.3 552240 211976 pts/6 R    01:27   0:40
/usr/lib/rpm/rp
alane     6921 57.7 82.6 564720 212868 pts/6 R    01:27   0:42
/usr/lib/rpm/rp
alane     6921 58.5 81.6 576272 210268 pts/6 R    01:27   0:44
/usr/lib/rpm/rp
alane     6921 58.6 82.8 588912 213272 pts/6 R    01:27   0:46
/usr/lib/rpm/rp

WOW! 588M! 213M RSS! My god, it's ... it's ... bigger ... than ... than ...
Motif!

Now, I expect that, given enough memory, this might have finished. I ran
the same test on an rpm that was 80K in size. here's the ps from that:

alane     6886 99.9  9.0 25208 23216 pts/6   R    01:27   0:01
/usr/lib/rpm/rp
alane     6886 99.9 17.9 48128 46184 pts/6   R    01:27   0:03
/usr/lib/rpm/rp
alane     6886 99.9 23.9 63704 61788 pts/6   R    01:27   0:05
/usr/lib/rpm/rp
alane     6886 88.3 28.9 76472 74592 pts/6   R    01:27   0:07
/usr/lib/rpm/rp

so we needed 76M to extract the variables from an 80K rpm file.

finally, here's a little shell script you can use to beat the shit out it
yourself:

---8<-snip---8<-snip---8<-snip---8<-snip---8<-snip---8<-snip---8<---
#!/bin/sh
rpm=/bin/rpm
tags=$($rpm --querytags)
fmt=
for t in $tags ; do
  fmt="$fmt[%{$t}+++]@@@"
done
echo $rpm -qp --qf "$fmt" $@
exec $rpm -qp --qf "$fmt" $@
---8<-snip---8<-snip---8<-snip---8<-snip---8<-snip---8<-snip---8<---

Comment 1 j. alan eldridge 2001-03-25 11:24:44 UTC
whoa... found the condition that's causing it: it's the unescaped square
brackets that make it blow its cookies. now that's *weird*. if the brackets are
escaped with \, then it works correctly.

Comment 2 j. alan eldridge 2001-03-25 11:52:01 UTC
of course, escaping the brackets makes it incorrect. but i'll add another data
point: with the brackets escaped (or missing), rather than just spitting out
"(array)", rpm gives you the zeroth element of the array, with no indication
that your're missing things. this is bad, too. "MaxRPM" indicates that there
should be an error output ("(array)") but that seems to have been lost over
time.

i stronly suspect that this is a bug in not freeing allocated memory at the end
of the loop which iterates over the array items to spit them out. if this is the
case, then we would see spectacular degradation as the number of files in the
archive increases, and in fact, that is what we see.

looks like an API thing come to bite again, jeff; the question of "who does own
this iterator, and do i have to free it - or decrement its ref count - or not?"
seems to be pervasive (from what i've seen on the rpm list), and i'm guessing it
has struck again.


Comment 3 Jeff Johnson 2001-03-25 15:58:29 UTC
Blindly grabbing HEADER_* tags is not advised, as the value for those tags is
an entire copy of the header as originally read.

Try changing your loop to start grabbing tags at 256 (if you want signatures),
or 1000
(traditional rpm tag start).

Comment 4 j. alan eldridge 2001-03-25 16:21:24 UTC
jeff, i don't understand your comment re 256/1000. please explain. note: purpose
of python class is to make the rpm's tags available from a hash.  so i don't
understand how numbers fit into it.

I think I see your point  re HEADER*.... grabbing HEADERIMAGE in the array
context seems to be what's killing it. But *why* ???  I can't see how the
exhibited behavior could be considered either reasonable or correct.

Also, in creating a general purpose (to some extent) tool, how is one to
determine which headers are safe to grab, and which not? And, while we're at it,
how to know which are arrays and which are scalars? 

Are there any known examples of using the python interface to rpmlib? Or does it
even work?

Comment 5 Jeff Johnson 2001-03-25 16:56:32 UTC
Tags have numeric values, roughly split into groups
	60-99	(new) HEADER_*
	100	I18N table
	256	(new) signature info duplicated from signature header
	1000	RPMTAG_NAME
Complete list of name/value pairs is in rpmlib.h enum.

New tags have new semantics, HEADER_IMAGE is literally that, a copy
of the header that can/will be used to re-generate MD5/SHA1 digests
for verification of header contents after install has been completed.

So the question is what tags can you access? The answer is all of them,
but you need to be prepared for how the contents are defined. All
tags are "safe to grab". Arrays can be determined by looking at
the type and count fields returned from headerGetEntry.

So how do you tell what the values mean, and what their datatypes are?
Meaning is determined by context of access, and a tag returns the datatype
(at least if using headerGetEntry).

Best examples of usage are either anaconda (which was the reason for
creating python bindings) or up2date sources.