89734 – rpm -qa simultaneous with rpm -i or -e causes rpmdbNextIterator error

Bug 89734 - rpm -qa simultaneous with rpm -i or -e causes rpmdbNextIterator error

Summary: rpm -qa simultaneous with rpm -i or -e causes rpmdbNextIterator error

Keywords:
Status:	CLOSED RAWHIDE
Alias:	None
Product:	Red Hat Linux
Classification:	Retired
Component:	rpm
Sub Component:
Version:	9
Hardware:	i686
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Paul Nasrat
QA Contact:	Mike McLean
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2003-04-27 05:27 UTC by Barry K. Nathan
Modified:	2007-04-18 16:53 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2005-09-29 21:39:50 UTC
Embargoed:

Attachments	(Terms of Use)

Description Barry K. Nathan 2003-04-27 05:27:28 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4a) Gecko/20030313

Description of problem:
With the test-4.1.1 or test-4.2 RPMs under RH 8.0 or 9, running rpm -qa at the
same time as an RPM install or erase can lead to errors like the following:

error: rpmdbNextIterator: skipping h#    5729 blob size(8288): BAD, 8 + 16 *
il(-393797231) + dl(-2133992638)

I did not try testing RPM 4.1 or 4.2 as shipped with RH 8.0 or 9. This bug does
not seem to happen with RPM 4.0.4 or test-4.0.5, nor does it happen with NPTL on
RH 9. (For more detail on RH 9, see the Steps to Reproduce.)

Version-Release number of selected component (if applicable):
rpm-4.2-1

How reproducible:
Depends on the setting of the magic constant in step 2 (see "Steps to
Reproduce"), and perhaps the performance characteristics of the computer you're
running it on. I suppose it's not 100% reproducible but it's not far off.

Steps to Reproduce:
For RH 8.0:

1. As root, run the following shell command/script:
while : ; do rpm -e --justdb kernel-utils ; rpm -i --just
db kernel-utils-2.4-8.28.i386.rpm ;sleep 1;done

(Change the version number etc. of the kernel-utils RPM file as needed to match
whatever you actually use in testing.)

2. Leave the command from step 1 running. In a separate console, run the
following command/script as a normal user:
for z in `seq 1 16` ; do rpm -qa > /dev/null & done
(Increasing "16" to something bigger will slow things down but increase
reproducibility. The opposite applies to reducing that number.)

3. In a third console, run top so you can see when all the rpmq processes stop
running.

4. Look at the second console, and wait a minute or two. If all the rpmq
processes disappear from top for more than one or two top snapshots, step 2
probably finished; if the bug did not reproduce, repeat step 2 with a larger
constant.

For Red Hat 9: Run the command "LD_ASSUME_KERNEL=2.2.5" in the first and/or
second console, before steps 1 or 2 repectively. (If that command is run in
neither console, I am unable to reproduce the bug.)

Actual Results:  Output from one session on Red Hat 8 (test-4.1.1):
[barryn@localhost barryn]$ for z in `seq 1 7` ; do rpm -qa > /dev/null & done
[1] 1604
[2] 1605
[3] 1606
[4] 1607
[5] 1608
[6] 1609
[7] 1610
[barryn@localhost barryn]$ error: rpmdbNextIterator: skipping h#    5729 blob
size(8288): BAD, 8 + 16 * il(-393797231) + dl(-2133992638)

(Note that most of my test runs were actually more noisy than this.)

Expected Results:  AFAIK those rpmdbNextIterator errors shouldn't be happening.

Additional info:

I don't know if there are any more serious consequences to this, but I thought
it might be worth reporting.

Comment 1 Jeff Johnson 2003-04-29 17:25:31 UTC

Yes. Running as non-root cannot create shared locks
because of permissions on /var/lib/rpm and /var/lib/rpm/__db*
that would prevent a concurrent exclusive lock.

So there is a window where non-root "rpm -qa" can see
inconsistent results of operation in progress

This needs a setgid helper to open __db files, with
associated transfer of opened fd's, to fix correctly,
aka privilege separation.

Comment 2 Barry K. Nathan 2003-04-29 19:26:59 UTC

I can reproduce this even if the rpm -qa's are being run as root. Example:

[root@localhost root]# for z in `seq 1 16`;do rpm -qa > /dev/null & done
[1] 17679
[2] 17680
[3] 17681
[4] 17682
[5] 17683
[6] 17684
[7] 17685
[8] 17686
[9] 17687
[10] 17688
[11] 17689
[12] 17690
[13] 17691
[14] 17692
[15] 17693
[16] 17694
[root@localhost root]# error: rpmdbNextIterator: skipping h#     927 blob
size(9404): BAD, 8 + 16 * il(0) + dl(0)
error: rpmdbNextIterator: skipping h#     927 blob size(9404): BAD, 8 + 16 *
il(0) + dl(0)

Comment 3 Jeff Johnson 2003-04-29 19:34:41 UTC

Hmmm, not for me, will try a couple times more. rpm-4.1.1?

Comment 4 Jeff Johnson 2003-04-29 19:45:21 UTC

Ah, root. Nope, 80 simultaneous rpm -qa jobs run fine
with NPTL locking on SMP. rpm-4.1.1?

Comment 5 Barry K. Nathan 2003-04-29 21:09:34 UTC

It happens with either RPM 4.1.1 (on RH 8.0) or with non-NPTL locking
(LD_ASSUME_KERNEL=2.2.5) on RPM 4.2 (on RH 9).

Comment 6 Paul Nasrat 2005-09-29 21:39:50 UTC

Red Hat Linux 9 is no longer supported and Fedora rawhide/devel is now purely
NPTL.  This should no longer be hit.

Note You need to log in before you can comment on or make changes to this bug.