Red Hat Bugzilla – Bug 89734
rpm -qa simultaneous with rpm -i or -e causes rpmdbNextIterator error
Last modified: 2007-04-18 12:53:20 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4a) Gecko/20030313
Description of problem:
With the test-4.1.1 or test-4.2 RPMs under RH 8.0 or 9, running rpm -qa at the
same time as an RPM install or erase can lead to errors like the following:
error: rpmdbNextIterator: skipping h# 5729 blob size(8288): BAD, 8 + 16 *
il(-393797231) + dl(-2133992638)
I did not try testing RPM 4.1 or 4.2 as shipped with RH 8.0 or 9. This bug does
not seem to happen with RPM 4.0.4 or test-4.0.5, nor does it happen with NPTL on
RH 9. (For more detail on RH 9, see the Steps to Reproduce.)
Version-Release number of selected component (if applicable):
Depends on the setting of the magic constant in step 2 (see "Steps to
Reproduce"), and perhaps the performance characteristics of the computer you're
running it on. I suppose it's not 100% reproducible but it's not far off.
Steps to Reproduce:
For RH 8.0:
1. As root, run the following shell command/script:
while : ; do rpm -e --justdb kernel-utils ; rpm -i --just
db kernel-utils-2.4-8.28.i386.rpm ;sleep 1;done
(Change the version number etc. of the kernel-utils RPM file as needed to match
whatever you actually use in testing.)
2. Leave the command from step 1 running. In a separate console, run the
following command/script as a normal user:
for z in `seq 1 16` ; do rpm -qa > /dev/null & done
(Increasing "16" to something bigger will slow things down but increase
reproducibility. The opposite applies to reducing that number.)
3. In a third console, run top so you can see when all the rpmq processes stop
4. Look at the second console, and wait a minute or two. If all the rpmq
processes disappear from top for more than one or two top snapshots, step 2
probably finished; if the bug did not reproduce, repeat step 2 with a larger
For Red Hat 9: Run the command "LD_ASSUME_KERNEL=2.2.5" in the first and/or
second console, before steps 1 or 2 repectively. (If that command is run in
neither console, I am unable to reproduce the bug.)
Actual Results: Output from one session on Red Hat 8 (test-4.1.1):
[barryn@localhost barryn]$ for z in `seq 1 7` ; do rpm -qa > /dev/null & done
[barryn@localhost barryn]$ error: rpmdbNextIterator: skipping h# 5729 blob
size(8288): BAD, 8 + 16 * il(-393797231) + dl(-2133992638)
(Note that most of my test runs were actually more noisy than this.)
Expected Results: AFAIK those rpmdbNextIterator errors shouldn't be happening.
I don't know if there are any more serious consequences to this, but I thought
it might be worth reporting.
Yes. Running as non-root cannot create shared locks
because of permissions on /var/lib/rpm and /var/lib/rpm/__db*
that would prevent a concurrent exclusive lock.
So there is a window where non-root "rpm -qa" can see
inconsistent results of operation in progress
This needs a setgid helper to open __db files, with
associated transfer of opened fd's, to fix correctly,
aka privilege separation.
I can reproduce this even if the rpm -qa's are being run as root. Example:
[root@localhost root]# for z in `seq 1 16`;do rpm -qa > /dev/null & done
[root@localhost root]# error: rpmdbNextIterator: skipping h# 927 blob
size(9404): BAD, 8 + 16 * il(0) + dl(0)
error: rpmdbNextIterator: skipping h# 927 blob size(9404): BAD, 8 + 16 *
il(0) + dl(0)
Hmmm, not for me, will try a couple times more. rpm-4.1.1?
Ah, root. Nope, 80 simultaneous rpm -qa jobs run fine
with NPTL locking on SMP. rpm-4.1.1?
It happens with either RPM 4.1.1 (on RH 8.0) or with non-NPTL locking
(LD_ASSUME_KERNEL=2.2.5) on RPM 4.2 (on RH 9).
Red Hat Linux 9 is no longer supported and Fedora rawhide/devel is now purely
NPTL. This should no longer be hit.