From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4a) Gecko/20030313 Description of problem: With the test-4.1.1 or test-4.2 RPMs under RH 8.0 or 9, running rpm -qa at the same time as an RPM install or erase can lead to errors like the following: error: rpmdbNextIterator: skipping h# 5729 blob size(8288): BAD, 8 + 16 * il(-393797231) + dl(-2133992638) I did not try testing RPM 4.1 or 4.2 as shipped with RH 8.0 or 9. This bug does not seem to happen with RPM 4.0.4 or test-4.0.5, nor does it happen with NPTL on RH 9. (For more detail on RH 9, see the Steps to Reproduce.) Version-Release number of selected component (if applicable): rpm-4.2-1 How reproducible: Depends on the setting of the magic constant in step 2 (see "Steps to Reproduce"), and perhaps the performance characteristics of the computer you're running it on. I suppose it's not 100% reproducible but it's not far off. Steps to Reproduce: For RH 8.0: 1. As root, run the following shell command/script: while : ; do rpm -e --justdb kernel-utils ; rpm -i --just db kernel-utils-2.4-8.28.i386.rpm ;sleep 1;done (Change the version number etc. of the kernel-utils RPM file as needed to match whatever you actually use in testing.) 2. Leave the command from step 1 running. In a separate console, run the following command/script as a normal user: for z in `seq 1 16` ; do rpm -qa > /dev/null & done (Increasing "16" to something bigger will slow things down but increase reproducibility. The opposite applies to reducing that number.) 3. In a third console, run top so you can see when all the rpmq processes stop running. 4. Look at the second console, and wait a minute or two. If all the rpmq processes disappear from top for more than one or two top snapshots, step 2 probably finished; if the bug did not reproduce, repeat step 2 with a larger constant. For Red Hat 9: Run the command "LD_ASSUME_KERNEL=2.2.5" in the first and/or second console, before steps 1 or 2 repectively. (If that command is run in neither console, I am unable to reproduce the bug.) Actual Results: Output from one session on Red Hat 8 (test-4.1.1): [barryn@localhost barryn]$ for z in `seq 1 7` ; do rpm -qa > /dev/null & done [1] 1604 [2] 1605 [3] 1606 [4] 1607 [5] 1608 [6] 1609 [7] 1610 [barryn@localhost barryn]$ error: rpmdbNextIterator: skipping h# 5729 blob size(8288): BAD, 8 + 16 * il(-393797231) + dl(-2133992638) (Note that most of my test runs were actually more noisy than this.) Expected Results: AFAIK those rpmdbNextIterator errors shouldn't be happening. Additional info: I don't know if there are any more serious consequences to this, but I thought it might be worth reporting.
Yes. Running as non-root cannot create shared locks because of permissions on /var/lib/rpm and /var/lib/rpm/__db* that would prevent a concurrent exclusive lock. So there is a window where non-root "rpm -qa" can see inconsistent results of operation in progress This needs a setgid helper to open __db files, with associated transfer of opened fd's, to fix correctly, aka privilege separation.
I can reproduce this even if the rpm -qa's are being run as root. Example: [root@localhost root]# for z in `seq 1 16`;do rpm -qa > /dev/null & done [1] 17679 [2] 17680 [3] 17681 [4] 17682 [5] 17683 [6] 17684 [7] 17685 [8] 17686 [9] 17687 [10] 17688 [11] 17689 [12] 17690 [13] 17691 [14] 17692 [15] 17693 [16] 17694 [root@localhost root]# error: rpmdbNextIterator: skipping h# 927 blob size(9404): BAD, 8 + 16 * il(0) + dl(0) error: rpmdbNextIterator: skipping h# 927 blob size(9404): BAD, 8 + 16 * il(0) + dl(0)
Hmmm, not for me, will try a couple times more. rpm-4.1.1?
Ah, root. Nope, 80 simultaneous rpm -qa jobs run fine with NPTL locking on SMP. rpm-4.1.1?
It happens with either RPM 4.1.1 (on RH 8.0) or with non-NPTL locking (LD_ASSUME_KERNEL=2.2.5) on RPM 4.2 (on RH 9).
Red Hat Linux 9 is no longer supported and Fedora rawhide/devel is now purely NPTL. This should no longer be hit.