From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.1) Gecko/20021003 Description of problem: Asking rpm what packages require libc.so.6 gives an error message from db4, db->close of denied permission. It comes after the long list of packages. Version-Release number of selected component (if applicable): 4.2-0.25.1 How reproducible: Always Steps to Reproduce: 1.env LANG=C rpm -q --whatrequires libc.so.6 Actual Results: Long list of packages (attached) followed by: error: db4 error(13) from db->close: Permission denied Additional info: The list of packages appears to be correct, although I haven't checked it. I haven't found this problem with any other library.
Created attachment 88968 [details] Output from rpm command on the test system
Probably true, although I can't reproduce, as root or non-root, using rpm-4.2-0.45, glibc-2.3.1-21, and kernel-smp-2.4.20-2.2, all of which are pertinent while rpm is switching to using posix mutexes. The underlying issue is that db-4.1.24, for mysterious reasons having to do with cache coherency in a db environment, reports close failures explicitly. The error message is dutifully reported by rpm, but afaict is always harmless (i.e. the fix will be to not print the error message.) Closing WORKSFORME because I can't reproduce with rpm-4.2-0.45.
I still get this with glibc-2.3.1-36 rpm-4.2-0.56 and a kernel built from kernel-source-2.4.20-2.21 Are there even more packages involved?
Could I have another chance and reopen this? I still see this on some machines, for example one with this configuration: kernel-source-2.4.20-8 db4-4.1.25-9 glibc-2.3.1-36 rpm-4.2.1-0.11 It is quite repeatable, but it happens only for some packages. In the hope it might give a clue, I enclose two straces, one where this happens and one where it does not. I notice a strange difference in the end. After having written the package name and version, rpmq tries to open /var/lib/rpm/Name in READ/WRITE mode. Naturally, it fails when I run this as non-root. This gives a "permission denied" error, which is what is written in the error message, so it looks suspiciously as the source of this problem. Also, the error message does not appear if I run the command as root.
Created attachment 95706 [details] Trace of rpmq where the problem appears The command was: /usr/lib/rpm/rpmq -q kernel-source Calls to gettimeofday were sorted out to reduce differences.
Created attachment 95707 [details] Trace of a case where the problem does not appear The command this time was: /usr/lib/rpm/rpmq -q libjpeg-devel Again, gettimeofday is sorted out.
Are other processes accessing /var/lib/rpm concurrently? There is a late repoen triggered by db->close that might be expecting write access (non-root cannot create shared locks, so cache can __db* cache can change while running, possibly triggering the late reopen. Just a wild guess ...) There is (or was) a dangling ptr if header bit array was realloc'ed inopportunely. That's fixed in rpm-4.2.2-0.6. Without a reproducer, this is gonna be a bear to track down. Could you tar up and save /var/lib/rpm and attach a ptr here if you encounter a reproducible case?
Thanks ;-)
At http://www.carmen.se/pub/var.lib.rpm.tar.bz2 there is the contents of the machine where I most often see this happen. (As you can see from the contents, the machine has quite a mix of packages from different releases. As mentioned previously, I believe all relevant parts are up to date.) The bug can be reproducibly triggered for example with rpm -q --whatrequires libc.so.6 > /dev/null An "lsof" doesn't reveal any other processes having anything in /var/lib/rpm open. I haven't had the time to try if rpm-4.2.2-0.6 is different. I'll do that in a few days. Oh, and: You're welcome! :-)
I cannot reproduce error as root or non-root using rpm-4.2.2 with your database. Of course that doesn't mean rpm-4.2.2 has fixed anything, but db-4.2.52 is used instead of db-4.1.25. I have your database, can/will try anything, but I can't fix what I can't see :-(
I upgraded to 4.2.2-0.8. In my first tests I can't reproduce this problem any more after that! :-) Let me do a few more tests the next few days before closing this. But with some luck, it is fixed. (Maybe my db kept triggering that pointer you fixed in 4.2.2-0.6?)
Sorry, didn't mean to close just yet.
I've now exercised rpm in a number of ways, and this problem does indeed seem to be gone! :-)