Bug 145021
Summary: | kernel 2.6.10-1.1076_FC4smp - yum stuck on futex | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Michal Jaegermann <michal> | ||||||||||
Component: | rpm | Assignee: | Dave Jones <davej> | ||||||||||
Status: | CLOSED WORKSFORME | QA Contact: | Brian Brock <bbrock> | ||||||||||
Severity: | medium | Docs Contact: | |||||||||||
Priority: | medium | ||||||||||||
Version: | rawhide | CC: | bojan, jonabbey, nobody+pnasrat, pfrields, trevor, wtogami | ||||||||||
Target Milestone: | --- | ||||||||||||
Target Release: | --- | ||||||||||||
Hardware: | i686 | ||||||||||||
OS: | Linux | ||||||||||||
Whiteboard: | |||||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||||
Doc Text: | Story Points: | --- | |||||||||||
Clone Of: | Environment: | ||||||||||||
Last Closed: | 2007-08-10 09:44:53 UTC | Type: | --- | ||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||
Documentation: | --- | CRM: | |||||||||||
Verified Versions: | Category: | --- | |||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
Embargoed: | |||||||||||||
Attachments: |
|
Description
Michal Jaegermann
2005-01-13 20:06:29 UTC
I can confirm this too. I got yum stuck on futex this time in FC3. On a screen ..... Transaction Test Succeeded Running Transaction Updating: xpdf 100 % done 1/2 and nothing moves from here anywhere. Attached strace shows Process 25232 attached - interrupt to quit futex(0x2a9bf1b0dc, FUTEX_WAIT, 1, NULL without moving further. 25232 is a process with a command line /usr/bin/python /usr/bin/yum update This also makes impossible 'rpm -q ....' for any package and probably other rpm operations as well. kernel-2.6.10-1.760_FC3 and yum-2.1.13-0.fc3. After I killed (-9, or otherwise no reaction) stuck process and removed stale /var/lib/rpm/__db.00* files, which were around a week old and a number of updates happened in the meantime, the next yum run succeeded. No good explanation. I tried various 'rpm' operations (query, test erase) before dumping __db.00* and no dice. If not that detail that they were dated February 2nd I would think that they were related to the problem. This is the same bug as I reported in bug 144589. I just hit this bug doing a normal (non OS level) update of about 50 packages (hadn't updated in a while). It hung on updating perl-suidperl, but I'm pretty sure it really doesn't matter what package it is updating when it hangs as I've seen it hang on all sorts or packages that appear completely random. On this latest hang: #/tmp/strace -p 7685 Process 7685 attached - interrupt to quit futex(0xa2e3e28, FUTEX_WAIT, 1, NULL The box is FC3 2.6.10-1.766_FC3 and if the updates had succeeded it would have been fully up to date. Cleaning up after a mess like this is very difficult if you want to be careful not to screw up your rpmdb and require a reinstall. PS: it (almost?) always seems to freeze on the "completing updates" phase and never on the installing updates phase -- can anyone else confirm? Oh, and unlike bug 144589 comments 14 and 15, there were no strange /tmp mounts at all when it hung this time. I really don't think the hang is package-dependent. Okay, but I really don't think this hang is yum related either. It looks like rpm is wedged. *** Bug 144589 has been marked as a duplicate of this bug. *** Created attachment 114981 [details]
yum hangs yet again
The bug just hit yet again, this time on a relatively simple (small) yum update
set. I checked /v/l/messages and there was nothing strange going on at the
time.
I went into /var/cache/yum and manually rpm -U --force the packages yum was
supposed to install and it all went in perfectly. If the problem was an rpm
problem, how come rpm -U works where yum update fails, on the exact same file
set?
Created attachment 115154 [details]
another futex hang on yet another machine
Another hang, on a different machine. This is becoming *really* common now; so
common that it is starting to hang more than it succeeds and I have to spend an
hour cleaning up the mess it leaves. Again, with this hang there were no weird
/tmp mounted fs's.
again, this is definitely not a yum bug. You're looking at rpm. i'm going to change the component to rpm. Please attach the output of: cd /var/lib/rpm /usr/lib/rpm/rpmdb_stat -CA Created attachment 115182 [details] rpmdb_stat -CA output Since I have since kill -9'd yum and rpm -U --force'd all the packages in, I'm not sure how much use this is to you. This output is from the most recent hung machine (attachment 115154 [details]) I think (or at worst the 114981 one). If you need this run while a yum is stuck on futex, I'll need time to wait till when the hang happens again. With the rate I'm seeing them lately, that shouldn't be too long. Created attachment 117470 [details]
Output of rpmdb_stat -CA while yum update is hanging on Futex
I'm seeing this hang as well. I'm attaching the output of rpmdb_stat -CA while
my yum update is hanging on a Futex.
Ah, sorry. Further information on the hang. I had installed Fedora Core 3 base onto a new box. I went to do a yum update to bring everything up to date about half an hour ago. It started working on 469 update steps, and finished doing the first phase of all of the updates/installs. It started doing the 'Completing update' phase, and got this far: Completing update for mgetty - 237/469 Completing update for fonts-xorg-75dpi - 238/469 Completing update for fonts-xorg-100dpi - 239/469 Completing update for gstreamer - 240/469 Completing update for python - 241/469 Completing update for emacs - 242/469 Completing update for libstdc++-devel - 243/469 Completing update for libf2c - 244/469 Completing update for shadow-utils - 245/469 Completing update for gtk2-devel - 246/469 Completing update for libgal2 - 247/469 Completing update for dbus-x11 - 248/469 Completing update for libtiff - 249/469 Completing update for krb5-libs - 250/469 Completing update for libtool-libs - 251/469 Completing update for at - 252/469 Completing update for libgcc - 253/469 Completing update for texinfo - 254/469 Completing update for glibc - 255/469 Completing update for udev - 256/469 Completing update for gcc-c++ - 257/469 Completing update for tetex-latex - 258/469 Completing update for apr - 259/469 Completing update for pam - 260/469 Completing update for cpp - 261/469 Completing update for vim-common - 262/469 Completing update for vim-minimal - 263/469 Completing update for HelixPlayer - 264/469 Completing update for emacs-leim - 265/469 Completing update for libselinux-devel - 266/469 Completing update for logwatch - 267/469 Completing update for tcpdump - 268/469 Completing update for qt - 269/469 Completing update for prelink - 270/469 Completing update for binutils - 271/469 before hanging thusly: [root@csdpc21 ~]# strace -p 4764 Process 4764 attached - interrupt to quit futex(0xaf7e050, FUTEX_WAIT, 1, NULL And process 4764 is the yum update process, of course.. [root@csdpc21 ~]# ps -ef|grep 4764 root 4764 4689 9 12:35 pts/1 00:03:36 /usr/bin/python /usr/bin/yum update root 8730 8605 0 13:13 pts/2 00:00:00 grep 4764 [root@csdpc21 ~]# as root can you run /sbin/fuser /var/lib/rpm/__db.00* [root@csdpc21 rpm]# /sbin/fuser /var/lib/rpm/__db.00* /var/lib/rpm/__db.001: 4764m /var/lib/rpm/__db.002: 4764m /var/lib/rpm/__db.003: 4764m Looks like only the yum update process itself.. Let me know if there are any further investigations I can productively carry out for y'all while the yum update continues to hang, otherwise I'll probably try to start recovery on that box in another hour or so. Could you create a backup of /var/lib/rpm before you perform any recovery. Sure. I assume you want to look at it? It's 16 megs, I don't know whether that's too big for bugzilla.. ftp://ftp.arlut.utexas.edu/pub/redhat_debug/varlibrpm.tar.bz2 This bug hits me on a regular basis. If you guys provide a set of things to do/capture before killing/cleaning I can make sure I do it next time. Otherwise I'll just do the things mentioned so far in the above comments. (And thanks for helping out, Jon.) Judging from the number of locks that are reported open, this looks like yum violating rules for using rpmdb iterators as anything else. Verification would require mapping the locks back to open db iterators looking to see if two iterators on the same index are open simultaneously -- that's a programming error. Just got this on FC6 with yum 3.0.1. Strace gives: --------------------------------- futex(0xb677ec10, FUTEX_WAIT, 2, NULL --------------------------------- While yum did: --------------------------------- Loading "installonlyn" plugin Loading "fastestmirror" plugin Setting up Update Process Setting up repositories core 100% |=========================| 951 B 00:00 updates 100% |=========================| 1.2 kB 00:00 extras 100% |=========================| 1.1 kB 00:00 Determining fastest mirrors Reading repository metadata in from local files primary.xml.gz 100% |=========================| 807 kB 00:00 ################################################## 2242/2242 primary.xml.gz 100% |=========================| 363 kB 00:00 ################################################## 1126/1126 primary.xml.gz 100% |=========================| 1.6 MB 00:10 ################################################## 5133/5133 Excluding Packages in global exclude list Finished --------------------------------- rpm --rebuilddb hangs too: --------------------------------- futex(0xb7da5c10, FUTEX_WAIT, 2, NULL --------------------------------- After removing /var/lib/rpm/__db* files, running rpm --rebuilddb again and then yum clean all (to be sure, to be sure), the problem has gone away. So, this looks like Berkeley DB stuck in one of its infamous hung states. i confirm comment 23. but yum clean all not needed kernel 2.6.18-1.2257.fc5smp This is a mixture of so many different things over such a long range of time and versions of software, including FC5/6 kernel mmap bug induced things (covered in bug 213963) it's impossible to say anything at this point. Closing as WORKSFORME, please open new bugs if still encountered with current versions. |