Description of problem: After an attempt of 'yum -c local.conf update', with a repository on a local disk and a configuration file to reflect that, a yum process got stuck. Nothing was happening by at least ten minutes. 'ps' was showing something like that: root 5410 2.8 9.4 55668 48792 tty1 S+ 11:13 0:09 \_ /usr/bin/python /usr/bin/yum -y -c local.conf update A check with 'strace' got this: # strace -p 5410 Process 5410 attached - interrupt to quit futex(0xb61df058, FUTEX_WAIT, 2, NULL and nothing was moving any further. With 'sysrq-T' this showed up: yum S 00001B4B 1920 5410 4978 d9c4deb4 00000086 00000000 00001b4b 00001b52 00000010 00000020 00000000 00000000 c49abd78 c1406060 00000000 00000000 659c3d00 000f4553 c0315ba0 df4d3560 df4d36b8 c0138483 00000000 fffffff5 7fffffff d9c4d000 b61df058 Call Trace: [<c0138483>] file_read_actor+0x0/0xf1 [<c02c6562>] schedule_timeout+0x13/0xae [<c01485f6>] find_extend_vma+0x12/0x4f [<c012e28a>] add_wait_queue+0x12/0x30 [<c012ed80>] futex_wait+0xd9/0x13a [<c011a2c4>] default_wake_function+0x0/0xc [<c011a2c4>] default_wake_function+0x0/0xc [<c012f02a>] do_futex+0x29/0x5a [<c012f118>] sys_futex+0xbd/0xcc [<c0152fd1>] sys_pread64+0x43/0x59 [<c0103cdb>] syscall_call+0x7/0xb If of any interest I have now a full output from 'sysrq-T' in logs. Version-Release number of selected component (if applicable): kernel 2.6.10-1.1076_FC4smp How reproducible: No idea. I rebooted with another kernel and update went through.
I can confirm this too.
I got yum stuck on futex this time in FC3. On a screen ..... Transaction Test Succeeded Running Transaction Updating: xpdf 100 % done 1/2 and nothing moves from here anywhere. Attached strace shows Process 25232 attached - interrupt to quit futex(0x2a9bf1b0dc, FUTEX_WAIT, 1, NULL without moving further. 25232 is a process with a command line /usr/bin/python /usr/bin/yum update This also makes impossible 'rpm -q ....' for any package and probably other rpm operations as well. kernel-2.6.10-1.760_FC3 and yum-2.1.13-0.fc3.
After I killed (-9, or otherwise no reaction) stuck process and removed stale /var/lib/rpm/__db.00* files, which were around a week old and a number of updates happened in the meantime, the next yum run succeeded. No good explanation. I tried various 'rpm' operations (query, test erase) before dumping __db.00* and no dice. If not that detail that they were dated February 2nd I would think that they were related to the problem.
This is the same bug as I reported in bug 144589. I just hit this bug doing a normal (non OS level) update of about 50 packages (hadn't updated in a while). It hung on updating perl-suidperl, but I'm pretty sure it really doesn't matter what package it is updating when it hangs as I've seen it hang on all sorts or packages that appear completely random. On this latest hang: #/tmp/strace -p 7685 Process 7685 attached - interrupt to quit futex(0xa2e3e28, FUTEX_WAIT, 1, NULL The box is FC3 2.6.10-1.766_FC3 and if the updates had succeeded it would have been fully up to date. Cleaning up after a mess like this is very difficult if you want to be careful not to screw up your rpmdb and require a reinstall. PS: it (almost?) always seems to freeze on the "completing updates" phase and never on the installing updates phase -- can anyone else confirm?
Oh, and unlike bug 144589 comments 14 and 15, there were no strange /tmp mounts at all when it hung this time. I really don't think the hang is package-dependent.
Okay, but I really don't think this hang is yum related either. It looks like rpm is wedged.
*** Bug 144589 has been marked as a duplicate of this bug. ***
Created attachment 114981 [details] yum hangs yet again The bug just hit yet again, this time on a relatively simple (small) yum update set. I checked /v/l/messages and there was nothing strange going on at the time. I went into /var/cache/yum and manually rpm -U --force the packages yum was supposed to install and it all went in perfectly. If the problem was an rpm problem, how come rpm -U works where yum update fails, on the exact same file set?
Created attachment 115154 [details] another futex hang on yet another machine Another hang, on a different machine. This is becoming *really* common now; so common that it is starting to hang more than it succeeds and I have to spend an hour cleaning up the mess it leaves. Again, with this hang there were no weird /tmp mounted fs's.
again, this is definitely not a yum bug. You're looking at rpm. i'm going to change the component to rpm.
Please attach the output of: cd /var/lib/rpm /usr/lib/rpm/rpmdb_stat -CA
Created attachment 115182 [details] rpmdb_stat -CA output Since I have since kill -9'd yum and rpm -U --force'd all the packages in, I'm not sure how much use this is to you. This output is from the most recent hung machine (attachment 115154 [details]) I think (or at worst the 114981 one). If you need this run while a yum is stuck on futex, I'll need time to wait till when the hang happens again. With the rate I'm seeing them lately, that shouldn't be too long.
Created attachment 117470 [details] Output of rpmdb_stat -CA while yum update is hanging on Futex I'm seeing this hang as well. I'm attaching the output of rpmdb_stat -CA while my yum update is hanging on a Futex.
Ah, sorry. Further information on the hang. I had installed Fedora Core 3 base onto a new box. I went to do a yum update to bring everything up to date about half an hour ago. It started working on 469 update steps, and finished doing the first phase of all of the updates/installs. It started doing the 'Completing update' phase, and got this far: Completing update for mgetty - 237/469 Completing update for fonts-xorg-75dpi - 238/469 Completing update for fonts-xorg-100dpi - 239/469 Completing update for gstreamer - 240/469 Completing update for python - 241/469 Completing update for emacs - 242/469 Completing update for libstdc++-devel - 243/469 Completing update for libf2c - 244/469 Completing update for shadow-utils - 245/469 Completing update for gtk2-devel - 246/469 Completing update for libgal2 - 247/469 Completing update for dbus-x11 - 248/469 Completing update for libtiff - 249/469 Completing update for krb5-libs - 250/469 Completing update for libtool-libs - 251/469 Completing update for at - 252/469 Completing update for libgcc - 253/469 Completing update for texinfo - 254/469 Completing update for glibc - 255/469 Completing update for udev - 256/469 Completing update for gcc-c++ - 257/469 Completing update for tetex-latex - 258/469 Completing update for apr - 259/469 Completing update for pam - 260/469 Completing update for cpp - 261/469 Completing update for vim-common - 262/469 Completing update for vim-minimal - 263/469 Completing update for HelixPlayer - 264/469 Completing update for emacs-leim - 265/469 Completing update for libselinux-devel - 266/469 Completing update for logwatch - 267/469 Completing update for tcpdump - 268/469 Completing update for qt - 269/469 Completing update for prelink - 270/469 Completing update for binutils - 271/469 before hanging thusly: [root@csdpc21 ~]# strace -p 4764 Process 4764 attached - interrupt to quit futex(0xaf7e050, FUTEX_WAIT, 1, NULL
And process 4764 is the yum update process, of course.. [root@csdpc21 ~]# ps -ef|grep 4764 root 4764 4689 9 12:35 pts/1 00:03:36 /usr/bin/python /usr/bin/yum update root 8730 8605 0 13:13 pts/2 00:00:00 grep 4764 [root@csdpc21 ~]#
as root can you run /sbin/fuser /var/lib/rpm/__db.00*
[root@csdpc21 rpm]# /sbin/fuser /var/lib/rpm/__db.00* /var/lib/rpm/__db.001: 4764m /var/lib/rpm/__db.002: 4764m /var/lib/rpm/__db.003: 4764m Looks like only the yum update process itself..
Let me know if there are any further investigations I can productively carry out for y'all while the yum update continues to hang, otherwise I'll probably try to start recovery on that box in another hour or so.
Could you create a backup of /var/lib/rpm before you perform any recovery.
Sure. I assume you want to look at it? It's 16 megs, I don't know whether that's too big for bugzilla.. ftp://ftp.arlut.utexas.edu/pub/redhat_debug/varlibrpm.tar.bz2
This bug hits me on a regular basis. If you guys provide a set of things to do/capture before killing/cleaning I can make sure I do it next time. Otherwise I'll just do the things mentioned so far in the above comments. (And thanks for helping out, Jon.)
Judging from the number of locks that are reported open, this looks like yum violating rules for using rpmdb iterators as anything else. Verification would require mapping the locks back to open db iterators looking to see if two iterators on the same index are open simultaneously -- that's a programming error.
Just got this on FC6 with yum 3.0.1. Strace gives: --------------------------------- futex(0xb677ec10, FUTEX_WAIT, 2, NULL --------------------------------- While yum did: --------------------------------- Loading "installonlyn" plugin Loading "fastestmirror" plugin Setting up Update Process Setting up repositories core 100% |=========================| 951 B 00:00 updates 100% |=========================| 1.2 kB 00:00 extras 100% |=========================| 1.1 kB 00:00 Determining fastest mirrors Reading repository metadata in from local files primary.xml.gz 100% |=========================| 807 kB 00:00 ################################################## 2242/2242 primary.xml.gz 100% |=========================| 363 kB 00:00 ################################################## 1126/1126 primary.xml.gz 100% |=========================| 1.6 MB 00:10 ################################################## 5133/5133 Excluding Packages in global exclude list Finished --------------------------------- rpm --rebuilddb hangs too: --------------------------------- futex(0xb7da5c10, FUTEX_WAIT, 2, NULL --------------------------------- After removing /var/lib/rpm/__db* files, running rpm --rebuilddb again and then yum clean all (to be sure, to be sure), the problem has gone away. So, this looks like Berkeley DB stuck in one of its infamous hung states.
i confirm comment 23. but yum clean all not needed kernel 2.6.18-1.2257.fc5smp
This is a mixture of so many different things over such a long range of time and versions of software, including FC5/6 kernel mmap bug induced things (covered in bug 213963) it's impossible to say anything at this point. Closing as WORKSFORME, please open new bugs if still encountered with current versions.