Red Hat Bugzilla – Bug 145021
kernel 2.6.10-1.1076_FC4smp - yum stuck on futex
Last modified: 2015-01-04 17:15:29 EST
Description of problem:
After an attempt of 'yum -c local.conf update', with a repository
on a local disk and a configuration file to reflect that, a yum
process got stuck. Nothing was happening by at least ten minutes.
'ps' was showing something like that:
root 5410 2.8 9.4 55668 48792 tty1 S+ 11:13 0:09
\_ /usr/bin/python /usr/bin/yum -y -c local.conf update
A check with 'strace' got this:
# strace -p 5410
Process 5410 attached - interrupt to quit
futex(0xb61df058, FUTEX_WAIT, 2, NULL
and nothing was moving any further. With 'sysrq-T' this showed
yum S 00001B4B 1920 5410 4978
d9c4deb4 00000086 00000000 00001b4b 00001b52 00000010 00000020 00000000
00000000 c49abd78 c1406060 00000000 00000000 659c3d00 000f4553
df4d3560 df4d36b8 c0138483 00000000 fffffff5 7fffffff d9c4d000
If of any interest I have now a full output from 'sysrq-T' in logs.
Version-Release number of selected component (if applicable):
No idea. I rebooted with another kernel and update went through.
I can confirm this too.
I got yum stuck on futex this time in FC3. On a screen
Transaction Test Succeeded
Updating: xpdf 100 % done 1/2
and nothing moves from here anywhere. Attached strace shows
Process 25232 attached - interrupt to quit
futex(0x2a9bf1b0dc, FUTEX_WAIT, 1, NULL
without moving further. 25232 is a process with a command line
/usr/bin/python /usr/bin/yum update
This also makes impossible 'rpm -q ....' for any package and probably
other rpm operations as well. kernel-2.6.10-1.760_FC3 and
After I killed (-9, or otherwise no reaction) stuck process
and removed stale /var/lib/rpm/__db.00* files, which were around
a week old and a number of updates happened in the meantime,
the next yum run succeeded. No good explanation.
I tried various 'rpm' operations (query, test erase) before dumping
__db.00* and no dice. If not that detail that they were dated
February 2nd I would think that they were related to the problem.
This is the same bug as I reported in bug 144589. I just hit this bug doing a
normal (non OS level) update of about 50 packages (hadn't updated in a while).
It hung on updating perl-suidperl, but I'm pretty sure it really doesn't matter
what package it is updating when it hangs as I've seen it hang on all sorts or
packages that appear completely random.
On this latest hang:
#/tmp/strace -p 7685
Process 7685 attached - interrupt to quit
futex(0xa2e3e28, FUTEX_WAIT, 1, NULL
The box is FC3 2.6.10-1.766_FC3 and if the updates had succeeded it would have
been fully up to date.
Cleaning up after a mess like this is very difficult if you want to be careful
not to screw up your rpmdb and require a reinstall.
PS: it (almost?) always seems to freeze on the "completing updates" phase and
never on the installing updates phase -- can anyone else confirm?
Oh, and unlike bug 144589 comments 14 and 15, there were no strange /tmp mounts
at all when it hung this time. I really don't think the hang is package-dependent.
Okay, but I really don't think this hang is yum related either. It looks like
rpm is wedged.
*** Bug 144589 has been marked as a duplicate of this bug. ***
Created attachment 114981 [details]
yum hangs yet again
The bug just hit yet again, this time on a relatively simple (small) yum update
set. I checked /v/l/messages and there was nothing strange going on at the
I went into /var/cache/yum and manually rpm -U --force the packages yum was
supposed to install and it all went in perfectly. If the problem was an rpm
problem, how come rpm -U works where yum update fails, on the exact same file
Created attachment 115154 [details]
another futex hang on yet another machine
Another hang, on a different machine. This is becoming *really* common now; so
common that it is starting to hang more than it succeeds and I have to spend an
hour cleaning up the mess it leaves. Again, with this hang there were no weird
/tmp mounted fs's.
again, this is definitely not a yum bug. You're looking at rpm. i'm going to
change the component to rpm.
Please attach the output of:
Created attachment 115182 [details]
rpmdb_stat -CA output
Since I have since kill -9'd yum and rpm -U --force'd all the packages in, I'm
not sure how much use this is to you. This output is from the most recent hung
machine (attachment 115154 [details]) I think (or at worst the 114981 one). If you need
this run while a yum is stuck on futex, I'll need time to wait till when the
hang happens again. With the rate I'm seeing them lately, that shouldn't be
Created attachment 117470 [details]
Output of rpmdb_stat -CA while yum update is hanging on Futex
I'm seeing this hang as well. I'm attaching the output of rpmdb_stat -CA while
my yum update is hanging on a Futex.
Ah, sorry. Further information on the hang. I had installed Fedora Core 3 base
onto a new box. I went to do a yum update to bring everything up to date about
half an hour ago. It started working on 469 update steps, and finished doing
the first phase of all of the updates/installs. It started doing the
'Completing update' phase, and got this far:
Completing update for mgetty - 237/469
Completing update for fonts-xorg-75dpi - 238/469
Completing update for fonts-xorg-100dpi - 239/469
Completing update for gstreamer - 240/469
Completing update for python - 241/469
Completing update for emacs - 242/469
Completing update for libstdc++-devel - 243/469
Completing update for libf2c - 244/469
Completing update for shadow-utils - 245/469
Completing update for gtk2-devel - 246/469
Completing update for libgal2 - 247/469
Completing update for dbus-x11 - 248/469
Completing update for libtiff - 249/469
Completing update for krb5-libs - 250/469
Completing update for libtool-libs - 251/469
Completing update for at - 252/469
Completing update for libgcc - 253/469
Completing update for texinfo - 254/469
Completing update for glibc - 255/469
Completing update for udev - 256/469
Completing update for gcc-c++ - 257/469
Completing update for tetex-latex - 258/469
Completing update for apr - 259/469
Completing update for pam - 260/469
Completing update for cpp - 261/469
Completing update for vim-common - 262/469
Completing update for vim-minimal - 263/469
Completing update for HelixPlayer - 264/469
Completing update for emacs-leim - 265/469
Completing update for libselinux-devel - 266/469
Completing update for logwatch - 267/469
Completing update for tcpdump - 268/469
Completing update for qt - 269/469
Completing update for prelink - 270/469
Completing update for binutils - 271/469
before hanging thusly:
[root@csdpc21 ~]# strace -p 4764
Process 4764 attached - interrupt to quit
futex(0xaf7e050, FUTEX_WAIT, 1, NULL
And process 4764 is the yum update process, of course..
[root@csdpc21 ~]# ps -ef|grep 4764
root 4764 4689 9 12:35 pts/1 00:03:36 /usr/bin/python /usr/bin/yum update
root 8730 8605 0 13:13 pts/2 00:00:00 grep 4764
as root can you run
[root@csdpc21 rpm]# /sbin/fuser /var/lib/rpm/__db.00*
Looks like only the yum update process itself..
Let me know if there are any further investigations I can productively carry out
for y'all while the yum update continues to hang, otherwise I'll probably try to
start recovery on that box in another hour or so.
Could you create a backup of /var/lib/rpm before you perform any recovery.
Sure. I assume you want to look at it? It's 16 megs, I don't know whether that's
too big for bugzilla..
This bug hits me on a regular basis. If you guys provide a set of things to
do/capture before killing/cleaning I can make sure I do it next time. Otherwise
I'll just do the things mentioned so far in the above comments.
(And thanks for helping out, Jon.)
Judging from the number of locks that are reported open, this looks like yum
violating rules for using rpmdb iterators as anything else. Verification would
require mapping the locks back to open db iterators looking to see if two
iterators on the same index are open simultaneously -- that's a programming error.
Just got this on FC6 with yum 3.0.1. Strace gives:
futex(0xb677ec10, FUTEX_WAIT, 2, NULL
While yum did:
Loading "installonlyn" plugin
Loading "fastestmirror" plugin
Setting up Update Process
Setting up repositories
core 100% |=========================| 951 B 00:00
updates 100% |=========================| 1.2 kB 00:00
extras 100% |=========================| 1.1 kB 00:00
Determining fastest mirrors
Reading repository metadata in from local files
primary.xml.gz 100% |=========================| 807 kB 00:00
primary.xml.gz 100% |=========================| 363 kB 00:00
primary.xml.gz 100% |=========================| 1.6 MB 00:10
Excluding Packages in global exclude list
rpm --rebuilddb hangs too:
futex(0xb7da5c10, FUTEX_WAIT, 2, NULL
After removing /var/lib/rpm/__db* files, running rpm --rebuilddb again and then
yum clean all (to be sure, to be sure), the problem has gone away. So, this
looks like Berkeley DB stuck in one of its infamous hung states.
i confirm comment 23. but yum clean all not needed
This is a mixture of so many different things over such a long range of time and
versions of software, including FC5/6 kernel mmap bug induced things (covered in
bug 213963) it's impossible to say anything at this point.
Closing as WORKSFORME, please open new bugs if still encountered with current