Bug 145021

Summary: kernel 2.6.10-1.1076_FC4smp - yum stuck on futex
Product: [Fedora] Fedora Reporter: Michal Jaegermann <michal>
Component: rpmAssignee: Dave Jones <davej>
Status: CLOSED WORKSFORME QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: rawhideCC: bojan, jonabbey, nobody+pnasrat, pfrields, trevor, wtogami
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-08-10 09:44:53 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
yum hangs yet again
none
another futex hang on yet another machine
none
rpmdb_stat -CA output
none
Output of rpmdb_stat -CA while yum update is hanging on Futex none

Description Michal Jaegermann 2005-01-13 20:06:29 UTC
Description of problem:

After an attempt of 'yum -c local.conf update', with a repository
on a local disk and a configuration file to reflect that, a yum
process got stuck.  Nothing was happening by at least ten minutes.
'ps' was showing something like that:

root      5410  2.8  9.4  55668 48792 tty1     S+   11:13   0:09
   \_ /usr/bin/python /usr/bin/yum -y -c local.conf update

A check with 'strace' got this:

# strace -p 5410
Process 5410 attached - interrupt to quit
futex(0xb61df058, FUTEX_WAIT, 2, NULL

and nothing was moving any further.  With 'sysrq-T' this showed
up:

yum           S 00001B4B  1920  5410   4978

d9c4deb4 00000086 00000000 00001b4b 00001b52 00000010 00000020 00000000
       00000000 c49abd78 c1406060 00000000 00000000 659c3d00 000f4553
c0315ba0
       df4d3560 df4d36b8 c0138483 00000000 fffffff5 7fffffff d9c4d000
b61df058
Call Trace:
 [<c0138483>] file_read_actor+0x0/0xf1
 [<c02c6562>] schedule_timeout+0x13/0xae
 [<c01485f6>] find_extend_vma+0x12/0x4f
 [<c012e28a>] add_wait_queue+0x12/0x30
 [<c012ed80>] futex_wait+0xd9/0x13a
 [<c011a2c4>] default_wake_function+0x0/0xc
 [<c011a2c4>] default_wake_function+0x0/0xc
 [<c012f02a>] do_futex+0x29/0x5a
 [<c012f118>] sys_futex+0xbd/0xcc
 [<c0152fd1>] sys_pread64+0x43/0x59
 [<c0103cdb>] syscall_call+0x7/0xb

If of any interest I have now a full output from 'sysrq-T' in logs.

Version-Release number of selected component (if applicable):
kernel 2.6.10-1.1076_FC4smp 

How reproducible:
No idea.  I rebooted with another kernel and update went through.

Comment 1 Sammy 2005-01-13 21:00:25 UTC
I can confirm this too. 

Comment 2 Michal Jaegermann 2005-02-09 21:51:29 UTC
I got yum stuck on futex this time in FC3.  On a screen

.....
Transaction Test Succeeded
Running Transaction
Updating: xpdf 100 % done 1/2 

and nothing moves from here anywhere.  Attached strace shows

Process 25232 attached - interrupt to quit
futex(0x2a9bf1b0dc, FUTEX_WAIT, 1, NULL

without moving further. 25232 is a process with a command line

  /usr/bin/python /usr/bin/yum update

This also makes impossible 'rpm -q ....' for any package and probably
other rpm operations as well. kernel-2.6.10-1.760_FC3 and
yum-2.1.13-0.fc3.

Comment 3 Michal Jaegermann 2005-02-09 22:08:44 UTC
After I killed (-9, or otherwise no reaction) stuck process
and removed stale /var/lib/rpm/__db.00* files, which were around
a week old and a number of updates happened in the meantime,
the next yum run succeeded.  No good explanation.

I tried various 'rpm' operations (query, test erase) before dumping
__db.00* and no dice.  If not that detail that they were dated
February 2nd I would think that they were related to the problem.

Comment 4 Trevor Cordes 2005-05-12 10:32:52 UTC
This is the same bug as I reported in bug 144589.  I just hit this bug doing a
normal (non OS level) update of about 50 packages (hadn't updated in a while). 
It hung on updating perl-suidperl, but I'm pretty sure it really doesn't matter
what package it is updating when it hangs as I've seen it hang on all sorts or
packages that appear completely random.

On this latest hang:
#/tmp/strace -p 7685
Process 7685 attached - interrupt to quit
futex(0xa2e3e28, FUTEX_WAIT, 1, NULL

The box is FC3 2.6.10-1.766_FC3 and if the updates had succeeded it would have
been fully up to date.

Cleaning up after a mess like this is very difficult if you want to be careful
not to screw up your rpmdb and require a reinstall.

PS: it (almost?) always seems to freeze on the "completing updates" phase and
never on the installing updates phase -- can anyone else confirm?


Comment 5 Trevor Cordes 2005-05-12 10:41:12 UTC
Oh, and unlike bug 144589 comments 14 and 15, there were no strange /tmp mounts
at all when it hung this time.  I really don't think the hang is package-dependent.


Comment 6 Seth Vidal 2005-05-12 12:29:32 UTC
Okay, but I really don't think this hang is yum related either. It looks like
rpm is wedged.

Comment 7 Matthew Miller 2005-05-18 15:08:11 UTC
*** Bug 144589 has been marked as a duplicate of this bug. ***

Comment 8 Trevor Cordes 2005-05-31 03:33:40 UTC
Created attachment 114981 [details]
yum hangs yet again

The bug just hit yet again, this time on a relatively simple (small) yum update
set.  I checked /v/l/messages and there was nothing strange going on at the
time.

I went into /var/cache/yum and manually rpm -U --force the packages yum was
supposed to install and it all went in perfectly.  If the problem was an rpm
problem, how come rpm -U works where yum update fails, on the exact same file
set?

Comment 9 Trevor Cordes 2005-06-05 17:06:37 UTC
Created attachment 115154 [details]
another futex hang on yet another machine

Another hang, on a different machine.  This is becoming *really* common now; so
common that it is starting to hang more than it succeeds and I have to spend an
hour cleaning up the mess it leaves.  Again, with this hang there were no weird
/tmp mounted fs's.

Comment 10 Seth Vidal 2005-06-05 17:10:35 UTC
again, this is definitely not a yum bug. You're looking at rpm. i'm going to
change the component to rpm.

Comment 11 Paul Nasrat 2005-06-07 11:19:20 UTC
Please attach the output of:

cd /var/lib/rpm
/usr/lib/rpm/rpmdb_stat -CA

Comment 12 Trevor Cordes 2005-06-07 12:09:37 UTC
Created attachment 115182 [details]
rpmdb_stat -CA output

Since I have since kill -9'd yum and rpm -U --force'd all the packages in, I'm
not sure how much use this is to you.  This output is from the most recent hung
machine (attachment 115154 [details]) I think (or at worst the 114981 one).  If you need
this run while a yum is stuck on futex, I'll need time to wait till when the
hang happens again.  With the rate I'm seeing them lately, that shouldn't be
too long.

Comment 13 Jonathan Abbey 2005-08-04 18:09:43 UTC
Created attachment 117470 [details]
Output of rpmdb_stat -CA while yum update is hanging on Futex

I'm seeing this hang as well.  I'm attaching the output of rpmdb_stat -CA while
my yum update is hanging on a Futex.

Comment 14 Jonathan Abbey 2005-08-04 18:13:22 UTC
Ah, sorry.  Further information on the hang.  I had installed Fedora Core 3 base
onto a new box.  I went to do a yum update to bring everything up to date about
half an hour ago.  It started working on 469 update steps, and finished doing
the first phase of all of the updates/installs.  It started doing the
'Completing update' phase, and got this far:

Completing update for mgetty  - 237/469
Completing update for fonts-xorg-75dpi  - 238/469
Completing update for fonts-xorg-100dpi  - 239/469
Completing update for gstreamer  - 240/469
Completing update for python  - 241/469
Completing update for emacs  - 242/469
Completing update for libstdc++-devel  - 243/469
Completing update for libf2c  - 244/469
Completing update for shadow-utils  - 245/469
Completing update for gtk2-devel  - 246/469
Completing update for libgal2  - 247/469
Completing update for dbus-x11  - 248/469
Completing update for libtiff  - 249/469
Completing update for krb5-libs  - 250/469
Completing update for libtool-libs  - 251/469
Completing update for at  - 252/469
Completing update for libgcc  - 253/469
Completing update for texinfo  - 254/469
Completing update for glibc  - 255/469
Completing update for udev  - 256/469
Completing update for gcc-c++  - 257/469
Completing update for tetex-latex  - 258/469
Completing update for apr  - 259/469
Completing update for pam  - 260/469
Completing update for cpp  - 261/469
Completing update for vim-common  - 262/469
Completing update for vim-minimal  - 263/469
Completing update for HelixPlayer  - 264/469
Completing update for emacs-leim  - 265/469
Completing update for libselinux-devel  - 266/469
Completing update for logwatch  - 267/469
Completing update for tcpdump  - 268/469
Completing update for qt  - 269/469
Completing update for prelink  - 270/469
Completing update for binutils  - 271/469

before hanging thusly:

[root@csdpc21 ~]# strace -p 4764
Process 4764 attached - interrupt to quit
futex(0xaf7e050, FUTEX_WAIT, 1, NULL


Comment 15 Jonathan Abbey 2005-08-04 18:14:10 UTC
And process 4764 is the yum update process, of course..

[root@csdpc21 ~]# ps -ef|grep 4764
root      4764  4689  9 12:35 pts/1    00:03:36 /usr/bin/python /usr/bin/yum update
root      8730  8605  0 13:13 pts/2    00:00:00 grep 4764
[root@csdpc21 ~]#


Comment 16 Paul Nasrat 2005-08-04 18:20:34 UTC
as root can you run

/sbin/fuser /var/lib/rpm/__db.00*


Comment 17 Jonathan Abbey 2005-08-04 18:30:26 UTC
[root@csdpc21 rpm]# /sbin/fuser /var/lib/rpm/__db.00*
/var/lib/rpm/__db.001:  4764m
/var/lib/rpm/__db.002:  4764m
/var/lib/rpm/__db.003:  4764m

Looks like only the yum update process itself..

Comment 18 Jonathan Abbey 2005-08-04 19:03:23 UTC
Let me know if there are any further investigations I can productively carry out
for y'all while the yum update continues to hang, otherwise I'll probably try to
start recovery on that box in another hour or so.

Comment 19 Paul Nasrat 2005-08-04 19:27:47 UTC
Could you create a backup of /var/lib/rpm before you perform any recovery.

Comment 20 Jonathan Abbey 2005-08-04 19:39:56 UTC
Sure.  I assume you want to look at it? It's 16 megs, I don't know whether that's
too big for bugzilla..

ftp://ftp.arlut.utexas.edu/pub/redhat_debug/varlibrpm.tar.bz2



Comment 21 Trevor Cordes 2005-08-07 03:45:51 UTC
This bug hits me on a regular basis.  If you guys provide a set of things to
do/capture before killing/cleaning I can make sure I do it next time.  Otherwise
I'll just do the things mentioned so far in the above comments.

(And thanks for helping out, Jon.)

Comment 22 Jeff Johnson 2005-08-24 18:33:57 UTC
Judging from the number of locks that are reported open, this looks like yum
violating rules for using rpmdb iterators as anything else. Verification would
require mapping the locks back to open db iterators looking to see if two
iterators on the same index are open simultaneously -- that's a programming error.

Comment 23 Bojan Smojver 2006-12-22 21:47:29 UTC
Just got this on FC6 with yum 3.0.1. Strace gives:

---------------------------------
futex(0xb677ec10, FUTEX_WAIT, 2, NULL
---------------------------------

While yum did:

---------------------------------
Loading "installonlyn" plugin
Loading "fastestmirror" plugin
Setting up Update Process
Setting up repositories
core                      100% |=========================|  951 B    00:00     
updates                   100% |=========================| 1.2 kB    00:00     
extras                    100% |=========================| 1.1 kB    00:00     
Determining fastest mirrors
Reading repository metadata in from local files
primary.xml.gz            100% |=========================| 807 kB    00:00     
################################################## 2242/2242
primary.xml.gz            100% |=========================| 363 kB    00:00     
################################################## 1126/1126
primary.xml.gz            100% |=========================| 1.6 MB    00:10     
################################################## 5133/5133
Excluding Packages in global exclude list
Finished
---------------------------------

rpm --rebuilddb hangs too:

---------------------------------
futex(0xb7da5c10, FUTEX_WAIT, 2, NULL
---------------------------------

After removing /var/lib/rpm/__db* files, running rpm --rebuilddb again and then
yum clean all (to be sure, to be sure), the problem has gone away. So, this
looks like Berkeley DB stuck in one of its infamous hung states.

Comment 24 Timon 2007-04-03 12:45:57 UTC
i confirm comment 23. but yum clean all not needed
kernel 2.6.18-1.2257.fc5smp


Comment 25 Panu Matilainen 2007-08-10 09:44:53 UTC
This is a mixture of so many different things over such a long range of time and
versions of software, including FC5/6 kernel mmap bug induced things (covered in
bug 213963) it's impossible to say anything at this point.

Closing as WORKSFORME, please open new bugs if still encountered with current
versions.