Bug 825945

Summary: rm -rf /some/directory hangs
Product: Red Hat Enterprise Linux 6 Reporter: miroslav.kubiczek
Component: sudoAssignee: Daniel Kopeček <dkopecek>
Status: CLOSED DUPLICATE QA Contact: BaseOS QE Security Team <qe-baseos-security>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 6.2CC: dkopecek, meyering, prc
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-05-30 15:22:30 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description miroslav.kubiczek 2012-05-29 07:32:29 UTC
Description of problem:

While running simultaneously many tests which among other things do rm -rf /some/directory and some of the rm system calls hang forever for some reason:

root     25466 31620  0 08:13 ?        00:00:00 sudo rm -fr /data/cecil/RS-5.0.0.2N/archiveRoot/data/retention_540
root     25467 18990  0 08:13 ?        00:00:00 sudo rm -fr /data/rainstor/archiveRoot/data/retention_540
root     25468 25178  0 08:13 ?        00:00:00 sudo rm -fr /data/rainstor/archiveRoot/data/retention_540
root     25469 19833  0 08:13 ?        00:00:00 sudo rm -fr /data/rainstor/archiveRoot/data/retention_540
root     25470 28281  0 08:13 ?        00:00:00 sudo rm -fr /data/rainstor/archiveRoot/data/retention_540
root     25471 13059  0 08:13 ?        00:00:00 sudo rm -fr /data/cecil/RS-5.0.0.2N/archiveRoot/data/retention_540
root     25478 27652  0 08:13 ?        00:00:00 sudo rm -fr /data/cecil/RS-5.0.0.2N/archiveRoot/data/retention_540
root     25479 27209  0 08:13 ?        00:00:00 sudo rm -fr /data/cecil/RS-5.0.0.2N/archiveRoot/data/retention_540

Some of the dirs even don't exists: ls /data/cecil
ls: cannot access /data/cecil: No such file or directory


sudo pstack 25479
#0  0x00007fdfacd992d3 in __select_nocancel () from /lib64/libc.so.6
#1  0x00007fdfadf25e83 in ?? ()
#2  0x00007fdfadf2ae46 in ?? ()
#3  0x00007fdfadf2c7f6 in main ()


sudo /usr/bin/strace -p 25479
Process 25479 attached - interrupt to quit
select(9, [8], [], NULL, NULL


Version-Release number of selected component (if applicable):
coreutils-8.4-16.el6.x86_64

How reproducible:
Don't know, it appears only in our intensive/complex automated testing.


Steps to Reproduce:
1.
2.
3.
  
Actual results:
system call to rm hangs.


Expected results:
Calling rm should finish as soon as the dir is removed.

Additional info:

Comment 2 Ondrej Vasik 2012-05-29 07:59:51 UTC
Is there some special filesystem/storage configuration? How often does the rm hang? Always or just once? How many simultaneous tests were running on the machine?

Comment 3 Jim Meyering 2012-05-29 08:03:26 UTC
Thanks for the report.
Can you tell us more about the directories you are removing?
Are they on two different NFS-mounted partitions?
cecil and rainstor?

If NFS, what are the servers running?
Can you reproduce it without using NFS?

What do the directory structures look like?  E.g.,
how many files/directories were there before the rm commands?
Do any files or directories remain when rm has hung?
How many rm processes are hung?  All 8 listed above?

If you still have access, please attach with gdb --pid=PID to determine
which part of the sources is resolving to that use of select?  Neither
rm.c nor any code it uses (mostly gnulib) calls select directly.

Comment 4 miroslav.kubiczek 2012-05-29 08:12:30 UTC
/data is ext4:

mount
/dev/mapper/VolGroup-lv_root on / type ext4 (rw)
...

Tests were run from 6 (at most) parallel sessions.

"How often does the rm hang? Always or just once?"
I'll wait until tomorrow morning and report status (number of rm's). Now all were killed with -9 signal.

Comment 5 miroslav.kubiczek 2012-05-29 09:34:46 UTC
Just after my last report. New status:


root     25466 31620  0 08:13 ?        00:00:00 sudo rm -fr /data/cecil/RainStor-5.0.0.2N/archiveRoot/data/retention_540
root     25467 18990  0 08:13 ?        00:00:00 sudo rm -fr /data/rainstor/archiveRoot/data/retention_540
root     25468 25178  0 08:13 ?        00:00:00 sudo rm -fr /data/rainstor/archiveRoot/data/retention_540
root     25469 19833  0 08:13 ?        00:00:00 sudo rm -fr /data/rainstor/archiveRoot/data/retention_540
root     25470 28281  0 08:13 ?        00:00:00 sudo rm -fr /data/rainstor/archiveRoot/data/retention_540
root     25471 13059  0 08:13 ?        00:00:00 sudo rm -fr /data/cecil/RainStor-5.0.0.2N/archiveRoot/data/retention_540
root     25472 25466  0 08:13 ?        00:00:00 [rm] <defunct>
root     25473 25467  0 08:13 ?        00:00:00 [rm] <defunct>
root     25474 25469  0 08:13 ?        00:00:00 [rm] <defunct>
root     25475 25468  0 08:13 ?        00:00:00 [rm] <defunct>
root     25476 25471  0 08:13 ?        00:00:00 [rm] <defunct>
root     25477 25470  0 08:13 ?        00:00:00 [rm] <defunct>
root     25478 27652  0 08:13 ?        00:00:00 sudo rm -fr /data/cecil/RainStor-5.0.0.2N/archiveRoot/data/retention_540
root     25479 27209  0 08:13 ?        00:00:00 sudo rm -fr /data/cecil/RainStor-5.0.0.2N/archiveRoot/data/retention_540
root     25480 25479  0 08:13 ?        00:00:00 [rm] <defunct>
root     25481 25478  0 08:13 ?        00:00:00 [rm] <defunct>






$ sudo gdb --pid=25478
GNU gdb (GDB) Red Hat Enterprise Linux (7.2-50.el6)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Attaching to process 25478
Reading symbols from /usr/bin/sudo...(no debugging symbols found)...done.
Reading symbols from /lib64/libaudit.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/libaudit.so.1
Reading symbols from /lib64/libselinux.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/libselinux.so.1
Reading symbols from /lib64/libutil.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/libutil.so.1
Reading symbols from /lib64/libpam.so.0...(no debugging symbols found)...done.
Loaded symbols for /lib64/libpam.so.0
Reading symbols from /lib64/libdl.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/libdl.so.2
Reading symbols from /usr/lib64/libldap-2.4.so.2...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/libldap-2.4.so.2
Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /usr/lib64/liblber-2.4.so.2...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/liblber-2.4.so.2
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /lib64/libcrypt.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/libcrypt.so.1
Reading symbols from /lib64/libresolv.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/libresolv.so.2
Reading symbols from /usr/lib64/libssl3.so...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/libssl3.so
Reading symbols from /usr/lib64/libsmime3.so...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/libsmime3.so
Reading symbols from /usr/lib64/libnss3.so...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/libnss3.so
Reading symbols from /usr/lib64/libnssutil3.so...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/libnssutil3.so
Reading symbols from /usr/lib64/libplds4.so...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/libplds4.so
Reading symbols from /usr/lib64/libplc4.so...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/libplc4.so
Reading symbols from /usr/lib64/libnspr4.so...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/libnspr4.so
Reading symbols from /usr/lib64/libsasl2.so.2...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/libsasl2.so.2
Reading symbols from /usr/lib64/libfreebl3.so...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/libfreebl3.so
Reading symbols from /lib64/libpthread.so.0...(no debugging symbols found)...done.
[Thread debugging using libthread_db enabled]
Loaded symbols for /lib64/libpthread.so.0
Reading symbols from /lib64/libz.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/libz.so.1
Reading symbols from /lib64/libnss_files.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/libnss_files.so.2
0x00007f7d5b9b72d3 in __select_nocancel () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install sudo-1.7.4p5-7.el6.x86_64
(gdb) where
#0  0x00007f7d5b9b72d3 in __select_nocancel () from /lib64/libc.so.6
#1  0x00007f7d5cb43e83 in ?? ()
#2  0x00007f7d5cb48e46 in ?? ()
#3  0x00007f7d5cb4a7f6 in main ()

Comment 6 Jim Meyering 2012-05-29 09:40:10 UTC
Thanks.  That shows that all rm processes have terminated.
It's the sudo ones that are hung.

Comment 7 Ondrej Vasik 2012-05-29 09:58:00 UTC
Adding sudo maintainer to cc, as it looks like it might be something wrong in sudo command.

Comment 8 Daniel Kopeček 2012-05-29 10:02:01 UTC
Looks like rhbz#769701

Comment 9 Ondrej Vasik 2012-05-29 10:27:02 UTC
It looks like ... reporter, could you please check version of sudo? If it is not sudo-1.7.4p5-6.el6_1 from http://rhn.redhat.com/errata/RHBA-2012-0513.html then update to it and try to reproduce again. It really looks like duplicate of bugzilla mentioned in comment #8.

Comment 10 miroslav.kubiczek 2012-05-29 10:29:29 UTC
I have this version:
sudo-1.7.4p5-7.el6.x86_64

will try update, thanks...

Comment 11 Ondrej Vasik 2012-05-29 10:53:19 UTC
It will complain about updating to older version, if you want to avoid it, the issue with SIG_CHLD signal is reported to be fixed in 1.7.4p5-8.el6 and newer as well. 
So you could either use package from the z-stream and downgrade to it - you will lose fix for rhbz#709235 - or you can use newer version from y-stream - which is still from beta branches until 6.3 GA.

Comment 12 Ondrej Vasik 2012-05-30 15:22:30 UTC
Marking duplicate, feel free to reopen if you will experience the issue with one of the versions of sudo with fix for #769701 and/or without the sudo (just simple rm command).

*** This bug has been marked as a duplicate of bug 769701 ***