Bug 180029 - deadlocks on ext2,sync mounted fs
Summary: deadlocks on ext2,sync mounted fs
Keywords:
Status: CLOSED DUPLICATE of bug 180028
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel
Version: 4.0
Hardware: x86_64
OS: Linux
medium
high
Target Milestone: ---
: ---
Assignee: Kernel Maintainer List
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2006-02-04 21:52 UTC by Jure Pečar
Modified: 2007-11-30 22:07 UTC (History)
0 users

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2006-02-07 16:59:04 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Jure Pečar 2006-02-04 21:52:22 UTC
From Bugzilla Helper:
User-Agent: Opera/8.5 (X11; Linux i686; U; en)

Description of problem:
I'm used to have my /boot sitting on md raid1, formatted as ext2 and moutned 
sync. It works well over many different servers, but I'm noticing some problems 
with latest RHEL4.

This problem hit me twice so far, both times at rpm kernel instalation and 
removal (of course, nothing else ever does io to /boot). Once while rpm was 
running the postinstall grub update script, second while it was removing initrd 
of the previous kernel. What happens is that the process doing the io to /boot 
gets stuck in D and never recovers. After hard reset the file it locked on is 
*gone*. At least it could be picked up by fsck and dropped into lost+found ...

Version-Release number of selected component (if applicable):
kernel-smp-2.6.9-11.EL

How reproducible:
Sometimes

Steps to Reproduce:
1. have /boot on md raid1, formatted as ext2 and mounted sync
2. do some rpm kernel install / remove
3. eventually it will deadlock
  

Actual Results:  I belive ps ax output shows the problem best:

3829 pts/0    S+     0:00 rpm -e kernel-smp-2.6.9-11.EL
3832 pts/0    S+     0:00 /bin/sh /var/tmp/rpm-tmp.43481 4
3907 pts/0    S+     0:00 /bin/bash /sbin/new-kernel-pkg --rminitrd --rmmoddep -
-remove 2.6.9-11.ELsmp
3931 pts/0    D+     0:00 rm -f /boot/initrd-2.6.9-11.ELsmp.img


Expected Results:  any ext2,sync mount has to work in the way it is meant to work :)
deadlocks are not wanted on any filesystem.

Additional info:

I did some digging in /proc/3931. I belive these are the relevant data:

# cat /proc/3931/maps 
00400000-00409000 r-xp 00000000 09:02 7536693                            /bin/rm
00508000-00509000 rw-p 00008000 09:02 7536693                            /bin/rm
00509000-0052a000 rwxp 00509000 00:00 0 
2a95556000-2a95557000 rw-p 2a95556000 00:00 0 
2a95563000-2a95565000 rw-p 2a95563000 00:00 0 
2a95565000-2a97b20000 r--p 00000000 09:02 3701737                        /usr/
lib/locale/locale-archive
369f700000-369f715000 r-xp 00000000 09:02 7290882                        /lib64/
ld-2.3.4.so
369f814000-369f816000 rw-p 00014000 09:02 7290882                        /lib64/
ld-2.3.4.so
369f900000-369fa2a000 r-xp 00000000 09:02 7291090                        /lib64/
tls/libc-2.3.4.so
369fa2a000-369fb29000 ---p 0012a000 09:02 7291090                        /lib64/
tls/libc-2.3.4.so
369fb29000-369fb2c000 r--p 00129000 09:02 7291090                        /lib64/
tls/libc-2.3.4.so
369fb2c000-369fb2f000 rw-p 0012c000 09:02 7291090                        /lib64/
tls/libc-2.3.4.so
369fb2f000-369fb33000 rw-p 369fb2f000 00:00 0 
7fbfffe000-7fc0000000 rw-p 7fbfffe000 00:00 0 
ffffffffff600000-ffffffffffe00000 ---p 00000000 00:00 0 

# cat /proc/3931/stat
3931 (rm) D 3907 3829 3398 34816 3829 4194560 134 0 0 0 0 0 0 0 18 0 1 0 25921 
42160128 108 18446744073709551615 4194304 4227748 548682070624 
18446744073709551615 234606008713 0 0 0 0 18446744071563606489 0 0 17 0 0 0

# cat /proc/3931/status 
Name:	rm
State:	D (disk sleep)
SleepAVG:	78%
Tgid:	3931
Pid:	3931
PPid:	3907
TracerPid:	0
Uid:	0	0	0	0
Gid:	0	0	0	0
FDSize:	256
Groups:	0 1 2 3 4 6 10 
VmSize:	   41172 kB
VmLck:	       0 kB
VmRSS:	     432 kB
VmData:	     160 kB
VmStk:	       8 kB
VmExe:	      32 kB
VmLib:	    1280 kB
StaBrk:	00509000 kB
Brk:	0052a000 kB
StaStk:	7fbffffa60 kB
Threads:	1
SigPnd:	0000000000000000
ShdPnd:	0000000000000000
SigBlk:	0000000000000000
SigIgn:	0000000000000000
SigCgt:	0000000000000000
CapInh:	0000000000000000
CapPrm:	00000000fffffeff
CapEff:	00000000fffffeff

# cat /proc/3931/wchan 
__lock_buffer


As I have to put this machine in production early next week, I'm afraid I wont 
be able to do any more tests on it. But as it's easy to recreate the situation, 
I don't belive this is much of a problem.

Btw, it's a dual opteron ... if smp has a factor here at all.

Comment 1 Jason Baron 2006-02-07 16:59:04 UTC

*** This bug has been marked as a duplicate of 180028 ***


Note You need to log in before you can comment on or make changes to this bug.