Bug 61546 - Oops: NFS access of ext3 partition generates Oops while doing umount and hangs the subsequent mount command.
Oops: NFS access of ext3 partition generates Oops while doing umount and han...
Status: CLOSED CURRENTRELEASE
Product: Red Hat Linux
Classification: Retired
Component: kernel (Show other bugs)
7.2
i386 Linux
medium Severity high
: ---
: ---
Assigned To: Stephen Tweedie
Brian Brock
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2002-03-21 06:13 EST by abhijit karmarkar
Modified: 2007-04-18 12:41 EDT (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2003-12-16 20:12:24 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
oops generated by umount (4.94 KB, text/plain)
2002-03-21 06:18 EST, abhijit karmarkar
no flags Details
output of ps -Aelf showing subsequent mount as hung (5.26 KB, text/plain)
2002-03-21 06:23 EST, abhijit karmarkar
no flags Details
ouput of magic-SysRq-'M' (memory info) (1.27 KB, text/plain)
2002-03-21 06:24 EST, abhijit karmarkar
no flags Details
output of magic-SysRq-'T' (task list), after hang (112.01 KB, text/plain)
2002-03-21 06:25 EST, abhijit karmarkar
no flags Details
output of /proc/scsi/scsi (scsi devices connected) (1.10 KB, text/plain)
2002-03-21 06:26 EST, abhijit karmarkar
no flags Details
output of /proc/scsi/aic7xxx/1 (scsi driver details) (2.95 KB, text/plain)
2002-03-21 06:28 EST, abhijit karmarkar
no flags Details
output of /proc/meminfo (592 bytes, text/plain)
2002-03-21 06:28 EST, abhijit karmarkar
no flags Details
loopy script to cycle nfs which generates Oops (782 bytes, text/plain)
2002-03-21 06:29 EST, abhijit karmarkar
no flags Details
oops with latest errata kernel (2.4.9-31) (5.07 KB, text/plain)
2002-03-21 07:55 EST, abhijit karmarkar
no flags Details

  None (edit)
Description abhijit karmarkar 2002-03-21 06:13:28 EST
From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0)

Description of problem:
On the (nfs) server, there is a script (attachment: cycle_nfs.sh) which does 
the following (actually does only step 5. step 1-4 are
to be done prior to running cycle_nfs.sh):
----------------
 1. assign <ip-address> to eth0:1
 2. start nfs services (i.e. /etc/init.d/nfs start)
 3. mount /dev/sdc1..6 onto /extdisk1..6
 4. nfs export disk partitions
      exportfs -o rw, no_root_squash <client-ip>:/extdisk1..6

 5. for i = 1 to 50 do
    5.1  bring down eth0:1
    5.2  unexport nfs partitions (i.e. exportfs -u -a)
    5.3  umount /extdisk1..6
    5.4  sleep 5 (seconds)
    5.5  mount /extdisk1..6
    5.6  nfs export all partitions <client-ip>:/extdisk1..6
    5.7  bringup eth0:1 with <ip-address>
    5.8  sleep 25 (seconds)
----------------

The client runs a simple program which does random r/w's on the NFS
exported partions (cp/mv/ls/rm of 1-MB size files).

The BUG:
--------
After couple of iterations (usually 1-2), following things happen:

The script hangs in the loop because:
   1. Oops message generated (attachment: oops.txt) due to umount 
      of one of the device
   2. mount cmd of this device hangs (attachment: ps.out of 'ps -Alf')


Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1. perform steps 1-4 decribed in 'Description' section
2. have a client access the exported partitions (we wrote a simple
program to cp,rm,mv 1-MB of files to/from the server, on the client)
3. run the cycle_nfs.sh script on the server (which does step 5 mentioned in 
the 'Description' section).
4. wait for the oops !!

Note: you need to change the device names and mount points specific 
the test m/c. At our end we had 1 external scsi disk having 5 partitions 
(/dev/sdc1,2,3,5,6). Each partition having EXT3 filesystem
and mounted at root dirs: /extdisk1 .. /extdisk6	

Actual Results:  Oops generated by umount command
subsequent mount of the same device (for which umount oopsed) hangs.

Expected Results:  The same thing does NOT happen if the filesystem is EXT2. 
and should also not happen on EXT3, but that's not the case.

Looks to be a bug in [ext3 + kernel].

Additional info:

System setup details
===============================
 - RedHat 7.2 + errata
 - Kernel: 2.4.9-13 (UP machine)
 - uname -a ==>
   Linux vcslinux20 2.4.9-13 #1 Tue Oct 30 20:11:04 EST 2001 i686 unknown

Hardware setup details
=================================
 - Dell PowerEdge 1300 server and client.
 - Adapter cards with HP and Quantum SCSI disks
 - using aic7xxx scsi driver (ver 5.2.4), the default installed by RH

Additional Ouputs
=================================
following text files (present as attachment) contain addition information, 
which might be useful for debugging. output was taken after the oops was 
generated and script hung.

1. oops.txt     - the above oops message
2. ps.out       - full output of 'ps -Aelf' command.
3. sysrq-M.out  - memory stat (output of SysRq-M, taken from /var/log/message)
4. sysrq-T.out  - task list (output of SysRq-T, taken from /var/log/message)
5. scsi.out     - output of /proc/scsi/scsi
6. aic_0/1.out  - output of /proc/scsi/aic7xxx/0,1
7. meminfo.out  - output of /proc/meminfo
8. cycle_nfs.sh - sample script. [change device and mount points]
Comment 1 abhijit karmarkar 2002-03-21 06:18:12 EST
Created attachment 49344 [details]
oops generated by umount
Comment 2 abhijit karmarkar 2002-03-21 06:23:13 EST
Created attachment 49345 [details]
output of ps -Aelf showing subsequent mount as hung
Comment 3 abhijit karmarkar 2002-03-21 06:24:19 EST
Created attachment 49346 [details]
ouput of magic-SysRq-'M' (memory info)
Comment 4 abhijit karmarkar 2002-03-21 06:25:14 EST
Created attachment 49347 [details]
output of magic-SysRq-'T' (task list), after hang
Comment 5 abhijit karmarkar 2002-03-21 06:26:06 EST
Created attachment 49348 [details]
output of /proc/scsi/scsi (scsi devices connected)
Comment 6 abhijit karmarkar 2002-03-21 06:28:06 EST
Created attachment 49349 [details]
output of /proc/scsi/aic7xxx/1 (scsi driver details)
Comment 7 abhijit karmarkar 2002-03-21 06:28:51 EST
Created attachment 49350 [details]
output of /proc/meminfo
Comment 8 abhijit karmarkar 2002-03-21 06:29:43 EST
Created attachment 49351 [details]
loopy script to cycle nfs which generates Oops
Comment 9 Arjan van de Ven 2002-03-21 06:34:36 EST
Can you try the errata kernel (2.4.9-31)? At least one NFS<->EXT3 interaction is
fixed. Also since this bug looks like memory corruption, can you give the output
of lsmod ?
Comment 10 abhijit karmarkar 2002-03-21 07:18:12 EST
output of lsmod (taken after a power-cycle of the server, before starting):
----------------
lsmod
Module                  Size  Used by
binfmt_misc             6448   1 
iscsi                  33200   0  (unused)
autofs                 11584   0  (autoclean) (unused)
eepro100               17600   4 
appletalk              20912   0  (autoclean)
ipx                    16416   0  (autoclean)
usb-uhci               21696   0  (unused)
usbcore                51808   1  [usb-uhci]
ext3                   62480   3 
jbd                    41056   3  [ext3]
aic7xxx               114672   4 
sd_mod                 11680   4 
scsi_mod               98432   3  [iscsi aic7xxx sd_mod]

-------------


and lsmod output after things hung and oopsed:
-------------
Module                  Size  Used by
nfsd                   71232   8  (autoclean)
lockd                  53168   1  (autoclean) [nfsd]
sunrpc                 64816   1  (autoclean) [nfsd lockd]
binfmt_misc             6448   1 
iscsi                  33200   0  (unused)
autofs                 11584   0  (autoclean) (unused)
eepro100               17600   4 
appletalk              20912   0  (autoclean)
ipx                    16416   0  (autoclean)
usb-uhci               21696   0  (unused)
usbcore                51808   1  [usb-uhci]
ext3                   62480   4 
jbd                    41056   3  [ext3]
aic7xxx               114672   6 
sd_mod                 11680   6 
scsi_mod               98432   3  [iscsi aic7xxx sd_mod]
-------------
Comment 11 abhijit karmarkar 2002-03-21 07:20:07 EST
just to correct what I said in the BUG section (at the top), the hang and Oops
happens after 11-12 cycles (and not 1-2 as said earlier above).
Comment 12 abhijit karmarkar 2002-03-21 07:54:16 EST
tried with new 2.4.9-31 (the latest RH-errata kernel). 
same result: umount oopsed and mount hung.

attached the oops (oops-2.4.9-31.txt)
Comment 13 abhijit karmarkar 2002-03-21 07:55:44 EST
Created attachment 49366 [details]
oops with latest errata kernel (2.4.9-31)
Comment 14 Stephen Tweedie 2002-03-22 07:09:04 EST
It's a core VFS problem in unmount: it ends up calling a per-inode method after
telling the filesystem to drop the superblock itself.  Specifically, kill_super
calls invalidate_inodes after calling the fs's put_super() method, but
invalidate_inodes can end up calling the fs's flushpage() method.

This problem is already fixed in 2.4.17 and later kernels.  You could try
reproducing with such a kernel (for example, our rawhide kernel) to see if that
cures the problem, and I've started a test build here with a back-port of the
fix to 2.4.9.
Comment 15 Stephen Tweedie 2002-03-22 11:28:34 EST
Should be fixed in the final release, but I haven't been able to reproduce the
original problem here.

Note You need to log in before you can comment on or make changes to this bug.