Bug 61570 - On a NFS server with ext2 partition exported, umount hangs
On a NFS server with ext2 partition exported, umount hangs
Status: CLOSED DEFERRED
Product: Red Hat Linux
Classification: Retired
Component: kernel (Show other bugs)
7.2
i386 Linux
medium Severity medium
: ---
: ---
Assigned To: Arjan van de Ven
Brian Brock
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2002-03-21 12:11 EST by abhijit karmarkar
Modified: 2007-04-18 12:41 EDT (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2002-03-22 04:29:49 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
output of ps -Aelf command. shows umount and loopopen hung. (5.70 KB, text/plain)
2002-03-21 12:15 EST, abhijit karmarkar
no flags Details
output of magic-SysRq-'T' key (captured from dmesg) (14.79 KB, text/plain)
2002-03-21 12:19 EST, abhijit karmarkar
no flags Details
output of magic-SysRq-'M' key (1.28 KB, text/plain)
2002-03-21 12:20 EST, abhijit karmarkar
no flags Details
cat /proc/scsi/scsi (1.10 KB, text/plain)
2002-03-21 12:21 EST, abhijit karmarkar
no flags Details
cat /proc/scsi/aic7xxx/1 (aic7xxx driver info) (2.95 KB, text/plain)
2002-03-21 12:21 EST, abhijit karmarkar
no flags Details
script to cycle NFS server (step 5 in description section) (782 bytes, text/plain)
2002-03-21 12:23 EST, abhijit karmarkar
no flags Details
'loopopen' code which does open()/close() in a loop, on set of device files. (668 bytes, text/plain)
2002-03-21 12:25 EST, abhijit karmarkar
no flags Details

  None (edit)
Description abhijit karmarkar 2002-03-21 12:11:10 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.2.1) Gecko/20010901

Description of problem:
On the (nfs) server, there is a script (cycle_nfs.sh) which does the following
(actually does only step 5, steps 1-4 are to be done before running the script):
----------------
 1. assign ip-address to eth0:1
 2. start nfs services (i.e. /etc/init.d/nfs start)
 3. mount /dev/sdc1..6 onto /extdisk1..6
 4. nfs export disk partitions
      exportfs -o rw, no_root_squash client-ip:/extdisk1..6

 5. for i = 1 to 50 do
    5.1  bring down eth0:1
    5.2  unexport nfs partitions (i.e. exportfs -u -a)
    5.3  umount /extdisk1..6
    5.4  sleep 5 (seconds)
    5.5  mount /extdisk1..6
    5.6  nfs export all partitions /extdisk1..6
    5.7  bringup eth0:1
    5.8  sleep 25 (seconds)
----------------

The client runs a simple program which does random r/w's on the NFS
exported partions. 

Along with this on the server (where the above script is cycling NFS server),
another program (called loopopen) opens and closes (open(), close()) all device
files (i.e. /dev/sdcX) repeateadly in a loop. (attached file: loopopen.c).

The BUG:
--------
After couple of iterations, following things happen:

1. the script hangs while in the loop, because
   1. umount program in the script hangs
   2. the loopopen program on the server hangs while doing open() 
      on a device.
   [the output of 'ps -Alef' is attached (ps.out)]

But the server is still telnet-accessible and sort of running.

Note: above script (cycle_nfs.sh) is very similar to the one in bug report
61546, although here the filesystem is EXT2.


Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1. perform steps 1-4 decribed in 'Description' section
2. have a client access the exported partitions (we wrote a simple
program to cp,rm,mv 1-MB of files to/from the server, on the client)
3. run the cycle_nfs.sh script on the server (which does step 5 mentioned in 
the 'Description' section).
4. run the loopopen program on the server
5. wait for umount and loopopen program to hang!!


	

Actual Results:  umount program hangs while umounting one of the device file.
and a concurrent open (being done by the loopopen program) hangs on the same
device. 

the script hangs, the loopopen program hangs :(

Expected Results:  Neither umount  nor the loopopen program should hang. It does
not in normal operations when the device is not being exported as NFS share.
Staring NFS and having client access it, causes this weird and crazy
behaviour.

Looks to be a [ext2 + kernel + nfsd] bug.


Additional info:

System setup details
=================================
 - RedHat 7.2 + errata
 - Kernel: 2.4.9-13 (UP machine)
 - uname -a ==>
   Linux vcslinux20 2.4.9-13 #1 Tue Oct 30 20:11:04 EST 2001 i686 unknown

Hardware setup details
=================================
 - Dell PowerEdge 1300 server and client.
 - Adapter cards with HP and Quantum SCSI disks
 - using aic7xxx scsi driver (ver 5.2.4), the default installed by RH


Additional Ouputs
=================================
following text files contain addition information, might be useful for
debugging. output was taken after the script and program (loopopen) hung.

1. ps.out       - full output of 'ps -Aelf' command.
2. sysrq-M.out  - memory stat (output of SysRq-M, taken from /var/log/message)
3. sysrq-T.out  - task list (output of SysRq-T, taken from /var/log/message)
4. scsi.out     - output of /proc/scsi/scsi
5. aic_0/1.out  - output of /proc/scsi/aic7xxx/0,1
6. cycle_nfs.sh - sample script. [change device and mount points]
7. loopopen.c   - the loopy open/close c-code [change device and mount points]
Comment 1 abhijit karmarkar 2002-03-21 12:15:16 EST
Created attachment 49444 [details]
output of ps -Aelf command. shows umount and loopopen hung.
Comment 2 abhijit karmarkar 2002-03-21 12:19:24 EST
Created attachment 49445 [details]
output of magic-SysRq-'T' key (captured from dmesg)
Comment 3 abhijit karmarkar 2002-03-21 12:20:13 EST
Created attachment 49446 [details]
output of magic-SysRq-'M' key
Comment 4 abhijit karmarkar 2002-03-21 12:21:04 EST
Created attachment 49447 [details]
cat /proc/scsi/scsi
Comment 5 abhijit karmarkar 2002-03-21 12:21:47 EST
Created attachment 49448 [details]
cat /proc/scsi/aic7xxx/1 (aic7xxx driver info)
Comment 6 abhijit karmarkar 2002-03-21 12:23:45 EST
Created attachment 49449 [details]
script to cycle NFS server (step 5 in description section)
Comment 7 abhijit karmarkar 2002-03-21 12:25:51 EST
Created attachment 49450 [details]
'loopopen' code which does open()/close() in a loop, on set of device files.
Comment 8 Arjan van de Ven 2002-03-22 07:54:29 EST
There indeed is a deadlock in the VFS layer of 2.4.9-31 it seems; but only when
opening devices nodes of a mounted fs directly. We'll consider modifying the VFS
to fix this however since it only triggers with actions that aren't done in any
normal use (direct device access while having the same device mounted in linux
gives undefined results) it might be that we won't fix this in the 2.4.9 kernel
series (it's root only so not a security issue). Later kernels (2.4.18+) have
this fixed.

Note You need to log in before you can comment on or make changes to this bug.