Bug 131235 - clusvcadm -d <svcname> hangs forever when lsof for NFS filesystem hangs
Summary: clusvcadm -d <svcname> hangs forever when lsof for NFS filesystem hangs
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Cluster Suite
Classification: Retired
Component: clumanager
Version: 3
Hardware: i686
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Lon Hohberger
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On:
Blocks: 131576
TreeView+ depends on / blocked
 
Reported: 2004-08-30 11:14 UTC by Chandrashekhar Marathe
Modified: 2009-04-16 20:15 UTC (History)
1 user (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2004-11-09 17:58:00 UTC
Embargoed:


Attachments (Terms of Use)
Patch to use lsof -b and _not_ use fuser if lsof exists (1.51 KB, patch)
2004-08-31 16:31 UTC, Lon Hohberger
no flags Details | Diff
Only use lsof when lsof exists (don't use fuser). Use -b + mount point instead of device (2.40 KB, patch)
2004-09-01 16:07 UTC, Lon Hohberger
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2004:491 0 high SHIPPED_LIVE Updated clumanager and redhat-config-cluster packages 2004-12-20 05:00:00 UTC

Description Chandrashekhar Marathe 2004-08-30 11:14:41 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.1)
Gecko/20020823 Netscape/7.0

Description of problem:
Configuration
1. RHAS 2.1 based system with clumanager 1.2.16
   compiled for this kernel.
2. 2 member cluster with ip based tiebreaker
3. 2 services configured -each exporting one/more
   devices with one/more ip addresses
4. Default service configuration is to run svc1
on host1 and svc2 on host2. 
5. For load balancing access to exported disks is
needed on both hosts, host1 cross-mounts partitions
exported by svc2 using NFS hard mount. host2 
crossmounts partitions exported by svc1 again using 
NFS hard mount.
6. In case of failure on any one host, both svc1 and
svc2 will run on remaining host and all partitions
will be NFS mounted from that same host.







Version-Release number of selected component (if applicable):
clumanager-1.2.16

How reproducible:
Always

Steps to Reproduce:
Steps to reproduce

1. service disable for svc2 on host2 works fine.
2. Now the ip address for svc2 is unavailable and service
disable svc1 uses lsof which hangs in
/proc/<pid_accessing_svc_partition>/cwd directory forever(
due to NFS server not responding on hard mount).
Hence service disable svc1 does not work.
3. However partitions belonging to svc1 are in 
unmountable state.
4. Even soft mounting the partitions did not help.
    

Actual Results:  Generically, for any cluster host having atleast 
one NFS mount (not necessarily related to cluster
configuration in any way) having that mount in 
"server not responding" state: service stop hangs forever.

Expected Results:  service stop should timeout or abort.


Additional info:

(1) Can lsof -b be tried to skip NFS mounted partitions?

This works when just NFS service is unavailable but
NFS ip is reachable. This does not work when nfs ip
itself is taken out.

(2) Can any kernel change be made to kill processes hanging 
on NFS partitions without using lsof/fuser?


sample fstab entry for cross-mounted partition:
svc2:/cluster/disk2 /mnt/disk2          nfs   
rw,bg,intr,timeo=1,retrans=3 0 0

Comment 1 Lon Hohberger 2004-08-30 14:59:22 UTC
Response to question (1): Yes, but you may have to add more grepping
for the cluster-managed device.

Note that the lsof code is from a time when fuser wasn't on many
distributions.  The RPM spec for clumanager apparently doesn't have an
install dependency on psmisc; it probably should (it uses fuser and
killall).

(Why are you trying to use lsof instead of fuser?)


Response to question (2): There's always a way, but I doubt there's a
clean way.  If a process is actually trying to touch an inaccessible
mount, you can't normally kill it.  It goes in to disk-waite state
while waiting for the I/O to complete, where it can't be interrupted.

You can specify the -f flag to the umount command line; this should
work for NFS volumes, but not other data sources (e.g. block devices).


Comment 2 Chandrashekhar Marathe 2004-08-31 13:37:28 UTC
lsof is used by svclib_filesystem script. It uses lsof followed by
fuser. In this case however, both hang accessing
/proc/<pid_process_waiting_on_nfs>/cwd.

So using only fuser will also not solve the problem. Also as the
process is not directly related to the service being stopped, it could
be any process on that machine. Perhaps service stop should fork out the
child for fuser/lsof and kill it if it hangs for more than certain
interval. (We tested lsof -S also for lsof to timeout but even that hangs)


Comment 3 Lon Hohberger 2004-08-31 15:35:52 UTC
You're right, it does do lsof first.

Ok, we'll do lsof -b.  It looks like fuser doesn't have any similar
method of operation.

Comment 4 Lon Hohberger 2004-08-31 16:31:32 UTC
Created attachment 103302 [details]
Patch to use lsof -b and _not_ use fuser if lsof exists

Patch is only against svclib_filesystem.

Comment 5 Chandrashekhar Marathe 2004-09-01 07:53:32 UTC
Two problems
1. lsof -b | grep $dev doesn't get the device. It works on mounted     
directory name grep.

2. Even lsof -b hangs in this case. Not sure there is another way
to timeout on lsof / forcefully umount.

Comment 6 Lon Hohberger 2004-09-01 15:08:52 UTC
If lsof -b hangs/blocks anyway in this case, then there's probably a
bug in lsof (the point of -b is to _not_ block...).


Comment 7 Lon Hohberger 2004-09-01 15:16:28 UTC
Digging deeper...

Comment 8 Lon Hohberger 2004-09-01 16:07:56 UTC
Created attachment 103342 [details]
Only use lsof when lsof exists (don't use fuser).  Use -b + mount point instead of device

This change worked for me.

Comment 9 Lon Hohberger 2004-09-01 16:13:27 UTC
How I tested:

(1) Mount NFS mount on clustered server.
(2) Do a 'find' on NFS mount point.  While 'find' is running, kill the
NFS server.
(3) Start up a shell in clustered service's mount point.
(4) Disable service.

The shell was properly killed, and nothing was hanging except the
'find' command.

Sep  1 10:47:25 magenta clusvcmgrd: [10191]: <notice> service notice:
Stopping service IP_disk_check_test ...
Sep  1 10:47:26 magenta clusvcmgrd: [10191]: <warning> service
warning: killing process 1952 (root bash /dev/sdd3)
Sep  1 10:47:31 magenta clusvcmgrd: [10191]: <notice> service notice:
Stopped service IP_disk_check_test ...

I also retried with no processes accessing the hung NFS mount point,
which also worked.

Comment 10 Satya Prakash Tripathi 2004-09-02 11:33:54 UTC
Does step 2) kill the NFS server, means all that 'clusvcadm -d <svc>'
does ie. on nfs server : nfs stop & start, ifconfig intf down, .. etc ?

The 'lsof -b' itself hangs for us in the case mentioned in the BUG. 

Is that there is a version of lsof that doesn't hang with -b option ?
We are using  lsof-4.51-2 .

Comment 11 Lon Hohberger 2004-09-02 13:08:30 UTC
Unexport/remove the interface the client was accessing.

The version that shipped with RHEL3 doesn't block; you should be able
to get the RPMs from RHN.  That's probably the problem you're seeing.


Comment 12 Lon Hohberger 2004-09-02 15:56:32 UTC
1.2.18pre1 patch (unsupported; test only, etc.)

http://people.redhat.com/lhh/clumanager-1.2.16-1.2.18pre1.patch

This includes the fix for this bug and a few others.

Comment 13 Satya Prakash Tripathi 2004-09-03 10:23:11 UTC
RHEL3 uses lsof-4.63-4. I tried this one, and this lsof -b hangs too!!

In your test plan, can you try preceding step 3) to step2):
...
(3) Start up a shell-script in clustered service's mount point.
(2) ..... kill the  NFS server.
...

This may cause lsof to hang.
When a nfs service goes down( ifconfig intf down, nfs stop, exportfs
-u etc.) while some script was already accessing the nfs mounted
partition exported by that service, then lsof ( -b, ... ) hangs for us .
We are not using RHEL3, rather we are using RHAS2.1. Could be a kernel
issue too?!?

Using the new patch anyway. It doesn't fix this bug for us.


Comment 14 Lon Hohberger 2004-09-03 13:59:01 UTC
You're correct; it actually just took me a few more tries to reproduce it.

The bug may be in lsof.  Here's what I did (outside of the cluster
software entirely):

(1) Mount NFS export from another machine (hard mount, not soft)
(2) cd /new_nfs_mount; while [ 0 ]; do find . ; done
(3) Disable NFS export+ifdown interface and or reboot NFS server (so
that client goes into retry mode)
(4) lsof -b

Step (4) hangs.

I tried the above steps with the following combinations, all hung
after a few tries; it seems newer versions of lsof took more tries:

RHEL 2.1 + lsof 4.52
RHEL 2.1 + lsof 4.63
RHEL 3 + lsof 4.63
RHEL 3 + lsof 4.72
Fedora Core 2 + lsof 4.72

They hang while doing stat64, and go into disk-wait:

read(4, "30030 (bash) S 30025 30030 30030"..., 4096) = 224
close(4)                                = 0
munmap(0xb7298000, 4096)                = 0
readlink("/proc/30030/cwd", "/mnt/tmp", 4096) = 8
stat64("/proc/30030/cwd",

I doubt fuser will do any better in this case.  

In the worst case, the patch provided here fixes the fact that it was
using _both_ instead of one or the other for killing mount processes.

Comment 15 Lon Hohberger 2004-09-03 14:15:39 UTC
Sorry, 4.51, not 4.52 in above comment.


Comment 16 Lon Hohberger 2004-09-14 14:11:02 UTC
Bug #131712 opened against lsof.


Comment 19 John Flanagan 2004-12-21 03:40:14 UTC
An advisory has been issued which should help the problem 
described in this bug report. This report is therefore being 
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files, 
please follow the link below. You may reopen this bug report 
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2004-491.html



Note You need to log in before you can comment on or make changes to this bug.