From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.1) Gecko/20020823 Netscape/7.0 Description of problem: Configuration 1. RHAS 2.1 based system with clumanager 1.2.16 compiled for this kernel. 2. 2 member cluster with ip based tiebreaker 3. 2 services configured -each exporting one/more devices with one/more ip addresses 4. Default service configuration is to run svc1 on host1 and svc2 on host2. 5. For load balancing access to exported disks is needed on both hosts, host1 cross-mounts partitions exported by svc2 using NFS hard mount. host2 crossmounts partitions exported by svc1 again using NFS hard mount. 6. In case of failure on any one host, both svc1 and svc2 will run on remaining host and all partitions will be NFS mounted from that same host. Version-Release number of selected component (if applicable): clumanager-1.2.16 How reproducible: Always Steps to Reproduce: Steps to reproduce 1. service disable for svc2 on host2 works fine. 2. Now the ip address for svc2 is unavailable and service disable svc1 uses lsof which hangs in /proc/<pid_accessing_svc_partition>/cwd directory forever( due to NFS server not responding on hard mount). Hence service disable svc1 does not work. 3. However partitions belonging to svc1 are in unmountable state. 4. Even soft mounting the partitions did not help. Actual Results: Generically, for any cluster host having atleast one NFS mount (not necessarily related to cluster configuration in any way) having that mount in "server not responding" state: service stop hangs forever. Expected Results: service stop should timeout or abort. Additional info: (1) Can lsof -b be tried to skip NFS mounted partitions? This works when just NFS service is unavailable but NFS ip is reachable. This does not work when nfs ip itself is taken out. (2) Can any kernel change be made to kill processes hanging on NFS partitions without using lsof/fuser? sample fstab entry for cross-mounted partition: svc2:/cluster/disk2 /mnt/disk2 nfs rw,bg,intr,timeo=1,retrans=3 0 0
Response to question (1): Yes, but you may have to add more grepping for the cluster-managed device. Note that the lsof code is from a time when fuser wasn't on many distributions. The RPM spec for clumanager apparently doesn't have an install dependency on psmisc; it probably should (it uses fuser and killall). (Why are you trying to use lsof instead of fuser?) Response to question (2): There's always a way, but I doubt there's a clean way. If a process is actually trying to touch an inaccessible mount, you can't normally kill it. It goes in to disk-waite state while waiting for the I/O to complete, where it can't be interrupted. You can specify the -f flag to the umount command line; this should work for NFS volumes, but not other data sources (e.g. block devices).
lsof is used by svclib_filesystem script. It uses lsof followed by fuser. In this case however, both hang accessing /proc/<pid_process_waiting_on_nfs>/cwd. So using only fuser will also not solve the problem. Also as the process is not directly related to the service being stopped, it could be any process on that machine. Perhaps service stop should fork out the child for fuser/lsof and kill it if it hangs for more than certain interval. (We tested lsof -S also for lsof to timeout but even that hangs)
You're right, it does do lsof first. Ok, we'll do lsof -b. It looks like fuser doesn't have any similar method of operation.
Created attachment 103302 [details] Patch to use lsof -b and _not_ use fuser if lsof exists Patch is only against svclib_filesystem.
Two problems 1. lsof -b | grep $dev doesn't get the device. It works on mounted directory name grep. 2. Even lsof -b hangs in this case. Not sure there is another way to timeout on lsof / forcefully umount.
If lsof -b hangs/blocks anyway in this case, then there's probably a bug in lsof (the point of -b is to _not_ block...).
Digging deeper...
Created attachment 103342 [details] Only use lsof when lsof exists (don't use fuser). Use -b + mount point instead of device This change worked for me.
How I tested: (1) Mount NFS mount on clustered server. (2) Do a 'find' on NFS mount point. While 'find' is running, kill the NFS server. (3) Start up a shell in clustered service's mount point. (4) Disable service. The shell was properly killed, and nothing was hanging except the 'find' command. Sep 1 10:47:25 magenta clusvcmgrd: [10191]: <notice> service notice: Stopping service IP_disk_check_test ... Sep 1 10:47:26 magenta clusvcmgrd: [10191]: <warning> service warning: killing process 1952 (root bash /dev/sdd3) Sep 1 10:47:31 magenta clusvcmgrd: [10191]: <notice> service notice: Stopped service IP_disk_check_test ... I also retried with no processes accessing the hung NFS mount point, which also worked.
Does step 2) kill the NFS server, means all that 'clusvcadm -d <svc>' does ie. on nfs server : nfs stop & start, ifconfig intf down, .. etc ? The 'lsof -b' itself hangs for us in the case mentioned in the BUG. Is that there is a version of lsof that doesn't hang with -b option ? We are using lsof-4.51-2 .
Unexport/remove the interface the client was accessing. The version that shipped with RHEL3 doesn't block; you should be able to get the RPMs from RHN. That's probably the problem you're seeing.
1.2.18pre1 patch (unsupported; test only, etc.) http://people.redhat.com/lhh/clumanager-1.2.16-1.2.18pre1.patch This includes the fix for this bug and a few others.
RHEL3 uses lsof-4.63-4. I tried this one, and this lsof -b hangs too!! In your test plan, can you try preceding step 3) to step2): ... (3) Start up a shell-script in clustered service's mount point. (2) ..... kill the NFS server. ... This may cause lsof to hang. When a nfs service goes down( ifconfig intf down, nfs stop, exportfs -u etc.) while some script was already accessing the nfs mounted partition exported by that service, then lsof ( -b, ... ) hangs for us . We are not using RHEL3, rather we are using RHAS2.1. Could be a kernel issue too?!? Using the new patch anyway. It doesn't fix this bug for us.
You're correct; it actually just took me a few more tries to reproduce it. The bug may be in lsof. Here's what I did (outside of the cluster software entirely): (1) Mount NFS export from another machine (hard mount, not soft) (2) cd /new_nfs_mount; while [ 0 ]; do find . ; done (3) Disable NFS export+ifdown interface and or reboot NFS server (so that client goes into retry mode) (4) lsof -b Step (4) hangs. I tried the above steps with the following combinations, all hung after a few tries; it seems newer versions of lsof took more tries: RHEL 2.1 + lsof 4.52 RHEL 2.1 + lsof 4.63 RHEL 3 + lsof 4.63 RHEL 3 + lsof 4.72 Fedora Core 2 + lsof 4.72 They hang while doing stat64, and go into disk-wait: read(4, "30030 (bash) S 30025 30030 30030"..., 4096) = 224 close(4) = 0 munmap(0xb7298000, 4096) = 0 readlink("/proc/30030/cwd", "/mnt/tmp", 4096) = 8 stat64("/proc/30030/cwd", I doubt fuser will do any better in this case. In the worst case, the patch provided here fixes the fact that it was using _both_ instead of one or the other for killing mount processes.
Sorry, 4.51, not 4.52 in above comment.
Bug #131712 opened against lsof.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2004-491.html