Description of problem:The “umount –f “ commands hangs on a mount point on RHEL 4 kernel version 2.6.9-55.ELsmp. Version-Release number of selected component (if applicable):RHEL4.5 How reproducible: 1) Create a volume /vol/vol0 on a filer and manipulate the export list so that it can be mounted from an NFS client running the aforementioned OS. 2) The mount options being used are (rw,bg,hard,intr,rsize=32768,wsize=32768,proto=tcp,nfsvers=3,addr=192.168.108.10 1). 3) Disconnect the cable between the Linux host and the filer through which the mount is occurring. 4) The NFS client on the Linux host will timeout. 5) Execute the force unmount “umount –f” command on the Linux host. We should be able to forcibly unmount the volume on the client. It does not happen. The umount –f command hangs with the following trace. [root@dae-pc18 ~]# strace umount -f /mnt/rhel01 execve("/bin/umount", ["umount", "-f", "/mnt/rhel01"], [/* 23 vars */]) = 0 uname({sys="Linux", node="dae-pc18.lab.netapp.com", ...}) = 0 brk(0) = 0x8ccc000 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory) open("/etc/ld.so.cache", O_RDONLY) = 3 fstat64(3, {st_mode=S_IFREG|0644, st_size=121023, ...}) = 0 old_mmap(NULL, 121023, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb7f6c000 close(3) = 0 open("/lib/tls/libc.so.6", O_RDONLY) = 3 read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\20_w\000"..., 512) = 512 fstat64(3, {st_mode=S_IFREG|0755, st_size=1454462, ...}) = 0 old_mmap(0x761000, 1219772, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x761000 old_mmap(0x885000, 16384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x124000) = 0x885000 old_mmap(0x889000, 7356, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x889000 close(3) = 0 old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7f6b000 mprotect(0x885000, 4096, PROT_READ) = 0 mprotect(0x759000, 4096, PROT_READ) = 0 set_thread_area({entry_number:-1 -> 6, base_addr:0xb7f6baa0, limit:1048575, seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0, useable:1}) = 0 munmap(0xb7f6c000, 121023) = 0 brk(0) = 0x8ccc000 brk(0x8ced000) = 0x8ced000 open("/usr/lib/locale/locale-archive", O_RDONLY|O_LARGEFILE) = 3 fstat64(3, {st_mode=S_IFREG|0644, st_size=48516784, ...}) = 0 mmap2(NULL, 2097152, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb7d6b000 close(3) = 0 umask(022) = 022 getuid32() = 0 geteuid32() = 0 readlink("/mnt", 0xbff22760, 4096) = -1 EINVAL (Invalid argument) readlink("/mnt/rhel01", 0xbff22760, 4096) = -1 EINVAL (Invalid argument) umask(077) = 022 open("/etc/mtab", O_RDONLY|O_LARGEFILE) = 3 umask(022) = 077 fstat64(3, {st_mode=S_IFREG|0644, st_size=446, ...}) = 0 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7d6a000 read(3, "/dev/mapper/VolGroup00-LogVol00 "..., 4096) = 446 read(3, "", 4096) = 0 close(3) = 0 munmap(0xb7d6a000, 4096) = 0 uname({sys="Linux", node="dae-pc18.lab.netapp.com", ...}) = 0 open("/proc/mounts", O_RDONLY) = 3 fstat64(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7d6a000 read(3, "rootfs / rootfs rw 0 0\n/proc /pr"..., 1024) = 548 close(3) = 0 munmap(0xb7d6a000, 4096) = 0 socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 3 bind(3, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("0.0.0.0")}, 16) = 0 connect(3, {sa_family=AF_INET, sin_port=htons(111), sin_addr=inet_addr ("192.168.108.101")}, 16 I have tried this on Solaris 10 and the force un-mount command “umount –f” works. In other words I am able to kill the mount on Solaris. Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
(In reply to comment #0) > Version-Release number of selected component (if applicable):RHEL4.5 Component = util-linux ...
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
Hey Steve - is this a global Linux issue? Does this work at all for either RHEL 4 and RHEL 5?
Is DNS or NIS involved? From the strace it appears umount is trying to take to the portmapper on 192.168.108.101. It could be umount trying to use NIS to do a name resolution. So is 192.168.108.101 the NFS server?
No, DNS and NIS is not involved. This is a private network.
I plan to look over this today, but this strikes me as odd: > However, this very critical for our HA environment where we testing with VCS. ...in a HA environment, shouldn't the failover be transparent? The address migrates to the new server. Why would they need to umount -f anything? Or am I not understanding what they're trying to do?
I've attempted to reproduce this here and cannot. Here's what I'm doing: Mount a F7 host with these options from a RHEL4 client: rw,rsize=32768,wsize=32768,hard,intr on the F7 box, set an iptables rule to drop all traffic from the RHEL4 host (this should simulate a cable pull): # iptables -A INPUT -s dhcp59-129 -j DROP ...then on the RHEL4 box, I run: # umount -f <mountpoint> ...it returns immediately and the NFS mount is no longer in the mount table. When I strace this "umount -f", I see no socket operations at all. I think that there is something else going on here -- maybe something nsswitch related... Do we have a sysreport from the client machine here? In particular, I'm interested in their versions of: util-linux glibc ...and the contents of nsswitch.conf.
Bikash - I'm wondering if I can open this bugzilla up to Symantec for their input. Is umount -f needed without running VCS?
Bikash, can you submit a sysreport from the client?
Here's the following: [root@dae-pc18 ~]# rpm -qa | grep util-linux util-linux-2.12a-16.EL4.11 [root@dae-pc18 ~]# rpm -qa | grep glibc glibc-utils-2.3.4-2.13 glibc-kernheaders-2.4-9.1.98.EL glibc-2.3.4-2.13 glibc-devel-2.3.4-2.13 compat-glibc-headers-2.3.2-95.30 glibc-common-2.3.4-2.13 glibc-headers-2.3.4-2.13 compat-glibc-2.3.2-95.30 glibc-profile-2.3.4-2.13 [root@dae-pc18 ~]# cat /etc/nsswitch.conf # # /etc/nsswitch.conf # # An example Name Service Switch config file. This file should be # sorted with the most- used services at the beginning. # # The entry '[NOTFOUND=return]' means that the search for an # entry should stop if the search in the previous entry turned # up nothing. Note that if the search failed due to some other reason # (like no NIS server responding) then the search continues with the # next entry. # # Legal entries are: # # nis or yp Use NIS (NIS version 2), also called YP # dns Use DNS (Domain Name Service) # files Use the local files # db Use the local database (.db) files # compat Use NIS on compat mode # hesiod Use Hesiod for user lookups # ldap Use LDAP (only if nss_ldap is installed) # nisplus or nis+ Use NIS+ (NIS version 3), unsupported # [NOTFOUND=return] Stop searching if not found so far # # To use db, put the "db" in front of "files" for entries you want to be # looked up first in the databases # # Example: #passwd: db files ldap nis #shadow: db files ldap nis #group: db files ldap nis passwd: files shadow: files group: files #hosts: db files ldap nis dns hosts: files dns # Example - obey only what ldap tells us... #services: ldap [NOTFOUND=return] files #networks: ldap [NOTFOUND=return] files #protocols: ldap [NOTFOUND=return] files #rpc: ldap [NOTFOUND=return] files #ethers: ldap [NOTFOUND=return] files bootparams: files ethers: files netmasks: files networks: files protocols: files rpc: files services: files netgroup: files publickey: files automount: files aliases: files [root@dae-pc18 ~]#
That util-linux is pretty old. From the changelog: * Wed Jul 20 2005 Karel Zak <kzak> 2.12a-16.EL4.11 ...can test on a more recent version? Maybe something from this year? I think the one that shipped with 4.5 was: 2.12a-16.EL4.25 ...also, that glibc version is pretty old too. I think current ones are: 2.3.4-2.36 ...my guess is that the difference in versions is likely why you're seeing a portmap connection attempt on unmount and I am not.
This has got nothing to do with VCS. I have tried this scenario where I pull the cable between the NFS client and the filer then subsequently running the command "umount -f" to do a forced unmount. The command hangs forever. I have tried this on a Solaris NFS client accessing the same filer and it works. However, we are going to upgrade the client to RHEL4.5 later today and run the tests.
Setting to NEEDINFO reporter pending testing with 4.5
Our internal test with RHEL4.5 have been successful so far.
Successful in that umount -f now works? If so, should we just close this as NOTABUG or CURRENTRELEASE?
Bikash, I'm going to close this as fixed in 4.5.