Bug 241306 - [NetApp-N 4.6 bug] umount –f hangs on a mount point
Summary: [NetApp-N 4.6 bug] umount –f hangs on a mount point
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: util-linux
Version: 4.4
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
: ---
Assignee: Jeff Layton
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks: 217206
TreeView+ depends on / blocked
 
Reported: 2007-05-24 22:41 UTC by Bikash
Modified: 2007-11-17 01:14 UTC (History)
6 users (show)

Fixed In Version: 4.5
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2007-08-20 15:28:31 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Bikash 2007-05-24 22:41:01 UTC
Description of problem:The “umount –f “ commands hangs on a mount point on RHEL 
4 kernel version 2.6.9-55.ELsmp.

Version-Release number of selected component (if applicable):RHEL4.5


How reproducible:
1) Create a volume /vol/vol0 on a filer and manipulate the export list  so that 
it can be mounted from an NFS client running the aforementioned OS. 
2) The mount options being used are 
(rw,bg,hard,intr,rsize=32768,wsize=32768,proto=tcp,nfsvers=3,addr=192.168.108.10
1). 
3) Disconnect the cable between the Linux host and the filer through which the 
mount is occurring. 
4) The NFS client on the Linux host will timeout. 
5) Execute the force unmount “umount –f” command on the Linux host. We should 
be able to forcibly unmount the volume on the client. It does not happen. The 
umount –f command hangs with the following trace. 
 

[root@dae-pc18 ~]# strace umount -f /mnt/rhel01

execve("/bin/umount", ["umount", "-f", "/mnt/rhel01"], [/* 23 vars */]) = 0

uname({sys="Linux", node="dae-pc18.lab.netapp.com", ...}) = 0

brk(0)                                  = 0x8ccc000

access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)

open("/etc/ld.so.cache", O_RDONLY)      = 3

fstat64(3, {st_mode=S_IFREG|0644, st_size=121023, ...}) = 0

old_mmap(NULL, 121023, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb7f6c000

close(3)                                = 0

open("/lib/tls/libc.so.6", O_RDONLY)    = 3

read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\20_w\000"..., 512) = 
512

fstat64(3, {st_mode=S_IFREG|0755, st_size=1454462, ...}) = 0

old_mmap(0x761000, 1219772, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 
0) = 0x761000

old_mmap(0x885000, 16384, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x124000) = 0x885000

old_mmap(0x889000, 7356, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x889000

close(3)                                = 0

old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 
0xb7f6b000

mprotect(0x885000, 4096, PROT_READ)     = 0

mprotect(0x759000, 4096, PROT_READ)     = 0

set_thread_area({entry_number:-1 -> 6, base_addr:0xb7f6baa0, limit:1048575, 
seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0, 
useable:1}) = 0

munmap(0xb7f6c000, 121023)              = 0

brk(0)                                  = 0x8ccc000

brk(0x8ced000)                          = 0x8ced000

open("/usr/lib/locale/locale-archive", O_RDONLY|O_LARGEFILE) = 3

fstat64(3, {st_mode=S_IFREG|0644, st_size=48516784, ...}) = 0

mmap2(NULL, 2097152, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb7d6b000

close(3)                                = 0

umask(022)                              = 022

getuid32()                              = 0

geteuid32()                             = 0

readlink("/mnt", 0xbff22760, 4096)      = -1 EINVAL (Invalid argument)

readlink("/mnt/rhel01", 0xbff22760, 4096) = -1 EINVAL (Invalid argument)

umask(077)                              = 022

open("/etc/mtab", O_RDONLY|O_LARGEFILE) = 3

umask(022)                              = 077

fstat64(3, {st_mode=S_IFREG|0644, st_size=446, ...}) = 0

mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 
0xb7d6a000

read(3, "/dev/mapper/VolGroup00-LogVol00 "..., 4096) = 446

read(3, "", 4096)                       = 0

close(3)                                = 0

munmap(0xb7d6a000, 4096)                = 0

uname({sys="Linux", node="dae-pc18.lab.netapp.com", ...}) = 0

open("/proc/mounts", O_RDONLY)          = 3

fstat64(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0

mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 
0xb7d6a000

read(3, "rootfs / rootfs rw 0 0\n/proc /pr"..., 1024) = 548

close(3)                                = 0

munmap(0xb7d6a000, 4096)                = 0

socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 3

bind(3, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("0.0.0.0")}, 
16) = 0

connect(3, {sa_family=AF_INET, sin_port=htons(111), sin_addr=inet_addr
("192.168.108.101")}, 16

 I have tried this on Solaris 10 and the force un-mount command “umount –f” 
works. In other words I am able to kill the mount on Solaris.





Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 2 Karel Zak 2007-05-29 07:35:57 UTC
(In reply to comment #0)
> Version-Release number of selected component (if applicable):RHEL4.5

 Component = util-linux ...

Comment 6 RHEL Program Management 2007-05-29 20:44:20 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 7 Andrius Benokraitis 2007-06-11 15:36:16 UTC
Hey Steve - is this a global Linux issue? Does this work at all for either RHEL
4 and RHEL 5?

Comment 8 Steve Dickson 2007-06-11 15:59:59 UTC
Is DNS or NIS involved? From the strace it appears umount is trying
to take to the portmapper on 192.168.108.101. It could be umount
trying to use NIS to do a name resolution. So is 192.168.108.101
the NFS server? 

Comment 9 Bikash 2007-06-14 01:02:21 UTC
No, DNS and NIS is not involved. This is a private network. 

Comment 10 Jeff Layton 2007-06-18 11:40:23 UTC
I plan to look over this today, but this strikes me as odd:

> However, this very critical for our HA environment where we testing with VCS.

...in a HA environment, shouldn't the failover be transparent? The address
migrates to the new server. Why would they need to umount -f anything? Or am I
not understanding what they're trying to do?


Comment 11 Jeff Layton 2007-06-18 12:35:23 UTC
I've attempted to reproduce this here and cannot. Here's what I'm doing:

Mount a F7 host with these options from a RHEL4 client:

rw,rsize=32768,wsize=32768,hard,intr

on the F7 box, set an iptables rule to drop all traffic from the RHEL4 host
(this should simulate a cable pull):

# iptables -A INPUT -s dhcp59-129 -j DROP

...then on the RHEL4 box, I run:

# umount -f <mountpoint>

...it returns immediately and the NFS mount is no longer in the mount table.
When I strace this "umount -f", I see no socket operations at all. I think that
there is something else going on here -- maybe something nsswitch related...

Do we have a sysreport from the client machine here? In particular, I'm
interested in their versions of:

util-linux
glibc

...and the contents of nsswitch.conf.





Comment 13 Andrius Benokraitis 2007-06-18 15:41:36 UTC
Bikash - I'm wondering if I can open this bugzilla up to Symantec for their
input. Is umount -f needed without running VCS?

Comment 15 Andrius Benokraitis 2007-06-18 15:58:18 UTC
Bikash, can you submit a sysreport from the client?

Comment 16 Bikash 2007-06-20 02:16:41 UTC
Here's the following:

[root@dae-pc18 ~]# rpm -qa | grep util-linux
util-linux-2.12a-16.EL4.11


[root@dae-pc18 ~]# rpm -qa | grep glibc
glibc-utils-2.3.4-2.13
glibc-kernheaders-2.4-9.1.98.EL
glibc-2.3.4-2.13
glibc-devel-2.3.4-2.13
compat-glibc-headers-2.3.2-95.30
glibc-common-2.3.4-2.13
glibc-headers-2.3.4-2.13
compat-glibc-2.3.2-95.30
glibc-profile-2.3.4-2.13 


[root@dae-pc18 ~]# cat /etc/nsswitch.conf # # /etc/nsswitch.conf # # An example 
Name Service Switch config file. This file should be # sorted with the most-
used services at the beginning.
#
# The entry '[NOTFOUND=return]' means that the search for an # entry should 
stop if the search in the previous entry turned # up nothing. Note that if the 
search failed due to some other reason # (like no NIS server responding) then 
the search continues with the # next entry.
#
# Legal entries are:
#
#       nis or yp               Use NIS (NIS version 2), also called YP
#       dns                     Use DNS (Domain Name Service)
#       files                   Use the local files
#       db                      Use the local database (.db) files
#       compat                  Use NIS on compat mode
#       hesiod                  Use Hesiod for user lookups
#       ldap                    Use LDAP (only if nss_ldap is installed)
#       nisplus or nis+         Use NIS+ (NIS version 3), unsupported
#       [NOTFOUND=return]       Stop searching if not found so far
#

# To use db, put the "db" in front of "files" for entries you want to be # 
looked up first in the databases # # Example:
#passwd:    db files ldap nis
#shadow:    db files ldap nis
#group:     db files ldap nis

passwd:     files
shadow:     files
group:      files

#hosts:     db files ldap nis dns
hosts:      files dns

# Example - obey only what ldap tells us...
#services:  ldap [NOTFOUND=return] files
#networks:  ldap [NOTFOUND=return] files
#protocols: ldap [NOTFOUND=return] files
#rpc:       ldap [NOTFOUND=return] files
#ethers:    ldap [NOTFOUND=return] files

bootparams: files
ethers:     files
netmasks:   files
networks:   files
protocols:  files
rpc:        files
services:   files
netgroup:   files
publickey:  files
automount:  files
aliases:    files
[root@dae-pc18 ~]#


Comment 17 Jeff Layton 2007-06-20 12:47:06 UTC
That util-linux is pretty old. From the changelog:

* Wed Jul 20 2005 Karel Zak <kzak> 2.12a-16.EL4.11

...can test on a more recent version? Maybe something from this year? I think
the one that shipped with 4.5 was:

2.12a-16.EL4.25

...also, that glibc version is pretty old too. I think current ones are:

2.3.4-2.36

...my guess is that the difference in versions is likely why you're seeing a
portmap connection attempt on unmount and I am not.


Comment 18 Bikash 2007-06-20 13:05:53 UTC
This has got nothing to do with VCS. I have tried this scenario where I pull 
the cable between the NFS client and the filer then subsequently running the 
command "umount -f" to do a forced unmount. The command hangs forever. I have 
tried this on a Solaris NFS client accessing the same filer and it works.

However, we are going to upgrade the client to RHEL4.5 later today and run the 
tests.

Comment 19 Jeff Layton 2007-06-29 14:41:10 UTC
Setting to NEEDINFO reporter pending testing with 4.5

Comment 20 Bikash 2007-06-29 16:23:12 UTC
Our internal test with RHEL4.5 have been successful so far.

Comment 21 Jeff Layton 2007-06-29 16:26:53 UTC
Successful in that umount -f now works? If so, should we just close this as
NOTABUG or CURRENTRELEASE?

Comment 22 Andrius Benokraitis 2007-08-20 15:28:31 UTC
Bikash, I'm going to close this as fixed in 4.5.


Note You need to log in before you can comment on or make changes to this bug.