Bug 713261

Summary: F15: nfs client can die if network connection looses link
Product: [Fedora] Fedora Reporter: Nick Allen <jnallen>
Component: nfs-utilsAssignee: Steve Dickson <steved>
Status: CLOSED WONTFIX QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: low Docs Contact:
Priority: unspecified    
Version: 15CC: ajschult784, bfields, jlayton, steved
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-08-07 18:39:35 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Nick Allen 2011-06-14 20:10:57 UTC
Description of problem:

GNOME3 session crashed for users over NFS, NFS client processes stopped when the network switch lost power (link dropped.)  Discovered by a foot on the power strip supplying the switch, then purposely reproduced a second time.  The power strip has also been moved :-)

Version-Release number of selected component (if applicable):

Fedora 15, just upgraded Sunday.  

[root@jedi log]# uname -a
Linux jedi.localdomain 2.6.38.7-30.fc15.i686 #1 SMP Fri May 27 06:02:17 UTC 2011 i686 i686 i386 GNU/Linux
[root@jedi log]# rpm -qa | grep -i nfs
nfs-utils-1.2.3-11.fc15.i686
libnfsidmap-0.24-5.fc15.i686
system-config-nfs-1.3.51-2.fc15.noarch
[root@jedi log]# rpm -qa | grep -i 2.6.38
kernel-devel-2.6.38.7-30.fc15.i686
kmod-nvidia-2.6.38.7-30.fc15.i686-270.41.06-2.fc15.1.i686
kernel-headers-2.6.38.7-30.fc15.i686
kernel-2.6.38.7-30.fc15.i686
iproute-2.6.38.1-4.fc15.i686
perf-2.6.38.7-30.fc15.i686
[root@jedi log]# 




How reproducible:

Problem discovered when power was lost to the network switch, UPS on NFS server and F15 workstation worked as expected, keeping both systems online.

/home over nfs during failure causes users to be dropped out to login prompt.

Recover - <ctrl><alt>f2, login as root.  Kill off stale user processes, NFS client processes are not online, restart NFS client and mount NFS file systems.  Return to graphical login, log back into system.

NFS client should not die following link loss, the system should recover and not require administrative intervention to get mounts back online.  The user's login session should not crash during this type of event.

Steps to Reproduce:
1.  Mount /home from an NFS server, using a single switch between them (perhaps a direct link will cause the failure)
2.  power cycle the switch between server and client
3.  sometimes, a very brief interruption will cause GNOME3 to drop the user's session
  
Actual results:

GNOME3 crashed, NFS Client crashed

Expected results:

User session hang, then resume...  NFS client should have cached IO requests

Additional info:


Some log data:

Jun 14 13:07:44 jedi kernel: [221284.016937] r8169 0000:02:09.0: eth0: link down
Jun 14 13:07:44 jedi NetworkManager[762]: <info> (eth0): carrier now OFF (device state 100, deferring action for 4 seconds)
Jun 14 13:07:48 jedi NetworkManager[762]: <info> (eth0): device state change: activated -> unavailable (reason 'carrier-changed') [100 20 40]
Jun 14 13:07:48 jedi NetworkManager[762]: <info> (eth0): deactivating device (reason: 40).
Jun 14 13:07:49 jedi NetworkManager[762]: <info> (eth0): canceled DHCP transaction, DHCP client pid 6542
Jun 14 13:07:49 jedi avahi-daemon[763]: Withdrawing address record for 192.168.1.7 on eth0.
Jun 14 13:07:49 jedi avahi-daemon[763]: Leaving mDNS multicast group on interface eth0.IPv4 with address 192.168.1.7.
Jun 14 13:07:49 jedi avahi-daemon[763]: Interface eth0.IPv4 no longer relevant for mDNS.
Jun 14 13:07:49 jedi dbus: [system] Activating service name='org.freedesktop.nm_dispatcher' (using servicehelper)
Jun 14 13:07:49 jedi dbus: [system] Successfully activated service 'org.freedesktop.nm_dispatcher'
Jun 14 13:07:50 jedi ntpd[1346]: Deleting interface #7 eth0, 192.168.1.7#123, interface stats: received=51, sent=51, dropped=0, active_time=6124 secs
Jun 14 13:07:50 jedi ntpd[1346]: 64.73.32.134 interface 192.168.1.7 -> (none)
Jun 14 13:07:50 jedi ntpd[1346]: 72.14.177.132 interface 192.168.1.7 -> (none)
Jun 14 13:07:50 jedi ntpd[1346]: 149.20.68.17 interface 192.168.1.7 -> (none)
Jun 14 13:07:50 jedi ntpd[1346]: 169.229.70.201 interface 192.168.1.7 -> (none)
Jun 14 13:07:50 jedi ntpd[1346]: peers refreshed
Jun 14 13:07:59 jedi kernel: [221299.142832] r8169 0000:02:09.0: eth0: link up
Jun 14 13:07:59 jedi NetworkManager[762]: <info> (eth0): carrier now ON (device state 20)
Jun 14 13:07:59 jedi NetworkManager[762]: <info> (eth0): device state change: unavailable -> disconnected (reason 'carrier-changed') [20 30 40]
Jun 14 13:07:59 jedi NetworkManager[762]: <info> Activation (eth0) starting connection 'System eth0'
Jun 14 13:07:59 jedi NetworkManager[762]: <info> (eth0): device state change: disconnected -> prepare (reason 'none') [30 40 0]
Jun 14 13:07:59 jedi NetworkManager[762]: <info> Activation (eth0) Stage 1 of 5 (Device Prepare) scheduled...
Jun 14 13:07:59 jedi NetworkManager[762]: <info> Activation (eth0) Stage 1 of 5 (Device Prepare) started...
Jun 14 13:07:59 jedi NetworkManager[762]: <info> Activation (eth0) Stage 2 of 5 (Device Configure) scheduled...
Jun 14 13:07:59 jedi NetworkManager[762]: <info> Activation (eth0) Stage 1 of 5 (Device Prepare) complete.
Jun 14 13:07:59 jedi NetworkManager[762]: <info> Activation (eth0) Stage 2 of 5 (Device Configure) starting...
Jun 14 13:07:59 jedi NetworkManager[762]: <info> (eth0): device state change: prepare -> config (reason 'none') [40 50 0]
Jun 14 13:07:59 jedi NetworkManager[762]: <info> Activation (eth0) Stage 2 of 5 (Device Configure) successful.
Jun 14 13:07:59 jedi NetworkManager[762]: <info> Activation (eth0) Stage 3 of 5 (IP Configure Start) scheduled.
Jun 14 13:07:59 jedi NetworkManager[762]: <info> Activation (eth0) Stage 2 of 5 (Device Configure) complete.
Jun 14 13:07:59 jedi NetworkManager[762]: <info> Activation (eth0) Stage 3 of 5 (IP Configure Start) started...
Jun 14 13:07:59 jedi NetworkManager[762]: <info> (eth0): device state change: config -> ip-config (reason 'none') [50 70 0]
Jun 14 13:07:59 jedi NetworkManager[762]: <info> Activation (eth0) Beginning DHCPv4 transaction (timeout in 45 seconds)
Jun 14 13:07:59 jedi NetworkManager[762]: <info> dhclient started with pid 8131
Jun 14 13:07:59 jedi NetworkManager[762]: <info> Activation (eth0) Stage 3 of 5 (IP Configure Start) complete.
Jun 14 13:08:00 jedi dhclient[8131]: Internet Systems Consortium DHCP Client 4.2.1-P1
Jun 14 13:08:00 jedi dhclient[8131]: Copyright 2004-2011 Internet Systems Consortium.
Jun 14 13:08:00 jedi dhclient[8131]: All rights reserved.
Jun 14 13:08:00 jedi dhclient[8131]: For info, please visit https://www.isc.org/software/dhcp/
Jun 14 13:08:00 jedi dhclient[8131]:
Jun 14 13:08:00 jedi NetworkManager[762]: <info> (eth0): DHCPv4 state changed nbi -> preinit
Jun 14 13:08:00 jedi dhclient[8131]: Listening on LPF/eth0/00:14:85:ce:20:d5
Jun 14 13:08:00 jedi dhclient[8131]: Sending on   LPF/eth0/00:14:85:ce:20:d5
Jun 14 13:08:00 jedi dhclient[8131]: Sending on   Socket/fallback
Jun 14 13:08:00 jedi dhclient[8131]: DHCPREQUEST on eth0 to 255.255.255.255 port 67
Jun 14 13:08:00 jedi dhclient[8131]: DHCPACK from 192.168.1.1
Jun 14 13:08:00 jedi dhclient[8131]: bound to 192.168.1.7 -- renewal in 35603 seconds.
Jun 14 13:08:00 jedi NetworkManager[762]: <info> (eth0): DHCPv4 state changed preinit -> reboot
Jun 14 13:08:00 jedi NetworkManager[762]: <info> Activation (eth0) Stage 4 of 5 (IP4 Configure Get) scheduled...
Jun 14 13:08:00 jedi NetworkManager[762]: <info> Activation (eth0) Stage 4 of 5 (IP4 Configure Get) started...
Jun 14 13:08:00 jedi NetworkManager[762]: <info>   address 192.168.1.7
Jun 14 13:08:00 jedi NetworkManager[762]: <info>   prefix 24 (255.255.255.0)
Jun 14 13:08:00 jedi NetworkManager[762]: <info>   gateway 192.168.1.1
Jun 14 13:08:00 jedi NetworkManager[762]: <info>   nameserver '192.168.1.1'
Jun 14 13:08:00 jedi NetworkManager[762]: <info> Activation (eth0) Stage 5 of 5 (IP Configure Commit) scheduled...
Jun 14 13:08:00 jedi NetworkManager[762]: <info> Activation (eth0) Stage 4 of 5 (IP4 Configure Get) complete.
Jun 14 13:08:00 jedi NetworkManager[762]: <info> Activation (eth0) Stage 5 of 5 (IP Configure Commit) started...
Jun 14 13:08:00 jedi avahi-daemon[763]: Joining mDNS multicast group on interface eth0.IPv4 with address 192.168.1.7.
Jun 14 13:08:00 jedi avahi-daemon[763]: New relevant interface eth0.IPv4 for mDNS.
Jun 14 13:08:00 jedi avahi-daemon[763]: Registering new address record for 192.168.1.7 on eth0.IPv4.
Jun 14 13:08:01 jedi NetworkManager[762]: <info> (eth0): device state change: ip-config -> activated (reason 'none') [70 100 0]
Jun 14 13:08:01 jedi NetworkManager[762]: <info> Policy set 'System eth0' (eth0) as default for IPv4 routing and DNS.
Jun 14 13:08:01 jedi NetworkManager[762]: <info> Activation (eth0) successful, device activated.
Jun 14 13:08:01 jedi NetworkManager[762]: <info> Activation (eth0) Stage 5 of 5 (IP Configure Commit) complete.
Jun 14 13:08:01 jedi systemd[1]: Unit shared.mount entered failed state.
Jun 14 13:08:01 jedi systemd[1]: Unit install.mount entered failed state.
Jun 14 13:08:02 jedi ntpd[1346]: Listen normally on 8 eth0 192.168.1.7 UDP 123
Jun 14 13:08:02 jedi ntpd[1346]: peers refreshed
Jun 14 13:08:03 jedi ntpd[1346]: 0.0.0.0 0638 08 no_sys_peer
Jun 14 13:08:07 jedi kernel: [221306.711419] agpgart-intel 0000:00:00.0: AGP 3.0 bridge
Jun 14 13:08:07 jedi kernel: [221306.711450] agpgart-intel 0000:00:00.0: putting AGP V3 device into 8x mode
Jun 14 13:08:07 jedi kernel: [221306.711505] nvidia 0000:01:00.0: putting AGP V3 device into 8x mode
Jun 14 13:08:08 jedi systemd[1]: Unit home.mount entered failed state.

Comment 1 Fedora End Of Life 2012-08-07 18:39:37 UTC
This message is a notice that Fedora 15 is now at end of life. Fedora
has stopped maintaining and issuing updates for Fedora 15. It is
Fedora's policy to close all bug reports from releases that are no
longer maintained. At this time, all open bugs with a Fedora 'version'
of '15' have been closed as WONTFIX.

(Please note: Our normal process is to give advanced warning of this
occurring, but we forgot to do that. A thousand apologies.)

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, feel free to reopen
this bug and simply change the 'version' to a later Fedora version.

Bug Reporter: Thank you for reporting this issue and we are sorry that
we were unable to fix it before Fedora 15 reached end of life. If you
would still like to see this bug fixed and are able to reproduce it
against a later version of Fedora, you are encouraged to click on
"Clone This Bug" (top right of this page) and open it against that
version of Fedora.

Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.

The process we are following is described here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping