Bug 1260824

Summary: nfs mount does not mount at boot time
Product: Red Hat Enterprise Linux 7 Reporter: brendan.beveridge <brendan.beveridge>
Component: NetworkManagerAssignee: Thomas Haller <thaller>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Desktop QE <desktop-qa-list>
Severity: high Docs Contact:
Priority: medium    
Version: 7.1CC: atragler, brendan.beveridge, btotty, dcbw, kzhang, lnykryn, rkhan, systemd-maint-list, thaller
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-05-22 11:30:04 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1203710, 1301628, 1313485    
Attachments:
Description Flags
journalctl
none
systemctl dependencies
none
systemd analyze blame
none
systemctl status
none
journalctl with debug enabled none

Description brendan.beveridge@optus.com.au 2015-09-08 01:13:06 UTC
Created attachment 1071163 [details]
journalctl

Description of problem:


Version-Release number of selected component (if applicable):
systemd-208-20.el7_1.5.x86_64

How reproducible:
Varies depending on hardware.
Tested on IBM HS22 physical blade which exhibits the problem
Tested on KVM Virtual machine which does not exhibit the problem

Steps to Reproduce:
1. install os
2. install nfs-utils
3. add nfs mount to fstab
4. enable NetworkManager-wait-online
5. reboot and check systemctl status $mount.mount

Actual results:

mount.nfs: Failed to resolve server nfs.server.tld: Name or service not known

Expected results:

mount successfully at boot

Additional info:

As you can see in the journalctl log it reaches network online state before trying the mount. once i get a login on the box i can manually mount the mount point with no issues.
Running the same config on a vm works as expected, im guessing NM is saying the network is done when its not

Please let me know if theres anything else i can add

Comment 1 brendan.beveridge@optus.com.au 2015-09-08 01:13:44 UTC
Created attachment 1071164 [details]
systemctl dependencies

Comment 2 brendan.beveridge@optus.com.au 2015-09-08 01:14:12 UTC
Created attachment 1071165 [details]
systemd analyze blame

Comment 3 brendan.beveridge@optus.com.au 2015-09-08 01:14:36 UTC
Created attachment 1071166 [details]
systemctl status

Comment 5 Lukáš Nykrýn 2015-09-08 07:14:28 UTC
Can you retest it with 7.2 beta. There were some minor changes in ordering. But from our point of view the order is fine. Lets reassign this to NM.

Comment 6 brendan.beveridge@optus.com.au 2015-09-09 05:50:35 UTC
Tested with 7.2 beta still the same result

The Key issue is NetworkManager claims eth0 is connected but doesnt seem to be:

NetworkManager[645]: <info>  (eth0): Activation: successful, device activated.
...
NetworkManager[645]: <info>  (eth0): link disconnected (deferring action for 4 seconds)
NetworkManager[645]: <info>  (eth0): link connected
NetworkManager[645]: <info>  startup complete
...
systemd[1]: Started Network Manager Wait Online.
systemd[1]: Reached target Network is Online.
systemd[1]: Starting Network is Online.
...
systemd[1]: Mounting /srv/packages...
...
systemd[1]: srv-packages.mount mount process exited, code=exited status=32
systemd[1]: Failed to mount /srv/packages.

Comment 7 brendan.beveridge@optus.com.au 2015-09-09 06:08:37 UTC
one thing that we noticed was that it seems like eth0 comes up then gets reinitialized later...ie:

Sep 09 15:56:30 repo03.server.tld NetworkManager[639]: <info>  (eth0): link connected
...
Sep 09 15:56:27 repo03.server.tld NetworkManager[639]: <info>  (eth0): link disconnected (deferring action for 4 seconds)
Sep 09 15:56:30 repo03.server.tld NetworkManager[639]: <info>  (eth0): link connected


during this time in bootup the interface is pingable then goes dead for a few seconds

Comment 8 Thomas Haller 2015-09-09 15:57:05 UTC
The logfile in comment #1 doesn't look ill. Could you please reproduce the problem with debug-logging enabled?


edit /etc/NetworkManager/NetworkManager.conf and add

[logging]
level=DEBUG
domains=ALL


and reboot.


Thank you

Comment 9 brendan.beveridge@optus.com.au 2015-09-10 00:03:21 UTC
Created attachment 1071974 [details]
journalctl with debug enabled

Comment 10 brendan.beveridge@optus.com.au 2015-09-10 00:05:46 UTC
one thing i have noticed over several reboots is once or twice the mount succeeded at boot, this tells me some form of race condition is happening

this has only happened when NetworkManager is in debug logging

Comment 11 brendan.beveridge@optus.com.au 2015-09-10 00:08:38 UTC
I've also tested on another piece of physical hardware (same model) but in a different enclosure with the same results

Comment 12 Dan Williams 2015-09-15 15:36:56 UTC
The latest logs indicate that static addressing is being used on the 'eth0' interface.  Also, NFS appears to wait for networking to become "online", which is fine (and necessary, of course).

At 09:49:13  the interface gets configured and has a carrier:

IP4_ADDRESS_0=10.255.137.16/25 10.255.137.1
IP4_NAMESERVERS=172.24.224.10 211.29.133.82 172.24.225.10

but then 30 seconds later NFS can't find the server:

Sep 10 09:49:46 repo03.server.tld mount[702]: mount.nfs: Failed to resolve server nfs01.syd-vlan50.optusnet.com.au: Name or service not known
Sep 10 09:49:46 repo03.server.tld systemd[1]: srv-packages.mount mount process exited, code=exited status=32

One thing to try is to add GATEWAY_PING_TIMEOUT=20 to /etc/sysconfig/network-scripts/ifcfg-eth0 which will cause NM to block networking until either a ping to the gateway succeeds, or 20 seconds have elapsed.  This will tell us whether the hardware can actually talk to the network even though it has said it is ready.

Comment 13 Dan Williams 2016-01-19 16:11:16 UTC
Brendan, any chance you can try the GATEWAY_PING_TIMEOUT option from comment 12?

Comment 14 Thomas Haller 2016-01-19 17:03:28 UTC
Hi Brendan,

we are considering whether this needs to be fixed on for the future 7.3 release.

But it's unclear whether there is a real issue, whether the issue still exists or whether it can be fixed by a configuration change.


It would be appreciated if you'd be able to provide the requested information. Thanks.

Comment 18 Thomas Haller 2016-05-22 11:30:04 UTC
Hi Brendan,

I am closing this bug with "INSUFFICIENT_DATA".

As detailed in comment 12, from the logfile it looks like NetworkManager is correctly declaring network up only after IP and DNS is configured.

It looks like at that point your environment is not yet ready for networking. NetworkManager cannot know that, and you'd have to explicitly configure connection.gateway-ping-timeout.


Please feel invited to re-open the bug if you can provide more debug information, especially testing with  connection.gateway-ping-timeout.


Thank you.

Comment 19 Red Hat Bugzilla 2023-09-14 03:04:56 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days