Bug 1260824 - nfs mount does not mount at boot time [NEEDINFO]
nfs mount does not mount at boot time
Status: CLOSED INSUFFICIENT_DATA
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: NetworkManager (Show other bugs)
7.1
x86_64 Linux
medium Severity high
: rc
: ---
Assigned To: Thomas Haller
Desktop QE
:
Depends On:
Blocks: 1203710 1301628 1313485
  Show dependency treegraph
 
Reported: 2015-09-07 21:13 EDT by brendan.beveridge@optus.com.au
Modified: 2016-05-22 07:30 EDT (History)
9 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-05-22 07:30:04 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
dcbw: needinfo? (brendan.beveridge)
dcbw: needinfo? (brendan.beveridge)


Attachments (Terms of Use)
journalctl (132.68 KB, text/x-vhdl)
2015-09-07 21:13 EDT, brendan.beveridge@optus.com.au
no flags Details
systemctl dependencies (712 bytes, text/plain)
2015-09-07 21:13 EDT, brendan.beveridge@optus.com.au
no flags Details
systemd analyze blame (1.70 KB, text/plain)
2015-09-07 21:14 EDT, brendan.beveridge@optus.com.au
no flags Details
systemctl status (703 bytes, text/plain)
2015-09-07 21:14 EDT, brendan.beveridge@optus.com.au
no flags Details
journalctl with debug enabled (291.99 KB, text/x-vhdl)
2015-09-09 20:03 EDT, brendan.beveridge@optus.com.au
no flags Details

  None (edit)
Description brendan.beveridge@optus.com.au 2015-09-07 21:13:06 EDT
Created attachment 1071163 [details]
journalctl

Description of problem:


Version-Release number of selected component (if applicable):
systemd-208-20.el7_1.5.x86_64

How reproducible:
Varies depending on hardware.
Tested on IBM HS22 physical blade which exhibits the problem
Tested on KVM Virtual machine which does not exhibit the problem

Steps to Reproduce:
1. install os
2. install nfs-utils
3. add nfs mount to fstab
4. enable NetworkManager-wait-online
5. reboot and check systemctl status $mount.mount

Actual results:

mount.nfs: Failed to resolve server nfs.server.tld: Name or service not known

Expected results:

mount successfully at boot

Additional info:

As you can see in the journalctl log it reaches network online state before trying the mount. once i get a login on the box i can manually mount the mount point with no issues.
Running the same config on a vm works as expected, im guessing NM is saying the network is done when its not

Please let me know if theres anything else i can add
Comment 1 brendan.beveridge@optus.com.au 2015-09-07 21:13:44 EDT
Created attachment 1071164 [details]
systemctl dependencies
Comment 2 brendan.beveridge@optus.com.au 2015-09-07 21:14:12 EDT
Created attachment 1071165 [details]
systemd analyze blame
Comment 3 brendan.beveridge@optus.com.au 2015-09-07 21:14:36 EDT
Created attachment 1071166 [details]
systemctl status
Comment 5 Lukáš Nykrýn 2015-09-08 03:14:28 EDT
Can you retest it with 7.2 beta. There were some minor changes in ordering. But from our point of view the order is fine. Lets reassign this to NM.
Comment 6 brendan.beveridge@optus.com.au 2015-09-09 01:50:35 EDT
Tested with 7.2 beta still the same result

The Key issue is NetworkManager claims eth0 is connected but doesnt seem to be:

NetworkManager[645]: <info>  (eth0): Activation: successful, device activated.
...
NetworkManager[645]: <info>  (eth0): link disconnected (deferring action for 4 seconds)
NetworkManager[645]: <info>  (eth0): link connected
NetworkManager[645]: <info>  startup complete
...
systemd[1]: Started Network Manager Wait Online.
systemd[1]: Reached target Network is Online.
systemd[1]: Starting Network is Online.
...
systemd[1]: Mounting /srv/packages...
...
systemd[1]: srv-packages.mount mount process exited, code=exited status=32
systemd[1]: Failed to mount /srv/packages.
Comment 7 brendan.beveridge@optus.com.au 2015-09-09 02:08:37 EDT
one thing that we noticed was that it seems like eth0 comes up then gets reinitialized later...ie:

Sep 09 15:56:30 repo03.server.tld NetworkManager[639]: <info>  (eth0): link connected
...
Sep 09 15:56:27 repo03.server.tld NetworkManager[639]: <info>  (eth0): link disconnected (deferring action for 4 seconds)
Sep 09 15:56:30 repo03.server.tld NetworkManager[639]: <info>  (eth0): link connected


during this time in bootup the interface is pingable then goes dead for a few seconds
Comment 8 Thomas Haller 2015-09-09 11:57:05 EDT
The logfile in comment #1 doesn't look ill. Could you please reproduce the problem with debug-logging enabled?


edit /etc/NetworkManager/NetworkManager.conf and add

[logging]
level=DEBUG
domains=ALL


and reboot.


Thank you
Comment 9 brendan.beveridge@optus.com.au 2015-09-09 20:03:21 EDT
Created attachment 1071974 [details]
journalctl with debug enabled
Comment 10 brendan.beveridge@optus.com.au 2015-09-09 20:05:46 EDT
one thing i have noticed over several reboots is once or twice the mount succeeded at boot, this tells me some form of race condition is happening

this has only happened when NetworkManager is in debug logging
Comment 11 brendan.beveridge@optus.com.au 2015-09-09 20:08:38 EDT
I've also tested on another piece of physical hardware (same model) but in a different enclosure with the same results
Comment 12 Dan Williams 2015-09-15 11:36:56 EDT
The latest logs indicate that static addressing is being used on the 'eth0' interface.  Also, NFS appears to wait for networking to become "online", which is fine (and necessary, of course).

At 09:49:13  the interface gets configured and has a carrier:

IP4_ADDRESS_0=10.255.137.16/25 10.255.137.1
IP4_NAMESERVERS=172.24.224.10 211.29.133.82 172.24.225.10

but then 30 seconds later NFS can't find the server:

Sep 10 09:49:46 repo03.server.tld mount[702]: mount.nfs: Failed to resolve server nfs01.syd-vlan50.optusnet.com.au: Name or service not known
Sep 10 09:49:46 repo03.server.tld systemd[1]: srv-packages.mount mount process exited, code=exited status=32

One thing to try is to add GATEWAY_PING_TIMEOUT=20 to /etc/sysconfig/network-scripts/ifcfg-eth0 which will cause NM to block networking until either a ping to the gateway succeeds, or 20 seconds have elapsed.  This will tell us whether the hardware can actually talk to the network even though it has said it is ready.
Comment 13 Dan Williams 2016-01-19 11:11:16 EST
Brendan, any chance you can try the GATEWAY_PING_TIMEOUT option from comment 12?
Comment 14 Thomas Haller 2016-01-19 12:03:28 EST
Hi Brendan,

we are considering whether this needs to be fixed on for the future 7.3 release.

But it's unclear whether there is a real issue, whether the issue still exists or whether it can be fixed by a configuration change.


It would be appreciated if you'd be able to provide the requested information. Thanks.
Comment 18 Thomas Haller 2016-05-22 07:30:04 EDT
Hi Brendan,

I am closing this bug with "INSUFFICIENT_DATA".

As detailed in comment 12, from the logfile it looks like NetworkManager is correctly declaring network up only after IP and DNS is configured.

It looks like at that point your environment is not yet ready for networking. NetworkManager cannot know that, and you'd have to explicitly configure connection.gateway-ping-timeout.


Please feel invited to re-open the bug if you can provide more debug information, especially testing with  connection.gateway-ping-timeout.


Thank you.

Note You need to log in before you can comment on or make changes to this bug.