Bug 1260824
Summary: | nfs mount does not mount at boot time | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | brendan.beveridge <brendan.beveridge> | ||||||||||||
Component: | NetworkManager | Assignee: | Thomas Haller <thaller> | ||||||||||||
Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | Desktop QE <desktop-qa-list> | ||||||||||||
Severity: | high | Docs Contact: | |||||||||||||
Priority: | medium | ||||||||||||||
Version: | 7.1 | CC: | atragler, brendan.beveridge, btotty, dcbw, kzhang, lnykryn, rkhan, systemd-maint-list, thaller | ||||||||||||
Target Milestone: | rc | ||||||||||||||
Target Release: | --- | ||||||||||||||
Hardware: | x86_64 | ||||||||||||||
OS: | Linux | ||||||||||||||
Whiteboard: | |||||||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||||||
Doc Text: | Story Points: | --- | |||||||||||||
Clone Of: | Environment: | ||||||||||||||
Last Closed: | 2016-05-22 11:30:04 UTC | Type: | Bug | ||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||
Documentation: | --- | CRM: | |||||||||||||
Verified Versions: | Category: | --- | |||||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||
Embargoed: | |||||||||||||||
Bug Depends On: | |||||||||||||||
Bug Blocks: | 1203710, 1301628, 1313485 | ||||||||||||||
Attachments: |
|
Created attachment 1071164 [details]
systemctl dependencies
Created attachment 1071165 [details]
systemd analyze blame
Created attachment 1071166 [details]
systemctl status
Can you retest it with 7.2 beta. There were some minor changes in ordering. But from our point of view the order is fine. Lets reassign this to NM. Tested with 7.2 beta still the same result The Key issue is NetworkManager claims eth0 is connected but doesnt seem to be: NetworkManager[645]: <info> (eth0): Activation: successful, device activated. ... NetworkManager[645]: <info> (eth0): link disconnected (deferring action for 4 seconds) NetworkManager[645]: <info> (eth0): link connected NetworkManager[645]: <info> startup complete ... systemd[1]: Started Network Manager Wait Online. systemd[1]: Reached target Network is Online. systemd[1]: Starting Network is Online. ... systemd[1]: Mounting /srv/packages... ... systemd[1]: srv-packages.mount mount process exited, code=exited status=32 systemd[1]: Failed to mount /srv/packages. one thing that we noticed was that it seems like eth0 comes up then gets reinitialized later...ie: Sep 09 15:56:30 repo03.server.tld NetworkManager[639]: <info> (eth0): link connected ... Sep 09 15:56:27 repo03.server.tld NetworkManager[639]: <info> (eth0): link disconnected (deferring action for 4 seconds) Sep 09 15:56:30 repo03.server.tld NetworkManager[639]: <info> (eth0): link connected during this time in bootup the interface is pingable then goes dead for a few seconds The logfile in comment #1 doesn't look ill. Could you please reproduce the problem with debug-logging enabled? edit /etc/NetworkManager/NetworkManager.conf and add [logging] level=DEBUG domains=ALL and reboot. Thank you Created attachment 1071974 [details]
journalctl with debug enabled
one thing i have noticed over several reboots is once or twice the mount succeeded at boot, this tells me some form of race condition is happening this has only happened when NetworkManager is in debug logging I've also tested on another piece of physical hardware (same model) but in a different enclosure with the same results The latest logs indicate that static addressing is being used on the 'eth0' interface. Also, NFS appears to wait for networking to become "online", which is fine (and necessary, of course). At 09:49:13 the interface gets configured and has a carrier: IP4_ADDRESS_0=10.255.137.16/25 10.255.137.1 IP4_NAMESERVERS=172.24.224.10 211.29.133.82 172.24.225.10 but then 30 seconds later NFS can't find the server: Sep 10 09:49:46 repo03.server.tld mount[702]: mount.nfs: Failed to resolve server nfs01.syd-vlan50.optusnet.com.au: Name or service not known Sep 10 09:49:46 repo03.server.tld systemd[1]: srv-packages.mount mount process exited, code=exited status=32 One thing to try is to add GATEWAY_PING_TIMEOUT=20 to /etc/sysconfig/network-scripts/ifcfg-eth0 which will cause NM to block networking until either a ping to the gateway succeeds, or 20 seconds have elapsed. This will tell us whether the hardware can actually talk to the network even though it has said it is ready. Brendan, any chance you can try the GATEWAY_PING_TIMEOUT option from comment 12? Hi Brendan, we are considering whether this needs to be fixed on for the future 7.3 release. But it's unclear whether there is a real issue, whether the issue still exists or whether it can be fixed by a configuration change. It would be appreciated if you'd be able to provide the requested information. Thanks. Hi Brendan, I am closing this bug with "INSUFFICIENT_DATA". As detailed in comment 12, from the logfile it looks like NetworkManager is correctly declaring network up only after IP and DNS is configured. It looks like at that point your environment is not yet ready for networking. NetworkManager cannot know that, and you'd have to explicitly configure connection.gateway-ping-timeout. Please feel invited to re-open the bug if you can provide more debug information, especially testing with connection.gateway-ping-timeout. Thank you. The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days |
Created attachment 1071163 [details] journalctl Description of problem: Version-Release number of selected component (if applicable): systemd-208-20.el7_1.5.x86_64 How reproducible: Varies depending on hardware. Tested on IBM HS22 physical blade which exhibits the problem Tested on KVM Virtual machine which does not exhibit the problem Steps to Reproduce: 1. install os 2. install nfs-utils 3. add nfs mount to fstab 4. enable NetworkManager-wait-online 5. reboot and check systemctl status $mount.mount Actual results: mount.nfs: Failed to resolve server nfs.server.tld: Name or service not known Expected results: mount successfully at boot Additional info: As you can see in the journalctl log it reaches network online state before trying the mount. once i get a login on the box i can manually mount the mount point with no issues. Running the same config on a vm works as expected, im guessing NM is saying the network is done when its not Please let me know if theres anything else i can add