Bug 1858930

Summary: nfs mounts will block cloud instances with cloud-init from starting up
Product: Red Hat Enterprise Linux 7 Reporter: anhvo
Component: nfs-utilsAssignee: Steve Dickson <steved>
Status: CLOSED WONTFIX QA Contact: Yongcheng Yang <yoyang>
Severity: high Docs Contact:
Priority: unspecified    
Version: 7.8CC: xzhou
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-11-11 21:38:12 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description anhvo 2020-07-20 19:42:41 UTC
Description of problem:

Current RHEL images on Azure if provisioning with cloud-init will timeout during a start/stop if an NFSv3 mount entry is added to /etc/fstab.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. On Microsoft Azure, deploy the latest RHEL 7.8 image (RedHat:RHEL:7.8:7.8.2020050910 as of the writing of this bug)
2. Configure an NFS mount entry and add the entry to /etc/fstab
3. Stop/Start the VM through portal

Actual results:
VM will not start properly (hangs during boot)

Expected results:
VM starts properly

Additional info:
From our investigation this is due to the following
1) During start/stop, cloud-init will issue a mount -a to mount all entries in /etc/fstab (this is necessary because the ephemeral resource disk will need to be reformatted when VM starts/stops because the VM likely will land on a new host)
2) cloud-init.service itself runs before network-online.target, while rpc-statd-notify.service and rpc-statd.service, which are required to successfully bring up the NFSv3 mount, run after network-online.target. This creates a hang when trying to mount the nfs mount
3) By changing the rpc-statd-*.service to run After=network.target instead of network-online.target, the problem goes away

There is a similar bug for Fedora (not related to cloud-init)
https://bugzilla.redhat.com/show_bug.cgi?id=1183293

Comment 2 Yongcheng Yang 2020-07-21 01:22:05 UTC
(In reply to anhvo from comment #0)
...
> There is a similar bug for Fedora (not related to cloud-init)
> https://bugzilla.redhat.com/show_bug.cgi?id=1183293

Please note the above fix (changing to network.target) was reverted afterwards.

See http://git.linux-nfs.org/?p=steved/nfs-utils.git;a=commit;h=9d4fc3fb5133be2df69fa380f80d1c660827fd1b

Comment 3 anhvo 2020-08-04 20:14:03 UTC
Would it be better then to start After NetworkManager-wait-online.target?

Comment 4 Yongcheng Yang 2020-08-05 07:28:54 UTC
(In reply to anhvo from comment #3)
> Would it be better then to start After NetworkManager-wait-online.target?

I don't know much about it.

But I just have found this in nm-online(1):

This tool is not very useful to call directly. It is however used by NetworkManager-wait-online.service with --wait-for-startup argument. This is used to delay the service and indirectly network-online.target, until networking is up. *Don't* order your own systemd services after NetworkManager-wait-online.service *directly*. Instead if necessary, order your services after *network-online.target*. Even better is to have your services react to network changes dynamically and don't order them with respect to network-online.target at all.

Comment 6 Chris Williams 2020-11-11 21:38:12 UTC
Red Hat Enterprise Linux 7 shipped it's final minor release on September 29th, 2020. 7.9 was the last minor releases scheduled for RHEL 7.
From intial triage it does not appear the remaining Bugzillas meet the inclusion criteria for Maintenance Phase 2 and will now be closed. 

From the RHEL life cycle page:
https://access.redhat.com/support/policy/updates/errata#Maintenance_Support_2_Phase
"During Maintenance Support 2 Phase for Red Hat Enterprise Linux version 7,Red Hat defined Critical and Important impact Security Advisories (RHSAs) and selected (at Red Hat discretion) Urgent Priority Bug Fix Advisories (RHBAs) may be released as they become available."

If this BZ was closed in error and meets the above criteria please re-open it flag for 7.9.z, provide suitable business and technical justifications, and follow the process for Accelerated Fixes:
https://source.redhat.com/groups/public/pnt-cxno/pnt_customer_experience_and_operations_wiki/support_delivery_accelerated_fix_release_handbook  

Feature Requests can re-opened and moved to RHEL 8 if the desired functionality is not already present in the product. 

Please reach out to the applicable Product Experience Engineer[0] if you have any questions or concerns.  

[0] https://bugzilla.redhat.com/page.cgi?id=agile_component_mapping.html&product=Red+Hat+Enterprise+Linux+7