Bug 1773637

Summary: cloud-init may rewrite network interface config during subsequent reboots if config-drive is not detected.
Product: Red Hat Enterprise Linux 7 Reporter: Matt Flusche <mflusche>
Component: cloud-initAssignee: Eduardo Otubo <eterrell>
Status: CLOSED CURRENTRELEASE QA Contact: xiachen
Severity: high Docs Contact:
Priority: high    
Version: 7.0CC: carolgrey98, dhill, huzhao, jgreguske, leiwang, linl, merry678garcia, mkalinin, ribarry, xiachen, yacao, yuxisun
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-11-05 12:20:01 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Matt Flusche 2019-11-18 15:27:40 UTC
Description of problem:

For unknown reasons during boot, cloud-init may fail to detect a config-drive and reset the networking to the initial (first-boot) config.

Reference for this issue:  https://bugzilla.redhat.com/show_bug.cgi?id=1760806

It would seem that cloud-init should still detect it is not a first boot based on the contents of /var/lib/cloud/instance and skip the network config.


Version-Release number of selected component (if applicable):
cloud-init-18.5-3.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1. create cloud instance with config-drive.
2. modify config of primary interface.
3. reboot and verify config of interface does not change; should see messages like the following in /var/log/cloud-init.log:

2019-11-18 14:45:47,532 - stages.py[DEBUG]: No network config applied. Neither a new instance nor datasource network update on 'System boot' event

4. remove cloud-init partition.
5. reboot and verify cloud-init rewrites interface's networking config.

Actual results:
contents of /etc/sysconfig/network-scripts/ifcfg-XX are modified to the default, first-boot, config.

Expected results:
No change to /etc/sysconfig/network-scripts/ifcfg-XX  after first boot.

Additional info:

Comment 2 David Hill 2020-03-03 21:15:36 UTC
The problem here is that if the datasource is not available, we fallback to Fallback/None which changes the instance-id to iid-datasource-none [1]:

~~~

   def is_new_instance(self):
        previous = self.previous_iid()
        ret = (previous == NO_PREVIOUS_INSTANCE_ID or
               previous != self.datasource.get_instance_id())
        return ret
~~~
and previous_iid is defined by this:
~~~

    def previous_iid(self):
        if self._previous_iid is not None:
            return self._previous_iid

        dp = self.paths.get_cpath('data')
        iid_fn = os.path.join(dp, 'instance-id')
        try:
            self._previous_iid = util.load_file(iid_fn).strip()
        except Exception:
            self._previous_iid = NO_PREVIOUS_INSTANCE_ID

        LOG.debug("previous iid found to be %s", self._previous_iid)
        return self._previous_iid
~~~

This is super easy to reproduce, attach a CD containing the metadata, boot a new instance.  Once it booted, shut it down, detach the CD and reboot it again.   This is what's happening on an atomic host [2] and this as for effect of wiping the network configuration as well as reverting many settings to the default settings.


[1] https://cloudinit.readthedocs.io/en/latest/topics/datasources/fallback.html
[2] https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux_atomic_host/7/html/installation_and_configuration_guide/types_of_installation#vmware_installation

Comment 5 Eduardo Otubo 2020-09-17 12:16:23 UTC
You're using cloud-init-18.5-3.el7.x86_64, and this same bug should be fixed on cloud-init-18.5-5.el7.x86_64, please give it a try.

Comment 17 Brandy Northrop 2024-05-03 13:07:15 UTC Comment hidden (spam)
Comment 18 merry678 2024-11-09 09:51:27 UTC
Did you have a fix on this issue?  https://www.spotify-stats.com

Comment 19 carolgrey98 2024-11-19 05:09:35 UTC
Cloud-init fails to detect the presence of a config-drive during boot and resets the network configuration to its default first-boot state, even if the system is not in a first-boot state. This happens when the cloud-init partition is removed, causing the network settings in /etc/sysconfig/network-scripts/ifcfg-XX to be overwritten. The issue is reproducible and contradicts the expected behavior of cloud-init skipping network reconfiguration after the first boot https://www.wellstar-mychart.com