Bug 1593010
Summary: | [cloud-init][RHVM]cloud-init network configuration does not persist reboot [RHEL 7.8] | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | amashah | ||||||
Component: | cloud-init | Assignee: | Eduardo Otubo <eterrell> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Yuhui Jiang <yujiang> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | high | ||||||||
Version: | 7.0 | CC: | amashah, bpelled, danken, dfu, dholler, dhuertas, dsh, eraviv, eterrell, fgarciad, fsun, greartes, huzhao, ikke, jcoscia, jgreguske, jhunsaker, jonathan.moore, ldu, linl, lrotenbe, lsurette, mkalinin, mkarg, mrezanin, mtessun, pengpengs, Rhev-m-bugs, ribarry, rmccabe, samuel.jon.gunnarsson, sirao, srevivo, svigan, vkuznets, vmware-gos-qa, vshypygu, xiachen, yacao, yanjin, yujiang | ||||||
Target Milestone: | pre-dev-freeze | ||||||||
Target Release: | --- | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | cloud-init-18.5-5.el7 | Doc Type: | If docs needed, set a value | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | |||||||||
: | 1706482 1750710 (view as bug list) | Environment: | |||||||
Last Closed: | 2020-03-31 20:07:08 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | Network | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 1736852, 1706482, 1710945, 1750710 | ||||||||
Attachments: |
|
Description
amashah
2018-06-19 20:38:56 UTC
Amar, could you please get a sosreport from the VM with the issue if it's a RHEL VM or the cloud-init logs and version inside the VM if it's not a RHEL VM ? Not sure, but maybe this BZ should be against RHEL product - cloud-init component instead of RHV ? (In reply to Javier Coscia from comment #2) > Amar, could you please get a sosreport from the VM with the issue if it's a > RHEL VM or the cloud-init logs and version inside the VM if it's not a RHEL > VM ? > > Not sure, but maybe this BZ should be against RHEL product - cloud-init > component instead of RHV ? I believe so. Calling it Ryan. But customer in-guest logs are paramount for further debugging. It could be a configuration issue. The network config module will run if the instance id changes, and not being able to retrieve any instance id will also do it. Without knowing the configuration and having the logs, it's hard to say what's going on. FYI, I have moved on from cloud-init. It's now owned by the Amnon's team in virt. Just from a quick look, the instance id is changing. Is something like a config drive being removed after the first run, or similar? (In reply to Ryan McCabe from comment #9) > Just from a quick look, the instance id is changing. Is something like a > config drive being removed after the first run, or similar? Yes, RHV attaches a virtual CD on initial boot. It no longer exists in further boots. Why? (In reply to Eduardo Otubo from comment #22) > > Ok, I didn't know there was a switch on/off on RHV GUI for cloud-init. Does > this switch always worked, meaning, is this behavior a regression? Also, on > the cloud-init switch, there a hint saying "Set up early initialization of > Linux virtual machine using cloud-init.", this leads me to understand that > cloud-init will be used for initial setup, I wouldn't expect it to be on > after the reboot. > My understanding is that this switch wasn't working as expected, well, actually it works, because when turned on, it sends the information to the VM, but when it's off, and the cloud-init service is enabled on boot on the VM, it tries to fetch cloud-init information which wasn't requested through RHV, looks like it's something the cloud-init service does when enabled on boot. > > > > I think that with the cloud-init service up and running on boot in the VM, > > and the fact that we are sending an empty cloud-init config/parameters, > > should be enough to not do changes on the VM. > > Not sure I followed you here. Can you elaborate? > If you turn on the cloud-init switch on RHV and fill in the information needed, like DNS, IP information, etc, then you're asking RHV to send this information to the VM and you will be asking cloud-init service to actually do something with this data. On the other hand, if you leave the cloud-init switch in off, you should be telling the cloud-init service on the VM, "don't do anything / I don't want cloud-init to fetch anything" > > > > Not sure what instance-id is here. Could you please help me understanding > > this so I can see if it can be passed through kernel line before boot ? > > Whenever you create a new instance (a new VM) based on an image, or even > installing a new VM with the distro image, cloud-init will generate a new > instance-id for that specific vm. When you remove the Data Source and reboot > (in this case the distro image on the cirtual CD-ROM), cloud-init thinks > it's a new instance, regenerates a new instance-id and run again (that's > when the network config is overwritten). This is the weird part, I'm not creating/installing a new VM, nor a VM based on an image. I assume that cloud-init will not generate a new instance-id in this case. I created the VM, installed it manually, no automation or anything different than a normal install, I then installed cloud-init package inside the VM, shut it down and then use the cloud-init switch on with the static IP information for example. > > What I mean is: cloud-init here is working as designed. > > You could pass this instance-id via configuration or even on the kernel line > so it wouldn't reset and regenerate the instance-id on the first reboot. According to the above information, my understanding is that cloud-init is trying to fetch new data because it finds the instance-id changed and since it cannot find the data to use (because we didn't specify any), it overwrites network config. Is this correct ? I can second what Javier said in comment #23 and I can easily reproduce. If I create a cloned VM of a template, on the first run with cloud-init enabled, the VM gets the correct IP / DNS etc. as specified for cloud init. On a subsequent power off / run cycle, cloud-init again kicks in during the reboot, but fails to find the necessary information and decides to fail back to a standard ifcfg-eth0 file e.g. that tries to get information via dhcp. Please let me know if you need anything from the RHHI setup I use here and I will gladly provide it. oh and btw - how do I actually disable cloud-init? I tried systemctl disable / mask, touching /etc/cloud/cloud-init.disabled and none of them worked. What is the status of this bug. I have encountered this bug as well and was wondering if there are any workarounds ? No updates so far, but it's on the third place on my priority list. I'm expecting to take a look at this next week. Sorry for the delay. Ok thanks for the update. As a workaround I did was masking out the following services: cloud-config.service cloud-final.service cloud-init-local.service cloud-init.service Is that an overkill? I found out that it was not sufficient to mask cloud-init service alone. This also happens with imported vm's from other kvm platform that has cloud-init packages enabled on boot. I have just tested import from apache cloudstack based platform where I had to mask the cloud init services. Regards, Sammi Having a similar issue with IBM PowerVC 1.4.1.0 and cloud-init-0.7.9-24.el7_5.1.ppc64le Configuration is also reset to DHCP after reboots. Adding network: config: disabled to /etc/cloud/cloud.cfg workaround has been a solution for us. As far as I understand how cloud-init works, it uses the Config Drive, provided by the CDROM image) to rely on an instance-id which is set to the vm. If the Config Drive is removed (which in this case it is) the instance-id is gone, cloud-init thinks it's a new instance-id and runs again, removing all network configuration set before reboot. This behaviour works as designed, there's no bug in cloud-init as I see it. From here we have two solutions: 1) Do not remove Config Drive. The CDROM will remain attached to the vm, cloud-init will read the instance-id every time it needs to be rebooted and no network configuraion will be lost. 2) Pass the instance-id information in the kernel command line in the form: cc:instance-id:<name> (I'm not sure about the kernel variable name) IMHO the cleaner workaround is to mask cloud-init services. It is easier to unmask if one wants new config. Since which version does cloud-init reset everything on a missing config source? I find it very surprising that we have not heard about this years ago. I would like to see a cloud-init feature where I can ask "keep on next boot, unless you find a fresh data source". Would that be possible, Eduardo? From cloud-init's side: ======================= According to cloud-init 0.7.9 documentation cloud-init is configured to run by default on each boot [1] and to render the user-selected network configuration on first boot [2]. Also, in absence of a data source to configure the network, it will fall back to configuring DHCP on eth0 [2]. If a VM is run once, and then in the next regular run the cloud-init flag is not selected in the VM configuration in engine, there is no data-source and cloud-init falls back to dhcp as documented. It is possible to modify cloud-init's behaviour with a 'marker' file, documented as follows: * disabling cloud-init altogether [1] with: touch /etc/cloud/cloud-init.disabled * preventing cloud-init from configuring the network [2] with: echo ‘network: {config: disabled}‘ >> /etc/cloud/cloud.cfg this can be accomplished by adding the above commands to the custom_script that cloud-init runs at the last stage of its operation [3]. There is possibly a third 'hack' that would not require any marker file: assign your static IP to a NIC not named 'eth0'.I have not tested it myself but it looks like a corollary of [2] [1] https://cloudinit.readthedocs.io/en/0.7.9/topics/boot.html#generator [2] https://cloudinit.readthedocs.io/en/0.7.9/topics/boot.html#local [3] https://cloudinit.readthedocs.io/en/0.7.9/topics/boot.html#final From engine's side: ======================= When a VM is started in 'Run once' mode, the initialization parameters supplied for that run are always passed by engine to cloud-init in the guest for application. But if a VM is started in 'Run' mode, the initialization parameters are passed to cloud-init on the guest only if this is the first run (be it 'Run' or 'Run once'). On every consecutive run in 'Run' mode no parameters are passed to the guest, and therefore (as I quoted from the cloud-init documentation earlier in this thread) cloud-init falls back to DHCP configuration on the guest. This is not an overlooked occurrence on engine's behalf but rather the designated behaviour. When this behaviour was introduced into engine the reasoning was that after the initial configuration of the VM, there is no reason to resend the configuration on every 'Run' but only on 'Run once'. That's because 'Run once' may be used for out-of the ordinary instantiations of the VM. Due to the behaviour of the current cloud-init package, this causes an unexpected side effect that can be dealt with by disabling cloud-init in one of the methods I described earlier in this thread. fwiw, adding the following as `custom script` in `Use Cloud-Init` RHV UI works, the VM will get the correct static IP information and will build the ifcfg-ethX correctly, then will disable network configuration from cloud-init ~~~ #cloud-config runcmd: - [ sh, -c, 'echo "network: {config: disabled}" >> /etc/cloud/cloud.cfg' ] ~~~ (In reply to Dan Kenigsberg from comment #37) > IMHO the cleaner workaround is to mask cloud-init services. It is easier to > unmask if one wants new config. > > Since which version does cloud-init reset everything on a missing config > source? I find it very surprising that we have not heard about this years > ago. > > I would like to see a cloud-init feature where I can ask "keep on next boot, > unless you find a fresh data source". Would that be possible, Eduardo? Yes, this is technically possible. But the question here: Do we want it? There's cleaner ways to solve this issue as pointed on other comments. And besides, this change would have to be sent upstream, and I think this feature would hardly make it to public release. What is the cleaner way to express "keep on next boot, unless you find a fresh data source"? I'm aware only of "keep on next boot, unless someone miraculously removes a line from /etc/cloud/cloud.cfg". (and of course, we are Red Hat, we are Upstream First). (In reply to Dan Kenigsberg from comment #41) > What is the cleaner way to express "keep on next boot, unless you find a > fresh data source"? The cleaner way is to configure the guest to have a variable with the instance-id on the kernel line. > I'm aware only of "keep on next boot, unless someone miraculously removes a > line from /etc/cloud/cloud.cfg". Actually what we have today is: Keep on next boot, unless the data source is removed Let us assume for a minute that we cannot change the age-old ovirt semantics of passing the data source to the VM only when we want to change something in cloud-init. What should I do in the guest so that: - if no data source is found, nothing changes - if data source is found, it is acted upon from what I gather from you explanation, once we grubby --update-kernel=`grubby --default-kernel` --args=cc:instance-id we would not get cloud-init acting on new data sources. Please correct me if I am wrong. I've spoken to Ryan Harper (maintainer) on IRC and he advised to use the option `manual_cache_clean' on the configuration file: # manual cache clean. # By default, the link from /var/lib/cloud/instance to # the specific instance in /var/lib/cloud/instances/ is removed on every # boot. The cloud-init code then searches for a DataSource on every boot # if your DataSource will not be present on every boot, then you can set # this option to 'True', and maintain (remove) that link before the image # will be booted as a new instance. # default is False manual_cache_clean: False (source: https://git.launchpad.net/cloud-init/tree/doc/examples/cloud-config.txt#n466) Perhaps this should be the cleanest way to avoid changing anything outside cloud-init and fixing the issue at the sametime. Please give it a try and let me know if I can help with anything else. I hope Liran can try it out, but would you please confirm what happens if manual_cache_clean is set to True, and a new DataSource is added to the VM? Would cloud-init act upon it? (In reply to Dan Kenigsberg from comment #45) > I hope Liran can try it out, but would you please confirm what happens if > manual_cache_clean is set to True, and a new DataSource is added to the VM? > Would cloud-init act upon it? In this case cloud-init won't create a new instance-id whenever a new data sources in added. In this case you might want to use `cloud-init clean' before a boot on a different data source that you want to use. (In reply to Eduardo Otubo from comment #46) > (In reply to Dan Kenigsberg from comment #45) > > I hope Liran can try it out, but would you please confirm what happens if > > manual_cache_clean is set to True, and a new DataSource is added to the VM? > > Would cloud-init act upon it? > > In this case cloud-init won't create a new instance-id whenever a new data > sources in added. In this case you might want to use `cloud-init clean' > before a boot on a different data source that you want to use. Thanks. This means that this configurable does not give me what I am looking for. My request is for a cloud-init mode where - if a data source is found, it is acted upon - if no data source is found, nothing changes Liran tells me that this is actually the behavior for any nic other than eth0. I'd like to extend it to eth0, too. (In reply to Dan Kenigsberg from comment #47) > (In reply to Eduardo Otubo from comment #46) > > (In reply to Dan Kenigsberg from comment #45) > > > I hope Liran can try it out, but would you please confirm what happens if > > > manual_cache_clean is set to True, and a new DataSource is added to the VM? > > > Would cloud-init act upon it? > > > > In this case cloud-init won't create a new instance-id whenever a new data > > sources in added. In this case you might want to use `cloud-init clean' > > before a boot on a different data source that you want to use. > > Thanks. This means that this configurable does not give me what I am looking > for. > > My request is for a cloud-init mode where > - if a data source is found, it is acted upon > - if no data source is found, nothing changes > > Liran tells me that this is actually the behavior for any nic other than > eth0. I'd like to extend it to eth0, too. I understand your point now. How's the priority and the scope of this BZ exactly? I cantry to push a behaviour like that to the configuration file on cloud-init upstream, but this might take a little while for it to be merged. OR, we could take this configuration outside cloud-init, you just have to use `manual_cache_clean' when you want things to remain the same and `cloud-init clean' when you want a new data source to be acted upon. According to this thread[0], Ryan pointed that the most recent version of cloud-init should fix the issue: "Current cloud-init uses a systemd-generator detect if it's booting on a system or platform which is providing a datasource. In the scenario from the bug, it initially provides a cdrom with the NoCloud datasource and then the cdrom is removed. After rebooting the VM, the cdrom is no longer present, cloud-init generator will not find the cdrom with NoCloud seed and it will disable itself (no need to mask services). If the cdrom is provided again with a new instance-id cloud-init will run as if it's a new instance." I prepared an unofficial 18.5 rebased package[1] for another BZ, please give it a try and check if this issue is gone. [0] https://lists.launchpad.net/cloud-init/msg00197.html [1] http://people.redhat.com/~eterrell/cloud-init/18.5/rhel7/bz1632967/ Hi Eduardo, I retest with your test build cloud-init-18.5-0.el7.otubo201903151232.x86_64 on ESXi platform. After reboot the static IP still replace with DHCP IP. seems it doesn't work. BR, Lili Du Liran, the el8 string tells me that you are testing an el8 build; it is unlikely to be installable in any el7. It would be lovely if you can try it on your favorite el8 guest; we can discuss backporting later. @Liran, thanks very much for point the correct link! @Eduardo, I retest with cloud-init-18.2-1.el8.otubo201905200930.noarch.rpm in link http://people.redhat.com/~eterrell/cloud-init/1593010/. After customize Guest uses static IP with cloud-init, then boot up the cloned guest, the IP not change to static, it's still DHCP, the cloud init customize ip failed. I attach the cloud-init log, please help review. Created attachment 1571778 [details]
cloud-init.log-05/22
Created attachment 1573764 [details]
initial-boot-logs
I'm having the same problem. The network configuration of cloud-init didn't pass on the initial boot(first boot before rebooting).
I checked a few types of configurations, including IPv6/IPv4 Static and DHCP. Other cloud-init configurations are set correctly.
Just to make it clear, with the given RPM (cloud-init-18.2-1.el8.otubo201905200930.noarch.rpm) the basic flow of cloud-init networks is broken - regression, and makes it impossible to test the current bug. Eduardo, can you please check it and provide another RPM to test? *** Bug 1680465 has been marked as a duplicate of this bug. *** *** Bug 1534694 has been marked as a duplicate of this bug. *** Comment#53 for summary. After lots of debugging and discussions with QA and maintainers on IRC we found out the root cause being a bug on vmware datasource itself. Vmware resets the datasource on every boot so it can add more customizations every time, but this behavior is just wrong from cloud-init perspective. There's already an open bug on Launchpad (https://bugs.launchpad.net/cloud-init/+bug/1835205, also linked above) and people from vmware are already working on it. Once it's complete I'll backport. Or depending on the time frame, issue a new rebase. Setting back to NEW since there's no work to be done at the moment. Fix included in cloud-init-18.5-5.el7 Patch 90300 (Fix for network configuration not persisting after reboot) committed with hash 9969cf3eaa23398816d140b319b3277465aa4bb8 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:1155 From bug 1862100, we found that this bug introduced one change, that is, 'ds-identify' was added, it affects custom DataSource configuration. As the bug 1593010 was mentioned in cloud-init Changelog, the key information should be updated here. *Important*: The 'ds-identify' is required for custom DataSource to work properly in cloud-init 18.5-5+ . When creating a custom datasource, follow this document to configure your datasource. Using ds-identify ensures that cloud-init recognizes your datasource upon upgrade and reboot. https://cloudinit.readthedocs.io/en/latest/topics/datasources.html#creation |