Bug 1762509
| Summary: | [vsphere] [upi] [ci] After node is rebooted reverts to a DHCP configuration | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Joseph Callen <jcallen> |
| Component: | RHCOS | Assignee: | Colin Walters <walters> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Michael Nguyen <mnguyen> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.3.0 | CC: | bbreard, dustymabe, imcleod, jerzhang, jligon, lszaszki, miabbott, nstielau, sdodson, walters |
| Target Milestone: | --- | ||
| Target Release: | 4.3.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2019-11-08 00:27:40 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Joseph Callen
2019-10-16 21:01:16 UTC
*** Bug 1762285 has been marked as a duplicate of this bug. *** > Ignition creates the interface correctly then within our terraform there is a reboot: https://github.com/openshift/installer/pull/2554 OK one thing I discovered too is we definitely broke https://github.com/coreos/ignition-dracut/pull/98 somehow in 4.3; with a default `cosa run` with qemu I see an auto-generated ifcfg file from dracut in /etc/sysconfig/network-scripts in a local RHCOS 43.81 build but not in a 4.2 build. Investigating 🕵 OK here's the smoking gun:
```
[root@coreos ~]# rpm-ostree status
State: idle
AutomaticUpdates: disabled
Deployments:
● pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:c3885598b7a6073dea581adc3c1c543debf64c803cd3132472c7f4ba4f86c3af
CustomOrigin: Provisioned from oscontainer
Version: 420.8.20190624.0 (2019-06-24T12:26:59Z)
[root@coreos ~]# systemctl status import-state
Unit import-state.service could not be found.
[root@coreos ~]#
```
versus:
```
[root@coreos ~]# rpm-ostree status
State: idle
AutomaticUpdates: disabled
Deployments:
* ostree://f68ac4e44c1f7e2cd6b066f6974d29cd0debfc1e6a806d391fa9e434ca9dacd5
Version: 43.81.20191023.2 (2019-10-23T20:02:27Z)
[root@coreos ~]# systemctl status import-state
● import-state.service - Import network configuration from initramfs
Loaded: loaded (/usr/lib/systemd/system/import-state.service; enabled; vendor preset: enabled)
Active: active (exited) since Wed 2019-10-23 20:14:41 UTC; 3s ago
Process: 1319 ExecStart=/usr/libexec/import-state (code=exited, status=0/SUCCESS)
Main PID: 1319 (code=exited, status=0/SUCCESS)
Oct 23 20:14:41 coreos systemd[1]: Starting Import network configuration from initramfs...
Oct 23 20:14:41 coreos systemd[1]: Started Import network configuration from initramfs.
[root@coreos ~]#
```
Testing with removing restart service: https://github.com/jcpowermac/installer/blob/vmware_on_aws/upi/vsphere/vmware/machine/ignition.tf and Testing with https://releases-rhcos-art.cloud.privileged.psi.redhat.com/storage/releases/rhcos-4.3/43.81.20191028.0/x86_64/rhcos-43.81.20191028.0-vmware.x86_64.ova resulted in a reboot and reverting the static ip address. Still not build with MR562? [core@master-0 ~]$ cat /etc/sysconfig/network-scripts/ifcfg-ens192 # Generated by dracut initrd NAME="ens192" DEVICE="ens192" ONBOOT=yes NETBOOT=yes UUID="14ac921f-fdbc-486f-95e9-682f62f185a5" IPV6INIT=yes BOOTPROTO=dhcp TYPE=Ethernet [core@master-0 ~]$ Joseph, that build you tried with did not have the fix in MR652. Could you try with a build after 43.81.20191025.1? Currently downloading: rhcos-43.81.20191029.2-vmware.x86_64.ova https://releases-rhcos-art.cloud.privileged.psi.redhat.com/storage/releases/rhcos-4.3/43.81.20191029.2/x86_64/rhcos-43.81.20191029.2-vmware.x86_64.ova I will test and update the BZ. This version: rhcos-43.81.20191029.2-vmware still has the reboot reset the network-config issue. I watched the console. It was set to a static ip address then rebooted. Resulting in this configuration: [core@master-0 ~]$ cat /etc/sysconfig/network-scripts/ifcfg-ens192 # Generated by dracut initrd NAME="ens192" DEVICE="ens192" ONBOOT=yes NETBOOT=yes UUID="9cae5f95-b0e0-4994-b501-fc37f98091c4" IPV6INIT=yes BOOTPROTO=dhcp TYPE=Ethernet The restart service that was in vSphere terraform has been removed: https://github.com/jcpowermac/installer/blob/vmware_on_aws/upi/vsphere/vmware/machine/ignition.tf I have been testing the above terraform w/4.2 without issue. Hmm. Can you run `rpm-ostree status` there too? In other words are you *sure* you booted into that new image? Try also the `systemctl status import-state` I posted above. Yes I am 100% certain it is the correct version:
From terraform.tfvars:
vm_template = "rhcos-43.81.20191029.2-vmware.x86_64"
From vCenter:
10/29/2019, 11:43:25 AM rhcos-43.81.20191029.2-vmware.x86_64 cloned to master-0 on 10.2.32.6, in SDDC-Datacenter
[core@master-0 ~]$ sudo rpm-ostree status
State: idle
AutomaticUpdates: disabled
Deployments:
● pivot://registry.svc.ci.openshift.org/origin/4.3-2019-10-29-152807@sha256:6dbbcd404d6d00ac202ce6f64a5bbce5ba640d5afcc5c822ccf9a2c7dea9310d
CustomOrigin: Managed by machine-config-operator
Version: 43.81.20191029.2 (2019-10-29T08:56:39Z)
ostree://60e0be487c3bd70d32c4a2268ec90fbcfbf8a5e2830ebb3854a040f6be851ebb
Version: 43.81.20191029.2 (2019-10-29T08:56:39Z)
[core@master-0 ~]$ sudo systemctl status import-state
Unit import-state.service could not be found.
Hmm. So this is probably https://github.com/coreos/ignition-dracut/pull/128 which we haven't built yet into RHCOS but I will do soon. But what I don't understand right now is why this isn't affecting 4.2. I don't think its https://github.com/coreos/ignition-dracut/pull/128. Vshpere for one doesn't use persist-ifcfg, and this is upon a reboot. As in on firstboot, the machine comes up fine, and in the real root the correct networking is there. After the reboot, another dhcp happens for some reason and overwrite the ifcfg file on the system. From the logs I saw before it seems that NM notices there is an existing ifcfg file, but doesn't seem to understand what it is, and opts to dhcp instead. > From the logs I saw before it seems that NM notices there is an existing ifcfg file, but doesn't seem to understand what it is, and opts to dhcp instead. Ohh. Then it's likely https://github.com/coreos/ignition-dracut/pull/130 Possibly selinux policy change between RHEL 8.0 and 8.1 tightened up access to unlabeled_t ? Hm, possible, I don't know the exact timing of the bug. That said the vsphere files are being dropped in via ignition directly, as in they're ifcfg files in the ignition config for the system. I'm under the impression that's a different path entirely? I believe 43.81.20191029.5 will fix this. (Though, that build includes crio-1.16 which may or may not work after dependent PRs have merged, we'll see) Just tried to test and it was pivoted back to .3
[core@master-0 ~]$ sudo rpm-ostree status
State: idle
AutomaticUpdates: disabled
Deployments:
● pivot://registry.svc.ci.openshift.org/origin/4.3-2019-10-29-180250@sha256:88406db92d5249d005226cb87adef0fd28cd9e5a73e76eb2d60499c6108fafb0
CustomOrigin: Managed by machine-config-operator
Version: 43.81.20191029.3 (2019-10-29T11:53:53Z)
ostree://f03bf128bcc7b8fd9163273bf3f2e1f6f4ec413d79721d669a5ffa76ed2b6b52
Version: 43.81.20191029.5 (2019-10-29T18:06:49Z)
[core@master-0 ~]$
OK, there's apparently yet *another* bug going on here. I'm seeing dracut still start dhclient in the initramfs, even without rd.neednet. Ah, it's adding clevis that broke this:
cmdline() {
echo "rd.neednet=1"
}
See also https://bugzilla.redhat.com/show_bug.cgi?id=1687753
The MR 678 landed in RHCOS 43.81.201911071801.0 from today @jcallen can you give it another try with that image? The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days |