Bug 1762509 - [vsphere] [upi] [ci] After node is rebooted reverts to a DHCP configuration
Summary: [vsphere] [upi] [ci] After node is rebooted reverts to a DHCP configuration
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: RHCOS
Version: 4.3.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: 4.3.0
Assignee: Colin Walters
QA Contact: Michael Nguyen
URL:
Whiteboard:
: 1762285 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-10-16 21:01 UTC by Joseph Callen
Modified: 2023-09-14 05:44 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-11-08 00:27:40 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Joseph Callen 2019-10-16 21:01:16 UTC
Description of problem:

Ignition creates the interface correctly then within our terraform there is a reboot:
https://github.com/openshift/installer/blob/0b573c545f34c0ae54c885960f0181eafa498dbf/upi/vsphere/machine/ignition.tf#L44-L58

Once that reboot occurs the interface configuration file is changed from defined statically to DHCP. See slack thread for additional details:
https://coreos.slack.com/archives/C999USB0D/p1571255388217700

[root@bootstrap-0 network-scripts]# cat ifcfg-ens192
# Generated by dracut initrd
NAME="ens192"
DEVICE="ens192"
ONBOOT=yes
NETBOOT=yes
UUID="f56f763a-8851-49ff-9e56-428d94c6632e"
IPV6INIT=yes
BOOTPROTO=dhcp
TYPE=Ethernet


RHCOS versions:
rhcos-43devel.80.20191015.0-vmware.x86_64.ova
rhcos-43.80.20191002.1-vmware.x86_64.ova

Does not happen in:
rhcos-42.80.20191002.0-vmware.ova


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Joseph Callen 2019-10-17 15:00:53 UTC
*** Bug 1762285 has been marked as a duplicate of this bug. ***

Comment 3 Colin Walters 2019-10-23 18:57:22 UTC
> Ignition creates the interface correctly then within our terraform there is a reboot:

https://github.com/openshift/installer/pull/2554

Comment 4 Colin Walters 2019-10-23 19:27:25 UTC
OK one thing I discovered too is we definitely broke https://github.com/coreos/ignition-dracut/pull/98 somehow in 4.3;
with a default `cosa run` with qemu I see an auto-generated ifcfg file from dracut in /etc/sysconfig/network-scripts in
a local RHCOS 43.81 build but not in a 4.2 build.  Investigating 🕵

Comment 5 Colin Walters 2019-10-23 20:15:14 UTC
OK here's the smoking gun:

```
[root@coreos ~]# rpm-ostree status
State: idle
AutomaticUpdates: disabled
Deployments:
● pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:c3885598b7a6073dea581adc3c1c543debf64c803cd3132472c7f4ba4f86c3af
              CustomOrigin: Provisioned from oscontainer
                   Version: 420.8.20190624.0 (2019-06-24T12:26:59Z)

[root@coreos ~]# systemctl status import-state
Unit import-state.service could not be found.
[root@coreos ~]# 
```

versus:

```
[root@coreos ~]# rpm-ostree status
State: idle
AutomaticUpdates: disabled
Deployments:
* ostree://f68ac4e44c1f7e2cd6b066f6974d29cd0debfc1e6a806d391fa9e434ca9dacd5
                   Version: 43.81.20191023.2 (2019-10-23T20:02:27Z)
[root@coreos ~]# systemctl status import-state
● import-state.service - Import network configuration from initramfs
   Loaded: loaded (/usr/lib/systemd/system/import-state.service; enabled; vendor preset: enabled)
   Active: active (exited) since Wed 2019-10-23 20:14:41 UTC; 3s ago
  Process: 1319 ExecStart=/usr/libexec/import-state (code=exited, status=0/SUCCESS)
 Main PID: 1319 (code=exited, status=0/SUCCESS)

Oct 23 20:14:41 coreos systemd[1]: Starting Import network configuration from initramfs...
Oct 23 20:14:41 coreos systemd[1]: Started Import network configuration from initramfs.
[root@coreos ~]# 

```

Comment 7 Joseph Callen 2019-10-28 16:20:34 UTC
Testing with removing restart service:
https://github.com/jcpowermac/installer/blob/vmware_on_aws/upi/vsphere/vmware/machine/ignition.tf

and

Testing with https://releases-rhcos-art.cloud.privileged.psi.redhat.com/storage/releases/rhcos-4.3/43.81.20191028.0/x86_64/rhcos-43.81.20191028.0-vmware.x86_64.ova

resulted in a reboot and reverting the static ip address.  Still not build with MR562?

[core@master-0 ~]$ cat /etc/sysconfig/network-scripts/ifcfg-ens192
# Generated by dracut initrd
NAME="ens192"
DEVICE="ens192"
ONBOOT=yes
NETBOOT=yes
UUID="14ac921f-fdbc-486f-95e9-682f62f185a5"
IPV6INIT=yes
BOOTPROTO=dhcp
TYPE=Ethernet
[core@master-0 ~]$

Comment 8 Micah Abbott 2019-10-29 13:16:38 UTC
Joseph, that build you tried with did not have the fix in MR652.

Could you try with a build after 43.81.20191025.1?

Comment 9 Joseph Callen 2019-10-29 13:26:09 UTC
Currently downloading: rhcos-43.81.20191029.2-vmware.x86_64.ova

https://releases-rhcos-art.cloud.privileged.psi.redhat.com/storage/releases/rhcos-4.3/43.81.20191029.2/x86_64/rhcos-43.81.20191029.2-vmware.x86_64.ova

I will test and update the BZ.

Comment 10 Joseph Callen 2019-10-29 15:10:16 UTC
This version: rhcos-43.81.20191029.2-vmware
still has the reboot reset the network-config issue.

I watched the console.  It was set to a static ip address then rebooted. Resulting in this configuration:

[core@master-0 ~]$ cat /etc/sysconfig/network-scripts/ifcfg-ens192
# Generated by dracut initrd
NAME="ens192"
DEVICE="ens192"
ONBOOT=yes
NETBOOT=yes
UUID="9cae5f95-b0e0-4994-b501-fc37f98091c4"
IPV6INIT=yes
BOOTPROTO=dhcp
TYPE=Ethernet

The restart service that was in vSphere terraform has been removed:
https://github.com/jcpowermac/installer/blob/vmware_on_aws/upi/vsphere/vmware/machine/ignition.tf

I have been testing the above terraform w/4.2 without issue.

Comment 11 Colin Walters 2019-10-29 15:29:12 UTC
Hmm.  Can you run `rpm-ostree status` there too?  In other words are you *sure* you booted into that new image?

Try also the `systemctl status import-state` I posted above.

Comment 12 Joseph Callen 2019-10-29 15:48:43 UTC
Yes I am 100% certain it is the correct version:

From terraform.tfvars:
vm_template = "rhcos-43.81.20191029.2-vmware.x86_64"

From vCenter:
10/29/2019, 11:43:25 AM rhcos-43.81.20191029.2-vmware.x86_64 cloned to master-0 on 10.2.32.6, in SDDC-Datacenter


[core@master-0 ~]$ sudo rpm-ostree status
State: idle
AutomaticUpdates: disabled
Deployments:
● pivot://registry.svc.ci.openshift.org/origin/4.3-2019-10-29-152807@sha256:6dbbcd404d6d00ac202ce6f64a5bbce5ba640d5afcc5c822ccf9a2c7dea9310d
              CustomOrigin: Managed by machine-config-operator
                   Version: 43.81.20191029.2 (2019-10-29T08:56:39Z)

  ostree://60e0be487c3bd70d32c4a2268ec90fbcfbf8a5e2830ebb3854a040f6be851ebb
                   Version: 43.81.20191029.2 (2019-10-29T08:56:39Z)

[core@master-0 ~]$ sudo systemctl status import-state
Unit import-state.service could not be found.

Comment 13 Colin Walters 2019-10-29 18:40:03 UTC
Hmm.  So this is probably https://github.com/coreos/ignition-dracut/pull/128
which we haven't built yet into RHCOS but I will do soon.

But what I don't understand right now is why this isn't affecting 4.2.

Comment 14 Yu Qi Zhang 2019-10-29 18:45:42 UTC
I don't think its https://github.com/coreos/ignition-dracut/pull/128.
Vshpere for one doesn't use persist-ifcfg, and this is upon a reboot.
As in on firstboot, the machine comes up fine, and in the real root
the correct networking is there. After the reboot, another dhcp happens
for some reason and overwrite the ifcfg file on the system. From the
logs I saw before it seems that NM notices there is an existing ifcfg
file, but doesn't seem to understand what it is, and opts to dhcp
instead.

Comment 15 Colin Walters 2019-10-29 18:49:45 UTC
> From the
logs I saw before it seems that NM notices there is an existing ifcfg
file, but doesn't seem to understand what it is, and opts to dhcp
instead.

Ohh.  Then it's likely https://github.com/coreos/ignition-dracut/pull/130
Possibly selinux policy change between RHEL 8.0 and 8.1 tightened up access to unlabeled_t ?

Comment 16 Yu Qi Zhang 2019-10-29 19:14:37 UTC
Hm, possible, I don't know the exact timing of the bug.

That said the vsphere files are being dropped in via ignition directly,
as in they're ifcfg files in the ignition config for the system. I'm
under the impression that's a different path entirely?

Comment 17 Colin Walters 2019-10-29 20:32:58 UTC
I believe 43.81.20191029.5 will fix this.

Comment 18 Colin Walters 2019-10-29 20:33:42 UTC
(Though, that build includes crio-1.16 which may or may not work after dependent PRs have merged, we'll see)

Comment 19 Joseph Callen 2019-10-30 01:11:49 UTC
Just tried to test and it was pivoted back to .3

[core@master-0 ~]$ sudo rpm-ostree status
State: idle
AutomaticUpdates: disabled
Deployments:
● pivot://registry.svc.ci.openshift.org/origin/4.3-2019-10-29-180250@sha256:88406db92d5249d005226cb87adef0fd28cd9e5a73e76eb2d60499c6108fafb0
              CustomOrigin: Managed by machine-config-operator
                   Version: 43.81.20191029.3 (2019-10-29T11:53:53Z)

  ostree://f03bf128bcc7b8fd9163273bf3f2e1f6f4ec413d79721d669a5ffa76ed2b6b52
                   Version: 43.81.20191029.5 (2019-10-29T18:06:49Z)
[core@master-0 ~]$

Comment 20 Colin Walters 2019-11-01 20:58:38 UTC
https://github.com/openshift/installer/pull/2609

Comment 21 Colin Walters 2019-11-04 20:24:24 UTC
OK, there's apparently yet *another* bug going on here.  I'm seeing dracut still start dhclient in the initramfs, even without rd.neednet.

Comment 22 Colin Walters 2019-11-04 20:36:35 UTC
Ah, it's adding clevis that broke this:

cmdline() {
    echo "rd.neednet=1"
}

See also https://bugzilla.redhat.com/show_bug.cgi?id=1687753

Comment 24 Micah Abbott 2019-11-07 20:57:34 UTC
The MR 678 landed in RHCOS 43.81.201911071801.0 from today

@jcallen can you give it another try with that image?

Comment 25 Red Hat Bugzilla 2023-09-14 05:44:38 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days


Note You need to log in before you can comment on or make changes to this bug.