1975701 – [vsphere][upi] Network is changed to dhcp configuration after second reboot when Tang disk encryption is enabled

Bug 1975701 - [vsphere][upi] Network is changed to dhcp configuration after second reboot when Tang disk encryption is enabled

Summary: [vsphere][upi] Network is changed to dhcp configuration after second reboot w...

Keywords:
Status:	CLOSED UPSTREAM
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	RHCOS
Sub Component:
Version:	4.8
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	RHCOS Bug Triage
QA Contact:	Michael Nguyen
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-06-24 08:46 UTC by jima
Modified:	2022-08-11 14:21 UTC (History)
CC List:	12 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-08-11 14:21:38 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	coreos fedora-coreos-tracker issues 886	0	None	open	Automatic network config propagation in the initramfs when using Tang	2021-06-29 21:41:18 UTC

Comment 1 Colin Walters 2021-06-24 14:18:52 UTC

Fundamentally right now, if you want to use Clevis+Tang, you need to represent the network state as kernel cmdline options; you can't encode the static IP in `/etc/sysconfig/network-scripts` because that only affects the real root filesystem.

Or to state this more obviously: the IP information in `/etc` is encrypted by LUKS, but we need the IP information to access the key.

We have added some basic support to inject a subset of network configuration into the initramfs, but for now I'd go with encoding into the kernel cmdline.

Hmm...we have special afterburn guestinfo injection for static IP: see the docs bits in
https://github.com/openshift/installer/pull/3533/files
and https://coreos.github.io/afterburn/usage/initrd-network-cmdline/#vmware
which...I'm not sure made it to the official docs yet.  But it also only works on the initial boot by design right now.  Perhaps we could make it an option to make it persist to the kernel cmdline.

Comment 2 Dusty Mabe 2021-06-24 15:00:29 UTC

Hey @jima - How are you configuring network right now? i.e. where and how do you specify the IP address when provisioning the machine?

Comment 5 jima 2021-06-25 02:52:21 UTC

We use upstream terraform script to deploy ocp for upi-on-vsphere .
You can refer to [1] to get ifcfg configuration file

[1]https://github.com/openshift/installer/blob/release-4.7/upi/vsphere/vm/ifcfg.tmpl

Comment 6 Dusty Mabe 2021-06-25 18:17:17 UTC

Got ya. Yeah I'm not sure of the best way forward for you right now but it may work to just specify extra kernel arguments into what you're already doing:

```
    kernelArguments:
      - rd.neednet=1
      - ip=10.10.10.10::10.10.10.1:255.255.255.0::ens192:none:8.8.8.8
```

Where you can get populate the ip= string from something like this short bash script:


```
ip='10.10.10.10'
gateway='10.10.10.1'
netmask='255.255.255.0'
hostname=''
interface='ens192'
nameserver='8.8.8.8'
echo "ip=${ip}::${gateway}:${netmask}:${hostname}:${interface}:none:${nameserver}"
```

Comment 9 Dusty Mabe 2021-06-29 20:33:03 UTC

upstream issue to discuss this problem: https://bugzilla.redhat.com/show_bug.cgi?id=1975701

Comment 10 Benjamin Gilbert 2021-06-29 21:41:20 UTC

The upstream issue is actually https://github.com/coreos/fedora-coreos-tracker/issues/886.

Comment 11 Dusty Mabe 2021-06-30 02:14:07 UTC

Shoot - really messed up that copy/pasta - 🤪

Comment 14 Bob Furu 2021-10-11 18:38:02 UTC

Created https://github.com/openshift/openshift-docs/pull/37356 to add the known issue and workaround to the 4.9 release notes: https://github.com/openshift/openshift-docs/pull/37356. Changing to ON_QA. 

@jima - please review for QE. Thanks!

Comment 15 Bob Furu 2021-10-11 19:47:53 UTC

I have also created https://github.com/openshift/openshift-docs/pull/37362 to update the 4.8+ product docs and have tagged QE and SMEs for review.

Comment 16 jima 2021-10-12 01:13:00 UTC

Bob, thanks to update release notes and product docs, I added some comments in above PRs.

Since doc PRs just highlight and provide the workaround to avoid the issue, but not fix the issue from code, it's better to use this bug to track the code fixing, so change back the status to NEW.

Comment 17 Bob Furu 2021-10-12 01:43:21 UTC

Thank you for the feedback, @jima - makes sense to me. I'll address your feedback in my PRs referenced above and leave this BZ status for the RHCOS coding professionals :).

Comment 19 Micah Abbott 2021-10-15 13:58:43 UTC

It appears this has transitioned to a Docs issue, with Bob working on it.  I've changed the subcomponent + assignee accordingly.

Comment 20 Micah Abbott 2021-10-15 15:17:39 UTC

Looks like I was misguided; this is still and issue and maps to https://issues.redhat.com/browse/GRPA-4048

Comment 21 Bob Furu 2021-10-20 22:20:28 UTC

A Known Issue was added in the OCP 4.9 Release Notes: https://github.com/openshift/openshift-docs/pull/37558.
It was determined however that a workaround should not be included in the RN because it requires a Support Exception. Therefore, I closed the workaround PR: https://github.com/openshift/openshift-docs/pull/37362

Comment 24 Timothée Ravier 2022-08-11 14:21:38 UTC

The RFE tracking implementation for this feature is https://issues.redhat.com/browse/RFE-1764
The Epic on the CoreOS board is https://issues.redhat.com/browse/COS-886
Upstream tracking issue is https://github.com/coreos/fedora-coreos-tracker/issues/886
Will close this one.

Note You need to log in before you can comment on or make changes to this bug.