2030289 – SNO with static IPv6 address is unreachable when booting from the internal drive for the first time

Bug 2030289 - SNO with static IPv6 address is unreachable when booting from the internal drive for the first time

Summary: SNO with static IPv6 address is unreachable when booting from the internal dr...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Advanced Cluster Management for Kubernetes
Classification:	Red Hat
Component:	Infrastructure Operator
Sub Component:
Version:	rhacm-2.5
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Target Release:	rhacm-2.5
Assignee:	Mat Kowalski
QA Contact:
Docs Contact:	Derek
URL:
Whiteboard:
Depends On:
Blocks:	2041889 2049034
TreeView+	depends on / blocked

Reported:	2021-12-08 11:33 UTC by Marius Cornea
Modified:	2022-10-03 20:19 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	2041889 2049034 (view as bug list)
Environment:
Last Closed:	2022-10-03 20:19:59 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
console.log (147.36 KB, image/png) 2021-12-08 11:33 UTC, Marius Cornea	no flags	Details
View All

Links
System	ID	Priority	Status	Summary	Last Updated
Github	openshift assisted-service pull 3199	None	Merged	MGMT-8894: Simplify IPv6 Network Manager configuration	2022-02-01 09:59:29 UTC
Github	openshift assisted-service pull 3226	None	Merged	MGMT-8894: Catch FCs and nightlies when comparing OCP versions	2022-02-01 09:59:36 UTC
Github	openshift assisted-service pull 3230	None	Merged	Bug 2049034: Simplify IPv6 Network Manager configuration	2022-02-05 01:14:58 UTC
Github	stolostron backlog issues 19592	None	None	None	2022-02-05 01:15:01 UTC
Red Hat Issue Tracker	MGMT-8894	None	None	None	2022-01-12 12:22:27 UTC
Red Hat Issue Tracker	MGMTBUGSM-60	None	None	None	2022-02-01 11:46:21 UTC

Internal Links: 2040195

Description Marius Cornea 2021-12-08 11:33:58 UTC

Created attachment 1845231 [details]
console.log

Description of problem:

SNO with static IPv6 address is unreachable when booting from the internal drive for the first time, after the node has booted from ISO and content was written to disk.

Version-Release number of selected component (if applicable):
hub: ocp 4.9.10 + acm 2.4.1
spoke: 4.10.0-0.nightly-2021-12-06-162419

How reproducible:
100%

Steps to Reproduce:
1. Deploy SNO with static IPv6 address 
2. Wait for the node to boot from internal drive during the installation process

Actual results:
Node is unreachable over the network; the network interfaces don't have the IP address set.

Expected results:
Node gets the network configuration applied as defined in NMStateConfig

Additional info:

The nmconnection file exists in /etc/NetworkManager/system-connections but not under  /etc/NetworkManager/system-connections-merged, see attached screenshot from console.

Note: after adjusting /etc/NetworkManager/conf.d/01-ipv6.conf to point and set the keyfile path to /etc/NetworkManager/system-connections and restart NetworkManager the network configuration gets applied and installation moves further.

Comment 1 Mat Kowalski 2021-12-09 15:20:05 UTC

* Does the same issue happen for multi-node cluster?
* Does the same issue happen for a statically set IPv4 address?

Comment 2 Mat Kowalski 2021-12-09 15:21:25 UTC

Can you please attach all the CRs (as well as custom manifests and configs, if existing) used to deploy the cluster?

Comment 5 Marius Cornea 2021-12-09 15:58:37 UTC

(In reply to Mat Kowalski from comment #1)
> * Does the same issue happen for multi-node cluster?

I haven't tried a multi-node cluster, the setup that I currently use is targeted for SNO spoke clusters.
 
> * Does the same issue happen for a statically set IPv4 address?

I haven't tried with IPv4 but based on the initial analysis I suspect the same issue would reproduce with IPv4 as the NetworkManager keyfile from /etc/NetworkManager/system-connections would not get loaded.

Comment 6 nshidlin 2022-01-10 07:58:51 UTC

(In reply to Mat Kowalski from comment #1)
> * Does the same issue happen for a statically set IPv4 address?
This does not reproduce for ipv4 using 4.10.0-0.nightly-2022-01-08-215919. Both with ovn and sdn networking the nodes revive the static ip defined after reboot.

Comment 7 Mat Kowalski 2022-01-10 18:01:31 UTC

From a further discussion - this is a regression between 4.9 and 4.10 affecting only IPv6 static configuration.

Comment 8 yevgeny shnaidman 2022-01-12 11:39:36 UTC

In case of the IPv6 static configuration, there are some config file that should be added to the ignition. They are not added because the code is looking at the cluster in order to check if the StaticNetworkConfig is defined. Since we moved to V2/infra env, the staticNetworkConfig is defined in the InfraEnv only:

https://github.com/openshift/assisted-service/blob/c7e6388c9cfd4a67e48760ab6fcfd830a0a3ae42/internal/ignition/ignition.go#L954

Comment 9 nshidlin 2022-01-12 11:49:19 UTC

(In reply to yevgeny shnaidman from comment #8)
> In case of the IPv6 static configuration, there are some config file that
> should be added to the ignition. They are not added because the code is
> looking at the cluster in order to check if the StaticNetworkConfig is
> defined. Since we moved to V2/infra env, the staticNetworkConfig is defined
> in the InfraEnv only:
> 
> https://github.com/openshift/assisted-service/blob/
> c7e6388c9cfd4a67e48760ab6fcfd830a0a3ae42/internal/ignition/ignition.go#L954

How do we explain then that this flow is working with OCP 4.9?

Comment 10 Mat Kowalski 2022-01-13 11:26:54 UTC

>How do we explain then that this flow is working with OCP 4.9?

Bug in assisted (https://issues.redhat.com/browse/MGMT-8894) does not affect RHCOS/OCP 4.9 and this version works correctly despite it. It's only RHCOS/OCP 4.10 that is affected. We need to investigate what has changed regarding the directories inside /etc/NetworkManager between those two versions and assess what is really the issue that we have exposed.

Comment 11 Mat Kowalski 2022-01-14 09:31:40 UTC

TLDR for anyone not following the chain of dependent tickets - the issue here is a combination of 2 problems

1) moving to InfraEnvs in Assisted Installer and missing logic for handling static network configuration in InfraEnv instead of Cluster object
2) bug in systemd-preset not handling mountpoints with special characters (https://bugzilla.redhat.com/show_bug.cgi?id=1952686)

https://github.com/openshift/assisted-service/pull/3199 is supposed to fix the issue by heavily simplifying the logic used for manual network configuration in case of IPv6 stack being used.

Comment 12 Marius Cornea 2022-01-18 12:14:46 UTC

(In reply to Mat Kowalski from comment #11)
> TLDR for anyone not following the chain of dependent tickets - the issue
> here is a combination of 2 problems
> 
> 1) moving to InfraEnvs in Assisted Installer and missing logic for handling
> static network configuration in InfraEnv instead of Cluster object
> 2) bug in systemd-preset not handling mountpoints with special characters
> (https://bugzilla.redhat.com/show_bug.cgi?id=1952686)
> 
> https://github.com/openshift/assisted-service/pull/3199 is supposed to fix
> the issue by heavily simplifying the logic used for manual network
> configuration in case of IPv6 stack being used.

Hi Mat,

I am seeing the same issue(losing connectivity) on a node which is upgraded from 4.9 to 4.10. Should your fix handle this case as well or should I file a new BZ to keep track of the upgrade use case?

Thanks

Comment 13 Mat Kowalski 2022-01-18 12:59:04 UTC

No, this fix does not handle upgrade. Please open a separate bug linking this one here

Comment 19 nshidlin 2022-03-13 08:56:54 UTC

 verified with:
ACM 2.5.0-DOWNSTREAM-2022-03-09-19-54-43
OCP 4.10.3

Note You need to log in before you can comment on or make changes to this bug.