Bug 2030289 - SNO with static IPv6 address is unreachable when booting from the internal drive for the first time
Summary: SNO with static IPv6 address is unreachable when booting from the internal dr...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Advanced Cluster Management for Kubernetes
Classification: Red Hat
Component: Infrastructure Operator
Version: rhacm-2.5
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: rhacm-2.5
Assignee: Mat Kowalski
QA Contact:
Derek
URL:
Whiteboard:
Depends On:
Blocks: 2041889 2049034
TreeView+ depends on / blocked
 
Reported: 2021-12-08 11:33 UTC by Marius Cornea
Modified: 2022-10-03 20:19 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 2041889 2049034 (view as bug list)
Environment:
Last Closed: 2022-10-03 20:19:59 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
console.log (147.36 KB, image/png)
2021-12-08 11:33 UTC, Marius Cornea
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github openshift assisted-service pull 3199 0 None Merged MGMT-8894: Simplify IPv6 Network Manager configuration 2022-02-01 09:59:29 UTC
Github openshift assisted-service pull 3226 0 None Merged MGMT-8894: Catch FCs and nightlies when comparing OCP versions 2022-02-01 09:59:36 UTC
Github openshift assisted-service pull 3230 0 None Merged Bug 2049034: Simplify IPv6 Network Manager configuration 2022-02-05 01:14:58 UTC
Github stolostron backlog issues 19592 0 None None None 2022-02-05 01:15:01 UTC
Red Hat Issue Tracker MGMT-8894 0 None None None 2022-01-12 12:22:27 UTC
Red Hat Issue Tracker MGMTBUGSM-60 0 None None None 2022-02-01 11:46:21 UTC

Internal Links: 2040195

Description Marius Cornea 2021-12-08 11:33:58 UTC
Created attachment 1845231 [details]
console.log

Description of problem:

SNO with static IPv6 address is unreachable when booting from the internal drive for the first time, after the node has booted from ISO and content was written to disk.

Version-Release number of selected component (if applicable):
hub: ocp 4.9.10 + acm 2.4.1
spoke: 4.10.0-0.nightly-2021-12-06-162419

How reproducible:
100%

Steps to Reproduce:
1. Deploy SNO with static IPv6 address 
2. Wait for the node to boot from internal drive during the installation process

Actual results:
Node is unreachable over the network; the network interfaces don't have the IP address set.

Expected results:
Node gets the network configuration applied as defined in NMStateConfig

Additional info:

The nmconnection file exists in /etc/NetworkManager/system-connections but not under  /etc/NetworkManager/system-connections-merged, see attached screenshot from console.

Note: after adjusting /etc/NetworkManager/conf.d/01-ipv6.conf to point and set the keyfile path to /etc/NetworkManager/system-connections and restart NetworkManager the network configuration gets applied and installation moves further.

Comment 1 Mat Kowalski 2021-12-09 15:20:05 UTC
* Does the same issue happen for multi-node cluster?
* Does the same issue happen for a statically set IPv4 address?

Comment 2 Mat Kowalski 2021-12-09 15:21:25 UTC
Can you please attach all the CRs (as well as custom manifests and configs, if existing) used to deploy the cluster?

Comment 5 Marius Cornea 2021-12-09 15:58:37 UTC
(In reply to Mat Kowalski from comment #1)
> * Does the same issue happen for multi-node cluster?

I haven't tried a multi-node cluster, the setup that I currently use is targeted for SNO spoke clusters.
 
> * Does the same issue happen for a statically set IPv4 address?

I haven't tried with IPv4 but based on the initial analysis I suspect the same issue would reproduce with IPv4 as the NetworkManager keyfile from /etc/NetworkManager/system-connections would not get loaded.

Comment 6 nshidlin 2022-01-10 07:58:51 UTC
(In reply to Mat Kowalski from comment #1)
> * Does the same issue happen for a statically set IPv4 address?
This does not reproduce for ipv4 using 4.10.0-0.nightly-2022-01-08-215919. Both with ovn and sdn networking the nodes revive the static ip defined after reboot.

Comment 7 Mat Kowalski 2022-01-10 18:01:31 UTC
From a further discussion - this is a regression between 4.9 and 4.10 affecting only IPv6 static configuration.

Comment 8 yevgeny shnaidman 2022-01-12 11:39:36 UTC
In case of the IPv6 static configuration, there are some config file that should be added to the ignition. They are not added because the code is looking at the cluster in order to check if the StaticNetworkConfig is defined. Since we moved to V2/infra env, the staticNetworkConfig is defined in the InfraEnv only:

https://github.com/openshift/assisted-service/blob/c7e6388c9cfd4a67e48760ab6fcfd830a0a3ae42/internal/ignition/ignition.go#L954

Comment 9 nshidlin 2022-01-12 11:49:19 UTC
(In reply to yevgeny shnaidman from comment #8)
> In case of the IPv6 static configuration, there are some config file that
> should be added to the ignition. They are not added because the code is
> looking at the cluster in order to check if the StaticNetworkConfig is
> defined. Since we moved to V2/infra env, the staticNetworkConfig is defined
> in the InfraEnv only:
> 
> https://github.com/openshift/assisted-service/blob/
> c7e6388c9cfd4a67e48760ab6fcfd830a0a3ae42/internal/ignition/ignition.go#L954

How do we explain then that this flow is working with OCP 4.9?

Comment 10 Mat Kowalski 2022-01-13 11:26:54 UTC
>How do we explain then that this flow is working with OCP 4.9?

Bug in assisted (https://issues.redhat.com/browse/MGMT-8894) does not affect RHCOS/OCP 4.9 and this version works correctly despite it. It's only RHCOS/OCP 4.10 that is affected. We need to investigate what has changed regarding the directories inside /etc/NetworkManager between those two versions and assess what is really the issue that we have exposed.

Comment 11 Mat Kowalski 2022-01-14 09:31:40 UTC
TLDR for anyone not following the chain of dependent tickets - the issue here is a combination of 2 problems

1) moving to InfraEnvs in Assisted Installer and missing logic for handling static network configuration in InfraEnv instead of Cluster object
2) bug in systemd-preset not handling mountpoints with special characters (https://bugzilla.redhat.com/show_bug.cgi?id=1952686)

https://github.com/openshift/assisted-service/pull/3199 is supposed to fix the issue by heavily simplifying the logic used for manual network configuration in case of IPv6 stack being used.

Comment 12 Marius Cornea 2022-01-18 12:14:46 UTC
(In reply to Mat Kowalski from comment #11)
> TLDR for anyone not following the chain of dependent tickets - the issue
> here is a combination of 2 problems
> 
> 1) moving to InfraEnvs in Assisted Installer and missing logic for handling
> static network configuration in InfraEnv instead of Cluster object
> 2) bug in systemd-preset not handling mountpoints with special characters
> (https://bugzilla.redhat.com/show_bug.cgi?id=1952686)
> 
> https://github.com/openshift/assisted-service/pull/3199 is supposed to fix
> the issue by heavily simplifying the logic used for manual network
> configuration in case of IPv6 stack being used.

Hi Mat,

I am seeing the same issue(losing connectivity) on a node which is upgraded from 4.9 to 4.10. Should your fix handle this case as well or should I file a new BZ to keep track of the upgrade use case?

Thanks

Comment 13 Mat Kowalski 2022-01-18 12:59:04 UTC
No, this fix does not handle upgrade. Please open a separate bug linking this one here

Comment 19 nshidlin 2022-03-13 08:56:54 UTC
 verified with:
ACM 2.5.0-DOWNSTREAM-2022-03-09-19-54-43
OCP 4.10.3


Note You need to log in before you can comment on or make changes to this bug.