Bug 2030289

Summary: SNO with static IPv6 address is unreachable when booting from the internal drive for the first time
Product: Red Hat Advanced Cluster Management for Kubernetes Reporter: Marius Cornea <mcornea>
Component: Infrastructure OperatorAssignee: Mat Kowalski <mko>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: urgent Docs Contact: Derek <dcadzow>
Priority: unspecified    
Version: rhacm-2.5CC: achernet, agurenko, aos-bugs, ccrum, mfilanov, mko, trwest, yfirst, yshnaidm
Target Milestone: ---Keywords: Regression
Target Release: rhacm-2.5   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 2041889 2049034 (view as bug list) Environment:
Last Closed: 2022-10-03 20:19:59 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2041889, 2049034    
Attachments:
Description Flags
console.log none

Description Marius Cornea 2021-12-08 11:33:58 UTC
Created attachment 1845231 [details]
console.log

Description of problem:

SNO with static IPv6 address is unreachable when booting from the internal drive for the first time, after the node has booted from ISO and content was written to disk.

Version-Release number of selected component (if applicable):
hub: ocp 4.9.10 + acm 2.4.1
spoke: 4.10.0-0.nightly-2021-12-06-162419

How reproducible:
100%

Steps to Reproduce:
1. Deploy SNO with static IPv6 address 
2. Wait for the node to boot from internal drive during the installation process

Actual results:
Node is unreachable over the network; the network interfaces don't have the IP address set.

Expected results:
Node gets the network configuration applied as defined in NMStateConfig

Additional info:

The nmconnection file exists in /etc/NetworkManager/system-connections but not under  /etc/NetworkManager/system-connections-merged, see attached screenshot from console.

Note: after adjusting /etc/NetworkManager/conf.d/01-ipv6.conf to point and set the keyfile path to /etc/NetworkManager/system-connections and restart NetworkManager the network configuration gets applied and installation moves further.

Comment 1 Mat Kowalski 2021-12-09 15:20:05 UTC
* Does the same issue happen for multi-node cluster?
* Does the same issue happen for a statically set IPv4 address?

Comment 2 Mat Kowalski 2021-12-09 15:21:25 UTC
Can you please attach all the CRs (as well as custom manifests and configs, if existing) used to deploy the cluster?

Comment 5 Marius Cornea 2021-12-09 15:58:37 UTC
(In reply to Mat Kowalski from comment #1)
> * Does the same issue happen for multi-node cluster?

I haven't tried a multi-node cluster, the setup that I currently use is targeted for SNO spoke clusters.
 
> * Does the same issue happen for a statically set IPv4 address?

I haven't tried with IPv4 but based on the initial analysis I suspect the same issue would reproduce with IPv4 as the NetworkManager keyfile from /etc/NetworkManager/system-connections would not get loaded.

Comment 6 nshidlin 2022-01-10 07:58:51 UTC
(In reply to Mat Kowalski from comment #1)
> * Does the same issue happen for a statically set IPv4 address?
This does not reproduce for ipv4 using 4.10.0-0.nightly-2022-01-08-215919. Both with ovn and sdn networking the nodes revive the static ip defined after reboot.

Comment 7 Mat Kowalski 2022-01-10 18:01:31 UTC
From a further discussion - this is a regression between 4.9 and 4.10 affecting only IPv6 static configuration.

Comment 8 yevgeny shnaidman 2022-01-12 11:39:36 UTC
In case of the IPv6 static configuration, there are some config file that should be added to the ignition. They are not added because the code is looking at the cluster in order to check if the StaticNetworkConfig is defined. Since we moved to V2/infra env, the staticNetworkConfig is defined in the InfraEnv only:

https://github.com/openshift/assisted-service/blob/c7e6388c9cfd4a67e48760ab6fcfd830a0a3ae42/internal/ignition/ignition.go#L954

Comment 9 nshidlin 2022-01-12 11:49:19 UTC
(In reply to yevgeny shnaidman from comment #8)
> In case of the IPv6 static configuration, there are some config file that
> should be added to the ignition. They are not added because the code is
> looking at the cluster in order to check if the StaticNetworkConfig is
> defined. Since we moved to V2/infra env, the staticNetworkConfig is defined
> in the InfraEnv only:
> 
> https://github.com/openshift/assisted-service/blob/
> c7e6388c9cfd4a67e48760ab6fcfd830a0a3ae42/internal/ignition/ignition.go#L954

How do we explain then that this flow is working with OCP 4.9?

Comment 10 Mat Kowalski 2022-01-13 11:26:54 UTC
>How do we explain then that this flow is working with OCP 4.9?

Bug in assisted (https://issues.redhat.com/browse/MGMT-8894) does not affect RHCOS/OCP 4.9 and this version works correctly despite it. It's only RHCOS/OCP 4.10 that is affected. We need to investigate what has changed regarding the directories inside /etc/NetworkManager between those two versions and assess what is really the issue that we have exposed.

Comment 11 Mat Kowalski 2022-01-14 09:31:40 UTC
TLDR for anyone not following the chain of dependent tickets - the issue here is a combination of 2 problems

1) moving to InfraEnvs in Assisted Installer and missing logic for handling static network configuration in InfraEnv instead of Cluster object
2) bug in systemd-preset not handling mountpoints with special characters (https://bugzilla.redhat.com/show_bug.cgi?id=1952686)

https://github.com/openshift/assisted-service/pull/3199 is supposed to fix the issue by heavily simplifying the logic used for manual network configuration in case of IPv6 stack being used.

Comment 12 Marius Cornea 2022-01-18 12:14:46 UTC
(In reply to Mat Kowalski from comment #11)
> TLDR for anyone not following the chain of dependent tickets - the issue
> here is a combination of 2 problems
> 
> 1) moving to InfraEnvs in Assisted Installer and missing logic for handling
> static network configuration in InfraEnv instead of Cluster object
> 2) bug in systemd-preset not handling mountpoints with special characters
> (https://bugzilla.redhat.com/show_bug.cgi?id=1952686)
> 
> https://github.com/openshift/assisted-service/pull/3199 is supposed to fix
> the issue by heavily simplifying the logic used for manual network
> configuration in case of IPv6 stack being used.

Hi Mat,

I am seeing the same issue(losing connectivity) on a node which is upgraded from 4.9 to 4.10. Should your fix handle this case as well or should I file a new BZ to keep track of the upgrade use case?

Thanks

Comment 13 Mat Kowalski 2022-01-18 12:59:04 UTC
No, this fix does not handle upgrade. Please open a separate bug linking this one here

Comment 19 nshidlin 2022-03-13 08:56:54 UTC
 verified with:
ACM 2.5.0-DOWNSTREAM-2022-03-09-19-54-43
OCP 4.10.3