Created attachment 1845231 [details] console.log Description of problem: SNO with static IPv6 address is unreachable when booting from the internal drive for the first time, after the node has booted from ISO and content was written to disk. Version-Release number of selected component (if applicable): hub: ocp 4.9.10 + acm 2.4.1 spoke: 4.10.0-0.nightly-2021-12-06-162419 How reproducible: 100% Steps to Reproduce: 1. Deploy SNO with static IPv6 address 2. Wait for the node to boot from internal drive during the installation process Actual results: Node is unreachable over the network; the network interfaces don't have the IP address set. Expected results: Node gets the network configuration applied as defined in NMStateConfig Additional info: The nmconnection file exists in /etc/NetworkManager/system-connections but not under /etc/NetworkManager/system-connections-merged, see attached screenshot from console. Note: after adjusting /etc/NetworkManager/conf.d/01-ipv6.conf to point and set the keyfile path to /etc/NetworkManager/system-connections and restart NetworkManager the network configuration gets applied and installation moves further.
* Does the same issue happen for multi-node cluster? * Does the same issue happen for a statically set IPv4 address?
Can you please attach all the CRs (as well as custom manifests and configs, if existing) used to deploy the cluster?
(In reply to Mat Kowalski from comment #1) > * Does the same issue happen for multi-node cluster? I haven't tried a multi-node cluster, the setup that I currently use is targeted for SNO spoke clusters. > * Does the same issue happen for a statically set IPv4 address? I haven't tried with IPv4 but based on the initial analysis I suspect the same issue would reproduce with IPv4 as the NetworkManager keyfile from /etc/NetworkManager/system-connections would not get loaded.
(In reply to Mat Kowalski from comment #1) > * Does the same issue happen for a statically set IPv4 address? This does not reproduce for ipv4 using 4.10.0-0.nightly-2022-01-08-215919. Both with ovn and sdn networking the nodes revive the static ip defined after reboot.
From a further discussion - this is a regression between 4.9 and 4.10 affecting only IPv6 static configuration.
In case of the IPv6 static configuration, there are some config file that should be added to the ignition. They are not added because the code is looking at the cluster in order to check if the StaticNetworkConfig is defined. Since we moved to V2/infra env, the staticNetworkConfig is defined in the InfraEnv only: https://github.com/openshift/assisted-service/blob/c7e6388c9cfd4a67e48760ab6fcfd830a0a3ae42/internal/ignition/ignition.go#L954
(In reply to yevgeny shnaidman from comment #8) > In case of the IPv6 static configuration, there are some config file that > should be added to the ignition. They are not added because the code is > looking at the cluster in order to check if the StaticNetworkConfig is > defined. Since we moved to V2/infra env, the staticNetworkConfig is defined > in the InfraEnv only: > > https://github.com/openshift/assisted-service/blob/ > c7e6388c9cfd4a67e48760ab6fcfd830a0a3ae42/internal/ignition/ignition.go#L954 How do we explain then that this flow is working with OCP 4.9?
>How do we explain then that this flow is working with OCP 4.9? Bug in assisted (https://issues.redhat.com/browse/MGMT-8894) does not affect RHCOS/OCP 4.9 and this version works correctly despite it. It's only RHCOS/OCP 4.10 that is affected. We need to investigate what has changed regarding the directories inside /etc/NetworkManager between those two versions and assess what is really the issue that we have exposed.
TLDR for anyone not following the chain of dependent tickets - the issue here is a combination of 2 problems 1) moving to InfraEnvs in Assisted Installer and missing logic for handling static network configuration in InfraEnv instead of Cluster object 2) bug in systemd-preset not handling mountpoints with special characters (https://bugzilla.redhat.com/show_bug.cgi?id=1952686) https://github.com/openshift/assisted-service/pull/3199 is supposed to fix the issue by heavily simplifying the logic used for manual network configuration in case of IPv6 stack being used.
(In reply to Mat Kowalski from comment #11) > TLDR for anyone not following the chain of dependent tickets - the issue > here is a combination of 2 problems > > 1) moving to InfraEnvs in Assisted Installer and missing logic for handling > static network configuration in InfraEnv instead of Cluster object > 2) bug in systemd-preset not handling mountpoints with special characters > (https://bugzilla.redhat.com/show_bug.cgi?id=1952686) > > https://github.com/openshift/assisted-service/pull/3199 is supposed to fix > the issue by heavily simplifying the logic used for manual network > configuration in case of IPv6 stack being used. Hi Mat, I am seeing the same issue(losing connectivity) on a node which is upgraded from 4.9 to 4.10. Should your fix handle this case as well or should I file a new BZ to keep track of the upgrade use case? Thanks
No, this fix does not handle upgrade. Please open a separate bug linking this one here
verified with: ACM 2.5.0-DOWNSTREAM-2022-03-09-19-54-43 OCP 4.10.3