2095415 – OCP on Z 4.9.38 and 4.9.39 builds hang on network operator during zVM environment install

Bug 2095415 - OCP on Z 4.9.38 and 4.9.39 builds hang on network operator during zVM environment install

Summary: OCP on Z 4.9.38 and 4.9.39 builds hang on network operator during zVM environ...

Keywords:
Status:	CLOSED DUPLICATE of bug 2095264
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Machine Config Operator
Sub Component:
Version:	4.9
Hardware:	s390x
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	4.9.z
Assignee:	Jaime Caamaño Ruiz
QA Contact:	Ross Brattain
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2022-06-09 17:11 UTC by krmoser
Modified:	2022-07-01 03:22 UTC (History)
CC List:	10 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-06-14 13:16:48 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
partial must-gather for OCP 4.9.38 zVM install issue (2.10 MB, application/gzip) 2022-06-09 17:27 UTC, krmoser	no flags	Details
master-0 journalctl logs (2.25 MB, text/plain) 2022-06-13 16:22 UTC, krmoser	no flags	Details
master-1 journalctl logs (2.20 MB, text/plain) 2022-06-13 16:22 UTC, krmoser	no flags	Details
master-2 journalctl logs (2.54 MB, text/plain) 2022-06-13 16:23 UTC, krmoser	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	MULTIARCH-2578	0	None	None	None	2022-06-09 17:18:14 UTC

Description krmoser 2022-06-09 17:11:12 UTC

Description of problem:
1. The OCP 4.9.38 on Z build, using the RHCOS 4.9 GA version, fails to install for zVM environments, with the install not progressing past the network cluster operator.

2. The OCP 4.9.38 on Z build, using the same RHCOS 4.9 GA version, installs successfully for multiple KVM environments.


Here are the OCP 4.9.38 OCP CLI "oc get clusterversion", "oc get nodes", and "oc get co" commands' output for when attempting an install in a zVM environment:

[root@ospamgr1 ~]# oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version             False       True          51m     Unable to apply 4.9.38: an unknown error has occurred: MultipleErrors
[root@ospamgr1 ~]# oc get nodes
NAME                                          STATUS     ROLES    AGE   VERSION
master-0.pok-25.ocptest.pok.stglabs.ibm.com   NotReady   master   50m   v1.22.8+f34b40c
master-1.pok-25.ocptest.pok.stglabs.ibm.com   NotReady   master   50m   v1.22.8+f34b40c
master-2.pok-25.ocptest.pok.stglabs.ibm.com   NotReady   master   50m   v1.22.8+f34b40c
[root@ospamgr1 ~]# oc get co
NAME                                       VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
authentication
baremetal
cloud-controller-manager                   4.9.38    True        False         False      50m
cloud-credential                                     True        False         False      50m
cluster-autoscaler
config-operator
console
csi-snapshot-controller
dns
etcd
image-registry
ingress
insights
kube-apiserver
kube-controller-manager
kube-scheduler
kube-storage-version-migrator
machine-api
machine-approver
machine-config
marketplace
monitoring
network                                              False       True          True       50m     The network is starting up
node-tuning
openshift-apiserver
openshift-controller-manager
openshift-samples
operator-lifecycle-manager
operator-lifecycle-manager-catalog
operator-lifecycle-manager-packageserver
service-ca
storage
[root@ospamgr1 ~]#


Version-Release number of selected component (if applicable):
1. OCP 4.9.38 at https://mirror.openshift.com/pub/openshift-v4/s390x/clients/ocp/4.9.38
2. RHCOS 4.9.0 at https://mirror.openshift.com/pub/openshift-v4/s390x/dependencies/rhcos/4.9/4.9.0


How reproducible:
1. Consistently reproducible in a OCP on Z zVM environment.

2. Consistently NOT reproducible in a OCP on Z KVM environment.

Steps to Reproduce:
1. Attempt to install the OCP 4.9.38 on Z build in a zVM environment.


Actual results:
The installation does not progress past the installation of the OCP 4.9.38 on Z network operator.

Expected results:
The installation should successfully complete.

Additional info:

Here is the console output of an attempted "oc adm must-gather":

 [root@ospamgr1 ~]# oc adm must-gather
[must-gather      ] OUT the server could not find the requested resource (get imagestreams.image.openshift.io must-gather)
[must-gather      ] OUT
[must-gather      ] OUT Using must-gather plug-in image: registry.redhat.io/openshift4/ose-must-gather:latest
When opening a support case, bugzilla, or issue please include the following summary data along with any other requested information:
ClusterID: 7fd28f1d-3ee8-4815-b2a6-ddf935e06199
ClusterVersion: Installing "4.9.38" for 34 minutes: Unable to apply 4.9.38: an unknown error has occurred: MultipleErrors
ClusterOperators:
        clusteroperator/authentication is not available (<missing>) because <missing>
        clusteroperator/baremetal is not available (<missing>) because <missing>
        clusteroperator/cluster-autoscaler is not available (<missing>) because <missing>
        clusteroperator/config-operator is not available (<missing>) because <missing>
        clusteroperator/console is not available (<missing>) because <missing>
        clusteroperator/csi-snapshot-controller is not available (<missing>) because <missing>
        clusteroperator/dns is not available (<missing>) because <missing>
        clusteroperator/etcd is not available (<missing>) because <missing>
        clusteroperator/image-registry is not available (<missing>) because <missing>
        clusteroperator/ingress is not available (<missing>) because <missing>
        clusteroperator/insights is not available (<missing>) because <missing>
        clusteroperator/kube-apiserver is not available (<missing>) because <missing>
        clusteroperator/kube-controller-manager is not available (<missing>) because <missing>
        clusteroperator/kube-scheduler is not available (<missing>) because <missing>
        clusteroperator/kube-storage-version-migrator is not available (<missing>) because <missing>
        clusteroperator/machine-api is not available (<missing>) because <missing>
        clusteroperator/machine-approver is not available (<missing>) because <missing>
        clusteroperator/machine-config is not available (<missing>) because <missing>
        clusteroperator/marketplace is not available (<missing>) because <missing>
        clusteroperator/monitoring is not available (<missing>) because <missing>
        clusteroperator/network is not available (The network is starting up) because DaemonSet "openshift-ovn-kubernetes/ovn-ipsec" rollout is not making progress - last change 2022-06-09T16:16:46Z
DaemonSet "openshift-ovn-kubernetes/ovnkube-node" rollout is not making progress - pod ovnkube-node-9sn8d is in CrashLoopBackOff State
DaemonSet "openshift-ovn-kubernetes/ovnkube-node" rollout is not making progress - pod ovnkube-node-dhx2p is in CrashLoopBackOff State
DaemonSet "openshift-ovn-kubernetes/ovnkube-node" rollout is not making progress - pod ovnkube-node-wzc67 is in CrashLoopBackOff State
DaemonSet "openshift-ovn-kubernetes/ovnkube-node" rollout is not making progress - last change 2022-06-09T16:16:46Z
        clusteroperator/node-tuning is not available (<missing>) because <missing>
        clusteroperator/openshift-apiserver is not available (<missing>) because <missing>
        clusteroperator/openshift-controller-manager is not available (<missing>) because <missing>
        clusteroperator/openshift-samples is not available (<missing>) because <missing>
        clusteroperator/operator-lifecycle-manager is not available (<missing>) because <missing>
        clusteroperator/operator-lifecycle-manager-catalog is not available (<missing>) because <missing>
        clusteroperator/operator-lifecycle-manager-packageserver is not available (<missing>) because <missing>
        clusteroperator/service-ca is not available (<missing>) because <missing>
        clusteroperator/storage is not available (<missing>) because <missing>


[must-gather      ] OUT namespace/openshift-must-gather-6qp27 created
[must-gather      ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-wtxkg created
[must-gather      ] OUT pod for plug-in image registry.redhat.io/openshift4/ose-must-gather:latest created

[must-gather-l5jvv] OUT gather did not start: timed out waiting for the condition
[must-gather      ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-wtxkg deleted
[must-gather      ] OUT namespace/openshift-must-gather-6qp27 deleted


Error running must-gather collection:
    gather did not start for pod must-gather-l5jvv: timed out waiting for the condition

Falling back to `oc adm inspect clusteroperators.v1.config.openshift.io` to collect basic cluster information.
Gathering data for ns/openshift-cloud-controller-manager-operator...
Gathering data for ns/openshift-cloud-controller-manager...
Gathering data for ns/openshift-cloud-credential-operator...
Gathering data for ns/openshift-machine-api...
Gathering data for ns/openshift-config...
Gathering data for ns/openshift-config-managed...
Gathering data for ns/openshift-etcd-operator...
Gathering data for ns/openshift-etcd...
Gathering data for ns/openshift-kube-apiserver-operator...
Gathering data for ns/openshift-kube-apiserver...
Gathering data for ns/openshift-kube-controller-manager...
Gathering data for ns/openshift-kube-controller-manager-operator...
Gathering data for ns/openshift-kube-scheduler-operator...
Gathering data for ns/openshift-kube-scheduler...
Gathering data for ns/openshift-kube-storage-version-migrator-operator...
Gathering data for ns/openshift-cluster-machine-approver...
Gathering data for ns/openshift-machine-config-operator...
Gathering data for ns/openshift-multus...
Gathering data for ns/openshift-ovn-kubernetes...
Gathering data for ns/openshift-host-network...
Gathering data for ns/openshift-network-diagnostics...
Gathering data for ns/openshift-network-operator...
Gathering data for ns/openshift-cluster-samples-operator...
Wrote inspect data to must-gather.local.685000384472291690/inspect.local.2710222056658389937.
error running backup collection: errors ocurred while gathering data:
    [skipping gathering securitycontextconstraints.security.openshift.io due to error: the server doesn't have a resource type "securitycontextconstraints", skipping gathering podnetworkconnectivitychecks.controlplane.operator.openshift.io due to error: the server doesn't have a resource type "podnetworkconnectivitychecks", skipping gathering apirequestcounts.apiserver.openshift.io due to error: the server doesn't have a resource type "apirequestcounts", skipping gathering namespaces/openshift-kube-storage-version-migrator due to error: namespaces "openshift-kube-storage-version-migrator" not found, skipping gathering controllerconfigs.machineconfiguration.openshift.io due to error: the server doesn't have a resource type "controllerconfigs", skipping gathering namespaces/openshift-multus due to error: one or more errors ocurred while gathering pod-specific data for namespace: openshift-multus

    [one or more errors ocurred while gathering container data for pod network-metrics-daemon-gpp62:

    [previous terminated container "network-metrics-daemon" in pod "network-metrics-daemon-gpp62" not found, container "network-metrics-daemon" in pod "network-metrics-daemon-gpp62" is waiting to start: ContainerCreating, container "kube-rbac-proxy" in pod "network-metrics-daemon-gpp62" is waiting to start: ContainerCreating, previous terminated container "kube-rbac-proxy" in pod "network-metrics-daemon-gpp62" not found], one or more errors ocurred while gathering container data for pod network-metrics-daemon-lnptv:

    [container "network-metrics-daemon" in pod "network-metrics-daemon-lnptv" is waiting to start: ContainerCreating, previous terminated container "network-metrics-daemon" in pod "network-metrics-daemon-lnptv" not found, previous terminated container "kube-rbac-proxy" in pod "network-metrics-daemon-lnptv" not found, container "kube-rbac-proxy" in pod "network-metrics-daemon-lnptv" is waiting to start: ContainerCreating], one or more errors ocurred while gathering container data for pod network-metrics-daemon-sh2c5:

    [previous terminated container "network-metrics-daemon" in pod "network-metrics-daemon-sh2c5" not found, container "network-metrics-daemon" in pod "network-metrics-daemon-sh2c5" is waiting to start: ContainerCreating, previous terminated container "kube-rbac-proxy" in pod "network-metrics-daemon-sh2c5" not found, container "kube-rbac-proxy" in pod "network-metrics-daemon-sh2c5" is waiting to start: ContainerCreating]], skipping gathering namespaces/openshift-ovn-kubernetes due to error: one or more errors ocurred while gathering pod-specific data for namespace: openshift-ovn-kubernetes

    [one or more errors ocurred while gathering container data for pod ovn-ipsec-8bkvt:

    [container "ovn-ipsec" in pod "ovn-ipsec-8bkvt" is waiting to start: PodInitializing, previous terminated container "ovn-ipsec" in pod "ovn-ipsec-8bkvt" not found], one or more errors ocurred while gathering container data for pod ovn-ipsec-dwv2r:

    [previous terminated container "ovn-ipsec" in pod "ovn-ipsec-dwv2r" not found, container "ovn-ipsec" in pod "ovn-ipsec-dwv2r" is waiting to start: PodInitializing], one or more errors ocurred while gathering container data for pod ovn-ipsec-lwrz2:

    [container "ovn-ipsec" in pod "ovn-ipsec-lwrz2" is waiting to start: PodInitializing, previous terminated container "ovn-ipsec" in pod "ovn-ipsec-lwrz2" not found]], skipping gathering namespaces/openshift-network-diagnostics due to error: one or more errors ocurred while gathering pod-specific data for namespace: openshift-network-diagnostics

    [one or more errors ocurred while gathering container data for pod network-check-target-9hmqn:

    [container "network-check-target-container" in pod "network-check-target-9hmqn" is waiting to start: ContainerCreating, previous terminated container "network-check-target-container" in pod "network-check-target-9hmqn" not found], one or more errors ocurred while gathering container data for pod network-check-target-dsmpc:

    [container "network-check-target-container" in pod "network-check-target-dsmpc" is waiting to start: ContainerCreating, previous terminated container "network-check-target-container" in pod "network-check-target-dsmpc" not found], one or more errors ocurred while gathering container data for pod network-check-target-mcsl4:

    [previous terminated container "network-check-target-container" in pod "network-check-target-mcsl4" not found, container "network-check-target-container" in pod "network-check-target-mcsl4" is waiting to start: ContainerCreating]], skipping gathering configs.samples.operator.openshift.io/cluster due to error: configs.samples.operator.openshift.io "cluster" not found, skipping gathering templates.template.openshift.io due to error: the server doesn't have a resource type "templates", skipping gathering imagestreams.image.openshift.io due to error: the server doesn't have a resource type "imagestreams"]


Reprinting Cluster State:
When opening a support case, bugzilla, or issue please include the following summary data along with any other requested information:
ClusterID: 7fd28f1d-3ee8-4815-b2a6-ddf935e06199
ClusterVersion: Installing "4.9.38" for 45 minutes: Working towards 4.9.38: 592 of 738 done (80% complete)
ClusterOperators:
        clusteroperator/authentication is not available (<missing>) because <missing>
        clusteroperator/baremetal is not available (<missing>) because <missing>
        clusteroperator/cluster-autoscaler is not available (<missing>) because <missing>
        clusteroperator/config-operator is not available (<missing>) because <missing>
        clusteroperator/console is not available (<missing>) because <missing>
        clusteroperator/csi-snapshot-controller is not available (<missing>) because <missing>
        clusteroperator/dns is not available (<missing>) because <missing>
        clusteroperator/etcd is not available (<missing>) because <missing>
        clusteroperator/image-registry is not available (<missing>) because <missing>
        clusteroperator/ingress is not available (<missing>) because <missing>
        clusteroperator/insights is not available (<missing>) because <missing>
        clusteroperator/kube-apiserver is not available (<missing>) because <missing>
        clusteroperator/kube-controller-manager is not available (<missing>) because <missing>
        clusteroperator/kube-scheduler is not available (<missing>) because <missing>
        clusteroperator/kube-storage-version-migrator is not available (<missing>) because <missing>
        clusteroperator/machine-api is not available (<missing>) because <missing>
        clusteroperator/machine-approver is not available (<missing>) because <missing>
        clusteroperator/machine-config is not available (<missing>) because <missing>
        clusteroperator/marketplace is not available (<missing>) because <missing>
        clusteroperator/monitoring is not available (<missing>) because <missing>
        clusteroperator/network is not available (The network is starting up) because DaemonSet "openshift-ovn-kubernetes/ovn-ipsec" rollout is not making progress - last change 2022-06-09T16:16:46Z
DaemonSet "openshift-ovn-kubernetes/ovnkube-node" rollout is not making progress - pod ovnkube-node-dhx2p is in CrashLoopBackOff State
DaemonSet "openshift-ovn-kubernetes/ovnkube-node" rollout is not making progress - pod ovnkube-node-wzc67 is in CrashLoopBackOff State
DaemonSet "openshift-ovn-kubernetes/ovnkube-node" rollout is not making progress - pod ovnkube-node-9sn8d is in CrashLoopBackOff State
DaemonSet "openshift-ovn-kubernetes/ovnkube-node" rollout is not making progress - last change 2022-06-09T16:16:46Z
        clusteroperator/node-tuning is not available (<missing>) because <missing>
        clusteroperator/openshift-apiserver is not available (<missing>) because <missing>
        clusteroperator/openshift-controller-manager is not available (<missing>) because <missing>
        clusteroperator/openshift-samples is not available (<missing>) because <missing>
        clusteroperator/operator-lifecycle-manager is not available (<missing>) because <missing>
        clusteroperator/operator-lifecycle-manager-catalog is not available (<missing>) because <missing>
        clusteroperator/operator-lifecycle-manager-packageserver is not available (<missing>) because <missing>
        clusteroperator/service-ca is not available (<missing>) because <missing>
        clusteroperator/storage is not available (<missing>) because <missing>


error: gather did not start for pod must-gather-l5jvv: timed out waiting for the condition



Thank you.

Comment 1 krmoser 2022-06-09 17:27:03 UTC

Created attachment 1888403 [details]
partial must-gather for OCP 4.9.38 zVM install issue

Partial "oc adm must-gather" for OCP 4.9.38 zVM install issue.

Thank you.

Comment 2 Prashanth Sundararaman 2022-06-09 23:07:02 UTC

ovnkube-node pods fail with : 

         I0609 16:55:39.356410   37236 gateway_localnet.go:173] Node local addresses initialized to: map[10.129.0.2:{10.129.0.0 fffffe00} 10.20.116.12:{10.20.116.0 ffffff00} 127.0.0.1:{127.0.0.0 ff000000} ::1:{::1 ffffffffffffffffffffffffffffffff} fe80::8808:c4ff:fed3:412:{fe80:: ffffffffffffffff0000000000000000} fe80::943d:51ff:fe27:b2fc:{fe80:: ffffffffffffffff0000000000000000}]
          I0609 16:55:39.356500   37236 helper_linux.go:73] Found default gateway interface enc2e0 10.20.116.1
          F0609 16:55:39.356532   37236 ovnkube.go:130] could not find IP addresses: failed to lookup link br-ex: Link not found

Kyle,

Which was the last 4.9 build which worked for you ?

Prashanth

Comment 3 krmoser 2022-06-10 02:36:48 UTC

Prashanth,

1. The last OCP 4.9 on Z build that installs properly in a zVM environment is the predecessor to this build, 4.9.37.

2. The OCP on Z 4.9.37 build was released on June 3, 2022.

Thank you,
Kyle

Comment 4 Prashanth Sundararaman 2022-06-10 13:50:36 UTC

Thanks Kyle.

The difference between 4.9.37 and 4.9.38 seems to be https://github.com/openshift/machine-config-operator/pull/3160 in the machine-config-operator and looks related to what we are seeing.

@jcaamano - could this issue we are hitting be related to the above change?

Comment 5 Jaime Caamaño Ruiz 2022-06-10 14:56:52 UTC

Yes, it could be. It has an issue with a tentative fix here: https://github.com/openshift/machine-config-operator/pull/3183

Comment 6 Sinny Kumari 2022-06-13 08:44:02 UTC

Assigning this bug to Networking team.

Hello Jaime, Can you please set blocker flag to '+ 'or '-' based on your assessment?

Comment 7 Jaime Caamaño Ruiz 2022-06-13 09:42:18 UTC

Can't be completely sure if it is the same thing without a node journal.

@krmoser.com would you be able to provide one?

Comment 9 Sinny Kumari 2022-06-13 13:11:51 UTC

Setting target release to 4.9.z and blocker flag to + based on input from comment#5, comment#7

Comment 10 krmoser 2022-06-13 13:59:15 UTC

Prashanth,

Please let us know where you would like the node journal collected from and the commands to do so. 

Thank you,
Kyle

Comment 11 Prashanth Sundararaman 2022-06-13 14:14:40 UTC

Hi Kyle,

A "journalctl" on the master nodes should help.

Thanks
Prashanth

Comment 12 krmoser 2022-06-13 16:22:02 UTC

Created attachment 1889463 [details]
master-0 journalctl logs

master-0 journalctl logs

Comment 13 krmoser 2022-06-13 16:22:52 UTC

Created attachment 1889464 [details]
master-1 journalctl logs

master-1 journalctl logs

Comment 14 krmoser 2022-06-13 16:23:36 UTC

Created attachment 1889465 [details]
master-2 journalctl logs

master-2 journalctl logs

Comment 15 Jaime Caamaño Ruiz 2022-06-13 17:02:31 UTC

Thanks @krmoser.com

Looks like something different:

Jun 13 15:41:58 master-0.pok-96.ocptest.pok.stglabs.ibm.com configure-ovs.sh[1390]: Brought up connection ovs-if-br-ex successfully
Jun 13 15:41:58 master-0.pok-96.ocptest.pok.stglabs.ibm.com configure-ovs.sh[1390]: + nmcli c mod ovs-if-br-ex connection.autoconnect yes
Jun 13 15:41:56 master-1.pok-96.ocptest.pok.stglabs.ibm.com NetworkManager[1370]: ((libnm-core/nm-connection.c:186)): assertion '<dropped>' failed
Jun 13 15:41:56 master-1.pok-96.ocptest.pok.stglabs.ibm.com NetworkManager[1370]: <warn>  [1655134916.9590] keyfile: commit: failure to write 13d22672-3c1f-4735-8db5-60cd18e60b8d ((null)) to "/etc/NetworkManager/systemConnectionsMerged/ovs-if-br-ex.nmconnection": error writing to file '/etc/NetworkManager/systemConnectionsMerged/ovs-if-br-ex.nmconnection': failed rename /etc/NetworkManager/systemConnectionsMerged/ovs-if-br-ex.nmconnection.1E9SN1 to /etc/NetworkManager/systemConnectionsMerged/ovs-if-br-ex.nmconnection: Permission denied
Jun 13 15:41:56 master-1.pok-96.ocptest.pok.stglabs.ibm.com NetworkManager[1370]: <info>  [1655134916.9590] audit: op="connection-update" uuid="13d22672-3c1f-4735-8db5-60cd18e60b8d" name="ovs-if-br-ex" args="connection.autoconnect,connection.timestamp" pid=1785 uid=0 result="fail" reason="failed to update connection: error writing to file '/etc/NetworkManager/systemConnectionsMerged/ovs-if-br-ex.nmconnection': failed rename /etc/NetworkManager/systemConnectionsMerged/ovs-if-br-ex.nmconnection.1E9SN1 to /etc/NetworkManager/systemConnectionsMerged/ovs-if-br-ex.nmconnection: Permission denied"
Jun 13 15:41:56 master-1.pok-96.ocptest.pok.stglabs.ibm.com configure-ovs.sh[1404]: Error: Failed to modify connection 'ovs-if-br-ex': failed to update connection: error writing to file '/etc/NetworkManager/systemConnectionsMerged/ovs-if-br-ex.nmconnection': failed rename /etc/NetworkManager/systemConnectionsMerged/ovs-if-br-ex.nmconnection.1E9SN1 to /etc/NetworkManager/systemConnectionsMerged/ovs-if-br-ex.nmconnection: Permission denied


Could you please provide ownership and permissions of /etc/NetworkManager/systemConnectionsMerged and /etc/NetworkManager/systemConnectionsMerged/ovs-if-br-ex.nmconnection as well as /var/log/audit/audit.log on any of the nodes?

It looks like there is no write permissions on /etc/NetworkManager/systemConnectionsMerged but there should be as that dir is created via /etc/tmpfiles.d/nm.conf containing:

d /etc/NetworkManager/systemConnectionsMerged 0755 root root - -

Comment 16 krmoser 2022-06-13 20:37:55 UTC

Jaime,

Thanks for the assistance.  Here's the requested information.  There is no /etc/NetworkManager/systemConnectionsMerged/ovs-if-br-ex.nmconnection file.

1. /etc/NetworkManager/systemConnectionsMerged :
======================================================================================================================================
[root@master-2 ~]# ls -al /etc/NetworkManager/systemConnectionsMerged
total 4
drwxr-xr-x. 1 root root 140 Jun 13 15:41 .
drwxr-xr-x. 8 root root 165 Jun 13 15:41 ..
-rw-------. 1 root root 406 Jun 13 15:40 default_connection.nmconnection
[root@master-2 ~]#


2. /etc/NetworkManager/systemConnectionsMerged/ovs-if-br-ex.nmconnection :     
======================================================================================================================================
[root@master-2 ~]# ls -al /etc/NetworkManager/systemConnectionsMerged/ovs-if-br-ex.nmconnection
ls: cannot access '/etc/NetworkManager/systemConnectionsMerged/ovs-if-br-ex.nmconnection': No such file or directory
[root@master-2 ~]#


3. /var/log/audit/audit.log  : 
======================================================================================================================================
[root@master-2 ~]# ls -al /var/log/audit/audit.log
-rw-------. 1 root root 92813 Jun 13 20:15 /var/log/audit/audit.log
[root@master-2 ~]#



Thank you,
Kyle

Comment 17 Jaime Caamaño Ruiz 2022-06-14 13:16:48 UTC

So it looks like the overlay /etc/NetworkManager/systemConnectionsMerged is not registered with selinux as a valid location for NM to manage its connection profiles and if we manually copy a profile there and restorecon, which we do when trying to copy a static ip configuration, then it will have as scontext NetworkManager_t instead of the expected NetworkManager_var_run_t or NetworkManager_etc_rw_t.

Recently we introduced a change with https://github.com/openshift/machine-config-operator/pull/3160 that attempt to configure something after this copy through nmcli and it fails.

Trying to work around it with https://github.com/openshift/machine-config-operator/pull/3188 via using `nmcli clone` instead of a manual copy.

Could you guys give it a shot?

Marking as dup of 2095264.

*** This bug has been marked as a duplicate of bug 2095264 ***

Comment 18 krmoser 2022-06-14 13:49:48 UTC

Folks,

Please let us know when there is an OCP 4.9.38 on Z successor build available to test with the proposed fix.

Thank you,
Kyle

Comment 19 krmoser 2022-06-16 08:04:01 UTC

Folks,

It appears that the same issue exists for this week's OCP 4.9 post-GA publicly available build: 4.9.39. 

Thank you,
Kyle

Comment 20 krmoser 2022-07-01 03:22:06 UTC

Folks,

The OCP on Z Solution Test team has successfully tested the following OCP 4.9 on Z builds for both connected and disconnected installs: 
1. OCP 4.9.40
2. OCP 4.9.41

Thank you,
Kyle

Note You need to log in before you can comment on or make changes to this bug.