2098424 – Metal Day 1 4.11 - deployments with bond fail - workers stuck in provisioning

Bug 2098424 - Metal Day 1 4.11 - deployments with bond fail - workers stuck in provisioning

Summary: Metal Day 1 4.11 - deployments with bond fail - workers stuck in provisioning

Keywords:
Status:	CLOSED DUPLICATE of bug 2092650
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Installer
Sub Component:
Version:	4.11
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	aos-install
QA Contact:	Gaoyun Pei
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2022-06-19 06:37 UTC by Yoav Porag
Modified:	2022-06-22 06:35 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-06-22 05:29:45 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Yoav Porag 2022-06-19 06:37:38 UTC

Version:
4.11.0-0.nightly-2022-06-15-222801

Platform:

IPI on virtual BM

What happened?

When testing 4.11 bug deployment fails to complete. Multiple cluster operators do not become available and workers are not provisioned. The same configuration works for 4.10. this is consistent with multiple bond modes including 802.3ad and active-backup.

What did you expect to happen?

deployment should have succeded

Anything else we need to know?

Must-gather: http://rhos-compute-node-10.lab.eng.rdu2.redhat.com/logs/bond_failure_must_gather_160622.tar.gz 

networkConfig segment added to install config under every node.
no-dhcp work around applied at pre-deployment according to https://docs.google.com/document/d/1AiviH6t24tOs9vQELLvojpc6_5eff8OKUNY3ApsH8po/edit#
        networkConfig:
          routes:
            config:
            - destination: 0.0.0.0/0
              next-hop-address: 192.168.123.1
              next-hop-interface: bond0
          dns-resolver:
            config:
              server:
              - 192.168.123.1
          interfaces:
          - name: bond0
            type: bond
            state: up
            ipv4:
              address:
              - ip: 192.168.123.150
                prefix-length: 24
              enabled: true
              dhcp: false
            link-aggregation:
              mode: 802.3ad
              options:
                miimon: '100'
              port:
              - enp0s4
              - enp0s5



[kni@provisionhost-0-0 ~]$ oc get co
NAME                                       VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
authentication                             4.11.0-0.nightly-2022-06-15-222801   False       False         True       22h     OAuthServerRouteEndpointAccessibleControllerAvailable: failed to retrieve route from cache: route.route.openshift.io "oauth-openshift" not found...
baremetal                                  4.11.0-0.nightly-2022-06-15-222801   True        False         False      22h     
cloud-controller-manager                   4.11.0-0.nightly-2022-06-15-222801   True        False         False      22h     
cloud-credential                           4.11.0-0.nightly-2022-06-15-222801   True        False         False      22h     
cluster-autoscaler                         4.11.0-0.nightly-2022-06-15-222801   True        False         False      22h     
config-operator                            4.11.0-0.nightly-2022-06-15-222801   True        False         False      22h     
console                                    4.11.0-0.nightly-2022-06-15-222801   False       False         True       22h     RouteHealthAvailable: console route is not admitted
csi-snapshot-controller                    4.11.0-0.nightly-2022-06-15-222801   True        False         False      22h     
dns                                        4.11.0-0.nightly-2022-06-15-222801   True        False         False      22h     
etcd                                       4.11.0-0.nightly-2022-06-15-222801   True        False         True       22h     UpgradeBackupControllerDegraded: unable to retrieve cluster version, no completed update was found in cluster version status history: [{Partial 2022-06-16 13:37:28 +0000 UTC <nil> 4.11.0-0.nightly-2022-06-15-222801 registry.ci.openshift.org/ocp/release@sha256:bceac2ed723ce186c56b1db5e7b17cf0ef0a62e6bbfba5d545d419c3018498b2 false }]
image-registry                             4.11.0-0.nightly-2022-06-15-222801   True        False         False      22h     
ingress                                                                         False       True          True       22h     The "default" ingress controller reports Available=False: IngressControllerUnavailable: One or more status conditions indicate unavailable: DeploymentAvailable=False (DeploymentUnavailable: The deployment has Available status condition set to False (reason: MinimumReplicasUnavailable) with message: Deployment does not have minimum availability.)
insights                                   4.11.0-0.nightly-2022-06-15-222801   True        False         False      5s      
kube-apiserver                             4.11.0-0.nightly-2022-06-15-222801   True        False         False      22h     
kube-controller-manager                    4.11.0-0.nightly-2022-06-15-222801   True        False         False      22h     
kube-scheduler                             4.11.0-0.nightly-2022-06-15-222801   True        False         False      22h     
kube-storage-version-migrator              4.11.0-0.nightly-2022-06-15-222801   True        False         False      22h     
machine-api                                4.11.0-0.nightly-2022-06-15-222801   True        False         False      22h     
machine-approver                           4.11.0-0.nightly-2022-06-15-222801   True        False         False      22h     
machine-config                             4.11.0-0.nightly-2022-06-15-222801   True        False         False      22h     
marketplace                                4.11.0-0.nightly-2022-06-15-222801   True        False         False      22h     
monitoring                                                                      False       True          True       21h     Rollout of the monitoring stack failed and is degraded. Please investigate the degraded status error.
network                                    4.11.0-0.nightly-2022-06-15-222801   True        True          False      22h     Deployment "/openshift-network-diagnostics/network-check-source" is waiting for other operators to become ready
node-tuning                                4.11.0-0.nightly-2022-06-15-222801   True        False         False      22h     
openshift-apiserver                        4.11.0-0.nightly-2022-06-15-222801   True        False         False      22h     
openshift-controller-manager               4.11.0-0.nightly-2022-06-15-222801   True        False         False      22h     
openshift-samples                          4.11.0-0.nightly-2022-06-15-222801   True        False         False      21h     
operator-lifecycle-manager                 4.11.0-0.nightly-2022-06-15-222801   True        False         False      22h     
operator-lifecycle-manager-catalog         4.11.0-0.nightly-2022-06-15-222801   True        False         False      22h     
operator-lifecycle-manager-packageserver   4.11.0-0.nightly-2022-06-15-222801   True        False         False      22h     
service-ca                                 4.11.0-0.nightly-2022-06-15-222801   True        False         False      22h     
storage                                    4.11.0-0.nightly-2022-06-15-222801   True        False         False      22h     

[kni@provisionhost-0-0 ~]$ oc get bmh -A
NAMESPACE               NAME                   STATE                    CONSUMER                                  ONLINE   ERROR   AGE
openshift-machine-api   openshift-master-0-0   externally provisioned   ocp-edge-cluster-0-b8c7d-master-0         true             22h
openshift-machine-api   openshift-master-0-1   externally provisioned   ocp-edge-cluster-0-b8c7d-master-1         true             22h
openshift-machine-api   openshift-master-0-2   externally provisioned   ocp-edge-cluster-0-b8c7d-master-2         true             22h
openshift-machine-api   openshift-worker-0-0   provisioning             ocp-edge-cluster-0-b8c7d-worker-0-tq56h   true             22h
openshift-machine-api   openshift-worker-0-1   provisioning             ocp-edge-cluster-0-b8c7d-worker-0-47v92   true             22h

Comment 1 Yoav Porag 2022-06-22 05:29:45 UTC

works now, probably a result of the fix made to https://bugzilla.redhat.com/show_bug.cgi?id=2098430.
closing bug

Comment 2 Yoav Porag 2022-06-22 06:35:54 UTC


*** This bug has been marked as a duplicate of bug 2092650 ***

Note You need to log in before you can comment on or make changes to this bug.