Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 2085335

Summary: Update from 4.8.39 to 4.9.31 is failing on OCP with dualstack cluster network
Product: OpenShift Container Platform Reporter: Ashish Vyawahare <avyawahare87>
Component: EtcdAssignee: Dean West <dwest>
Status: CLOSED DUPLICATE QA Contact: ge liu <geliu>
Severity: high Docs Contact:
Priority: medium    
Version: 4.8CC: smerrow, tjungblu
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-09-08 14:14:33 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
ClusterUpdateError
none
must-gather data part1 --> namespace included from assisted-installer to openshift-kni-infra
none
must-gather part2 --> Included namespace openshift-kube-apiserver and openshift-kube-apiserver-operator
none
must-gather part3 --> Included namespaces from openshift-kube-controller-manager to openshift-vsphere-infra none

Description Ashish Vyawahare 2022-05-13 04:12:14 UTC
Created attachment 1879295 [details]
ClusterUpdateError

Description of problem:
We have created the OCP cluster (version 4.8.39) with dual stack cluster network,
we have the VM interface which is configured with ipv4 and ipv6 on all nodes(3 master node and 2 worker node).
Cluster installation is working working fine.

But cluster update from 4.8.39 to 4.8.31 is not working fine.
Cluster update is stuck at Partial state, with below failing reason,

"
EtcdCertSignerControllerDegraded: [x509: certificate is valid for 10.30.1.4, not 2101::4, x509: certificate is valid for ::1, 10.30.1.4, 127.0.0.1, ::1, not 2101::4]
"

[core@ocp-avyaw-nyu1mo-ctrl-3 ~]$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.8.39    True        True          12h     Unable to apply 4.9.31: wait has exceeded 40 minutes for these operators: etcd

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Create the OCP cluster (4.8.39) with dual stack network. 
2. Update the OCP cluster to 4.9.31.

Actual results:
 Cluster update stuck at Partial state with error EtcdCertSignerControllerDegraded: [x509: certificate is valid for 10.30.1.4, not 2101::4, x509: certificate is valid for ::1, 10.30.1.4, 127.0.0.1, ::1, not 2101::4]


Expected results:

Cluster update should work fine.

Additional info:

ClusterID: 9ff77ed0-e858-4b07-b30d-ab5f4692dddf
ClusterVersion: Updating to "4.9.31" from "4.8.39" for 13 hours: Working towards 4.9.31: 71 of 738 done (9% complete)
ClusterOperators:
	clusteroperator/authentication is degraded because APIServerDeploymentDegraded: 1 of 3 requested instances are unavailable for apiserver.openshift-oauth-apiserver ()
OAuthServerDeploymentDegraded: 1 of 3 requested instances are unavailable for oauth-openshift.openshift-authentication ()
	clusteroperator/etcd is degraded because EtcdCertSignerControllerDegraded: [x509: certificate is valid for 10.30.1.4, not 2101::4, x509: certificate is valid for ::1, 10.30.1.4, 127.0.0.1, ::1, not 2101::4]
	clusteroperator/machine-config is degraded because Unable to apply 4.9.31: timed out waiting for the condition during syncRequiredMachineConfigPools: error pool master is not ready, retrying. Status: (pool degraded: true total: 3, ready 1, updated: 1, unavailable: 1)
	clusteroperator/openshift-apiserver is degraded because APIServerDeploymentDegraded: 1 of 3 requested instances are unavailable for apiserver.openshift-apiserver ()



[core@ocp-avyaw-nyu1mo-ctrl-3 ~]$ oc describe network
Name:         cluster
Namespace:    
Labels:       <none>
Annotations:  <none>
API Version:  config.openshift.io/v1
Kind:         Network
Metadata:
  Creation Timestamp:  2022-05-12T13:19:32Z
  Generation:          2
  Managed Fields:
    API Version:  config.openshift.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:spec:
        .:
        f:clusterNetwork:
        f:externalIP:
          .:
          f:policy:
        f:networkType:
        f:serviceNetwork:
      f:status:
    Manager:      cluster-bootstrap
    Operation:    Update
    Time:         2022-05-12T13:19:32Z
    API Version:  config.openshift.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:status:
        f:clusterNetwork:
        f:networkType:
        f:serviceNetwork:
    Manager:         cluster-network-operator
    Operation:       Update
    Time:            2022-05-12T13:21:11Z
  Resource Version:  3087
  UID:               a21bdaa7-de95-4b7f-8b40-ed84946fe11d
Spec:
  Cluster Network:
    Cidr:         10.128.0.0/14
    Host Prefix:  23
    Cidr:         2001::/60
    Host Prefix:  64
  External IP:
    Policy:
  Network Type:  Contrail
  Service Network:
    172.30.0.0/16
    2222::/108
Status:
  Cluster Network:
    Cidr:         10.128.0.0/14
    Host Prefix:  23
    Cidr:         2001::/60
    Host Prefix:  64
  Network Type:   Contrail
  Service Network:
    172.30.0.0/16
    2222::/108
Events:  <none>

Comment 1 Ashish Vyawahare 2022-05-13 06:15:04 UTC
Created attachment 1879313 [details]
must-gather data part1 --> namespace included from assisted-installer to openshift-kni-infra

Comment 2 Ashish Vyawahare 2022-05-13 06:33:13 UTC
Created attachment 1879315 [details]
must-gather part2 --> Included namespace openshift-kube-apiserver  and openshift-kube-apiserver-operator

Comment 3 Ashish Vyawahare 2022-05-13 06:37:17 UTC
Created attachment 1879316 [details]
must-gather part3 --> Included namespaces from openshift-kube-controller-manager to  openshift-vsphere-infra

Comment 12 Ashish Vyawahare 2022-06-28 06:27:35 UTC
Hi,
Any update on this bug?

I see there is similar bug https://bugzilla.redhat.com/show_bug.cgi?id=2046335