Bug 2039656 - [EgressIP] Configuring EgressIPs on master nodes caused etcd Degraded
Summary: [EgressIP] Configuring EgressIPs on master nodes caused etcd Degraded
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.10
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: Patryk Diak
QA Contact: huirwang
URL:
Whiteboard:
: 2050403 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-01-12 07:37 UTC by huirwang
Modified: 2022-02-21 08:59 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-02-21 08:58:45 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description huirwang 2022-01-12 07:37:22 UTC
Description of problem:
Tested on AWS sdn cluster, configuring EgressIPs on master nodes caused etcd Degraded.

Version-Release number of selected component (if applicable):
4.10.0-0.nightly-2022-01-11-065245

How reproducible:


Steps to Reproduce:
$ oc get hostsubnet
NAME                                         HOST                                         HOST IP        SUBNET          EGRESS CIDRS   EGRESS IPS
ip-10-0-128-161.us-west-2.compute.internal   ip-10-0-128-161.us-west-2.compute.internal   10.0.128.161   10.128.0.0/23                  ["10.0.128.100"]
ip-10-0-129-201.us-west-2.compute.internal   ip-10-0-129-201.us-west-2.compute.internal   10.0.129.201   10.129.2.0/23                  []
ip-10-0-136-54.us-west-2.compute.internal    ip-10-0-136-54.us-west-2.compute.internal    10.0.136.54    10.131.0.0/23                  []
ip-10-0-177-232.us-west-2.compute.internal   ip-10-0-177-232.us-west-2.compute.internal   10.0.177.232   10.129.0.0/23                  ["10.0.177.100"]
ip-10-0-238-94.us-west-2.compute.internal    ip-10-0-238-94.us-west-2.compute.internal    10.0.238.94    10.130.0.0/23                  ["10.0.238.100"]
ip-10-0-239-251.us-west-2.compute.internal   ip-10-0-239-251.us-west-2.compute.internal   10.0.239.251   10.128.2.0/23                  []
$ oc get netnamespace test
NAME   NETID     EGRESS IPS
test   5166387   ["10.0.128.100","10.0.238.100","10.0.177.100"]
$ oc get nodes
NAME                                         STATUS   ROLES    AGE   VERSION
ip-10-0-128-161.us-west-2.compute.internal   Ready    master   75m   v1.22.1+6859754
ip-10-0-129-201.us-west-2.compute.internal   Ready    worker   65m   v1.22.1+6859754
ip-10-0-136-54.us-west-2.compute.internal    Ready    worker   67m   v1.22.1+6859754
ip-10-0-177-232.us-west-2.compute.internal   Ready    master   75m   v1.22.1+6859754
ip-10-0-238-94.us-west-2.compute.internal    Ready    master   73m   v1.22.1+6859754
ip-10-0-239-251.us-west-2.compute.internal   Ready    worker   65m   v1.22.1+6859754

$ oc get co
NAME                                       VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
authentication                             4.10.0-0.nightly-2022-01-11-065245   True        False         False      50m     
baremetal                                  4.10.0-0.nightly-2022-01-11-065245   True        False         False      66m     
cloud-controller-manager                   4.10.0-0.nightly-2022-01-11-065245   True        False         False      68m     
cloud-credential                           4.10.0-0.nightly-2022-01-11-065245   True        False         False      68m     
cluster-autoscaler                         4.10.0-0.nightly-2022-01-11-065245   True        False         False      66m     
config-operator                            4.10.0-0.nightly-2022-01-11-065245   True        False         False      67m     
console                                    4.10.0-0.nightly-2022-01-11-065245   True        False         False      54m     
csi-snapshot-controller                    4.10.0-0.nightly-2022-01-11-065245   True        False         False      67m     
dns                                        4.10.0-0.nightly-2022-01-11-065245   True        False         False      66m     
etcd                                       4.10.0-0.nightly-2022-01-11-065245   True        False         True       65m     EtcdCertSignerControllerDegraded: [x509: certificate is valid for 10.0.128.161, not 10.0.128.100, x509: certificate is valid for ::1, 10.0.128.161, 127.0.0.1, ::1, not 10.0.128.100]
image-registry                             4.10.0-0.nightly-2022-01-11-065245   True        False         False      59m     
ingress                                    4.10.0-0.nightly-2022-01-11-065245   True        False         False      58m     
insights                                   4.10.0-0.nightly-2022-01-11-065245   True        False         False      61m     
kube-apiserver                             4.10.0-0.nightly-2022-01-11-065245   True        False         False      61m     
kube-controller-manager                    4.10.0-0.nightly-2022-01-11-065245   True        False         False      65m     
kube-scheduler                             4.10.0-0.nightly-2022-01-11-065245   True        False         False      65m     
kube-storage-version-migrator              4.10.0-0.nightly-2022-01-11-065245   True        False         False      67m     
machine-api                                4.10.0-0.nightly-2022-01-11-065245   True        False         False      62m     
machine-approver                           4.10.0-0.nightly-2022-01-11-065245   True        False         False      66m     
machine-config                             4.10.0-0.nightly-2022-01-11-065245   True        False         False      65m     
marketplace                                4.10.0-0.nightly-2022-01-11-065245   True        False         False      66m     
monitoring                                 4.10.0-0.nightly-2022-01-11-065245   True        False         False      57m     
network                                    4.10.0-0.nightly-2022-01-11-065245   True        False         False      68m     
node-tuning                                4.10.0-0.nightly-2022-01-11-065245   True        False         False      66m     
openshift-apiserver                        4.10.0-0.nightly-2022-01-11-065245   True        False         False      60m     
openshift-controller-manager               4.10.0-0.nightly-2022-01-11-065245   True        False         False      59m     
openshift-samples                          4.10.0-0.nightly-2022-01-11-065245   True        False         False      59m     
operator-lifecycle-manager                 4.10.0-0.nightly-2022-01-11-065245   True        False         False      66m     
operator-lifecycle-manager-catalog         4.10.0-0.nightly-2022-01-11-065245   True        False         False      66m     
operator-lifecycle-manager-packageserver   4.10.0-0.nightly-2022-01-11-065245   True        False         False      60m     
service-ca                                 4.10.0-0.nightly-2022-01-11-065245   True        False         False      67m     
storage                                    4.10.0-0.nightly-2022-01-11-065245   True        False         False      66m  

$ oc get co etcd  -o yaml
apiVersion: config.openshift.io/v1
kind: ClusterOperator
metadata:
  annotations:
    exclude.release.openshift.io/internal-openshift-hosted: "true"
    include.release.openshift.io/self-managed-high-availability: "true"
    include.release.openshift.io/single-node-developer: "true"
  creationTimestamp: "2022-01-12T06:18:07Z"
  generation: 1
  name: etcd
  ownerReferences:
  - apiVersion: config.openshift.io/v1
    kind: ClusterVersion
    name: version
    uid: 54a22c6d-0e41-4cb9-9e76-7c74e0a87ace
  resourceVersion: "46701"
  uid: 42d0b9d0-a75e-4c94-859d-0f944a02bbd9
spec: {}
status:
  conditions:
  - lastTransitionTime: "2022-01-12T06:20:40Z"
    reason: ControllerStarted
    status: Unknown
    type: RecentBackup
  - lastTransitionTime: "2022-01-12T07:25:51Z"
    message: 'EtcdCertSignerControllerDegraded: [x509: certificate is valid for 10.0.177.232,
      not 10.0.177.100, x509: certificate is valid for ::1, 10.0.177.232, 127.0.0.1,
      ::1, not 10.0.177.100, x509: certificate is valid for 10.0.128.161, not 10.0.128.100,
      x509: certificate is valid for ::1, 10.0.128.161, 127.0.0.1, ::1, not 10.0.128.100,
      x509: certificate is valid for 10.0.238.94, not 10.0.238.100, x509: certificate
      is valid for ::1, 10.0.238.94, 127.0.0.1, ::1, not 10.0.238.100]'
    reason: EtcdCertSignerController_Error
    status: "True"
    type: Degraded
  - lastTransitionTime: "2022-01-12T06:31:49Z"
    message: |-
      NodeInstallerProgressing: 3 nodes are at revision 6
      EtcdMembersProgressing: No unstarted etcd members found
    reason: AsExpected
    status: "False"
    type: Progressing
  - lastTransitionTime: "2022-01-12T06:22:41Z"
    message: |-
      StaticPodsAvailable: 3 nodes are active; 3 nodes are at revision 6
      EtcdMembersAvailable: 3 members are available
    reason: AsExpected
    status: "True"
    type: Available
  - lastTransitionTime: "2022-01-12T06:20:40Z"
    message: All is well
    reason: AsExpected
    status: "True"
    type: Upgradeable
  extension: null
  relatedObjects:
  - group: operator.openshift.io
    name: cluster
    resource: etcds
  - group: ""
    name: openshift-config
    resource: namespaces
  - group: ""
    name: openshift-config-managed
    resource: namespaces
  - group: ""
    name: openshift-etcd-operator
    resource: namespaces
  - group: ""
    name: openshift-etcd
    resource: namespaces
  versions:
  - name: raw-internal
    version: 4.10.0-0.nightly-2022-01-11-065245
  - name: etcd
    version: 4.10.0-0.nightly-2022-01-11-065245
  - name: operator
    version: 4.10.0-0.nightly-2022-01-11-065245

Actual results:


Expected results:
Should not cause etcd downgrade.

Additional info:

Comment 5 Jason Boxman 2022-01-18 17:02:17 UTC
I've created a draft PR[0] that adds this known issue to the release notes.

[0] https://github.com/openshift/openshift-docs/pull/40711

Comment 6 Mike Fiedler 2022-01-20 15:38:09 UTC
Removing TestBlocker.   @

Comment 7 Mike Fiedler 2022-01-20 15:38:48 UTC
@huirwang Let me know if you disagree with removing TestBlocker.

Comment 10 Patryk Diak 2022-02-21 08:58:45 UTC
This is a documented limitation and should be addressed in a future release

Comment 11 Patryk Diak 2022-02-21 08:59:43 UTC
*** Bug 2050403 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.