Bug 1978845 - Scaled ingress replicas following sharded pattern don't balance evenly across multi-AZ
Summary: Scaled ingress replicas following sharded pattern don't balance evenly across...
Product: OpenShift Container Platform
Classification: Red Hat
Component: Routing
Version: 4.6
: 4.7.z
Assignee: Miciah Dashiel Butler Masters
QA Contact: jechen
Depends On: 1900819
Blocks: 1977480
TreeView+ depends on / blocked
Reported: 2021-07-03 00:14 UTC by OpenShift BugZilla Robot
Modified: 2021-07-26 17:35 UTC (History)
8 users (show)

Last Closed: 2021-07-26 17:35:23 UTC
System ID Private Priority Status Summary Last Updated
Github openshift cluster-ingress-operator pull 632 0 None open [release-4.7] Bug 1978845: Specify topology spread constraints 2021-07-03 00:14:50 UTC
Red Hat Product Errata RHBA-2021:2762 0 None None None 2021-07-26 17:35:39 UTC

Comment 2 jechen 2021-07-13 19:51:08 UTC
Verified in 4.7.0-0.nightly-2021-07-13-133801

$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.7.0-0.nightly-2021-07-13-133801   True        False         29m     Cluster version is 4.7.0-0.nightly-2021-07-13-133801

$ oc get node
NAME                                         STATUS   ROLES    AGE   VERSION
ip-10-0-134-71.us-east-2.compute.internal    Ready    master   54m   v1.20.0+bd7b30d
ip-10-0-158-37.us-east-2.compute.internal    Ready    worker   47m   v1.20.0+bd7b30d
ip-10-0-164-100.us-east-2.compute.internal   Ready    master   55m   v1.20.0+bd7b30d
ip-10-0-190-137.us-east-2.compute.internal   Ready    worker   46m   v1.20.0+bd7b30d
ip-10-0-196-211.us-east-2.compute.internal   Ready    worker   46m   v1.20.0+bd7b30d
ip-10-0-199-80.us-east-2.compute.internal    Ready    master   54m   v1.20.0+bd7b30d

1.Check the current machineset 
$ oc get machinesets -n openshift-machine-api
NAME                                   DESIRED   CURRENT   READY   AVAILABLE   AGE
jechen-0713a-cscj7-worker-us-east-2a   1         1         1       1           62m
jechen-0713a-cscj7-worker-us-east-2b   1         1         1       1           62m
jechen-0713a-cscj7-worker-us-east-2c   1         1         1       1           62m

2. Scale the  each of the above machineset
$ oc -n openshift-machine-api scale --replicas=2  machinesets jechen-0713a-cscj7-worker-us-east-2a 
machineset.machine.openshift.io/jechen-0713a-cscj7-worker-us-east-2a scaled
[jechen@jechen ~]$ oc -n openshift-machine-api scale --replicas=2  machinesets jechen-0713a-cscj7-worker-us-east-2b
machineset.machine.openshift.io/jechen-0713a-cscj7-worker-us-east-2b scaled
[jechen@jechen ~]$ oc -n openshift-machine-api scale --replicas=2  machinesets jechen-0713a-cscj7-worker-us-east-2c
machineset.machine.openshift.io/jechen-0713a-cscj7-worker-us-east-2c scaled

3. check if machineset scalings are successful
$ oc get machinesets -n openshift-machine-api
NAME                                   DESIRED   CURRENT   READY   AVAILABLE   AGE
jechen-0713a-cscj7-worker-us-east-2a   2         2         1       1           63m
jechen-0713a-cscj7-worker-us-east-2b   2         2         1       1           63m
jechen-0713a-cscj7-worker-us-east-2c   2         2         1       1           63m

$ oc get node |grep worker
ip-10-0-150-210.us-east-2.compute.internal   NotReady   worker   28s   v1.20.0+bd7b30d
ip-10-0-158-37.us-east-2.compute.internal    Ready      worker   51m   v1.20.0+bd7b30d
ip-10-0-180-83.us-east-2.compute.internal    NotReady   worker   33s   v1.20.0+bd7b30d
ip-10-0-190-137.us-east-2.compute.internal   Ready      worker   50m   v1.20.0+bd7b30d
ip-10-0-196-211.us-east-2.compute.internal   Ready      worker   50m   v1.20.0+bd7b30d
ip-10-0-209-75.us-east-2.compute.internal    NotReady   worker   14s   v1.20.0+bd7b30d

4. create custom ingress controller with routerSelector
$cat ingressctl-route-selector.yaml 
kind: IngressController
apiVersion: operator.openshift.io/v1
  name: test
  namespace: openshift-ingress-operator
    name: router-certs-default 
  domain: jechen-0713a.qe.devcluster.openshift.com 
  replicas: 1
    type: NodePortService
      route: router-test

$ oc create -f ./test2/ingressctl-route-selector.yaml 
ingresscontroller.operator.openshift.io/test created

5. scale up the ingresscontroller above to 6
$ oc -n openshift-ingress-operator scale --replicas=6 ingresscontroller/test
ingresscontroller.operator.openshift.io/test scaled

$ oc -n openshift-ingress get pod -owide | grep router-test
router-test-65c5748cb8-475w9      1/1     Running   0          52s   ip-10-0-158-37.us-east-2.compute.internal    <none>           <none>
router-test-65c5748cb8-4g6nq      1/1     Running   0          52s   ip-10-0-196-211.us-east-2.compute.internal   <none>           <none>
router-test-65c5748cb8-77l6p      1/1     Running   0          52s   ip-10-0-150-210.us-east-2.compute.internal   <none>           <none>
router-test-65c5748cb8-9cm4t      1/1     Running   0          52s   ip-10-0-190-137.us-east-2.compute.internal   <none>           <none>
router-test-65c5748cb8-bvbnr      1/1     Running   0          52s   ip-10-0-209-75.us-east-2.compute.internal    <none>           <none>
router-test-65c5748cb8-z5rr2      1/1     Running   0          52s    ip-10-0-180-83.us-east-2.compute.internal    <none>           <none>

$ oc get nodes  $(oc get pods -n openshift-ingress -l ingresscontroller.operator.openshift.io/deployment-ingresscontroller=test -o jsonpath='{range .items[*]} {.spec.nodeName}{"\n"}') -o jsonpath='{range .items[*]} {.metadata.name} {" "} {.metadata.labels.failure-domain\.beta\.kubernetes\.io/zone} {"\n"} {end}'
 ip-10-0-158-37.us-east-2.compute.internal   us-east-2a 
  ip-10-0-196-211.us-east-2.compute.internal   us-east-2c 
  ip-10-0-150-210.us-east-2.compute.internal   us-east-2a 
  ip-10-0-190-137.us-east-2.compute.internal   us-east-2b 
  ip-10-0-209-75.us-east-2.compute.internal   us-east-2c 
  ip-10-0-180-83.us-east-2.compute.internal   us-east-2b 

Balanced ingress is across the multi-AZ cluster

Comment 5 errata-xmlrpc 2021-07-26 17:35:23 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.7.21 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


