Bug 1978845

Summary: Scaled ingress replicas following sharded pattern don't balance evenly across multi-AZ
Product: OpenShift Container Platform Reporter: OpenShift BugZilla Robot <openshift-bugzilla-robot>
Component: NetworkingAssignee: Miciah Dashiel Butler Masters <mmasters>
Networking sub component: router QA Contact: jechen <jechen>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: medium CC: aclewett, aiyengar, amcdermo, aos-bugs, bmcelvee, bperkins, jechen, jelopez
Version: 4.6   
Target Milestone: ---   
Target Release: 4.7.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-07-26 17:35:23 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1900819    
Bug Blocks: 1977480    

Comment 2 jechen 2021-07-13 19:51:08 UTC
Verified in 4.7.0-0.nightly-2021-07-13-133801

$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.7.0-0.nightly-2021-07-13-133801   True        False         29m     Cluster version is 4.7.0-0.nightly-2021-07-13-133801

$ oc get node
NAME                                         STATUS   ROLES    AGE   VERSION
ip-10-0-134-71.us-east-2.compute.internal    Ready    master   54m   v1.20.0+bd7b30d
ip-10-0-158-37.us-east-2.compute.internal    Ready    worker   47m   v1.20.0+bd7b30d
ip-10-0-164-100.us-east-2.compute.internal   Ready    master   55m   v1.20.0+bd7b30d
ip-10-0-190-137.us-east-2.compute.internal   Ready    worker   46m   v1.20.0+bd7b30d
ip-10-0-196-211.us-east-2.compute.internal   Ready    worker   46m   v1.20.0+bd7b30d
ip-10-0-199-80.us-east-2.compute.internal    Ready    master   54m   v1.20.0+bd7b30d


1.Check the current machineset 
$ oc get machinesets -n openshift-machine-api
NAME                                   DESIRED   CURRENT   READY   AVAILABLE   AGE
jechen-0713a-cscj7-worker-us-east-2a   1         1         1       1           62m
jechen-0713a-cscj7-worker-us-east-2b   1         1         1       1           62m
jechen-0713a-cscj7-worker-us-east-2c   1         1         1       1           62m


2. Scale the  each of the above machineset
$ oc -n openshift-machine-api scale --replicas=2  machinesets jechen-0713a-cscj7-worker-us-east-2a 
machineset.machine.openshift.io/jechen-0713a-cscj7-worker-us-east-2a scaled
[jechen@jechen ~]$ oc -n openshift-machine-api scale --replicas=2  machinesets jechen-0713a-cscj7-worker-us-east-2b
machineset.machine.openshift.io/jechen-0713a-cscj7-worker-us-east-2b scaled
[jechen@jechen ~]$ oc -n openshift-machine-api scale --replicas=2  machinesets jechen-0713a-cscj7-worker-us-east-2c
machineset.machine.openshift.io/jechen-0713a-cscj7-worker-us-east-2c scaled

3. check if machineset scalings are successful
$ oc get machinesets -n openshift-machine-api
NAME                                   DESIRED   CURRENT   READY   AVAILABLE   AGE
jechen-0713a-cscj7-worker-us-east-2a   2         2         1       1           63m
jechen-0713a-cscj7-worker-us-east-2b   2         2         1       1           63m
jechen-0713a-cscj7-worker-us-east-2c   2         2         1       1           63m


$ oc get node |grep worker
ip-10-0-150-210.us-east-2.compute.internal   NotReady   worker   28s   v1.20.0+bd7b30d
ip-10-0-158-37.us-east-2.compute.internal    Ready      worker   51m   v1.20.0+bd7b30d
ip-10-0-180-83.us-east-2.compute.internal    NotReady   worker   33s   v1.20.0+bd7b30d
ip-10-0-190-137.us-east-2.compute.internal   Ready      worker   50m   v1.20.0+bd7b30d
ip-10-0-196-211.us-east-2.compute.internal   Ready      worker   50m   v1.20.0+bd7b30d
ip-10-0-209-75.us-east-2.compute.internal    NotReady   worker   14s   v1.20.0+bd7b30d


4. create custom ingress controller with routerSelector
$cat ingressctl-route-selector.yaml 
kind: IngressController
apiVersion: operator.openshift.io/v1
metadata:
  name: test
  namespace: openshift-ingress-operator
spec:
  defaultCertificate:
    name: router-certs-default 
  domain: jechen-0713a.qe.devcluster.openshift.com 
  replicas: 1
  endpointPublishingStrategy:
    type: NodePortService
  routeSelector:
    matchLabels:
      route: router-test


$ oc create -f ./test2/ingressctl-route-selector.yaml 
ingresscontroller.operator.openshift.io/test created


5. scale up the ingresscontroller above to 6
$ oc -n openshift-ingress-operator scale --replicas=6 ingresscontroller/test
ingresscontroller.operator.openshift.io/test scaled


$ oc -n openshift-ingress get pod -owide | grep router-test
router-test-65c5748cb8-475w9      1/1     Running   0          52s   10.131.0.17   ip-10-0-158-37.us-east-2.compute.internal    <none>           <none>
router-test-65c5748cb8-4g6nq      1/1     Running   0          52s   10.129.2.13   ip-10-0-196-211.us-east-2.compute.internal   <none>           <none>
router-test-65c5748cb8-77l6p      1/1     Running   0          52s   10.131.2.12   ip-10-0-150-210.us-east-2.compute.internal   <none>           <none>
router-test-65c5748cb8-9cm4t      1/1     Running   0          52s   10.128.2.36   ip-10-0-190-137.us-east-2.compute.internal   <none>           <none>
router-test-65c5748cb8-bvbnr      1/1     Running   0          52s   10.128.4.11   ip-10-0-209-75.us-east-2.compute.internal    <none>           <none>
router-test-65c5748cb8-z5rr2      1/1     Running   0          52s   10.130.2.9    ip-10-0-180-83.us-east-2.compute.internal    <none>           <none>


$ oc get nodes  $(oc get pods -n openshift-ingress -l ingresscontroller.operator.openshift.io/deployment-ingresscontroller=test -o jsonpath='{range .items[*]} {.spec.nodeName}{"\n"}') -o jsonpath='{range .items[*]} {.metadata.name} {" "} {.metadata.labels.failure-domain\.beta\.kubernetes\.io/zone} {"\n"} {end}'
 ip-10-0-158-37.us-east-2.compute.internal   us-east-2a 
  ip-10-0-196-211.us-east-2.compute.internal   us-east-2c 
  ip-10-0-150-210.us-east-2.compute.internal   us-east-2a 
  ip-10-0-190-137.us-east-2.compute.internal   us-east-2b 
  ip-10-0-209-75.us-east-2.compute.internal   us-east-2c 
  ip-10-0-180-83.us-east-2.compute.internal   us-east-2b 

Balanced ingress is across the multi-AZ cluster

Comment 5 errata-xmlrpc 2021-07-26 17:35:23 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.7.21 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:2762