Description of problem: Rerunning wsu ansible playbook fail in TASK [Check for bootstrap CSR] when no Pending csr exist, there are 2 situations: 1, csr is all in approved status, ansible shell "oc get csr | awk '/system:serviceaccount:openshift-machine-config-operator:node-bootstrapper/ && /Pending/ {print $1}'" will always be "". # oc get csr NAME AGE REQUESTOR CONDITION csr-9csdn 50s system:node:winworker-ay3n2 Approved,Issued csr-hhkpj 3m31s system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Approved,Issued # oc get csr | awk '/system:serviceaccount:openshift-machine-config-operator:node-bootstrapper/ && /Pending/ {print $1}' # # ansible-playbook -i hosts ~/go/src/windows-machine-config-operator/tools/ansible/tasks/wsu/main.yaml ... TASK [Check for bootstrap CSR] *********** FAILED - RETRYING: Check for bootstrap CSR (2 retries left). FAILED - RETRYING: Check for bootstrap CSR (1 retries left). fatal: [40.69.171.210 -> localhost]: FAILED! => {"attempts": 2, "changed": true, "cmd": "oc get csr | awk '/system:serviceaccount:openshift-machine-config-operator:node-bootstrapper/ && /Pending/ {print $1}'", "delta": "0:00:01.098141", "end": "2019-11-26 02:51:59.182270", "rc": 0, "start": "2019-11-26 02:51:58.084129", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []} PLAY RECAP *********** 40.69.171.210 : ok=2 changed=0 unreachable=0 failed=1 skipped=0 rescued=0 ignored=0 localhost : ok=2 changed=1 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0 2, Cluster will clean the approved csr after a while (maybe hours), then ansible shell will always be "" too. # oc get csr No resources found. # oc get csr | awk '/system:serviceaccount:openshift-machine-config-operator:node-bootstrapper/ && /Pending/ {print $1}' No resources found. # # ansible-playbook -i hosts ~/go/src/windows-machine-config-operator/tools/ansible/tasks/wsu/main.yaml ... TASK [Check for bootstrap CSR] *********** FAILED - RETRYING: Check for bootstrap CSR (2 retries left). FAILED - RETRYING: Check for bootstrap CSR (1 retries left). fatal: [40.69.171.210 -> localhost]: FAILED! => {"attempts": 2, "changed": true, "cmd": "oc get csr | awk '/system:serviceaccount:openshift-machine-config-operator:node-bootstrapper/ && /Pending/ {print $1}'", "delta": "0:00:06.170185", "end": "2019-11-26 02:15:27.416078", "rc": 0, "start": "2019-11-26 02:15:21.245893", "stderr": "No resources found.", "stderr_lines": ["No resources found."], "stdout": "", "stdout_lines": []} PLAY RECAP *********** 40.69.171.210 : ok=6 changed=4 unreachable=0 failed=1 skipped=1 rescued=0 ignored=0 localhost : ok=7 changed=6 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0 Version-Release number of selected component (if applicable): # oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.3.0-0.nightly-2019-11-24-183610 True False 5m35s Cluster version is 4.3.0-0.nightly-2019-11-24-183610 windows-machine-config-operator commit: # git show commit 1eb1f983774101b5077828fd2efb4dfb711d5886 How reproducible: Always Steps to Reproduce: 1. Install OCP 4.3 cluster with ovn-kubernetes 2. Edit ovn-kubernetes as following # oc edit Network.operator.openshift.io cluster # oc get Network.operator.openshift.io cluster -o yaml apiVersion: operator.openshift.io/v1 kind: Network metadata: creationTimestamp: "2019-11-25T13:02:46Z" generation: 2 name: cluster resourceVersion: "21021" selfLink: /apis/operator.openshift.io/v1/networks/cluster uid: c0315a6b-41fa-446d-971f-70c846607467 spec: clusterNetwork: - cidr: 10.128.0.0/14 hostPrefix: 23 defaultNetwork: ovnKubernetesConfig: hybridOverlayConfig: hybridClusterNetwork: - cidr: 10.132.0.0/14 hostPrefix: 23 type: OVNKubernetes logLevel: "" serviceNetwork: - 172.30.0.0/16 status: {} 3. Create windows instance with wni # ./wni azure create --kubeconfig ~/window_container/azure/cluster/kubeconfig --credentials ~/.azure/osServicePrincipal.json --image-id MicrosoftWindowsServer:WindowsServer:2019-Datacenter-with-Containers:latest --instance-type Standard_D2s_v3 4. Run wsu ansible the first time # ansible-playbook -i hosts ~/go/src/windows-machine-config-operator/tools/ansible/tasks/wsu/main.yaml ... PLAY RECAP ****** 40.69.171.210 : ok=11 changed=8 unreachable=0 failed=0 skipped=1 rescued=0 ignored=0 localhost : ok=7 changed=6 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0 5. Rerun wsu ansible again Actual results: Rerun ansible playbook fail when no Pending csr exist Expected results: Rerun ansible playbook should not block following tasks even though no Pending csr exist Additional info:
This bug has been verified and passed on OCP 4.4.0-0.nightly-2020-01-12-032939, thanks. Version: # oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.4.0-0.nightly-2020-01-12-032939 True False 77m Cluster version is 4.4.0-0.nightly-2020-01-12-032939 WMCO repo: # git show commit 389ae941d6113d8a741719a5fb559e5deca0a506 Steps: 1, Run WSU against a Windows node attached to a 4.4 cluster # ansible-playbook -i hosts ~/go/src/windows-machine-config-operator/tools/ansible/tasks/wsu/main.yaml -v 2, Check ansible run succeed and windows node is added to cluster # oc get node NAME STATUS ROLES AGE VERSION ... ip-10-0-31-23.us-east-2.compute.internal Ready worker 129m v1.16.2 3, Run WSU again, check ansible run succeed and windows node is not affected 4, Delete windows node and run ansible again, check it can be added back # oc delete node ip-10-0-31-23.us-east-2.compute.internal node "ip-10-0-31-23.us-east-2.compute.internal" deleted # ansible-playbook -i hosts ~/go/src/windows-machine-config-operator/tools/ansible/tasks/wsu/main.yaml -v ... # oc get nodes NAME STATUS ROLES AGE VERSION ... ip-10-0-31-23.us-east-2.compute.internal Ready worker 5m42s v1.16.2 5, Check windows work load # oc create -f WinWebServer116.yaml deployment.apps/win-webserver created ... # oc get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES win-webserver-79b64df8b9-p2txh 1/1 Running 0 2m8s 10.132.2.2 ip-10-0-31-23.us-east-2.compute.internal <none> <none>
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0581