Bug 1772192
Summary: | AWS machine-controller does not honor requeue delay | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | W. Trevor King <wking> |
Component: | Cloud Compute | Assignee: | Alberto <agarcial> |
Cloud Compute sub component: | Other Providers | QA Contact: | Jianwei Hou <jhou> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | low | ||
Priority: | medium | CC: | agarcial, jkaur |
Version: | 4.3.0 | ||
Target Milestone: | --- | ||
Target Release: | 4.5.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-07-13 17:12:14 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
W. Trevor King
2019-11-13 21:06:56 UTC
So at least the 4.3 CI thing [1] is not the GCP error wrapping thing. Here you can see delayIfRequeueAfterError triggering: $ curl -s https://storage.googleapis.com/origin-ci-test/logs/release-openshift-openshift-ansible-e2e-aws-scaleup-rhel7-4.3/242/artifacts/e2e-aws-scaleup-rhel7/must-gather/registry-svc-ci-openshift-org-ci-op-49m8kdi7-stable-sha256-64c63eedf863406fbc6c7515026f909a7221472cf70283708fb7010dd5e6139e/namespaces/openshift-machine-api/pods/machine-api-controllers-7445cdcd69-vlqjh/machine-controller/machine-controller/logs/current.log | grep -1 'Actuator returned requeue-after error' | tail -n3 2019-11-13T16:57:16.780894255Z W1113 16:57:16.780861 1 controller.go:321] Failed to create machine "ci-op-49m8kdi7-6df53-9bmsm-worker-us-east-1b-centos-mwlr9": requeue in: 20s 2019-11-13T16:57:16.780947512Z I1113 16:57:16.780925 1 controller.go:409] Actuator returned requeue-after error: requeue in: 20s 2019-11-13T16:57:16.781080475Z I1113 16:57:16.781054 1 controller.go:164] Reconciling Machine "ci-op-49m8kdi7-6df53-9bmsm-worker-us-east-1b-centos-mwlr9" but still not stopping the subsequent reconciliation. [1]: https://bugzilla.redhat.com/show_bug.cgi?id=1772163#c0 Validated on : [miyadav@miyadav ~]$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.5.0-0.nightly-2020-04-12-180647 True False 5m22s Cluster version is 4.5.0-0.nightly-2020-04-12-180647 Steps: 1.[miyadav@miyadav ~]$ oc project openshift-machine-api Now using project "openshift-machine-api" on server "https://api.miyadav-1304.qe.devcluster.openshift.com:6443". 2.[miyadav@miyadav ~]$ oc get machines NAME PHASE TYPE REGION ZONE AGE miyadav-1304-plrql-master-0 Running m4.xlarge us-east-2 us-east-2a 35m miyadav-1304-plrql-master-1 Running m4.xlarge us-east-2 us-east-2b 35m miyadav-1304-plrql-master-2 Running m4.xlarge us-east-2 us-east-2c 35m miyadav-1304-plrql-worker-us-east-2a-rrhzj Running m4.large us-east-2 us-east-2a 5m59s miyadav-1304-plrql-worker-us-east-2b-zdft4 Running m4.large us-east-2 us-east-2b 24m miyadav-1304-plrql-worker-us-east-2c-g2kn7 Running m4.large us-east-2 us-east-2c 24m 3.[miyadav@miyadav ~]$ oc delete machine miyadav-1304-plrql-worker-us-east-2b-zdft4 machine.machine.openshift.io "miyadav-1304-plrql-worker-us-east-2b-zdft4" deleted 4. oc logs -f machine-api-controllers-56b68ff6d7-qczp9 -c machine-controller Actual & Expected : I0413 03:44:29.525626 1 reconciler.go:352] miyadav-1304-plrql-worker-us-east-2b-wsnrv: Instance state still pending, returning an error to requeue I0413 03:44:29.525635 1 machine_scope.go:80] miyadav-1304-plrql-worker-us-east-2b-wsnrv: patching machine E0413 03:44:29.538575 1 actuator.go:65] miyadav-1304-plrql-worker-us-east-2b-wsnrv error: requeue in: 20s E0413 03:44:29.538628 1 controller.go:275] miyadav-1304-plrql-worker-us-east-2b-wsnrv: error updating machine: requeue in: 20s I0413 03:44:29.538639 1 controller.go:401] Actuator returned requeue-after error: requeue in: 20s I0413 03:44:29.538724 1 recorder.go:52] controller-runtime/manager/events "msg"="Warning" "message"="requeue in: 20s" "object"={"kind":"Machine","namespace":"openshift-machine-api","name":"miyadav-1304-plrql-worker-us-east-2b-wsnrv","uid":"1e209294-8f6e-462b-a52c-849576609dad","apiVersion":"machine.openshift.io/v1beta1","resourceVersion":"27546"} "reason"="FailedCreate" I0413 03:44:49.538922 1 controller.go:165] miyadav-1304-plrql-worker-us-east-2b-wsnrv: reconciling Machine I0413 03:44:49.538950 1 actuator.go:97] miyadav-1304-plrql-worker-us-east-2b-wsnrv: actuator checking if machine exists I0413 03:44:49.631983 1 reconciler.go:367] miyadav-1304-plrql-worker-us-east-2b-wsnrv: Found instance by id: i-0ad778699bb16aa1a I0413 03:44:49.632007 1 controller.go:273] miyadav-1304-plrql-worker-us-east-2b-wsnrv: reconciling machine triggers idempotent update I0413 03:44:49.632013 1 actuator.go:112] miyadav-1304-plrql-worker-us-east-2b-wsnrv: actuator updating machine . . . Additional Info : 20s difference is seen in logs honoring requeue time Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2409 The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days |