It seems like all your azure-credentials secrets were filled by empty strings at some point after first machine was provisioned an became running. Did you change any of those by any chance?
(In reply to Danil Grigorev from comment #1) > It seems like all your azure-credentials secrets were filled by empty > strings at some point after first machine was provisioned an became running. > Did you change any of those by any chance? Hi @jialiu can you please provide some info?
No, it is a ci cluster, as far as I know, no one touch with it after first machine was provisioned an became running.
this require further investigation. Tagging with UpcomingSprint
Copying into a public comment for search engines, the redacted failure was: Error Message: failed to reconcile machine "REDACTED": azure.BearerAuthorizer#WithAuthorization: Failed to refresh the Token for request to https://management.azure.com/subscriptions/REDACTED/resourceGroups/REDACTED/providers/Microsoft.Network/virtualNetworks/REDACTED/subnets/REDACTED?api-version=2018-12-01: StatusCode=401 -- Original Error: adal: Refresh request failed. Status Code = '401'. Response body: {"error":"invalid_client","error_description":"AADSTS700026: Client application has no configured keys.\r\nTrace ID: REDACTED\r\nCorrelation ID: REDACTED\r\nTimestamp: 2020-05-15 05:49:59Z","error_codes":[700026],"timestamp":"2020-05-15 05:49:59Z","trace_id":"REDACTED","correlation_id":"REDACTED"} Bug 1805163 has a similar error, although in that case it was a 400.
Hi, Can you verify that the service principal you are using: hasn't been deleted, hasn't been changed, password hasn't expired?
can we please also have must gather logs?
We should not set failed on invalid credentials. Credentials do expire from time to time and need to be refreshed. Or someone might have removed a role from a set of credentials out of band, and they could re-add them. This is especially true for just updating machines. If we're only setting failed on create, maybe that's okay. https://github.com/Azure/AKS/issues/222 This issue highlights the error "azure.BearerAuthorizer#WithAuthorization: Failed to refresh the Token for request to" listed in #c0 . Is CI or something else setting these credentials to be very short lived, and the credentials are expiring too quickly?
Tagging as upcoming sprint, I need to investigate possible fixes for the problem and won't complete that by the end of the sprint
Verified clusterversion: 4.6.0-0.nightly-2020-08-12-003456 Create a new machine, we don't get the following error any more, will watch CI results if meet this again. E0709 16:44:26.642473 1 actuator.go:78] Machine error: failed to reconcile machine "jspeed-test-8cnpg-worker-centralus3-g6htz"s: failed to create vm jspeed-test-8cnpg-worker-centralus3-g6htz: vm jspeed-test-8cnpg-worker-centralus3-g6htz is still in provisioning state Creating, reconcile I0812 08:43:03.349919 1 controller.go:169] zhsunazure8122-6bfhc-worker-northcentralus-zhnh6: reconciling Machine I0812 08:43:03.367456 1 controller.go:282] controller-runtime/controller "msg"="Successfully Reconciled" "controller"="machine_controller" "request"={"Namespace":"openshift-machine-api","Name":"zhsunazure8122-6bfhc-worker-northcentralus-zhnh6"} I0812 08:43:03.367735 1 controller.go:169] zhsunazure8122-6bfhc-worker-northcentralus-zhnh6: reconciling Machine I0812 08:43:03.367756 1 actuator.go:201] zhsunazure8122-6bfhc-worker-northcentralus-zhnh6: actuator checking if machine exists I0812 08:43:03.598401 1 controller.go:424] zhsunazure8122-6bfhc-worker-northcentralus-zhnh6: going into phase "Provisioning" I0812 08:43:03.617245 1 controller.go:313] zhsunazure8122-6bfhc-worker-northcentralus-zhnh6: reconciling machine triggers idempotent create I0812 08:43:03.617271 1 actuator.go:84] Creating machine zhsunazure8122-6bfhc-worker-northcentralus-zhnh6 I0812 08:43:03.732394 1 controller.go:282] controller-runtime/controller "msg"="Successfully Reconciled" "controller"="machineset" "request"={"Namespace":"openshift-machine-api","Name":"zhsunazure8122-6bfhc-worker-northcentralus"} I0812 08:43:03.732453 1 controller.go:72] controllers/MachineSet "msg"="Reconciling" "machineset"="zhsunazure8122-6bfhc-worker-northcentralus" "namespace"="openshift-machine-api" I0812 08:43:03.778119 1 controller.go:282] controller-runtime/controller "msg"="Successfully Reconciled" "controller"="machineset" "request"={"Namespace":"openshift-machine-api","Name":"zhsunazure8122-6bfhc-worker-northcentralus"} I0812 08:43:03.778163 1 controller.go:72] controllers/MachineSet "msg"="Reconciling" "machineset"="zhsunazure8122-6bfhc-worker-northcentralus" "namespace"="openshift-machine-api" I0812 08:43:03.841406 1 controller.go:282] controller-runtime/controller "msg"="Successfully Reconciled" "controller"="machineset" "request"={"Namespace":"openshift-machine-api","Name":"zhsunazure8122-6bfhc-worker-northcentralus"} I0812 08:43:04.849856 1 networkinterfaces.go:226] successfully created network interface zhsunazure8122-6bfhc-worker-northcentralus-zhnh6-nic I0812 08:43:04.929418 1 virtualmachines.go:76] getting nic zhsunazure8122-6bfhc-worker-northcentralus-zhnh6-nic I0812 08:43:05.068459 1 virtualmachines.go:85] got nic zhsunazure8122-6bfhc-worker-northcentralus-zhnh6-nic I0812 08:43:05.068498 1 virtualmachines.go:87] creating vm zhsunazure8122-6bfhc-worker-northcentralus-zhnh6 I0812 08:43:10.001707 1 machine_scope.go:160] zhsunazure8122-6bfhc-worker-northcentralus-zhnh6: patching machine E0812 08:43:10.044656 1 actuator.go:78] Machine error: failed to reconcile machine "zhsunazure8122-6bfhc-worker-northcentralus-zhnh6"s: failed to create vm zhsunazure8122-6bfhc-worker-northcentralus-zhnh6: failed to create or get machine: compute.VirtualMachinesCreateOrUpdateFuture: asynchronous operation has not completed W0812 08:43:10.044687 1 controller.go:315] zhsunazure8122-6bfhc-worker-northcentralus-zhnh6: failed to create machine: requeue in: 20s I0812 08:43:10.044704 1 controller.go:405] Actuator returned requeue-after error: requeue in: 20s I0812 08:43:10.044767 1 controller.go:169] zhsunazure8122-6bfhc-worker-northcentralus-zhnh6: reconciling Machine I0812 08:43:10.044783 1 actuator.go:201] zhsunazure8122-6bfhc-worker-northcentralus-zhnh6: actuator checking if machine exists I0812 08:43:10.044850 1 recorder.go:52] controller-runtime/manager/events "msg"="Warning" "message"="CreateError: failed to reconcile machine \"zhsunazure8122-6bfhc-worker-northcentralus-zhnh6\"s: failed to create vm zhsunazure8122-6bfhc-worker-northcentralus-zhnh6: failed to create or get machine: compute.VirtualMachinesCreateOrUpdateFuture: asynchronous operation has not completed" "object"={"kind":"Machine","namespace":"openshift-machine-api","name":"zhsunazure8122-6bfhc-worker-northcentralus-zhnh6","uid":"1d8cfd28-9a7f-4da3-b164-31f775171ae7","apiVersion":"machine.openshift.io/v1beta1","resourceVersion":"52648"} "reason"="FailedCreate" I0812 08:43:10.306552 1 reconciler.go:392] Machine 3820b466-edf9-4541-83f1-351ecc91cd55 is creating I0812 08:43:10.306580 1 reconciler.go:398] Found vm for machine zhsunazure8122-6bfhc-worker-northcentralus-zhnh6 I0812 08:43:10.306591 1 controller.go:277] zhsunazure8122-6bfhc-worker-northcentralus-zhnh6: reconciling machine triggers idempotent update I0812 08:43:10.306597 1 actuator.go:168] Updating machine zhsunazure8122-6bfhc-worker-northcentralus-zhnh6 I0812 08:43:10.691715 1 machine_scope.go:160] zhsunazure8122-6bfhc-worker-northcentralus-zhnh6: patching machine E0812 08:43:10.731714 1 controller.go:284] zhsunazure8122-6bfhc-worker-northcentralus-zhnh6: instance exists but providerID or addresses has not been given to the machine yet, requeuing
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196