Bug 1759659
Summary: | [azure] MachineWithNoRunningPhase firing even when machines are running | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Abhinav Dahiya <adahiya> |
Component: | Cloud Compute | Assignee: | Alberto <agarcial> |
Cloud Compute sub component: | Other Providers | QA Contact: | Jianwei Hou <jhou> |
Status: | CLOSED CURRENTRELEASE | Docs Contact: | |
Severity: | medium | ||
Priority: | unspecified | CC: | agarcial, inecas, jhou, vjaypurk, vlaad, wking, xtian, zhsun |
Version: | 4.3.0 | ||
Target Milestone: | --- | ||
Target Release: | 4.3.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-05-14 14:20:51 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Abhinav Dahiya
2019-10-08 19:27:16 UTC
MachineWithNoRunningPhase is fired when a machine is provisioned. The alert is soon cleared after machine reached 'Running' phase. I think this is working correctly, so moving to verified. There's something preventing those machines from being fully gracefully terminated. That might be e.g PDBs or something else. Can we please get must gather logs? Hey Vedanti, the alert is legitimately triggering since machines are stuck "deleting". From the logs it seems the secret referenced by those machines has no perms to perform the deletion operation 2020-06-03T05:11:20.873424112Z E0603 05:11:20.873381 1 actuator.go:84] Machine error: failed to delete machine "cluster-wtcln-worker-eastus-4shjw": failed to delete machine: failed to delete vm cluster-wtcln-worker-eastus-4shjw in resource group cluster-wtcln-rg: compute.VirtualMachinesClient#Delete: Failure sending request: StatusCode=403 -- Original Error: Code="AuthorizationFailed" Message="The client '9b1463d5-687a-4397-a17d-b53e20961ee7' with object id '9b1463d5-687a-4397-a17d-b53e20961ee7' does not have authorization to perform action 'Microsoft.Compute/virtualMachines/delete' over scope '/subscriptions/117eba7c-c7b7-43a6-b9d6-d0f257dd71a5/resourceGroups/cluster-wtcln-rg/providers/Microsoft.Compute/virtualMachines/cluster-wtcln-worker-eastus-4shjw' or the scope is invalid. If access was recently granted, please refresh your credentials." 2020-06-03T05:11:30.956944884Z E0603 05:11:30.956903 1 actuator.go:84] Machine error: failed to delete machine "cluster-wtcln-worker-eastus-fnhsj": failed to delete machine: failed to delete vm cluster-wtcln-worker-eastus-fnhsj in resource group cluster-wtcln-rg: azure.BearerAuthorizer#WithAuthorization: Failed to refresh the Token for request to https://management.azure.com/subscriptions/117eba7c-c7b7-43a6-b9d6-d0f257dd71a5/resourceGroups/cluster-wtcln-rg/providers/Microsoft.Compute/virtualMachines/cluster-wtcln-worker-eastus-fnhsj?api-version=2018-10-01: StatusCode=0 -- Original Error: adal: Failed to execute the refresh request. Error = 'Post https://login.microsoftonline.com/4087a2a7-6506-40c1-86b4-1d0404c4969e/oauth2/token?api-version=1.0: dial tcp: lookup login.microsoftonline.com on 172.30.0.10:53: read udp 10.128.0.4:59119->172.30.0.10:53: i/o timeout' Have the perms referenced by azure-cloud-credentials been manipulated out of band? Can you please open a new BZ to track and discuss this? The "If access was recently granted, please refresh your credentials" error seems to have moved to bug 1846292. |