Description of problem: Failed job: https://prow.k8s.io/view/gcs/origin-ci-test/logs/canary-openshift-ocp-installer-e2e-azure-4.2/59 https://prow.k8s.io/view/gcs/origin-ci-test/logs/canary-openshift-ocp-installer-e2e-azure-4.2/57 https://prow.k8s.io/view/gcs/origin-ci-test/logs/canary-openshift-ocp-installer-e2e-azure-4.2/39 https://prow.k8s.io/view/gcs/origin-ci-test/logs/canary-openshift-ocp-installer-e2e-azure-4.2/35 Failed error: Aug 20 23:44:06.601 - 120s W ns/openshift-must-gather-v7xkc pod/must-gather-l46fv node/ci-op-7lyqsprv-282fe-kk5rn-master-1 pod has been pending longer than a minute Aug 20 23:45:55.642 W persistentvolume/pvc-7ca2b921-c3a4-11e9-8c87-000d3a3ec41e compute.DisksClient#Delete: Failure sending request: StatusCode=0 -- Original Error: autorest/azure: Service returned an error. Status=<nil> Code="OperationNotAllowed" Message="Disk ci-op-7lyqsprv-282fe-kk5rn-dynamic-pvc-7ca2b921-c3a4-11e9-8c87-000d3a3ec41e is attached to VM /subscriptions/d38f1e38-4bed-438e-b227-833f997adf6a/resourceGroups/ci-op-7lyqsprv-282fe-kk5rn-rg/providers/Microsoft.Compute/virtualMachines/ci-op-7lyqsprv-282fe-kk5rn-worker-centralus2-q5979." Aug 20 23:46:12.185 I ns/openshift-must-gather-v7xkc pod/must-gather-l46fv Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:068e9da3c7056e6d0044de04a2bba814a9df098350b9336c735e900650f4b465" already present on machine Aug 20 23:46:12.646 I ns/openshift-must-gather-v7xkc pod/must-gather-l46fv Created container copy Aug 20 23:46:12.691 I ns/openshift-must-gather-v7xkc pod/must-gather-l46fv Started container copy Aug 20 23:46:17.263 W persistentvolume/pvc-6a4b7673-c3a4-11e9-96e8-000d3a949a12 compute.DisksClient#Delete: Failure sending request: StatusCode=0 -- Original Error: autorest/azure: Service returned an error. Status=<nil> Code="OperationNotAllowed" Message="Disk ci-op-7lyqsprv-282fe-kk5rn-dynamic-pvc-6a4b7673-c3a4-11e9-96e8-000d3a949a12 is attached to VM /subscriptions/d38f1e38-4bed-438e-b227-833f997adf6a/resourceGroups/ci-op-7lyqsprv-282fe-kk5rn-rg/providers/Microsoft.Compute/virtualMachines/ci-op-7lyqsprv-282fe-kk5rn-worker-centralus2-q5979." Aug 20 23:46:22.083 W ns/openshift-must-gather-v7xkc pod/must-gather-l46fv node/ci-op-7lyqsprv-282fe-kk5rn-master-1 graceful deletion within 0s Aug 20 23:46:22.121 W ns/openshift-must-gather-v7xkc pod/must-gather-l46fv node/ci-op-7lyqsprv-282fe-kk5rn-master-1 deleted Aug 20 23:47:52.738 W clusteroperator/monitoring changed Upgradeable to False: RollOutInProgress: Rollout of the monitoring stack is in progress. Please wait until it finishes. Aug 20 23:48:18.332 W clusteroperator/monitoring changed Upgradeable to True Aug 20 23:48:18.403 W clusteroperator/monitoring changed Upgradeable to False: RollOutInProgress: Rollout of the monitoring stack is in progress. Please wait until it finishes. Aug 20 23:48:43.686 W clusteroperator/monitoring changed Upgradeable to True Version-Release number of selected component (if applicable): redhat-openshift-release-informing#redhat-canary-openshift-ocp-installer-e2e-azure-4.2 How reproducible: Some times Case: [sig-autoscaling] [HPA] Horizontal pod autoscaling (scale resource: CPU) [sig-autoscaling] ReplicationController light Should scale from 2 pods to 1 pod [Suite:openshift/conformance/parallel] [Suite:k8s]
It's not clear to me yet why this was assigned to the cluster DNS component. Needs triaged.
Let's try kube-controller-manager since the failure mentions HPA.
These happened for one day almost two weeks ago and didn't reappear in the past few days, I'm lowing the priority and moving this to 4.3.
According to Derek, HPA is owned by monitoring team, moving accordingly.
Likely the HPA StackDriver tests will be disabled. Marking as a duplicate of the other bug. *** This bug has been marked as a duplicate of bug 1750851 ***