Bug 1689146
| Summary: | [cloud] Never see Progressing=True in upgrade for clusteroperator cluster-autoscaler | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | sunzhaohua <zhsun> |
| Component: | Cloud Compute | Assignee: | Brad Ison <brad.ison> |
| Status: | CLOSED ERRATA | QA Contact: | sunzhaohua <zhsun> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 4.1.0 | CC: | aos-cloud, brad.ison, decarr, jchaloup, mgugino, xtian |
| Target Milestone: | --- | ||
| Target Release: | 4.1.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | No Doc Update | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2019-06-04 10:45:52 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
sunzhaohua
2019-03-15 09:45:42 UTC
Is the cluster autoscaler deployed in this environment? If not, you're never going to see it go progressing because there is no operand to wait on. I deployed autoscaler, then upgrade to 4.0.0-0.nightly-2019-03-14-135819. Maybe it happens very fast so that I couldn't see it. $ oc get clusterautoscaler NAME AGE default 117s $ oc get machineautoscaler NAME REF KIND REF NAME MIN MAX AGE autoscale-a MachineSet zhsun1-rxgd4-worker-ap-northeast-1a 1 3 18s autoscale-c MachineSet zhsun1-rxgd4-worker-ap-northeast-1c 1 3 29s autoscale-d MachineSet zhsun1-rxgd4-worker-ap-northeast-1d 1 3 49s $ oc get pod NAME READY STATUS RESTARTS AGE cluster-autoscaler-default-774f5b4c7-plwdb 1/1 Running 0 2m6s cluster-autoscaler-operator-df46df49b-slgmv 1/1 Running 1 23m clusterapi-manager-controllers-7fb5fcdb87-2b2bs 4/4 Running 0 22m machine-api-operator-6997c457b8-pw2sn 1/1 Running 0 22m $ oc adm upgrade --to 4.0.0-0.nightly-2019-03-14-135819 Updating to 4.0.0-0.nightly-2019-03-14-135819 sunzhaohua, based on your comment (https://bugzilla.redhat.com/show_bug.cgi?id=1689146#c3) are you saying you are no longer able to reproduce the issue? Jan, no, I could reproduce it each time. I mean I still couldn't see "Progressing=True" in upgrade after I deployed autoscaler. Do you have logs from cluster autoscaler operator pod?
We have a log statement for ```glog.Infof("Syncing to version %v", r.releaseVersion)```
which is utilized when we are setting status=progressing.
Before upgrade log:
$ oc logs -f cluster-autoscaler-operator-5c548c64b5-mfrvb
I0321 03:42:00.543450 1 main.go:14] Go Version: go1.10.8
I0321 03:42:00.544605 1 main.go:15] Go OS/Arch: linux/amd64
I0321 03:42:00.544626 1 main.go:16] Version: cluster-autoscaler-operator v4.0.22-201903161424-dirty
W0321 03:42:00.653609 1 machineautoscaler_controller.go:118] Removing support for unregistered target type: cluster.k8s.io/v1alpha1, Kind=MachineDeployment
W0321 03:42:00.654425 1 machineautoscaler_controller.go:118] Removing support for unregistered target type: cluster.k8s.io/v1alpha1, Kind=MachineSet
I0321 03:42:00.654809 1 main.go:30] Starting cluster-autoscaler-operator
I0321 03:42:00.654961 1 leaderelection.go:205] attempting to acquire leader lease openshift-machine-api/cluster-autoscaler-operator-leader...
I0321 03:42:00.668584 1 status.go:136] Setting operator to available
I0321 03:42:00.668690 1 status.go:97] Setting operator version to: 4.0.0-0.nightly-2019-03-19-004004
I0321 03:42:00.679447 1 status.go:109] operator status not current; Updating operator
I0321 03:42:00.686451 1 leaderelection.go:214] successfully acquired lease openshift-machine-api/cluster-autoscaler-operator-leader
...
I0321 05:35:52.138271 1 machineautoscaler_controller.go:153] Reconciling MachineAutoscaler openshift-machine-api/autoscale-a
I0321 05:35:52.163000 1 machineautoscaler_controller.go:153] Reconciling MachineAutoscaler openshift-machine-api/autoscale-a
I0321 05:35:52.167299 1 machineautoscaler_controller.go:153] Reconciling MachineAutoscaler openshift-machine-api/autoscale-a
I0321 05:36:08.554285 1 machineautoscaler_controller.go:153] Reconciling MachineAutoscaler openshift-machine-api/autoscale-b
I0321 05:36:08.569716 1 machineautoscaler_controller.go:153] Reconciling MachineAutoscaler openshift-machine-api/autoscale-b
I0321 05:36:42.031018 1 clusterautoscaler_controller.go:122] Reconciling ClusterAutoscaler default
I0321 05:36:42.031056 1 clusterautoscaler_controller.go:216] Creating cluster-autoscaler deployment openshift-machine-api/default
I0321 05:36:42.052843 1 clusterautoscaler_controller.go:122] Reconciling ClusterAutoscaler default
I0321 05:36:42.075085 1 clusterautoscaler_controller.go:122] Reconciling ClusterAutoscaler default
I0321 05:36:42.086717 1 clusterautoscaler_controller.go:122] Reconciling ClusterAutoscaler default
I0321 05:36:42.106773 1 clusterautoscaler_controller.go:122] Reconciling ClusterAutoscaler default
I0321 05:37:06.969623 1 clusterautoscaler_controller.go:122] Reconciling ClusterAutoscaler default
E0321 05:43:15.878000 1 streamwatcher.go:109] Unable to decode an event from the watch stream: http2: server sent GOAWAY and closed the connection; LastStreamID=13369, ErrCode=NO_ERROR, debug=""
E0321 05:43:15.878776 1 streamwatcher.go:109] Unable to decode an event from the watch stream: http2: server sent GOAWAY and closed the connection; LastStreamID=13369, ErrCode=NO_ERROR, debug=""
E0321 05:43:15.879257 1 streamwatcher.go:109] Unable to decode an event from the watch stream: http2: server sent GOAWAY and closed the connection; LastStreamID=13369, ErrCode=NO_ERROR, debug=""
E0321 05:43:15.879359 1 streamwatcher.go:109] Unable to decode an event from the watch stream: http2: server sent GOAWAY and closed the connection; LastStreamID=13369, ErrCode=NO_ERROR, debug=""
E0321 05:43:15.879261 1 streamwatcher.go:109] Unable to decode an event from the watch stream: http2: server sent GOAWAY and closed the connection; LastStreamID=13369, ErrCode=NO_ERROR, debug=""
W0321 05:43:16.163743 1 reflector.go:270] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:126: watch of *unstructured.Unstructured ended with: unexpected object: &{map[message:too old resource version: 15769 (77787) reason:Gone code:410 kind:Status apiVersion:v1 metadata:map[] status:Failure]}
W0321 05:43:16.364825 1 reflector.go:270] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:126: watch of *unstructured.Unstructured ended with: unexpected object: &{map[status:Failure message:too old resource version: 77224 (77787) reason:Gone code:410 kind:Status apiVersion:v1 metadata:map[]]}
W0321 05:43:16.498326 1 reflector.go:270] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:126: watch of *v1alpha1.ClusterAutoscaler ended with: too old resource version: 77534 (81743)
I0321 05:43:17.369594 1 machineautoscaler_controller.go:153] Reconciling MachineAutoscaler openshift-machine-api/autoscale-a
I0321 05:43:17.374819 1 machineautoscaler_controller.go:153] Reconciling MachineAutoscaler openshift-machine-api/autoscale-b
I0321 05:43:17.505730 1 clusterautoscaler_controller.go:122] Reconciling ClusterAutoscaler default
error: unexpected EOF
After upgrade log:
$ oc logs -f cluster-autoscaler-operator-5d866c497-jhzrx
I0321 06:05:56.171001 1 main.go:14] Go Version: go1.10.8
I0321 06:05:56.171406 1 main.go:15] Go OS/Arch: linux/amd64
I0321 06:05:56.171421 1 main.go:16] Version: cluster-autoscaler-operator v4.0.22-201903161424-dirty
W0321 06:05:56.422828 1 machineautoscaler_controller.go:118] Removing support for unregistered target type: cluster.k8s.io/v1alpha1, Kind=MachineDeployment
W0321 06:05:56.423339 1 machineautoscaler_controller.go:118] Removing support for unregistered target type: cluster.k8s.io/v1alpha1, Kind=MachineSet
I0321 06:05:56.423705 1 main.go:30] Starting cluster-autoscaler-operator
I0321 06:05:56.423830 1 leaderelection.go:205] attempting to acquire leader lease openshift-machine-api/cluster-autoscaler-operator-leader...
I0321 06:05:56.440400 1 status.go:136] Setting operator to available
I0321 06:05:56.440432 1 status.go:97] Setting operator version to: 4.0.0-0.nightly-2019-03-20-153904
I0321 06:05:56.445727 1 status.go:109] operator status not current; Updating operator
I0321 06:06:42.484309 1 leaderelection.go:214] successfully acquired lease openshift-machine-api/cluster-autoscaler-operator-leader
I0321 06:06:42.685102 1 clusterautoscaler_controller.go:122] Reconciling ClusterAutoscaler default
I0321 06:06:42.687792 1 clusterautoscaler_controller.go:122] Reconciling ClusterAutoscaler default
I0321 06:06:42.685102 1 machineautoscaler_controller.go:153] Reconciling MachineAutoscaler openshift-machine-api/autoscale-a
I0321 06:06:42.695697 1 machineautoscaler_controller.go:153] Reconciling MachineAutoscaler openshift-machine-api/autoscale-b
I0321 06:06:42.712879 1 clusterautoscaler_controller.go:122] Reconciling ClusterAutoscaler default
I0321 06:06:42.739058 1 clusterautoscaler_controller.go:122] Reconciling ClusterAutoscaler default
I0321 06:06:42.750482 1 clusterautoscaler_controller.go:122] Reconciling ClusterAutoscaler default
I0321 06:06:42.792135 1 clusterautoscaler_controller.go:122] Reconciling ClusterAutoscaler default
I0321 06:07:08.852923 1 clusterautoscaler_controller.go:122] Reconciling ClusterAutoscaler default
I0321 06:07:08.860783 1 clusterautoscaler_controller.go:122] Reconciling ClusterAutoscaler default
It doesn't appear a clusterautoscaler deployment was ever created. A ClusterAutoscaler CR is not created by default, has to be done by the user (or some other automation), this will trigger a clusterautoscaler deployment. Without a deployment present, there is nothing for us to upgrade, so we don't report progressing. (In reply to Michael Gugino from comment #8) > It doesn't appear a clusterautoscaler deployment was ever created. A > ClusterAutoscaler CR is not created by default, has to be done by the user > (or some other automation), this will trigger a clusterautoscaler > deployment. Without a deployment present, there is nothing for us to > upgrade, so we don't report progressing. Disregard this ^^. Looks like your install/upgrade is using an old version of cluster-autoscaler-operator that does not have latest code for enabling status=Progressing. I think this should be fixed: https://github.com/openshift/cluster-autoscaler-operator/pull/79 verified. During upgrade from 4.0.0-0.9 to 4.0.0-0.10 we can see: $ oc get clusteroperator NAME VERSION AVAILABLE PROGRESSING FAILING SINCE authentication 4.0.0-0.10 True False False 3s cloud-credential 4.0.0-0.10 True False False 3h56m cluster-autoscaler 4.0.0-0.10 True True False 3h57m Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0758 |