Hide Forgot
Description of problem: Never see Progressing=True in upgrade for clusteroperator cluster-autoscaler Version-Release number of selected component (if applicable): clusterversion: 4.0.0-0.nightly-2019-03-13-233958 How reproducible: Always Steps to Reproduce: 1. Install a cluster 4.0 with 4.0.0-0.nightly-2019-03-13-233958 version. 2. Upgrade the cluster to 4.0.0-0.nightly-2019-03-14-135819 3. Watch the clusteroperator cluster-autoscaler status in upgrade Actual results: Never see Progressing=True in upgrade Every 2.0s: oc get clusteroperator NAME VERSION AVAILABLE PROGRESSING FAILING SINCE cluster-autoscaler 4.0.0-0.nightly-2019-03-14-135819 True False False 3m28s Expected results: We could see Progressing=True in upgrade Additional info:
Is the cluster autoscaler deployed in this environment? If not, you're never going to see it go progressing because there is no operand to wait on.
I deployed autoscaler, then upgrade to 4.0.0-0.nightly-2019-03-14-135819. Maybe it happens very fast so that I couldn't see it. $ oc get clusterautoscaler NAME AGE default 117s $ oc get machineautoscaler NAME REF KIND REF NAME MIN MAX AGE autoscale-a MachineSet zhsun1-rxgd4-worker-ap-northeast-1a 1 3 18s autoscale-c MachineSet zhsun1-rxgd4-worker-ap-northeast-1c 1 3 29s autoscale-d MachineSet zhsun1-rxgd4-worker-ap-northeast-1d 1 3 49s $ oc get pod NAME READY STATUS RESTARTS AGE cluster-autoscaler-default-774f5b4c7-plwdb 1/1 Running 0 2m6s cluster-autoscaler-operator-df46df49b-slgmv 1/1 Running 1 23m clusterapi-manager-controllers-7fb5fcdb87-2b2bs 4/4 Running 0 22m machine-api-operator-6997c457b8-pw2sn 1/1 Running 0 22m $ oc adm upgrade --to 4.0.0-0.nightly-2019-03-14-135819 Updating to 4.0.0-0.nightly-2019-03-14-135819
sunzhaohua, based on your comment (https://bugzilla.redhat.com/show_bug.cgi?id=1689146#c3) are you saying you are no longer able to reproduce the issue?
Jan, no, I could reproduce it each time. I mean I still couldn't see "Progressing=True" in upgrade after I deployed autoscaler.
Do you have logs from cluster autoscaler operator pod? We have a log statement for ```glog.Infof("Syncing to version %v", r.releaseVersion)``` which is utilized when we are setting status=progressing.
Before upgrade log: $ oc logs -f cluster-autoscaler-operator-5c548c64b5-mfrvb I0321 03:42:00.543450 1 main.go:14] Go Version: go1.10.8 I0321 03:42:00.544605 1 main.go:15] Go OS/Arch: linux/amd64 I0321 03:42:00.544626 1 main.go:16] Version: cluster-autoscaler-operator v4.0.22-201903161424-dirty W0321 03:42:00.653609 1 machineautoscaler_controller.go:118] Removing support for unregistered target type: cluster.k8s.io/v1alpha1, Kind=MachineDeployment W0321 03:42:00.654425 1 machineautoscaler_controller.go:118] Removing support for unregistered target type: cluster.k8s.io/v1alpha1, Kind=MachineSet I0321 03:42:00.654809 1 main.go:30] Starting cluster-autoscaler-operator I0321 03:42:00.654961 1 leaderelection.go:205] attempting to acquire leader lease openshift-machine-api/cluster-autoscaler-operator-leader... I0321 03:42:00.668584 1 status.go:136] Setting operator to available I0321 03:42:00.668690 1 status.go:97] Setting operator version to: 4.0.0-0.nightly-2019-03-19-004004 I0321 03:42:00.679447 1 status.go:109] operator status not current; Updating operator I0321 03:42:00.686451 1 leaderelection.go:214] successfully acquired lease openshift-machine-api/cluster-autoscaler-operator-leader ... I0321 05:35:52.138271 1 machineautoscaler_controller.go:153] Reconciling MachineAutoscaler openshift-machine-api/autoscale-a I0321 05:35:52.163000 1 machineautoscaler_controller.go:153] Reconciling MachineAutoscaler openshift-machine-api/autoscale-a I0321 05:35:52.167299 1 machineautoscaler_controller.go:153] Reconciling MachineAutoscaler openshift-machine-api/autoscale-a I0321 05:36:08.554285 1 machineautoscaler_controller.go:153] Reconciling MachineAutoscaler openshift-machine-api/autoscale-b I0321 05:36:08.569716 1 machineautoscaler_controller.go:153] Reconciling MachineAutoscaler openshift-machine-api/autoscale-b I0321 05:36:42.031018 1 clusterautoscaler_controller.go:122] Reconciling ClusterAutoscaler default I0321 05:36:42.031056 1 clusterautoscaler_controller.go:216] Creating cluster-autoscaler deployment openshift-machine-api/default I0321 05:36:42.052843 1 clusterautoscaler_controller.go:122] Reconciling ClusterAutoscaler default I0321 05:36:42.075085 1 clusterautoscaler_controller.go:122] Reconciling ClusterAutoscaler default I0321 05:36:42.086717 1 clusterautoscaler_controller.go:122] Reconciling ClusterAutoscaler default I0321 05:36:42.106773 1 clusterautoscaler_controller.go:122] Reconciling ClusterAutoscaler default I0321 05:37:06.969623 1 clusterautoscaler_controller.go:122] Reconciling ClusterAutoscaler default E0321 05:43:15.878000 1 streamwatcher.go:109] Unable to decode an event from the watch stream: http2: server sent GOAWAY and closed the connection; LastStreamID=13369, ErrCode=NO_ERROR, debug="" E0321 05:43:15.878776 1 streamwatcher.go:109] Unable to decode an event from the watch stream: http2: server sent GOAWAY and closed the connection; LastStreamID=13369, ErrCode=NO_ERROR, debug="" E0321 05:43:15.879257 1 streamwatcher.go:109] Unable to decode an event from the watch stream: http2: server sent GOAWAY and closed the connection; LastStreamID=13369, ErrCode=NO_ERROR, debug="" E0321 05:43:15.879359 1 streamwatcher.go:109] Unable to decode an event from the watch stream: http2: server sent GOAWAY and closed the connection; LastStreamID=13369, ErrCode=NO_ERROR, debug="" E0321 05:43:15.879261 1 streamwatcher.go:109] Unable to decode an event from the watch stream: http2: server sent GOAWAY and closed the connection; LastStreamID=13369, ErrCode=NO_ERROR, debug="" W0321 05:43:16.163743 1 reflector.go:270] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:126: watch of *unstructured.Unstructured ended with: unexpected object: &{map[message:too old resource version: 15769 (77787) reason:Gone code:410 kind:Status apiVersion:v1 metadata:map[] status:Failure]} W0321 05:43:16.364825 1 reflector.go:270] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:126: watch of *unstructured.Unstructured ended with: unexpected object: &{map[status:Failure message:too old resource version: 77224 (77787) reason:Gone code:410 kind:Status apiVersion:v1 metadata:map[]]} W0321 05:43:16.498326 1 reflector.go:270] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:126: watch of *v1alpha1.ClusterAutoscaler ended with: too old resource version: 77534 (81743) I0321 05:43:17.369594 1 machineautoscaler_controller.go:153] Reconciling MachineAutoscaler openshift-machine-api/autoscale-a I0321 05:43:17.374819 1 machineautoscaler_controller.go:153] Reconciling MachineAutoscaler openshift-machine-api/autoscale-b I0321 05:43:17.505730 1 clusterautoscaler_controller.go:122] Reconciling ClusterAutoscaler default error: unexpected EOF After upgrade log: $ oc logs -f cluster-autoscaler-operator-5d866c497-jhzrx I0321 06:05:56.171001 1 main.go:14] Go Version: go1.10.8 I0321 06:05:56.171406 1 main.go:15] Go OS/Arch: linux/amd64 I0321 06:05:56.171421 1 main.go:16] Version: cluster-autoscaler-operator v4.0.22-201903161424-dirty W0321 06:05:56.422828 1 machineautoscaler_controller.go:118] Removing support for unregistered target type: cluster.k8s.io/v1alpha1, Kind=MachineDeployment W0321 06:05:56.423339 1 machineautoscaler_controller.go:118] Removing support for unregistered target type: cluster.k8s.io/v1alpha1, Kind=MachineSet I0321 06:05:56.423705 1 main.go:30] Starting cluster-autoscaler-operator I0321 06:05:56.423830 1 leaderelection.go:205] attempting to acquire leader lease openshift-machine-api/cluster-autoscaler-operator-leader... I0321 06:05:56.440400 1 status.go:136] Setting operator to available I0321 06:05:56.440432 1 status.go:97] Setting operator version to: 4.0.0-0.nightly-2019-03-20-153904 I0321 06:05:56.445727 1 status.go:109] operator status not current; Updating operator I0321 06:06:42.484309 1 leaderelection.go:214] successfully acquired lease openshift-machine-api/cluster-autoscaler-operator-leader I0321 06:06:42.685102 1 clusterautoscaler_controller.go:122] Reconciling ClusterAutoscaler default I0321 06:06:42.687792 1 clusterautoscaler_controller.go:122] Reconciling ClusterAutoscaler default I0321 06:06:42.685102 1 machineautoscaler_controller.go:153] Reconciling MachineAutoscaler openshift-machine-api/autoscale-a I0321 06:06:42.695697 1 machineautoscaler_controller.go:153] Reconciling MachineAutoscaler openshift-machine-api/autoscale-b I0321 06:06:42.712879 1 clusterautoscaler_controller.go:122] Reconciling ClusterAutoscaler default I0321 06:06:42.739058 1 clusterautoscaler_controller.go:122] Reconciling ClusterAutoscaler default I0321 06:06:42.750482 1 clusterautoscaler_controller.go:122] Reconciling ClusterAutoscaler default I0321 06:06:42.792135 1 clusterautoscaler_controller.go:122] Reconciling ClusterAutoscaler default I0321 06:07:08.852923 1 clusterautoscaler_controller.go:122] Reconciling ClusterAutoscaler default I0321 06:07:08.860783 1 clusterautoscaler_controller.go:122] Reconciling ClusterAutoscaler default
It doesn't appear a clusterautoscaler deployment was ever created. A ClusterAutoscaler CR is not created by default, has to be done by the user (or some other automation), this will trigger a clusterautoscaler deployment. Without a deployment present, there is nothing for us to upgrade, so we don't report progressing.
(In reply to Michael Gugino from comment #8) > It doesn't appear a clusterautoscaler deployment was ever created. A > ClusterAutoscaler CR is not created by default, has to be done by the user > (or some other automation), this will trigger a clusterautoscaler > deployment. Without a deployment present, there is nothing for us to > upgrade, so we don't report progressing. Disregard this ^^. Looks like your install/upgrade is using an old version of cluster-autoscaler-operator that does not have latest code for enabling status=Progressing.
I think this should be fixed: https://github.com/openshift/cluster-autoscaler-operator/pull/79
verified. During upgrade from 4.0.0-0.9 to 4.0.0-0.10 we can see: $ oc get clusteroperator NAME VERSION AVAILABLE PROGRESSING FAILING SINCE authentication 4.0.0-0.10 True False False 3s cloud-credential 4.0.0-0.10 True False False 3h56m cluster-autoscaler 4.0.0-0.10 True True False 3h57m
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0758