Hide Forgot
+++ This bug was initially created as a clone of Bug #1873900 +++ With 4.6.0-0.nightly-2020-08-25-204643: $ grep 'Loading payload\|Payload loaded\|Running sync.*in state\|Desired version from spec\|Result of work' /tmp/xx/namespaces/openshift-cluster-version/pods/cluster-version-operator-54589d4748-fj56v/cluster-version-operator/cluster-version-operator/logs/current.log 2020-08-28T03:36:00.384964545Z I0828 03:36:00.356808 1 sync_worker.go:485] Running sync 4.6.0-0.nightly-2020-08-25-204643 (force=false) on generation 1 in state Reconciling at attempt 0 2020-08-28T03:36:00.384964545Z I0828 03:36:00.356823 1 sync_worker.go:493] Loading payload 2020-08-28T03:36:01.854233287Z I0828 03:36:01.854208 1 sync_worker.go:593] Payload loaded from registry.svc.ci.openshift.org/ocp/release@sha256:56945dc7218d758e25ffe990374668890e8c77d72132c98e0cc8f6272c063cc7 with hash VdMtCylIGgw= 2020-08-28T03:39:05.050142724Z I0828 03:39:05.050129 1 sync_worker.go:485] Running sync 4.6.0-0.nightly-2020-08-25-204643 (force=false) on generation 1 in state Reconciling at attempt 0 2020-08-28T03:39:05.050166757Z I0828 03:39:05.050159 1 sync_worker.go:493] Loading payload 2020-08-28T03:39:05.770317300Z I0828 03:39:05.770275 1 sync_worker.go:593] Payload loaded from registry.svc.ci.openshift.org/ocp/release@sha256:56945dc7218d758e25ffe990374668890e8c77d72132c98e0cc8f6272c063cc7 with hash VdMtCylIGgw= 2020-08-28T03:42:05.761345833Z I0828 03:42:05.761295 1 task_graph.go:546] Result of work: [] 2020-08-28T03:44:56.717918691Z I0828 03:44:56.717873 1 sync_worker.go:485] Running sync 4.6.0-0.nightly-2020-08-25-204643 (force=false) on generation 1 in state Reconciling at attempt 1 2020-08-28T03:45:40.127607986Z I0828 03:45:40.127597 1 task_graph.go:546] Result of work: [] ... 2020-08-28T05:46:20.965430733Z I0828 05:46:20.965389 1 sync_worker.go:485] Running sync 4.6.0-0.nightly-2020-08-25-204643 (force=false) on generation 1 in state Reconciling at attempt 35 2020-08-28T05:47:00.525075212Z I0828 05:47:00.525051 1 task_graph.go:546] Result of work: [] 2020-08-28T05:47:32.154197038Z I0828 05:47:32.154182 1 sync_worker.go:485] Running sync 4.6.0-0.nightly-2020-08-25-204643 (force=false) on generation 2 in state Updating at attempt 0 2020-08-28T05:47:32.285316684Z I0828 05:47:32.285292 1 task_graph.go:546] Result of work: [] 2020-08-28T05:48:17.099951081Z I0828 05:48:17.096699 1 task_graph.go:546] Result of work: [] 2020-08-28T05:49:38.130488133Z I0828 05:49:38.130460 1 sync_worker.go:485] Running sync 4.6.0-0.nightly-2020-08-25-204643 (force=false) on generation 3 in state Updating at attempt 0 2020-08-28T05:49:38.246716527Z I0828 05:49:38.246705 1 task_graph.go:546] Result of work: [] 2020-08-28T05:50:02.959409172Z I0828 05:50:02.959384 1 task_graph.go:546] Result of work: [Could not update deployment "openshift-cloud-credential-operator/cloud-credential-operator" (193 of 598) Could not update role "openshift-machine-api/prometheus-k8s-cluster-autoscaler-operator" (222 of 598) Could not update clusterrole "system:openshift:aggregate-snapshots-to-view" (243 of 598) Could not update configmap "openshift-image-registry/trusted-ca" (258 of 598) Could not update service "openshift-ingress-operator/metrics" (277 of 598) Could not update rolebinding "openshift/cluster-samples-operator-openshift-edit" (322 of 598) Could not update deployment "openshift-monitoring/cluster-monitoring-operator" (359 of 598) Could not update clusterrolebinding "console-extensions-reader" (375 of 598) Could not update rolebinding "openshift-config/insights-operator" (407 of 598) Could not update deployment "openshift-operator-lifecycle-manager/olm-operator" (433 of 598) Could not update servicemonitor "openshift-marketplace/marketplace-operator" (454 of 598) Could not update service "openshift-service-ca-operator/metrics" (460 of 598)] 2020-08-28T05:50:02.960435734Z I0828 05:50:02.960413 1 sync_worker.go:485] Running sync 4.6.0-0.nightly-2020-08-25-204643 (force=false) on generation 4 in state Updating at attempt 0 2020-08-28T05:50:04.939552794Z I0828 05:50:04.939539 1 task_graph.go:546] Result of work: [] 2020-08-28T05:50:46.585629809Z I0828 05:50:46.585626 1 task_graph.go:546] Result of work: [] 2020-08-28T05:51:33.800694010Z I0828 05:51:33.800672 1 sync_worker.go:485] Running sync 4.6.0-0.nightly-2020-08-25-204643 (force=false) on generation 5 in state Updating at attempt 0 2020-08-28T05:51:33.931867692Z I0828 05:51:33.931861 1 task_graph.go:546] Result of work: [] 2020-08-28T05:56:53.162544970Z I0828 05:56:53.162531 1 task_graph.go:546] Result of work: [Cluster operator csi-snapshot-controller has not yet reported success] 2020-08-28T05:56:53.163581403Z I0828 05:56:53.163565 1 sync_worker.go:485] Running sync 4.6.0-0.nightly-2020-08-25-204643 (force=false) on generation 6 in state Updating at attempt 0 2020-08-28T05:56:53.285166167Z I0828 05:56:53.285162 1 task_graph.go:546] Result of work: [] 2020-08-28T05:57:54.624727704Z I0828 05:57:54.624715 1 task_graph.go:546] Result of work: [] 2020-08-28T05:58:25.498627106Z I0828 05:58:25.498596 1 sync_worker.go:485] Running sync 4.6.0-0.nightly-2020-08-25-204643 (force=false) on generation 7 in state Updating at attempt 0 2020-08-28T05:58:25.616645308Z I0828 05:58:25.616634 1 task_graph.go:546] Result of work: [] 2020-08-28T05:59:10.409241880Z I0828 05:59:10.409232 1 task_graph.go:546] Result of work: [] 2020-08-28T06:02:01.365927807Z I0828 06:02:01.365884 1 sync_worker.go:485] Running sync 4.6.0-0.nightly-2020-08-25-204643 (force=false) on generation 7 in state Reconciling at attempt 0 2020-08-28T06:02:44.773794796Z I0828 06:02:44.773786 1 task_graph.go:546] Result of work: [] 2020-08-28T06:05:35.730466807Z I0828 06:05:35.730422 1 sync_worker.go:485] Running sync 4.6.0-0.nightly-2020-08-25-204643 (force=false) on generation 7 in state Reconciling at attempt 0 2020-08-28T06:06:19.137044203Z I0828 06:06:19.137032 1 task_graph.go:546] Result of work: [] 2020-08-28T06:09:10.093712198Z I0828 06:09:10.093665 1 sync_worker.go:485] Running sync 4.6.0-0.nightly-2020-08-25-204643 (force=false) on generation 7 in state Reconciling at attempt 0 2020-08-28T10:30:29.184128618Z I0828 10:30:29.184105 1 cvo.go:435] Desired version from spec is v1.Update{Version:"", Image:"quay.io/openshift-release-dev/ocp-release-nightly@sha256:567be653c695fce831f848537f64ba7e06c86d47af8e8dd15718e8586a1f087d", Force:false} 2020-08-28T10:31:39.714275754Z I0828 10:31:39.714250 1 cvo.go:435] Desired version from spec is v1.Update{Version:"", Image:"quay.io/openshift-release-dev/ocp-release-nightly@sha256:567be653c695fce831f848537f64ba7e06c86d47af8e8dd15718e8586a1f087d", Force:false} ... 2020-08-28T13:44:13.345879177Z I0828 13:44:13.345865 1 cvo.go:435] Desired version from spec is v1.Update{Version:"", Image:"quay.io/openshift-release-dev/ocp-release-nightly@sha256:567be653c695fce831f848537f64ba7e06c86d47af8e8dd15718e8586a1f087d", Force:false} 2020-08-28T13:47:18.337033333Z I0828 13:47:18.336993 1 cvo.go:435] Desired version from spec is v1.Update{Version:"", Image:"registry.svc.ci.openshift.org/ocp/release:4.6.0-0.nightly-2020-08-27-005538", Force:true} ... 2020-08-28T14:03:28.722402381Z I0828 14:03:28.722348 1 cvo.go:435] Desired version from spec is v1.Update{Version:"", Image:"registry.svc.ci.openshift.org/ocp/release:4.6.0-0.nightly-2020-08-27-005538", Force:true} 2020-08-28T14:05:18.370948271Z I0828 14:05:18.370906 1 cvo.go:435] Desired version from spec is v1.Update{Version:"", Image:"quay.io/openshift-release-dev/ocp-release-nightly@sha256:567be653c695fce831f848537f64ba7e06c86d47af8e8dd15718e8586a1f087d", Force:false} ... 2020-08-28T23:29:02.546827101Z I0828 23:29:02.546802 1 cvo.go:435] Desired version from spec is v1.Update{Version:"", Image:"quay.io/openshift-release-dev/ocp-release-nightly@sha256:567be653c695fce831f848537f64ba7e06c86d47af8e8dd15718e8586a1f087d", Force:false} Once the desired version is changed, and it changes a few times, there is no sign from the sync worker that it is still alive. Not clear where it is getting stuck, or if it is not stuck and just iterating in some low-log pathway. --- Additional comment from W. Trevor King on 2020-08-30 22:10:22 UTC --- http://virt-openshift-05.lab.eng.nay.redhat.com/wduan/logs/must-gather.local.8306168597559322770_0829.tar.gz --- Additional comment from W. Trevor King on 2020-08-31 21:46:17 UTC --- Moving back to NEW. The landed logging is not enough to actually fix anything, it will just help us understand the next time we catch this in a must-gather. --- Additional comment from W. Trevor King on 2020-09-11 05:11:11 UTC --- Landing more logging. Still not clear enough to fix the issue. --- Additional comment from W. Trevor King on 2020-09-14 20:04:18 UTC --- Recent 4.6.0-0.ci-2020-09-14-081525 -> 4.6.0-0.ci-2020-09-14-103240 job [1] had still not acknowledged the update by gather-time: $ curl -s https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-e2e-gcp-upgrade-4.6/1305495056425160704/artifacts/e2e-gcp-upgrade/clusterversion.json | jq -r '.items[].status.history[] | .startedTime + " " + .completionTime + " " + .version + " " + .state + " " + (.verified | tostring)' 2020-09-14T13:21:41Z 2020-09-14T13:56:47Z 4.6.0-0.ci-2020-09-14-081525 Completed false Running CVO is from before install-complete: $ curl -s https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-e2e-gcp-upgrade-4.6/1305495056425160704/artifacts/e2e-gcp-upgrade/pods/openshift-cluster-version_cluster-version-operator-55d9c8686d-smksh_cluster-version-operator.log | head -n1 I0914 13:32:27.214436 1 start.go:21] ClusterVersionOperator v1.0.0-372-g6d56c655-dirty From those same logs, here's the update to 4.6.0-0.ci-2020-09-14-103240 making it through: I0914 13:59:51.815658 1 cvo.go:435] Desired version from spec is v1.Update{Version:"", Image:"registry.svc.ci.openshift.org/ocp/release:4.6.0-0.ci-2020-09-14-103240", Force:true} I0914 13:59:51.815686 1 sync_worker.go:243] Cancel the sync worker's current loop I0914 13:59:51.815700 1 sync_worker.go:249] Notify the sync worker that new work is available I0914 13:59:51.815716 1 status.go:159] Synchronizing errs=field.ErrorList{} status=&cvo.SyncWorkerStatus{Generation:1, Step:"ApplyResources", Failure:error(nil), Fraction:0.78571427, Completed:0, Reconciling:true, Initial:false, VersionHash:"EeHTCcLjL-Y=", LastProgress:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}, Actual:v1.Release{Version:"4.6.0-0.ci-2020-09-14-081525", Image:"registry.svc.ci.openshift.org/ocp/release@sha256:139d14b24df55f821d7027db5804d7261d8eec024a6e2a55d972f344869b0ce8", URL:"", Channels:[]string(nil)}, Verified:false} I0914 13:59:51.815781 1 status.go:79] merge into existing history completed=false desired=v1.Release{Version:"4.6.0-0.ci-2020-09-14-081525", Image:"registry.svc.ci.openshift.org/ocp/release@sha256:139d14b24df55f821d7027db5804d7261d8eec024a6e2a55d972f344869b0ce8", URL:"", Channels:[]string(nil)} last=&v1.UpdateHistory{State:"Completed", StartedTime:v1.Time{Time:time.Time{wall:0x0, ext:63735686501, loc:(*time.Location)(0x28cfd60)}}, CompletionTime:(*v1.Time)(0xc000855ca0), Version:"4.6.0-0.ci-2020-09-14-081525", Image:"registry.svc.ci.openshift.org/ocp/release@sha256:139d14b24df55f821d7027db5804d7261d8eec024a6e2a55d972f344869b0ce8", Verified:false} E0914 13:59:51.815800 1 task.go:81] error running apply for configmap "openshift-console-operator/console-operator-config" (390 of 602): context canceled I0914 13:59:51.815874 1 task_graph.go:483] Canceled worker 1 while waiting for work ... E0914 13:59:51.815971 1 task.go:81] error running apply for service "openshift-machine-config-operator/machine-config-daemon" (494 of 602): context canceled I0914 13:59:51.817013 1 task_graph.go:483] Canceled worker 0 while waiting for work ... I0914 14:00:02.786276 1 cvo.go:435] Desired version from spec is v1.Update{Version:"", Image:"registry.svc.ci.openshift.org/ocp/release:4.6.0-0.ci-2020-09-14-103240", Force:true} I0914 14:00:02.786311 1 sync_worker.go:222] Update work is equal to current target; no change required and all subsequent sync-loop goroutine work hangs. Maybe in RunGraph? Maybe on consistentReporter.lock? I think I need to catch this in a live cluster so I can SIGQUIT [2] and get a stack dump. [1]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-e2e-gcp-upgrade-4.6/1305495056425160704 [2]: https://golang.org/pkg/os/signal/#hdr-Default_behavior_of_signals_in_Go_programs --- Additional comment from W. Trevor King on 2020-09-14 23:49:27 UTC --- I think I reproduced this by installing 4.6.0-fc.5 and updating back and forth between fc.4 and fc.5 until the sync worker hung. Then in one terminal I streamed down the logs: $ oc -n openshift-cluster-version logs -f cluster-version-operator-659794d67-x6l84 while in another terminal I exec'ed in and SIGQUIT the CVO: $ oc -n openshift-cluster-version exec -it cluster-version-operator-659794d67-x6l84 -- bash [root@ci-ln-gjxvdvb-f76d1-jrtws-master-1 /]# ps aux USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 1 0.9 0.5 964340 89732 ? Ssl 23:03 0:10 /usr/bin/cluster-version-operator start --release-image=registry.svc.ci.openshift.org/ocp/release@sha256:5883d0db15939484bd477147e6949c53fbc6f55 root 16 0.2 0.0 11840 2952 pts/0 Ss 23:22 0:00 bash root 30 0.0 0.0 51768 3424 pts/0 R+ 23:22 0:00 ps aux [root@ci-ln-gjxvdvb-f76d1-jrtws-master-1 /]# kill -s QUIT 1 [root@ci-ln-gjxvdvb-f76d1-jrtws-master-1 /]# command terminated with exit code 137 to get a traceback. The sync worker goroutine looked like: goroutine 113 [chan receive, 5 minutes]: github.com/openshift/cluster-version-operator/pkg/payload.RunGraph(0x1c31fc0, 0xc000c7e960, 0xc001c853c0, 0x2, 0xc0019c2360, 0x0, 0x0, 0x0) /go/src/github.com/openshift/cluster-version-operator/pkg/payload/task_graph.go:510 +0x450 github.com/openshift/cluster-version-operator/pkg/cvo.(*SyncWorker).apply(0xc00018a300, 0x1c31fc0, 0xc000c7e960, 0xc0007a8000, 0xc001c399e0, 0x2, 0x1beaa00, 0xc0009d8050, 0x18ee2a0, 0xc001b38600) /go/src/github.com/openshift/cluster-version-operator/pkg/cvo/sync_worker.go:685 +0x558 github.com/openshift/cluster-version-operator/pkg/cvo.(*SyncWorker).syncOnce(0xc00018a300, 0x1c31fc0, 0xc000c7e960, 0xc001c399e0, 0x10, 0x1beaa00, 0xc0009d8050, 0xc000aa2480, 0xc00109f6e4, 0xa) /go/src/github.com/openshift/cluster-version-operator/pkg/cvo/sync_worker.go:596 +0x34f github.com/openshift/cluster-version-operator/pkg/cvo.(*SyncWorker).Start.func1.1(0xc001c399e0, 0xc00018a300, 0x1c31f40, 0xc000b0e0c0, 0x1bf70c0, 0xc000115860, 0x1967d48, 0x7, 0x10, 0x0, ...) /go/src/github.com/openshift/cluster-version-operator/pkg/cvo/sync_worker.go:324 +0x264 github.com/openshift/cluster-version-operator/pkg/cvo.(*SyncWorker).Start.func1() /go/src/github.com/openshift/cluster-version-operator/pkg/cvo/sync_worker.go:325 +0x366 k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0xc000f3bf40) /go/src/github.com/openshift/cluster-version-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:155 +0x5f k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc001c85f40, 0x1beb200, 0xc000b64120, 0xc000b0e001, 0xc000038960) /go/src/github.com/openshift/cluster-version-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:156 +0xa3 k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc000f3bf40, 0x989680, 0x0, 0x1, 0xc000038960) /go/src/github.com/openshift/cluster-version-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133 +0x98 k8s.io/apimachinery/pkg/util/wait.Until(...) /go/src/github.com/openshift/cluster-version-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:90 github.com/openshift/cluster-version-operator/pkg/cvo.(*SyncWorker).Start(0xc00018a300, 0x1c31f40, 0xc000b0e0c0, 0x10, 0x1967d48, 0x7, 0x1bf70c0, 0xc000115860) /go/src/github.com/openshift/cluster-version-operator/pkg/cvo/sync_worker.go:256 +0x14c created by github.com/openshift/cluster-version-operator/pkg/cvo.(*Operator).Run /go/src/github.com/openshift/cluster-version-operator/pkg/cvo/cvo.go:310 +0x43b And since the hung CVO was from rc.5, we can resolve those line numbers in [1]: $ oc adm release info --commits quay.io/openshift-release-dev/ocp-release:4.6.0-fc.5-x86_64 | grep cluster-version-operator cluster-version-operator https://github.com/openshift/cluster-version-operator d6475eff00192a56cd6a833aa7f73d6f4f4a2cef So it was hung in the RunGraph call that attempts release image reconciliation [2] while it's waiting to read from resultCh. I probably flubbed context expiration handling in the new-in-4.6 [4]. [1]: https://github.com/openshift/cluster-version-operator/tree/d6475eff00192a56cd6a833aa7f73d6f4f4a2cef [2]: https://github.com/openshift/cluster-version-operator/blob/d6475eff00192a56cd6a833aa7f73d6f4f4a2cef/pkg/cvo/sync_worker.go#L685 [3]: https://github.com/openshift/cluster-version-operator/blob/d6475eff00192a56cd6a833aa7f73d6f4f4a2cef/pkg/payload/task_graph.go#L510 [4]: https://github.com/openshift/cluster-version-operator/pull/264 --- Additional comment from W. Trevor King on 2020-09-16 16:38:57 UTC --- One workaround is to kill the cluster-version operator pod. The deployment controller will create a replacement, and that replacement will begin syncing manifests again without getting stuck in the racy update-triggers-a-sync-worker-cancel hole. --- Additional comment from W. Trevor King on 2020-09-16 18:02:06 UTC --- Took a stab at doc text, although I'm also fine with this introduced-in-4.6-and-fixed-before-GA bug not having doc text. --- Additional comment from OpenShift Automated Release Tooling on 2020-09-18 07:06:11 UTC --- Elliott changed bug status from MODIFIED to ON_QA. --- Additional comment from Johnny Liu on 2020-09-21 08:20:05 UTC --- > I think I reproduced this by installing 4.6.0-fc.5 and updating back and forth between fc.4 and fc.5 until the sync worker hung. I tried this scenario for about 7 rounds (back to fc.4, then forth to fc.5 again), each upgrade is completed successfully. @wking, do you have idea how to reproduce this issue? --- Additional comment from W. Trevor King on 2020-09-21 17:24:17 UTC --- Per the description in [1], you want to have a big stack of parallel nodes, so that the workers will not be able to consume all the nodes in the queue before your target-release bump gets them canceled. We have a big stack of parallel nodes when we are in the reconciliation phase, but during that phase the CVO is mostly doing quick "yeah, the in-cluster state is fine for this manifest" checks. You might have more luck trying to catch it if you re-target mid-update while the CVO is early in the process of reconciling runlevel 50, after the kube-storage-version-migrator operator levels in runlevel 40. But we also landed a unit test, so if we can't actually reproduce the breakage after a reasonable effort, I'm fine with just putting the new code through the usual regression testing. [1]: https://github.com/openshift/cluster-version-operator/pull/455/commits/632e763fa209c3e1bd07098c2e33599f8c663e3e --- Additional comment from Johnny Liu on 2020-09-24 14:04:34 UTC --- Thanks for your hint. I finally reproduced this issue against a cluster with 20 workers, and updating back and forth between fc.4 and fc.5, when I am trying to upgrade back to fc.4 in the 3rd round, the sync worker get hung. [root@preserve-jialiu-ansible ~]# oc adm upgrade --to-image=quay.io/openshift-release-dev/ocp-release@sha256:960ec73733150827076cbb5fa2c1f5aaa9a94bfbce1b4897e46432a56ac976c1 --allow-explicit-upgrade warning: The requested upgrade image is not one of the available updates. You have used --allow-explicit-upgrade to the update to proceed anyway Updating to release image quay.io/openshift-release-dev/ocp-release@sha256:960ec73733150827076cbb5fa2c1f5aaa9a94bfbce1b4897e46432a56ac976c1 [root@preserve-jialiu-ansible ~]# oc get clusterversion -o json| jq -r '.items[].spec' { "channel": "stable-4.6", "clusterID": "ab92b0b2-0d37-4f5c-83f9-e467ec43af1a", "desiredUpdate": { "force": false, "image": "quay.io/openshift-release-dev/ocp-release@sha256:960ec73733150827076cbb5fa2c1f5aaa9a94bfbce1b4897e46432a56ac976c1", "version": "" }, "upstream": "https://api.openshift.com/api/upgrades_info/v1/graph" } [root@preserve-jialiu-ansible ~]# oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.6.0-fc.5 True False 107m Cluster version is 4.6.0-fc.5 On the same cluster, run updating back and forth between fc.5 and 4.6.0-0.nightly-2020-09-22-073212, run 4 rounds, all get passed. --- Additional comment from W. Trevor King on 2020-09-24 17:04:42 UTC --- > On the same cluster, run updating back and forth between fc.5 and 4.6.0-0.nightly-2020-09-22-073212, run 4 rounds, all get passed. Only the outgoing version should matter, so fc.5 -> patched nightly would still be vulnerable to the race. But still, VERIFIED makes sense to me, and we'll hear about it in CI if we missed anything else related to accepting update requests. --- Additional comment from errata-xmlrpc on 2020-10-06 18:03:45 UTC --- This bug has been added to advisory RHBA-2020:54579 by OpenShift Release Team Bot (ocp-build/buildvm.openshift.eng.bos.redhat.com) --- Additional comment from errata-xmlrpc on 2020-10-26 00:34:19 UTC --- Bug report changed to RELEASE_PENDING status by Errata System. Advisory RHBA-2020:4196-05 has been changed to PUSH_READY status. https://errata.devel.redhat.com/advisory/54579 --- Additional comment from errata-xmlrpc on 2020-10-27 16:36:11 UTC --- Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196
Hi Jack, I'm going to verify it. Could you please hint me how can I trigger the trace level log? Thanks.
v5 should give you access to: $ git --no-pager grep 'klog.V.5' origin/release-4.5 -- pkg/cvo origin/release-4.5:pkg/cvo/cvo_test.go: klog.V(5).Infof("updated object to %#v", obj) origin/release-4.5:pkg/cvo/status.go: klog.V(5).Infof("initialize new history completed=%t desired=%#v", completed, desired) origin/release-4.5:pkg/cvo/status.go: klog.V(5).Infof("merge into existing history completed=%t desired=%#v last=%#v", completed, desired, last) origin/release-4.5:pkg/cvo/status.go: klog.V(5).Infof("must add a new history entry completed=%t desired=%#v != last=%#v", completed, desired, last) origin/release-4.5:pkg/cvo/status.go: if klog.V(5) && len(config.Status.History) > 1 { origin/release-4.5:pkg/cvo/status.go: klog.V(5).Infof("Synchronizing errs=%#v status=%#v", validationErrs, status) origin/release-4.5:pkg/cvo/sync_worker.go: klog.V(5).Infof("Starting sync worker") origin/release-4.5:pkg/cvo/sync_worker.go: klog.V(5).Infof("Stopped worker") origin/release-4.5:pkg/cvo/sync_worker.go: klog.V(5).Infof("Wait finished") origin/release-4.5:pkg/cvo/sync_worker.go: klog.V(5).Infof("Work updated") origin/release-4.5:pkg/cvo/sync_worker.go: klog.V(5).Infof("No change, waiting") origin/release-4.5:pkg/cvo/sync_worker.go: klog.V(5).Infof("No work, waiting") origin/release-4.5:pkg/cvo/sync_worker.go: klog.V(5).Infof("Previous sync status: %#v", reporter.previousStatus) origin/release-4.5:pkg/cvo/sync_worker.go: klog.V(5).Infof("Sync succeeded, reconciling") origin/release-4.5:pkg/cvo/sync_worker.go: klog.V(5).Infof("Worker shut down") origin/release-4.5:pkg/cvo/sync_worker.go: klog.V(5).Infof("Dropping status report from earlier in sync loop") origin/release-4.5:pkg/cvo/sync_worker.go: klog.V(5).Infof("Status change %#v", update) origin/release-4.5:pkg/cvo/sync_worker.go: if klog.V(5) { origin/release-4.5:pkg/cvo/sync_worker.go: klog.V(5).Infof("Manifest: %s", string(task.Manifest.Raw)) For example, 4.5.37 CI doesn't have 'merge into existing history': $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/release-openshift-origin-installer-launch-aws/1377215339498049536/artifacts/launch/pods/openshift-cluster-version_cluster-version-operator-8dc8b5fc4-fxhnt_cluster-version-operator.log | grep 'merge into existing history' While recent 4.5 nightlies do: $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.5-e2e-aws/1377872648524009472/artifacts/e2e-aws/gather-extra/artifacts/pods/openshift-cluster-version_cluster-version-operator-744cf5c97f-fkd2r_cluster-version-operator.log | grep 'merge into existing history' | head -n1 I0402 07:15:39.344456 1 status.go:79] merge into existing history completed=false desired=v1.Update{Version:"4.5.0-0.nightly-2021-04-02-061143", Image:"registry.build02.ci.openshift.org/ci-op-jdcvygb3/release@sha256:573aec4324c3013f6305b807b3e383dfda96f503d70386221f58966483b6a00d", Force:false} last=&v1.UpdateHistory{State:"Partial", StartedTime:v1.Time{Time:time.Time{wall:0x0, ext:63752942657, loc:(*time.Location)(0x271c5c0)}}, CompletionTime:(*v1.Time)(nil), Version:"4.5.0-0.nightly-2021-04-02-061143", Image:"registry.build02.ci.openshift.org/ci-op-jdcvygb3/release@sha256:573aec4324c3013f6305b807b3e383dfda96f503d70386221f58966483b6a00d", Verified:false}
Trevor, I appreciate your help. For a fresh installed cluster with 4.5.0-0.nightly-2021-04-02-061143, we can see many v5 logs "merge into existing history" in the cvo pod log: # cat 1.log |grep "merge into existing history" I0404 08:40:17.040642 1 status.go:79] merge into existing history completed=true desired=v1.Update{Version:"4.5.0-0.nightly-2021-04-02-061143", Image:"registry.ci.openshift.org/ocp/release@sha256:573aec4324c3013f6305b807b3e383dfda96f503d70386221f58966483b6a00d", Force:false} last=&v1.UpdateHistory{State:"Completed", StartedTime:v1.Time{Time:time.Time{wall:0x0, ext:63753119866, loc:(*time.Location)(0x271c5c0)}}, CompletionTime:(*v1.Time)(0xc000aa15e0), Version:"4.5.0-0.nightly-2021-04-02-061143", Image:"registry.ci.openshift.org/ocp/release@sha256:573aec4324c3013f6305b807b3e383dfda96f503d70386221f58966483b6a00d", Verified:false} While a 4.5.36 cluster does not have that log: # cat 2.log | grep "merge into existing history completed" Moving it to verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.5.38 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:1300