Bug 1866554 - Cluster version operator does not manage shareProcessNamespace on pods and their consumers
Summary: Cluster version operator does not manage shareProcessNamespace on pods and th...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cluster Version Operator
Version: 4.1.z
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.6.0
Assignee: Vadim Rutkovsky
QA Contact: Wenjing Zheng
URL:
Whiteboard:
: 1857782 (view as bug list)
Depends On:
Blocks: 1868478
TreeView+ depends on / blocked
 
Reported: 2020-08-05 21:22 UTC by W. Trevor King
Modified: 2020-10-27 16:25 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: The cluster-version operator did not reconcile dnsPolicy, shareProcessNamespace, or terminationGracePeriodSeconds in managed manifests. Consequence: The value set in the manifest used to create the in-cluster object was preserved, regardless of the value in future manifests. Fix: The cluster-version operator now reconciles all PodSpec properties set in operator manifests. Result: Cluster-version operator managed objects now reconcile to remove any drift as operators adjust the properties in their manifests.
Clone Of:
Environment:
Last Closed: 2020-10-27 16:25:04 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-version-operator pull 428 0 None closed Bug 1866554: lib/resourcemerge/core: set ShareProcessNamespace, DNSConfig and TerminationGracePeriodSeconds 2020-12-10 17:19:40 UTC
Red Hat Product Errata RHBA-2020:4196 0 None None None 2020-10-27 16:25:24 UTC

Description W. Trevor King 2020-08-05 21:22:43 UTC
The image-registry folks added shareProcessNamespace to a Deployment between 4.1 and 4.2:

$ git --no-pager log --oneline -G shareProcessNamespace origin/release-4.2..origin/master -- manifests
...no hits...
$ git --no-pager log --oneline -G shareProcessNamespace origin/release-4.1..origin/master -- manifests
cc9e9fe05 (origin/pr/364) Integrating watchdog as a sidecar to registry operator.
3803d25ff Revert "Integrating watchdog as a sidecar to registry operator."
ffbb403ef (origin/pr/342) Integrating watchdog as a sidecar to registry operator.

But the CVO does not reconcile that property today.  That means that whatever the value was when the manifest was created would be preserved regardless of the value in future manifests.  We should start reconciling this property and probably audit for other missing pod properties, and then port that fix back probably as far as we can excepting end-of-life versions.

Spun out from bug 1857782.

Comment 3 W. Trevor King 2020-08-12 20:46:15 UTC
Test plan for this based on [1]:

1. Create a 4.1 cluster.
2. Confirm that 'oc -n openshift-image-registry get -o yaml cluster-image-registry-operator' does not include shareProcessNamespace.
3. Confirm that 'oc -n openshift-dns-operator get -o yaml dns-operator' does not include .spec.template.spec.terminationGracePeriodSeconds (it will include .spec.template.spec.containers[].terminationGracePeriodSeconds).
4. Update 4.2 -> 4.3 -> 4.4 -> 4.5.
5. Confirm that the registry deployment still lacks shareProcessNamespace and the DNS operator still has the mispositioned terminationGracePeriodSeconds.
6. Update to a 4.6 nightly with the fix.
7. Confirm that the registry deployment includes 'shareProcessNamespace: true' and DNS operator has the correctly-positioned terminationGracePeriodSeconds after the manifest change from [2] (both of which landed in 4.2 manifests but had no effect because of this CVO bug).

The only dnsPolicy consumer in 4.6.0-fc.0 is [3], which hasn't changed since 4.1.  So no easy testing ideas there, but also no downside if we've somehow botched it in this fix.  Standard regression testing should be sufficient.

[1]: https://bugzilla.redhat.com/show_bug.cgi?id=1857782#c9
[2]: https://github.com/openshift/cluster-dns-operator/pull/114
[3]: https://github.com/openshift/cluster-dns-operator/blame/4f21006b681c1165abc96173c8cc51ad1f50f90e/manifests/0000_70_dns-operator_02-deployment.yaml#L16

Comment 4 Wenjing Zheng 2020-08-13 12:12:54 UTC
No available payload to verify this bug till now.

Comment 5 Wenjing Zheng 2020-08-17 07:44:36 UTC
Since 4.1 and 4.2 is sunset and I tried to upgrade from 4.1 to 4.2->4.3->4.4->4.5->4.6, machine-config-operator pod cannot startup with below error:
$ oc logs pods/machine-config-operator-677c5786c8-zdvhn -n openshift-machine-config-operator
I0817 07:31:12.744568       1 start.go:46] Version: 4.6.0-0.nightly-2020-08-16-072105 (Raw: v4.6.0-202008130129.p0-dirty, Hash: b7f3c7043aa9e6a5ca4718f53e26a1db9c5716f6)
I0817 07:31:12.747468       1 leaderelection.go:242] attempting to acquire leader lease  openshift-machine-config-operator/machine-config...
I0817 07:33:10.566770       1 leaderelection.go:252] successfully acquired lease openshift-machine-config-operator/machine-config
I0817 07:33:11.183070       1 operator.go:270] Starting MachineConfigOperator
E0817 07:33:13.375289       1 runtime.go:78] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
goroutine 239 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic(0x18151e0, 0x2a37200)
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:74 +0xa3
k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:48 +0x82
panic(0x18151e0, 0x2a37200)
	/opt/rh/go-toolset-1.14/root/usr/lib/go-toolset-1.14-golang/src/runtime/panic.go:969 +0x166
github.com/openshift/machine-config-operator/lib/resourcemerge.ensureControllerConfigSpec(0xc00082f80f, 0xc000133798, 0xc000c7d5e0, 0xb, 0x0, 0x0, 0xc000c7d718, 0x3, 0xc000dfced0, 0x26, ...)
	/go/src/github.com/openshift/machine-config-operator/lib/resourcemerge/machineconfig.go:83 +0x19f
github.com/openshift/machine-config-operator/lib/resourcemerge.EnsureControllerConfig(0xc00082f80f, 0xc000133680, 0x16c3311, 0x10, 0xc000dfcf60, 0x24, 0xc000d7bc60, 0x19, 0x0, 0x0, ...)
	/go/src/github.com/openshift/machine-config-operator/lib/resourcemerge/machineconfig.go:19 +0xd4
github.com/openshift/machine-config-operator/lib/resourceapply.ApplyControllerConfig(0x7f10896bc890, 0xc000096a90, 0xc000133400, 0x7f10896bc890, 0xc000096a90, 0x5aba, 0x5b33)
	/go/src/github.com/openshift/machine-config-operator/lib/resourceapply/machineconfig.go:67 +0x185
github.com/openshift/machine-config-operator/pkg/operator.(*Operator).syncMachineConfigController(0xc000596000, 0xc000121880, 0xc01334abea, 0x6e164d6c948)
	/go/src/github.com/openshift/machine-config-operator/pkg/operator/sync.go:468 +0x438
github.com/openshift/machine-config-operator/pkg/operator.(*Operator).syncAll(0xc000596000, 0xc00082fca8, 0x6, 0x6, 0xc0007c2c01, 0x413893)
	/go/src/github.com/openshift/machine-config-operator/pkg/operator/sync.go:69 +0x177
github.com/openshift/machine-config-operator/pkg/operator.(*Operator).sync(0xc000596000, 0xc0007dac90, 0x30, 0x0, 0x0)
	/go/src/github.com/openshift/machine-config-operator/pkg/operator/operator.go:362 +0x40a
github.com/openshift/machine-config-operator/pkg/operator.(*Operator).processNextWorkItem(0xc000596000, 0x203000)
	/go/src/github.com/openshift/machine-config-operator/pkg/operator/operator.go:318 +0xd2
github.com/openshift/machine-config-operator/pkg/operator.(*Operator).worker(0xc000596000)
	/go/src/github.com/openshift/machine-config-operator/pkg/operator/operator.go:307 +0x2b
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0xc000428030)
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:155 +0x5f
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc000428030, 0x1cc05a0, 0xc000bc0000, 0xc000406001, 0xc000094180)
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:156 +0xa3
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc000428030, 0x3b9aca00, 0x0, 0x1, 0xc000094180)
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133 +0xe2
k8s.io/apimachinery/pkg/util/wait.Until(0xc000428030, 0x3b9aca00, 0xc000094180)
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:90 +0x4d
created by github.com/openshift/machine-config-operator/pkg/operator.(*Operator).Run
	/go/src/github.com/openshift/machine-config-operator/pkg/operator/operator.go:276 +0x3dc
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x1a8 pc=0x13ab99f]

goroutine 239 [running]:
k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:55 +0x105
panic(0x18151e0, 0x2a37200)
	/opt/rh/go-toolset-1.14/root/usr/lib/go-toolset-1.14-golang/src/runtime/panic.go:969 +0x166
github.com/openshift/machine-config-operator/lib/resourcemerge.ensureControllerConfigSpec(0xc00082f80f, 0xc000133798, 0xc000c7d5e0, 0xb, 0x0, 0x0, 0xc000c7d718, 0x3, 0xc000dfced0, 0x26, ...)
	/go/src/github.com/openshift/machine-config-operator/lib/resourcemerge/machineconfig.go:83 +0x19f
github.com/openshift/machine-config-operator/lib/resourcemerge.EnsureControllerConfig(0xc00082f80f, 0xc000133680, 0x16c3311, 0x10, 0xc000dfcf60, 0x24, 0xc000d7bc60, 0x19, 0x0, 0x0, ...)
	/go/src/github.com/openshift/machine-config-operator/lib/resourcemerge/machineconfig.go:19 +0xd4
github.com/openshift/machine-config-operator/lib/resourceapply.ApplyControllerConfig(0x7f10896bc890, 0xc000096a90, 0xc000133400, 0x7f10896bc890, 0xc000096a90, 0x5aba, 0x5b33)
	/go/src/github.com/openshift/machine-config-operator/lib/resourceapply/machineconfig.go:67 +0x185
github.com/openshift/machine-config-operator/pkg/operator.(*Operator).syncMachineConfigController(0xc000596000, 0xc000121880, 0xc01334abea, 0x6e164d6c948)
	/go/src/github.com/openshift/machine-config-operator/pkg/operator/sync.go:468 +0x438
github.com/openshift/machine-config-operator/pkg/operator.(*Operator).syncAll(0xc000596000, 0xc00082fca8, 0x6, 0x6, 0xc0007c2c01, 0x413893)
	/go/src/github.com/openshift/machine-config-operator/pkg/operator/sync.go:69 +0x177
github.com/openshift/machine-config-operator/pkg/operator.(*Operator).sync(0xc000596000, 0xc0007dac90, 0x30, 0x0, 0x0)
	/go/src/github.com/openshift/machine-config-operator/pkg/operator/operator.go:362 +0x40a
github.com/openshift/machine-config-operator/pkg/operator.(*Operator).processNextWorkItem(0xc000596000, 0x203000)
	/go/src/github.com/openshift/machine-config-operator/pkg/operator/operator.go:318 +0xd2
github.com/openshift/machine-config-operator/pkg/operator.(*Operator).worker(0xc000596000)
	/go/src/github.com/openshift/machine-config-operator/pkg/operator/operator.go:307 +0x2b
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0xc000428030)
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:155 +0x5f
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc000428030, 0x1cc05a0, 0xc000bc0000, 0xc000406001, 0xc000094180)
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:156 +0xa3
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc000428030, 0x3b9aca00, 0x0, 0x1, 0xc000094180)
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133 +0xe2
k8s.io/apimachinery/pkg/util/wait.Until(0xc000428030, 0x3b9aca00, 0xc000094180)
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:90 +0x4d
created by github.com/openshift/machine-config-operator/pkg/operator.(*Operator).Run
	/go/src/github.com/openshift/machine-config-operator/pkg/operator/operator.go:276 +0x3dc

Comment 6 Vadim Rutkovsky 2020-08-17 08:07:39 UTC
(In reply to Wenjing Zheng from comment #5)
> Since 4.1 and 4.2 is sunset and I tried to upgrade from 4.1 to
> 4.2->4.3->4.4->4.5->4.6, machine-config-operator pod cannot startup with
> below error:
...

That looks like a different issue. Did image-registry pod had spec.shareProcessNamespace set?

Comment 7 Wenjing Zheng 2020-08-17 08:23:38 UTC
comment #5 is about https://bugzilla.redhat.com/show_bug.cgi?id=1861404

Comment 8 Wenjing Zheng 2020-08-17 08:30:26 UTC
(In reply to Vadim Rutkovsky from comment #6)
> (In reply to Wenjing Zheng from comment #5)
> > Since 4.1 and 4.2 is sunset and I tried to upgrade from 4.1 to
> > 4.2->4.3->4.4->4.5->4.6, machine-config-operator pod cannot startup with
> > below error:
> ...
> 
> That looks like a different issue. Did image-registry pod had
> spec.shareProcessNamespace set?

After upgrade to 4.6 latest nightly build, image-registry pod has NO spec.shareProcessNamespace set with current cluster(machine-config remain in 4.5).

Comment 9 Hongan Li 2020-08-17 09:05:55 UTC
Checked the DNS operator settings in the same upgrade cluster (with machine-config-operator pod error) but seems the DNS operator has the correctly-positioned terminationGracePeriodSeconds and dnsPolicy: 

$ oc -n openshift-dns-operator get deploy/dns-operator -o go-template='{{.spec.template.spec.terminationGracePeriodSeconds}}'
2

$ oc -n openshift-dns-operator get deploy/dns-operator -o go-template='{{.spec.template.spec.dnsPolicy}}'
Default

Comment 10 W. Trevor King 2020-08-18 03:43:13 UTC
I am confused about the earlier verification attempt.  Bug 1861404 is talking about a 4.6.0-0.nightly-2020-07-25-091217 target, but that is quite old, long before the PR addressing this bug landed:

$ oc adm release info --commits registry.svc.ci.openshift.org/ocp/release:4.6.0-0.nightly-2020-07-25-091217 | grep cluster-version-operator
  cluster-version-operator                       https://github.com/openshift/cluster-version-operator                       a49fef5c66c6b0707c54fd93f84d2f51d3d28aca
$ git log --oneline origin/master | grep -n 'a49fef5c\|a03a8957'
2:a03a8957 Merge pull request #428 from vrutkovs/shareProcessNamespace
39:a49fef5c Merge pull request #411 from deads2k/emit-events-on-update

And there should be no need to involve nightlies for earlier 4.y, where we have official releases available.  You should be able to use:

$ for V in 4.1 4.2 4.3 4.4 4.5; do curl -sH 'Accept:application/json' "https://api.openshift.com/api/upgrades_info/v1/graph?channel=stable-${V}" | jq -r '.nodes[].version' | sort -V | tail -n1; done
4.1.41
4.2.36
4.3.31
4.4.16
4.5.5

and then a hop to a recent 4.6 nightly.  In fact, 4.6.0-fc.1 is modern enough to include the patch:

$ oc adm release info --commits quay.io/openshift-release-dev/ocp-release:4.6.0-fc.1-x86_64 | grep cluster-version-operator
  cluster-version-operator                       https://github.com/openshift/cluster-version-operator                       71aef74480d199fe96a590f2f1e4e8056a9cb687
$ git log --oneline origin/master | grep -n '71aef744\|a03a8957'
1:71aef744 Merge pull request #430 from wking/drop-available-deployment-check
2:a03a8957 Merge pull request #428 from vrutkovs/shareProcessNamespace

If you do not see the expected behavior in the next verification attempt, can you link a must-gather with the final state from this bug?

Comment 11 Wenjing Zheng 2020-08-18 13:50:37 UTC
I still cannot see the option "shareProcessNamespace" when cluster is upgraded to 4.6.0-0.nightly-2020-08-18-055142 (we can ignore bug https://bugzilla.redhat.com/show_bug.cgi?id=1861404, since it has not been fixed and will be existing in current latest 4.6 nightly build, it is just reported on 4.6.0-0.nightly-2020-07-25-091217);

Upgrade path: 4.1.41-x86_64->4.2.36-x86_64->4.3.31-x86_64->4.4.16-x86_64->4.5.5-x86_64->4.6.0-0.nightly-2020-08-18-055142


4.2-4.5 has "shareProcessNamespace": https://github.com/openshift/cluster-image-registry-operator/blob/release-4.5/manifests/07-operator.yaml#L20
4.6 has no "shareProcessNamespace":
https://github.com/openshift/cluster-image-registry-operator/blob/master/manifests/07-operator.yaml
https://github.com/openshift/cluster-image-registry-operator/pull/587

Comment 13 Kirsten Garrison 2020-08-18 17:45:36 UTC
Are we sure these nightlies are up to date? I know the name is but are the contents?

Comment 14 W. Trevor King 2020-08-18 22:19:57 UTC
(In reply to Wenjing Zheng from comment #11)
> 4.2-4.5 has "shareProcessNamespace"...

In your test cluster?  Or are you just talking about the source repositories?  I would expect the born-in-4.1 test cluster to lack shareProcessNamespace until it was updated to a release which had both the CVO patch from this bug and a manifest which requested shareProcessNamespace be set.

> https://github.com/openshift/cluster-image-registry-operator/pull/587

Huh, I hadn't realized that they'd removed it in 4.6.  I dunno if there are any 4.6 nightlies which have both our CVO patch from this bug landed (Aug. 12th and later [1]), but which still have shareProcessNamespace set (Aug. 5th and earlier [2]).  Seems unlikely.  I guess we could build a release image like that, if we wanted.

[1]: https://github.com/openshift/cluster-version-operator/pull/428#event-3649049638
[2]: https://github.com/openshift/cluster-image-registry-operator/pull/587#event-3626507070

Comment 15 W. Trevor King 2020-08-18 22:31:44 UTC
Ah, the 4.6 change means we can verify via a shorter update path that sticks to 4.6:

$ oc adm release info --commits quay.io/openshift-release-dev/ocp-release:4.6.0-fc.0-x86_64 | grep 'cluster-version-operator\|cluster-image-registry-operator'
  cluster-image-registry-operator                https://github.com/openshift/cluster-image-registry-operator                8eb457b2b93324c1954f5af439fb9c4612a93fc9
  cluster-version-operator                       https://github.com/openshift/cluster-version-operator                       d2fc678353769e10a614fb98c15279da3b2b0ca5
$ oc adm release info --commits quay.io/openshift-release-dev/ocp-release:4.6.0-fc.1-x86_64 | grep 'cluster-version-operator\|cluster-image-registry-operator'
  cluster-image-registry-operator                https://github.com/openshift/cluster-image-registry-operator                5eda706684c9ec69ae8be5745f0daf740fe947fe
  cluster-version-operator                       https://github.com/openshift/cluster-version-operator                       71aef74480d199fe96a590f2f1e4e8056a9cb687
$ git --no-pager -C cluster-image-registry-operator log --oneline --first-parent origin/master | grep '66bf2fe\|8eb457b2\|5eda7066'
5eda70668 Merge pull request #586 from ricardomaraschini/bz-1857684
66bf2feb3 Merge pull request #587 from ricardomaraschini/remove-shared-namespace
8eb457b2b Merge pull request #584 from dmage/ignore-invalid-refs
$ git --no-pager -C cluster-version-operator log --oneline --first-parent origin/master | grep 'a03a895\|d2fc6783\|71aef744'
71aef744 Merge pull request #430 from wking/drop-available-deployment-check
a03a8957 Merge pull request #428 from vrutkovs/shareProcessNamespace
d2fc6783 Merge pull request #423 from wking/clarify-currently-installed

So you should be able to validate with:

1. Install 4.6.0-fc.0.  Verify that shareProcessNamespace is set.
2. Update to 4.6.0-fc.1.  Verify that shareProcessNamespace is not set.

Comment 16 W. Trevor King 2020-08-19 01:04:24 UTC
Testing:

1. Launch 4.6.0-fc.0 with cluster-bot:

     launch quay.io/openshift-release-dev/ocp-release:4.6.0-fc.0-x86_64

2. Set a channel:

     $ oc patch clusterversion version --type json -p '[{"op": "add", "path": "/spec/channel", "value": "candidate-4.6"}]'

3. Check that the property is set:

     $ oc -n openshift-image-registry get -o jsonpath='{.spec.template.spec.shareProcessNamespace}{"\n"}' deployment cluster-image-registry-operator
     true

4. Update to 4.6.0-fc.1:

     $ oc adm upgrade --to 4.6.0-fc.1
     $ sleep # wait for the update to complete


5. Check that the property is not set:

     $ oc -n openshift-image-registry get -o jsonpath='{.spec.template.spec.shareProcessNamespace}{"\n"}' deployment cluster-image-registry-operator
     true

Huh.  Ah, because setBoolPtr is treating "unset in the manifest" as "operator does not care", not "return to the Kube default for this property" [1].  I'll round with the registry folks about that distinction.  Logs and artifacts and whatnot for my run will be in [2] once cluster-bot times the job out and collects them.

[1]: https://github.com/openshift/cluster-version-operator/blob/47d87e1083cbc6921e0485a1c71eb91525ae5d4f/lib/resourcemerge/core.go#L539-L540
[2]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-launch-gcp/1295851130839896064

Comment 17 Wenjing Zheng 2020-08-19 07:43:55 UTC
(In reply to W. Trevor King from comment #16)
> Testing:
> 
> 4. Update to 4.6.0-fc.1:
> 
>      $ oc adm upgrade --to 4.6.0-fc.1
>      $ sleep # wait for the update to complete
> 

I am stucking at step #4, cannot upgrde to 4.6.0-fc.1. Here are some information:
$ oc adm upgrade
Cluster version is 4.6.0-fc.0
Updates:
VERSION  IMAGE
4.6.0-fc.1 quay.io/openshift-release-dev/ocp-release@sha256:b0fcdaaac358ad352bb4a948ac1f88ad728c4b9b044c13a9e1294706d643dc7c
$ oc adm upgrade --to=4.6.0-fc.1
Updating to 4.6.0-fc.1
$ oc get clusterversion
NAME   VERSION   AVAILABLE  PROGRESSING  SINCE  STATUS
version  4.6.0-fc.0  True    False     121m  Cluster version is 4.6.0-fc.0
$ oc get all -n openshift-cluster-version
W0819 15:38:53.829116    5731 warnings.go:67] batch/v1beta1 CronJob is deprecated in v1.22+, unavailable in v1.25+
NAME                                            READY   STATUS    RESTARTS   AGE
pod/cluster-version-operator-77555b6fd9-g86w5   1/1     Running   0          4h2m

NAME                               TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)    AGE
service/cluster-version-operator   ClusterIP   172.30.48.43   <none>        9099/TCP   4h29m

NAME                                       READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/cluster-version-operator   1/1     1            1           4h29m

NAME                                                  DESIRED   CURRENT   READY   AGE
replicaset.apps/cluster-version-operator-5cf8c489d9   0         0         0       4h29m
replicaset.apps/cluster-version-operator-77555b6fd9   1         1         1       4h29m
$ oc get clusterversion -o json|jq -r '.items[].spec'
{
  "channel": "candidate-4.6",
  "clusterID": "fd5470e8-1ab1-4350-9334-ff097c9d2364",
  "upstream": "https://api.openshift.com/api/upgrades_info/v1/graph"
}
$ oc version
Client Version: 4.6.0-fc.1
Server Version: 4.6.0-fc.0

Comment 18 Wenjing Zheng 2020-08-19 07:46:16 UTC
(In reply to W. Trevor King from comment #14)
> (In reply to Wenjing Zheng from comment #11)
> > 4.2-4.5 has "shareProcessNamespace"...
> 
> In your test cluster?  Or are you just talking about the source
> repositories?  I would expect the born-in-4.1 test cluster to lack
> shareProcessNamespace until it was updated to a release which had both the
> CVO patch from this bug and a manifest which requested shareProcessNamespace
> be set.
> 
My cluster always has no "shareProcessNamespace" as your expectation, I am saying it has in source repo. Sorry for confusing you!

Comment 19 Wenjing Zheng 2020-08-19 08:33:24 UTC
Correction for the output of command $oc get clusterversion -o json|jq -r '.items[].spec', the output should be
{
  "channel": "candidate-4.6",
  "clusterID": "fd5470e8-1ab1-4350-9334-ff097c9d2364",
  "desiredUpdate": {
    "force": false,
    "image": "quay.io/openshift-release-dev/ocp-release@sha256:b0fcdaaac358ad352bb4a948ac1f88ad728c4b9b044c13a9e1294706d643dc7c",
    "version": "4.6.0-fc.1"
  },
  "upstream": "https://api.openshift.com/api/upgrades_info/v1/graph"
}

Comment 20 W. Trevor King 2020-08-19 22:51:58 UTC
> Ah, because setBoolPtr is treating "unset in the manifest" as "operator does not care", not "return to the Kube default for this property"...

Registry folks have [1] in flight with an explicit 'false'.

> I am stucking at step #4, cannot upgrde to 4.6.0-fc.1
> ...
>   "clusterID": "fd5470e8-1ab1-4350-9334-ff097c9d2364",

Huh.  Pulling more of the ClusterVersion from the final Insights tarball from that cluster:

$ tar -xOz config/version.json <20200819071410-807d045c114f47899b5ea002f7c1a7aa | jq '{spec, status}'
{
  "spec": {
    "clusterID": "fd5470e8-1ab1-4350-9334-ff097c9d2364",
    "desiredUpdate": {
      "version": "4.6.0-fc.1",
      "image": "quay.io/openshift-release-dev/ocp-release@sha256:b0fcdaaac358ad352bb4a948ac1f88ad728c4b9b044c13a9e1294706d643dc7c",
      "force": false
    },
    "upstream": "xxxxx://xxx.xxxxxxxxx.xxx/xxx/xxxxxxxxxxxxx/xx/xxxxx",
    "channel": "candidate-4.6"
  },
  "status": {
    "desired": {
      "version": "4.6.0-fc.0",
      "image": "quay.io/openshift-release-dev/ocp-release@sha256:45e6bc583040384efb4033b22c58f054b12ac32c7874554885d74a0faf6fef79",
      "force": false
    },
    "history": [
      {
        "state": "Completed",
        "startedTime": "2020-08-19T03:09:41Z",
        "completionTime": "2020-08-19T03:34:06Z",
        "version": "4.6.0-fc.0",
        "image": "quay.io/openshift-release-dev/ocp-release@sha256:45e6bc583040384efb4033b22c58f054b12ac32c7874554885d74a0faf6fef79",
        "verified": false
      }
    ],
    "observedGeneration": 2,
    "versionHash": "kVdi1UOYMBM=",
    "conditions": [
      {
        "type": "Available",
        "status": "True",
        "lastTransitionTime": "2020-08-19T03:34:06Z",
        "message": "Done applying 4.6.0-fc.0"
      },
      {
        "type": "Failing",
        "status": "False",
        "lastTransitionTime": "2020-08-19T03:34:06Z"
      },
      {
        "type": "Progressing",
        "status": "False",
        "lastTransitionTime": "2020-08-19T03:34:06Z",
        "message": "Cluster version is 4.6.0-fc.0"
      },
      {
        "type": "RetrievedUpdates",
        "status": "True",
        "lastTransitionTime": "2020-08-19T05:29:32Z"
      }
    ],
    "availableUpdates": [
      {
        "version": "4.6.0-fc.1",
        "image": "quay.io/openshift-release-dev/ocp-release@sha256:b0fcdaaac358ad352bb4a948ac1f88ad728c4b9b044c13a9e1294706d643dc7c",
        "force": false
      }
    ]
  }
}

Not clear to me why the CVO is neither accepting the requested desiredUpdate nor complaining with a condition about why it isn't accepting it.  If you can reproduce, can you attach CVO logs from your stuck cluster?

[1]: https://github.com/openshift/cluster-image-registry-operator/pull/591

Comment 21 Wenjing Zheng 2020-08-20 03:10:43 UTC
-bash-4.2$ ./oc adm upgrade
Cluster version is 4.6.0-fc.0

Updates:

VERSION    IMAGE
4.6.0-fc.1 quay.io/openshift-release-dev/ocp-release@sha256:b0fcdaaac358ad352bb4a948ac1f88ad728c4b9b044c13a9e1294706d643dc7c
-bash-4.2$ ./oc get clusterversion -o json|jq -r '.items[].spec'
{
  "channel": "candidate-4.6",
  "clusterID": "8882329b-b6b6-4752-a560-915775f4b1b4",
  "upstream": "https://api.openshift.com/api/upgrades_info/v1/graph"
}
-bash-4.2$ ./oc adm upgrade --to 4.6.0-fc.1
Updating to 4.6.0-fc.1
-bash-4.2$ ./oc get clusterversion -o json|jq -r '.items[].spec'
{
  "channel": "candidate-4.6",
  "clusterID": "8882329b-b6b6-4752-a560-915775f4b1b4",
  "desiredUpdate": {
    "force": false,
    "image": "quay.io/openshift-release-dev/ocp-release@sha256:b0fcdaaac358ad352bb4a948ac1f88ad728c4b9b044c13a9e1294706d643dc7c",
    "version": "4.6.0-fc.1"
  },
  "upstream": "https://api.openshift.com/api/upgrades_info/v1/graph"
}

Comment 22 Wenjing Zheng 2020-08-20 03:46:20 UTC
Upgrade to 4.6.0-fc.1 is successful now, QE will wait for a nightly-with-below-pr to have more confidence. Thanks for your support,@wking

[1] https://github.com/openshift/cluster-image-registry-operator/pull/591

Comment 23 W. Trevor King 2020-08-21 13:54:01 UTC
4.6.0-0.nightly-2020-08-21-084833 and later have the explicit false.

Comment 24 Wenjing Zheng 2020-08-24 10:26:16 UTC
Upgrade from 4.6.0-fc.0 to 4.6.0-0.nightly-2020-08-24-034934:
$ oc -n openshift-image-registry get -o jsonpath='{.spec.template.spec.shareProcessNamespace}{"\n"}' deployment cluster-image-registry-operator
true
$ oc adm upgrade --to-image=registry.svc.ci.openshift.org/ocp/release:4.6.0-0.nightly-2020-08-24-034934 --force=true --allow-explicit-upgrade
wait..
$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.6.0-0.nightly-2020-08-24-034934   True        False         60s     Cluster version is 4.6.0-0.nightly-2020-08-24-034934
$ oc -n openshift-image-registry get -o jsonpath='{.spec.template.spec.shareProcessNamespace}{"\n"}' deployment cluster-image-registry-operator
false

Comment 25 Oleg Bulatov 2020-09-03 23:04:37 UTC
*** Bug 1857782 has been marked as a duplicate of this bug. ***

Comment 27 errata-xmlrpc 2020-10-27 16:25:04 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196


Note You need to log in before you can comment on or make changes to this bug.