Bug 2090274

Summary: when downgrade from ocp4.11 to ocp4.10 the openshift-controller-manager-operator pod will CrashLoopBackOff
Product: OpenShift Container Platform Reporter: zhou ying <yinzhou>
Component: openshift-controller-managerAssignee: jawed <jkhelil>
openshift-controller-manager sub component: controller-manager QA Contact: Jitendar Singh <jitsingh>
Status: CLOSED NOTABUG Docs Contact:
Severity: medium    
Priority: high CC: cdaley, jitsingh, jkhelil, talessio
Version: 4.10Keywords: TestBlocker
Target Milestone: ---   
Target Release: 4.12.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: devex
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-01-24 13:40:33 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description zhou ying 2022-05-25 13:21:07 UTC
Description of problem:
when downgrade  from ocp4.11 to ocp4.10 the openshift-controller-manager-operator pod will CrashLoopBackOff

Version-Release number of selected component (if applicable):


How reproducible:
always

Steps to Reproduce:
1. Launch the ocp4.10 cluster; 
2. upgrade to ocp4.11;
3. Update the nodes.config cluster :Set the WorkerLatencyProfile as "MediumUpdateAverageReaction":
oc patch nodes.config cluster --type='json' -p='[{"op": "add", "path": "/spec/workerLatencyProfile", "value":"MediumUpdateAverageReaction"}]'
4. Downgrade the ocp4.11 to ocp4.10

Actual results:
4. The downgrade will failed with openshift-controller-manager-operator pod  CrashLoopBackOff:
oc describe pod/openshift-controller-manager-operator-6cc779df49-jjtbg
Name:                 openshift-controller-manager-operator-6cc779df49-jjtbg
Namespace:            openshift-controller-manager-operator
Priority:             2000000000
Priority Class Name:  system-cluster-critical
Node:                 ip-10-0-191-50.ap-southeast-1.compute.internal/10.0.191.50
Start Time:           Wed, 25 May 2022 20:38:41 +0800
Labels:               app=openshift-controller-manager-operator
                      pod-template-hash=6cc779df49
Annotations:          k8s.v1.cni.cncf.io/network-status:
                        [{
                            "name": "openshift-sdn",
                            "interface": "eth0",
                            "ips": [
                                "10.130.0.55"
                            ],
                            "default": true,
                            "dns": {}
                        }]
                      k8s.v1.cni.cncf.io/networks-status:
                        [{
                            "name": "openshift-sdn",
                            "interface": "eth0",
                            "ips": [
                                "10.130.0.55"
                            ],
                            "default": true,
                            "dns": {}
                        }]
                      openshift.io/scc: anyuid
Status:               Running
IP:                   10.130.0.55
IPs:
  IP:           10.130.0.55
Controlled By:  ReplicaSet/openshift-controller-manager-operator-6cc779df49
Containers:
  openshift-controller-manager-operator:
    Container ID:  cri-o://abdd964b5d5973f96a020bf8c0acc42202e57b8af5557439f6675eba09ca2c26
    Image:         quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:5ed785112aa91b0e1ef2a68949b672f0240649a81829a47cd64ea8183cf8a3e7
    Image ID:      quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:5ed785112aa91b0e1ef2a68949b672f0240649a81829a47cd64ea8183cf8a3e7
    Port:          8443/TCP
    Host Port:     0/TCP
    Command:
      cluster-openshift-controller-manager-operator
      operator
    Args:
      --config=/var/run/configmaps/config/config.yaml
      -v=4
    State:       Waiting
      Reason:    CrashLoopBackOff
    Last State:  Terminated
      Reason:    Error
      Message:   it.JitterUntil(0x0, 0x12a05f200, 0x0, 0x5, 0xc000071fd0)
                 k8s.io/apimachinery.0/pkg/util/wait/wait.go:133 +0x89
k8s.io/apimachinery/pkg/util/wait.Until(...)
  k8s.io/apimachinery.0/pkg/util/wait/wait.go:90
k8s.io/apimachinery/pkg/util/wait.Forever(0x0, 0xc0007a4ba0)
  k8s.io/apimachinery.0/pkg/util/wait/wait.go:81 +0x28
created by k8s.io/component-base/logs.InitLogs
  k8s.io/component-base.0/logs/logs.go:179 +0x85

goroutine 78 [select]:
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc0001138c0, {0x2715700, 0xc000804540}, 0x1, 0xc000116300)
  k8s.io/apimachinery.0/pkg/util/wait/wait.go:167 +0x13b
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x742246, 0x12a05f200, 0x0, 0x70, 0x2329000)
  k8s.io/apimachinery.0/pkg/util/wait/wait.go:133 +0x89
k8s.io/apimachinery/pkg/util/wait.Until(...)
  k8s.io/apimachinery.0/pkg/util/wait/wait.go:90
k8s.io/apimachinery/pkg/util/wait.Forever(0x4cce7a, 0xc00048e780)
  k8s.io/apimachinery.0/pkg/util/wait/wait.go:81 +0x28
created by k8s.io/component-base/logs.InitLogs
  k8s.io/component-base.0/logs/logs.go:179 +0x85

goroutine 80 [syscall]:
os/signal.signal_recv()
  runtime/sigqueue.go:169 +0x98
os/signal.loop()
  os/signal/signal_unix.go:24 +0x19
created by os/signal.Notify.func1.1
  os/signal/signal.go:151 +0x2c

goroutine 86 [runnable]:
k8s.io/apiserver/pkg/server.SetupSignalContext.func1()
  k8s.io/apiserver.0/pkg/server/signal.go:47
created by k8s.io/apiserver/pkg/server.SetupSignalContext
  k8s.io/apiserver.0/pkg/server/signal.go:47 +0xe7

goroutine 87 [runnable]:
github.com/openshift/library-go/pkg/controller/controllercmd.(*ControllerCommandConfig).NewCommandWithContext.func1.1()
  github.com/openshift/library-go.0-20211220195323-eca2c467c492/pkg/controller/controllercmd/cmd.go:94
created by github.com/openshift/library-go/pkg/controller/controllercmd.(*ControllerCommandConfig).NewCommandWithContext.func1
  github.com/openshift/library-go.0-20211220195323-eca2c467c492/pkg/controller/controllercmd/cmd.go:94 +0x1d4

      Exit Code:    255
      Started:      Wed, 25 May 2022 21:04:38 +0800
      Finished:     Wed, 25 May 2022 21:04:38 +0800


Expected results:
4. no error 

Additional info:

Comment 7 jawed 2022-08-05 11:47:26 UTC
@cdacdaycorey After diff between 4.10 and 4.11 deployment of controller-manager-operator i can see that the following cojfiguration is in 4.11 but not in 4.10
 securityContext:
          runAsNonRoot: true
          runAsUser: 65534
          seccompProfile:
            type: RuntimeDefault

it is coming from this commit https://github.com/openshift/cluster-openshift-controller-manager-operator/commit/718fce194896f5e96e10f45c79c68640d1e1caf9

when I look to other operator in openshift, they have this scc configuration too and this changes happens when going to 4.11, but they behave correctly  after downgrading
I think we need to identify what processes are run during a downgrade, there might be a ncessary check/delete before deploying it

Unfortunately I am not able to identify this