Bug 1942552 - OCP update from 4.7.1 to 4.7.2 hangs during openshift-apiserver CO update
Summary: OCP update from 4.7.1 to 4.7.2 hangs during openshift-apiserver CO update
Keywords:
Status: CLOSED DUPLICATE of bug 1942725
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: openshift-apiserver
Version: 4.7
Hardware: s390x
OS: Unspecified
unspecified
medium
Target Milestone: ---
: ---
Assignee: Standa Laznicka
QA Contact: Xingxing Xia
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-03-24 14:30 UTC by Romain P
Modified: 2021-04-15 11:45 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-04-15 11:45:50 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
describe pod apiserver-658846ccbd-8pqrb (11.14 KB, text/plain)
2021-03-24 14:30 UTC, Romain P
no flags Details

Description Romain P 2021-03-24 14:30:17 UTC
Created attachment 1765948 [details]
describe pod apiserver-658846ccbd-8pqrb

Description of problem:
I started an OCP update from 4.7.1 to 4.7.2.
Update hangs during openshift-apiserver ClusterOperators upgrade (7 of 31 / 23%):
"APIServerDeploymentDegraded: 1 of 3 requested instances are unavailable for apiserver.openshift-apiserver (2 crashlooping containers are waiting in apiserver-658846ccbd-8pqrb pod)"
One pod of openshift-apiserver failed "CrashLoopBackOff". See output below

Version-Release number of selected component (if applicable):

How reproducible:
Don't have another cluster to do the same test

Steps to Reproduce:
1. Start OCP update from 4.7.1 to 4.7.2

Actual results:
Update hangs forever.
Cannot update the openshift-apiserver CO properly.
Openshift-apiserver in degraded state

Expected results:
Update successful, OCP in version 4.7.2

Additional info:
oc get clusteroperators
NAME                                       VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                             4.7.1     True        False         False      8d
baremetal                                  4.7.1     True        False         False      15d
cloud-credential                           4.7.1     True        False         False      15d
cluster-autoscaler                         4.7.1     True        False         False      15d
config-operator                            4.7.2     True        False         False      15d
console                                    4.7.1     True        False         False      13d
csi-snapshot-controller                    4.7.1     True        False         False      13d
dns                                        4.7.1     True        False         False      15d
etcd                                       4.7.2     True        False         False      15d
image-registry                             4.7.1     True        False         False      13d
ingress                                    4.7.1     True        False         False      15d
insights                                   4.7.1     True        False         False      15d
kube-apiserver                             4.7.2     True        False         False      15d
kube-controller-manager                    4.7.2     True        False         False      15d
kube-scheduler                             4.7.2     True        False         False      15d
kube-storage-version-migrator              4.7.1     True        False         False      13d
machine-api                                4.7.2     True        False         False      15d
machine-approver                           4.7.1     True        False         False      15d
machine-config                             4.7.1     True        False         False      13d
marketplace                                4.7.1     True        False         False      13d
monitoring                                 4.7.1     True        False         False      15d
network                                    4.7.1     True        False         False      15d
node-tuning                                4.7.1     True        False         False      13d
openshift-apiserver                        4.7.2     True        False         True       13d
openshift-controller-manager               4.7.1     True        False         False      13d
openshift-samples                          4.7.1     True        False         False      13d
operator-lifecycle-manager                 4.7.1     True        False         False      15d
operator-lifecycle-manager-catalog         4.7.1     True        False         False      15d
operator-lifecycle-manager-packageserver   4.7.1     True        False         False      13d
service-ca                                 4.7.1     True        False         False      15d
storage                                    4.7.1     True        False         False      15d

oc get pods -n openshift-apiserver
NAME                         READY   STATUS             RESTARTS   AGE
apiserver-575f7f5c59-l4jbr   2/2     Running            0          13d
apiserver-575f7f5c59-qz6wc   2/2     Running            0          13d
apiserver-658846ccbd-8pqrb   0/2     CrashLoopBackOff   106        4h6m

oc logs apiserver-658846ccbd-8pqrb -n openshift-apiserver openshift-apiserver
Copying system trust bundle
cp: cannot remove '/etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem': Read-only file system

I attached the output of "oc describe pod apiserver-658846ccbd-8pqrb -n openshift-apiserver openshift-apiserver"

Comment 1 Stefan Schimanski 2021-03-25 08:40:35 UTC
Is this connected to https://bugzilla.redhat.com/show_bug.cgi?id=1942725, i.e. StackRox installation?

Comment 2 Romain P 2021-03-25 15:35:24 UTC
(In reply to Stefan Schimanski from comment #1)
> Is this connected to https://bugzilla.redhat.com/show_bug.cgi?id=1942725,
> i.e. StackRox installation?

Hi Stefan,
I didn't install StackRox but yes it was related to a wrong SCC in openshift-apiserver as describe in your link.

The failing pod returned this SCC: openshift.io/scc: logging-elk-filebeat-ds
As the working pods returned: openshift.io/scc: node-exporter

But I really don't know why this SCC was applied to this pod.
We were able to manually bypass this and to complete the upgrade to 7.2.

Unfortunately I don't have another cluster to test again the update 4.7.1 => 4.7.2 to try to reproduce the issue.

Comment 3 Standa Laznicka 2021-04-15 11:45:50 UTC
I can see this has the same symptoms as the referenced Stackrox BZ (read-only root FS for the openshift-apiserver pod), going to close as a duplicate as the fix is the same for both the cases.

*** This bug has been marked as a duplicate of bug 1942725 ***


Note You need to log in before you can comment on or make changes to this bug.