Bug 1847368 - API server crash loop on ARO
Summary: API server crash loop on ARO
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: kube-apiserver
Version: 4.3.z
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 4.5.0
Assignee: Alberto
QA Contact: Ke Wang
URL:
Whiteboard:
Depends On: 1847185
Blocks: 1847419
TreeView+ depends on / blocked
 
Reported: 2020-06-16 09:47 UTC by Alberto
Modified: 2020-07-13 17:44 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1847185
: 1847419 (view as bug list)
Environment:
Last Closed: 2020-07-13 17:44:18 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Github kubernetes kubernetes pull 92166 None closed fix: GetLabelsForVolume panic issue for azure disk PV 2020-08-07 09:05:16 UTC
Github openshift origin pull 25158 None closed Bug 1847368: [release-4.5]: UPSTREAM: 92166: fix: GetLabelsForVolume panic issue for azure disk PV 2020-08-07 09:05:16 UTC
Red Hat Product Errata RHBA-2020:2409 None None None 2020-07-13 17:44:40 UTC

Comment 1 Stefan Schimanski 2020-06-16 13:54:25 UTC
This is probably https://github.com/kubernetes/kubernetes/pull/92166.

Comment 2 Stefan Schimanski 2020-06-17 10:11:16 UTC
Talked to Alberto (https://coreos.slack.com/archives/CB48XQ4KZ/p1592388418137000?thread_ts=1592315769.107100&cid=CB48XQ4KZ). This is no regression in 4.5, but was preexisting. Moving out of blocker list.

Comment 5 Ke Wang 2020-06-23 10:18:57 UTC
Verified with OCP build 4.5.0-0.nightly-2020-06-23-052343, steps see below,

- Creating one sc and pvc on non-zoned region,
$ cat sc-non-zoned.yaml 
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  annotations:
  labels:
    kubernetes.io/cluster-service: "true"
  name: managed-premium-nonzoned
parameters:
  kind: Managed
  storageaccounttype: Premium_LRS
  zoned: "false"
provisioner: kubernetes.io/azure-disk
volumeBindingMode: WaitForFirstConsumer

$ oc apply -f sc-non-zoned.yaml
storageclass.storage.k8s.io/managed-premium-nonzoned created

$ oc get sc
NAME                        PROVISIONER                RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
managed-premium (default)   kubernetes.io/azure-disk   Delete          WaitForFirstConsumer   true                   76m
managed-premium-nonzoned    kubernetes.io/azure-disk   Delete          WaitForFirstConsumer   false                  36m

$ cat pvc-nonzoned.yaml 
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: azure-managed-non
spec:
  accessModes:
  - ReadWriteOnce
  storageClassName: managed-premium-nonzoned
  resources:
    requests:
      storage: 5Gi


$ oc apply -f pvc-non-zoned.yaml      
persistentvolumeclaim/azure-managed-non created

$ oc get pvc
NAME                STATUS    VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS               AGE
azure-managed-non   Pending                                      managed-premium-nonzoned   76s

$ cat mypod-non-zoned.yaml 
kind: Pod
apiVersion: v1
metadata:
  name: mypod
spec:
  containers:
  - name: mypod
    image: nginx:1.15.5
    resources:
      requests:
        cpu: 100m
        memory: 128Mi
      limits:
        cpu: 250m
        memory: 256Mi
    volumeMounts:
    - mountPath: "/mnt/azure"
      name: volume
  volumes:
    - name: volume
      persistentVolumeClaim:
        claimName: azure-managed-non

$ oc create -f mypod-non-zoned.yaml 
pod/mypod created



Checked the created pod status,
$ oc get pod/mypod
NAME    READY   STATUS    RESTARTS   AGE
mypod   1/1     Running   0          27s

$ oc describe pod/mypod
Name:         mypod
Namespace:    default
...
Status:       Running
...
Volumes:
  volume:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  azure-managed-non
    ReadOnly:   false
  default-token-rjg64:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-rjg64
    Optional:    false
...

- Creating one sc and pvc on zoned region,
Since one default zoned sc already existed, no need new one.
$ oc get sc
NAME                        PROVISIONER                RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
managed-premium (default)   kubernetes.io/azure-disk   Delete          WaitForFirstConsumer   true                   76m

$ cat pvc-zoned.yaml 
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: azure-managed-disk
spec:
  accessModes:
  - ReadWriteOnce
  storageClassName: managed-premium
  resources:
    requests:
      storage: 5Gi
      

$ oc apply -f pvc-zoned.yaml 
persistentvolumeclaim/azure-managed-disk created

$ oc get pvc
NAME                 STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS               AGE
azure-managed-disk   Bound    pvc-f86422fb-576f-455a-98d4-f864b4a7bf6f   5Gi        RWO            managed-premium            17m
...

$ cat mypod-zoned.yaml 
kind: Pod
apiVersion: v1
metadata:
  name: mypod1
spec:
  containers:
  - name: mypod1
    image: nginx:1.15.5
    resources:
      requests:
        cpu: 100m
        memory: 128Mi
      limits:
        cpu: 250m
        memory: 256Mi
    volumeMounts:
    - mountPath: "/mnt/azure"
      name: volume
  volumes:
    - name: volume
      persistentVolumeClaim:
        claimName: azure-managed-disk
        
$ oc apply -f mypod-zoned.yaml 
pod/mypod1 created

$ oc get pods
NAME     READY   STATUS    RESTARTS   AGE
mypod    1/1     Running   0          35m
mypod1   1/1     Running   0          19m

$ oc get pods -A | grep -E 'apiserver|NAME' | grep -vE 'installer|revision|catalog'
NAMESPACE                                          NAME                                                         READY   STATUS      RESTARTS   AGE
openshift-apiserver-operator                       openshift-apiserver-operator-9c88c497-gwtpm                  1/1     Running     2          91m
openshift-apiserver                                apiserver-dfd78fb66-9f9bf                                    1/1     Running     0          80m
openshift-apiserver                                apiserver-dfd78fb66-cdz5c                                    1/1     Running     0          79m
openshift-apiserver                                apiserver-dfd78fb66-ntxgv                                    1/1     Running     0          81m
openshift-kube-apiserver-operator                  kube-apiserver-operator-6fc9948f46-sxqhb                     1/1     Running     2          91m
openshift-kube-apiserver                           kube-apiserver-kewang23azure51-5whj2-master-0                4/4     Running     0          66m
openshift-kube-apiserver                           kube-apiserver-kewang23azure51-5whj2-master-1                4/4     Running     0          70m
openshift-kube-apiserver                           kube-apiserver-kewang23azure51-5whj2-master-2                4/4     Running     0          69m

From above test results, it doesn't matter if sc and pvc are created on zoned or non-zoned, the kube-apiservers won't crash, move the bug verified.

Comment 6 errata-xmlrpc 2020-07-13 17:44:18 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409


Note You need to log in before you can comment on or make changes to this bug.