Bug 1869479 - [OCP v46] The Compliance Operator goes in a Pending state on vSphere cluster due to thin provision failure
Summary: [OCP v46] The Compliance Operator goes in a Pending state on vSphere cluster ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Compliance Operator
Version: 4.6
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.6.0
Assignee: Juan Antonio Osorio
QA Contact: Prashant Dhamdhere
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-08-18 05:04 UTC by Prashant Dhamdhere
Modified: 2020-10-27 16:28 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-10-27 16:28:36 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift compliance-operator pull 404 0 None closed Make access modes and storage class configurable for PVs 2020-10-08 11:34:55 UTC
Red Hat Product Errata RHBA-2020:4196 0 None None None 2020-10-27 16:28:53 UTC

Description Prashant Dhamdhere 2020-08-18 05:04:48 UTC
Description of problem: 

The Compliance Operator goes in the Pending state on the vSphere cluster due to thin provision failure which is the default 
storage class of vSphere.

$ oc get pods 
NAME                                               READY   STATUS     RESTARTS   AGE 
compliance-operator-869646dd4f-dnqjv               1/1     Running    0          16m 
compliance-operator-869646dd4f-f7lv2               1/1     Running    0          16m 
compliance-operator-869646dd4f-l657h               1/1     Running    0          16m 
ocp4-pp-f478f5897-2tgxq                            1/1     Running    0          14m 
rhcos4-pp-d8ddb8fdc-lzfr6                          1/1     Running    0          14m 
worker-scan-jimaqeci-7287-gqntb-worker-dj8gj-pod   1/2     NotReady   0          10m 
worker-scan-jimaqeci-7287-gqntb-worker-hmnpd-pod   1/2     NotReady   0          10m 
worker-scan-rs-66564cb46-nnchw                     0/1     Pending    0          10m   <<----- 

$ oc describe pod worker-scan-rs-66564cb46-nnchw |tail -2 
  Warning  FailedScheduling  <unknown>        0/5 nodes are available: 5 pod has unbound immediate PersistentVolumeClaims. 
  Warning  FailedScheduling  <unknown>        0/5 nodes are available: 5 pod has unbound immediate PersistentVolumeClaims. 


$ oc get pvc 
NAME          STATUS    VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS   AGE 
worker-scan   Pending                                      thin           11m 


$ oc describe pvc worker-scan 
Name:          worker-scan 
Namespace:     openshift-compliance 
StorageClass:  thin 
Status:        Pending             <<----- 
Volume:         
Labels:        complianceScan=worker-scan 
Annotations:   volume.beta.kubernetes.io/storage-provisioner: kubernetes.io/vsphere-volume 
Finalizers:    [kubernetes.io/pvc-protection] 
Capacity:       
Access Modes:   
VolumeMode:    Filesystem 
Mounted By:    worker-scan-rs-66564cb46-nnchw 
Events:        <none> 


$ oc get sc 
NAME             PROVISIONER                    RECLAIMPOLICY   VOLUMEBINDINGMODE   ALLOWVOLUMEEXPANSION   AGE 
thin (default)   kubernetes.io/vsphere-volume   Delete          Immediate           false                  76m 

$ oc describe sc thin 
Name:                  thin 
IsDefaultClass:        Yes 
Annotations:           storageclass.kubernetes.io/is-default-class=true 
Provisioner:           kubernetes.io/vsphere-volume 
Parameters:            diskformat=thin 
AllowVolumeExpansion:  <unset> 
MountOptions:          <none> 
ReclaimPolicy:         Delete 
VolumeBindingMode:     Immediate 
Events:                <none> 


Version-Release number of selected component (if applicable): 

4.6.0-0.nightly-2020-08-17-225638 

How reproducible: 

Always 

Steps to Reproduce: 

1 clone compliance-operator git repo  

$ git clone https://github.com/openshift/compliance-operator.git  

2 Create 'openshift-compliance' namespace  

$ oc create -f compliance-operator/deploy/ns.yaml    

3 Switch to 'openshift-compliance' namespace  

$ oc project openshift-compliance  

4 Deploy CustomResourceDefinition.  

$ for f in $(ls -1 compliance-operator/deploy/crds/*crd.yaml); do oc create -f $f; done  

5. Deploy compliance-operator.  

$ oc create -f compliance-operator/deploy/  

6. Deploy ComplianceSuite CR to perform a scan 

$ oc create -f - <<EOF 
apiVersion: compliance.openshift.io/v1alpha1 
kind: ComplianceSuite 
metadata: 
  name: example-compliancesuite 
spec: 
  autoApplyRemediations: false 
  schedule: "0 1 * * *" 
  scans: 
    - name: worker-scan 
      profile: xccdf_org.ssgproject.content_profile_moderate 
      content: ssg-rhcos4-ds.xml 
      contentImage: quay.io/complianceascode/ocp4:latest 
      debug: true 
      nodeSelector: 
        node-role.kubernetes.io/worker: "" 
EOF 

7. Monitor pod status, the worker-scan-rs pod goes in a pending state 

$ oc get pods -w 

8. Describe pvc and check status, it shows Pending 

$ oc get pvc 
$ oc describe pvc worker-scan 


Actual results: 

The Compliance Operator goes in the Pending state on the vSphere cluster due to thin provision failure and the scan gets stuck. 

Expected results: 

The Compliance Operator should not go in the Pending state on the vSphere cluster and also the default storage class should be 
configurable from compliance operator to scan store result

Additional info: 

The Compliance Operator goes in the Pending state on vSphere cluster due to BZ 1863009

Comment 1 Juan Antonio Osorio 2020-08-18 07:04:05 UTC
Addressing here: https://github.com/openshift/compliance-operator/pull/404

Comment 4 Prashant Dhamdhere 2020-09-09 14:11:36 UTC
This looks good. Now, the Compliance Operator does not go in a Pending state on the vsphere cluster and the 
persistent volume gets created on the default thin provisioner storage class without an issue.

Verified on:

OCP version: 4.6.0-0.nightly-2020-09-08-123737
Compliance Operator : v0.1.15

$ oc get sc
NAME             PROVISIONER                    RECLAIMPOLICY   VOLUMEBINDINGMODE   ALLOWVOLUMEEXPANSION   AGE
thin (default)   kubernetes.io/vsphere-volume   Delete          Immediate           false                  5h6m

$ oc describe sc thin
Name:                  thin
IsDefaultClass:        Yes
Annotations:           storageclass.kubernetes.io/is-default-class=true
Provisioner:           kubernetes.io/vsphere-volume
Parameters:            diskformat=thin
AllowVolumeExpansion:  <unset>
MountOptions:          <none>
ReclaimPolicy:         Delete
VolumeBindingMode:     Immediate
Events:                <none>

$ oc get pods
NAME                                                 READY   STATUS    RESTARTS   AGE
compliance-operator-869646dd4f-vlvbj                 1/1     Running   0          8m25s
ocp4-pp-6786c5f5b-vf48p                              1/1     Running   0          7m21s
rhcos4-pp-78c8cc9d44-666wc                           1/1     Running   0          7m22s
worker-scan-rs-5fcd885449-q4jtk                      1/1     Running   0          45s   <<----- [running]
worker-scan-vs-miyadav-0909-rpkms-worker-rrq7s-pod   2/2     Running   0          45s

$ oc get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                              STORAGECLASS   REASON   AGE
pvc-f8fb177e-9478-4cdc-8f31-af1d2716e4e2   1Gi        RWO            Delete           Bound    openshift-compliance/worker-scan   thin                    66s

$ oc get pvc
NAME          STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
worker-scan   Bound    pvc-f8fb177e-9478-4cdc-8f31-af1d2716e4e2   1Gi        RWO            thin           73s

$  oc describe pvc worker-scan
Name:          worker-scan
Namespace:     openshift-compliance
StorageClass:  thin
Status:        Bound
Volume:        pvc-f8fb177e-9478-4cdc-8f31-af1d2716e4e2
Labels:        compliance.openshift.io/scan-name=worker-scan
Annotations:   pv.kubernetes.io/bind-completed: yes
               pv.kubernetes.io/bound-by-controller: yes
               volume.beta.kubernetes.io/storage-provisioner: kubernetes.io/vsphere-volume
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      1Gi
Access Modes:  RWO
VolumeMode:    Filesystem
Mounted By:    worker-scan-rs-5fcd885449-q4jtk
Events:
  Type    Reason                 Age   From                         Message
  ----    ------                 ----  ----                         -------
  Normal  ProvisioningSucceeded  96s   persistentvolume-controller  Successfully provisioned volume pvc-f8fb177e-9478-4cdc-8f31-af1d2716e4e2 using kubernetes.io/vsphere-volume


$ oc get pods 
NAME                                                 READY   STATUS      RESTARTS   AGE
aggregator-pod-worker-scan                           0/1     Completed   0          86s
compliance-operator-869646dd4f-vlvbj                 1/1     Running     0          15m
ocp4-pp-6786c5f5b-vf48p                              1/1     Running     0          14m
rhcos4-pp-78c8cc9d44-666wc                           1/1     Running     0          14m
worker-scan-vs-miyadav-0909-rpkms-worker-rrq7s-pod   0/2     Completed   0          7m46s

$ oc get compliancesuite
NAME                      PHASE   RESULT
example-compliancesuite   DONE    NON-COMPLIANT

Comment 5 Prashant Dhamdhere 2020-09-09 14:25:26 UTC
Also, I have verified that the storage class and access modes are configurable for PVs through Compliancesuite
and Compliancescan


Verified on:

OCP version: 4.6.0-0.nightly-2020-09-08-123737
Compliance Operator: v0.1.15

ComplianceSuite :

$ oc create -f - <<EOF 
> kind: StorageClass
> apiVersion: storage.k8s.io/v1
> metadata:
>   name: gold
> provisioner: kubernetes.io/aws-ebs
> reclaimPolicy: Delete
> volumeBindingMode: WaitForFirstConsumer
> parameters:
>   type: gp2 
> EOF
storageclass.storage.k8s.io/gold created

$ oc get sc
NAME            PROVISIONER             RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
gold            kubernetes.io/aws-ebs   Delete          WaitForFirstConsumer   false                  5s
gp2 (default)   kubernetes.io/aws-ebs   Delete          WaitForFirstConsumer   true                   10h
gp2-csi         ebs.csi.aws.com         Delete          WaitForFirstConsumer   true                   10h

$ oc create -f - <<EOF
> apiVersion: compliance.openshift.io/v1alpha1
> kind: ComplianceSuite
> metadata:
>   name: example-compliancesuite
> spec:
>   autoApplyRemediations: false
>   schedule: "0 1 * * *"
>   scans:
>     - name: worker-scan
>       profile: xccdf_org.ssgproject.content_profile_moderate
>       content: ssg-rhcos4-ds.xml
>       contentImage: quay.io/complianceascode/ocp4:latest
>       rule: "xccdf_org.ssgproject.content_rule_no_netrc_files"
>       debug: true
>       nodeSelector:
>         node-role.kubernetes.io/worker: ""
>       rawResultStorage:
>        storageClassName: gold
>        pvAccessModes:
>            - ReadWriteOnce
> EOF
compliancesuite.compliance.openshift.io/example-compliancesuite created


$ oc get pods 
NAME                                                         READY   STATUS      RESTARTS   AGE
aggregator-pod-worker-scan                                   0/1     Completed   0          23s
compliance-operator-869646dd4f-ls7nx                         1/1     Running     0          3h45m
ocp4-pp-6786c5f5b-dr6lf                                      1/1     Running     0          3h44m
rhcos4-pp-78c8cc9d44-76rcd                                   1/1     Running     0          3h44m
worker-scan-ip-10-0-156-82.us-east-2.compute.internal-pod    0/2     Completed   0          63s
worker-scan-ip-10-0-177-248.us-east-2.compute.internal-pod   0/2     Completed   0          63s
worker-scan-ip-10-0-223-164.us-east-2.compute.internal-pod   0/2     Completed   0          64s
worker-scan-rs-5fcd885449-mrdwp                              1/1     Running     0          64s  <<---- [running]

$ oc get compliancesuite
NAME                      PHASE   RESULT
example-compliancesuite   DONE    COMPLIANT

$ oc get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                              STORAGECLASS   REASON   AGE
pvc-21c12275-752c-4e6e-bd51-9608c9053b28   1Gi        RWO            Delete           Bound    openshift-compliance/worker-scan   gold                    30s

$ oc get pvc
NAME          STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
worker-scan   Bound    pvc-21c12275-752c-4e6e-bd51-9608c9053b28   1Gi        RWO            gold           41s


$ oc describe pvc worker-scan
Name:          worker-scan
Namespace:     openshift-compliance
StorageClass:  gold       <<----
Status:        Bound
Volume:        pvc-21c12275-752c-4e6e-bd51-9608c9053b28
Labels:        compliance.openshift.io/scan-name=worker-scan
Annotations:   pv.kubernetes.io/bind-completed: yes
               pv.kubernetes.io/bound-by-controller: yes
               volume.beta.kubernetes.io/storage-provisioner: kubernetes.io/aws-ebs
               volume.kubernetes.io/selected-node: ip-10-0-177-248.us-east-2.compute.internal
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      1Gi
Access Modes:  RWO   <<----
VolumeMode:    Filesystem
Mounted By:    <none>
Events:
  Type    Reason                 Age   From                         Message
  ----    ------                 ----  ----                         -------
  Normal  WaitForFirstConsumer   90s   persistentvolume-controller  waiting for first consumer to be created before binding
  Normal  ProvisioningSucceeded  84s   persistentvolume-controller  Successfully provisioned volume pvc-21c12275-752c-4e6e-bd51-9608c9053b28 using kubernetes.io/aws-ebs


ComplianceScan:


$ oc create -f - <<EOF
> apiVersion: compliance.openshift.io/v1alpha1
> kind: ComplianceScan
> metadata:
>   name: master-scan
> spec:
>   profile: xccdf_org.ssgproject.content_profile_e8
>   content: ssg-rhcos4-ds.xml
>   contentImage: quay.io/complianceascode/ocp4:latest
>   rule: xccdf_org.ssgproject.content_rule_accounts_no_uid_except_zero
>   debug: true
>   nodeSelector:
>     node-role.kubernetes.io/master: ""
>   rawResultStorage:
>     storageClassName: gold
>     pvAccessModes:
>       - ReadWriteOnce
> EOF
compliancescan.compliance.openshift.io/master-scan created


$ oc get pods
NAME                                                         READY   STATUS      RESTARTS   AGE
aggregator-pod-master-scan                                   0/1     Completed   0          55s
aggregator-pod-worker-scan                                   0/1     Completed   0          24m
compliance-operator-869646dd4f-ls7nx                         1/1     Running     0          4h9m
master-scan-ip-10-0-159-106.us-east-2.compute.internal-pod   0/2     Completed   0          115s
master-scan-ip-10-0-168-192.us-east-2.compute.internal-pod   0/2     Completed   0          115s
master-scan-ip-10-0-214-130.us-east-2.compute.internal-pod   0/2     Completed   0          115s
ocp4-pp-6786c5f5b-dr6lf                                      1/1     Running     0          4h8m
rhcos4-pp-78c8cc9d44-76rcd                                   1/1     Running     0          4h8m
worker-scan-ip-10-0-156-82.us-east-2.compute.internal-pod    0/2     Completed   0          24m
worker-scan-ip-10-0-177-248.us-east-2.compute.internal-pod   0/2     Completed   0          24m
worker-scan-ip-10-0-223-164.us-east-2.compute.internal-pod   0/2     Completed   0          24m


$ oc get compliancescan
NAME          PHASE   RESULT
master-scan   DONE    COMPLIANT
worker-scan   DONE    COMPLIANT

$ oc get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                              STORAGECLASS   REASON   AGE
pvc-21c12275-752c-4e6e-bd51-9608c9053b28   1Gi        RWO            Delete           Bound    openshift-compliance/worker-scan   gold                    23m
pvc-fdc3b24c-3133-49a0-b0bd-21f5f4bb41dc   1Gi        RWO            Delete           Bound    openshift-compliance/master-scan   gold                    27s  <<----

$ oc get pvc
NAME          STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
master-scan   Bound    pvc-fdc3b24c-3133-49a0-b0bd-21f5f4bb41dc   1Gi        RWO            gold           43s <<----
worker-scan   Bound    pvc-21c12275-752c-4e6e-bd51-9608c9053b28   1Gi        RWO            gold           23m


$ oc describe pvc master-scan
Name:          master-scan
Namespace:     openshift-compliance
StorageClass:  gold      <<-----
Status:        Bound
Volume:        pvc-fdc3b24c-3133-49a0-b0bd-21f5f4bb41dc
Labels:        compliance.openshift.io/scan-name=master-scan
Annotations:   pv.kubernetes.io/bind-completed: yes
               pv.kubernetes.io/bound-by-controller: yes
               volume.beta.kubernetes.io/storage-provisioner: kubernetes.io/aws-ebs
               volume.kubernetes.io/selected-node: ip-10-0-177-248.us-east-2.compute.internal
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      1Gi
Access Modes:  RWO  <<-----
VolumeMode:    Filesystem
Mounted By:    <none>
Events:
  Type    Reason                 Age    From                         Message
  ----    ------                 ----   ----                         -------
  Normal  WaitForFirstConsumer   3m42s  persistentvolume-controller  waiting for first consumer to be created before binding
  Normal  ProvisioningSucceeded  3m37s  persistentvolume-controller  Successfully provisioned volume pvc-fdc3b24c-3133-49a0-b0bd-21f5f4bb41dc using kubernetes.io/aws-ebs

Comment 7 errata-xmlrpc 2020-10-27 16:28:36 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196


Note You need to log in before you can comment on or make changes to this bug.