1952344 – OCS 4.8: v4.8.0-359 - storagecluster is in progressing state

Bug 1952344 - OCS 4.8: v4.8.0-359 - storagecluster is in progressing state

Summary: OCS 4.8: v4.8.0-359 - storagecluster is in progressing state

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenShift Container Storage
Classification:	Red Hat Storage
Component:	Multi-Cloud Object Gateway
Sub Component:
Version:	4.8
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	OCS 4.8.0
Assignee:	Danny
QA Contact:	Vijay Avuthu
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-04-22 06:06 UTC by Vijay Avuthu
Modified:	2021-08-03 18:16 UTC (History)
CC List:	6 users (show)
Fixed In Version:	4.8.0-361.ci
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-08-03 18:15:57 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	noobaa noobaa-operator pull 618	0	None	open	changed noobaa-endpoint security context	2021-04-22 07:35:58 UTC
Red Hat Product Errata	RHBA-2021:3003	0	None	None	None	2021-08-03 18:16:21 UTC

Description Vijay Avuthu 2021-04-22 06:06:46 UTC

Description of problem (please be detailed as possible and provide log
snippests):

OCS storage cluster is in Progressing state 

$ oc get csv
NAME                         DISPLAY                       VERSION        REPLACES   PHASE
ocs-operator.v4.8.0-359.ci   OpenShift Container Storage   4.8.0-359.ci              Succeeded
$ 
$ oc get storagecluster
NAME                 AGE   PHASE         EXTERNAL   CREATED AT             VERSION
ocs-storagecluster   36m   Progressing              2021-04-22T05:10:13Z   4.8.0
$ 


Version of all relevant components (if applicable):

openshift installer (4.8.0-0.nightly-2021-04-21-201409)
quay.io/rhceph-dev/ocs-registry:4.8.0-359.ci

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?


Is there any workaround available to the best of your knowledge?
NA

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
1

Can this issue reproducible?
1/1

Can this issue reproduce from the UI?
Not tried

If this is a regression, please provide more details to justify this:
Yes

Steps to Reproduce:
1.Install OCS using ocs-ci
2.verify storage cluster is in ready state
3.


Actual results:

storage cluster is in progressing state

Expected results:

storage cluster should be in succeed phase

Additional info:

$ oc describe storagecluster ocs-storagecluster
Name:         ocs-storagecluster
Namespace:    openshift-storage
Labels:       <none>
Annotations:  uninstall.ocs.openshift.io/cleanup-policy: delete
              uninstall.ocs.openshift.io/mode: graceful
API Version:  ocs.openshift.io/v1
Kind:         StorageCluster
Metadata:
  Creation Timestamp:  2021-04-22T05:10:13Z
  Finalizers:
    storagecluster.ocs.openshift.io
  Generation:  2
  Managed Fields:
    API Version:  ocs.openshift.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:spec:
        .:
        f:resources:
          .:
          f:mds:
          f:mgr:
          f:mon:
          f:noobaa-core:
          f:noobaa-db:
          f:noobaa-endpoint:
            .:
            f:limits:
              .:
              f:memory:
            f:requests:
              .:
              f:memory:
          f:rgw:
    Manager:      kubectl-create
    Operation:    Update
    Time:         2021-04-22T05:10:13Z
    API Version:  ocs.openshift.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .:
          f:uninstall.ocs.openshift.io/cleanup-policy:
          f:uninstall.ocs.openshift.io/mode:
        f:finalizers:
          .:
          v:"storagecluster.ocs.openshift.io":
      f:spec:
        f:arbiter:
        f:encryption:
          .:
          f:kms:
        f:externalStorage:
        f:managedResources:
          .:
          f:cephBlockPools:
          f:cephConfig:
          f:cephDashboard:
          f:cephFilesystems:
          f:cephObjectStoreUsers:
          f:cephObjectStores:
        f:resources:
          f:noobaa-endpoint:
            f:limits:
              f:cpu:
            f:requests:
              f:cpu:
        f:storageDeviceSets:
        f:version:
      f:status:
        .:
        f:conditions:
        f:failureDomain:
        f:failureDomainKey:
        f:failureDomainValues:
        f:images:
          .:
          f:ceph:
            .:
            f:actualImage:
            f:desiredImage:
          f:noobaaCore:
            .:
            f:actualImage:
            f:desiredImage:
          f:noobaaDB:
            .:
            f:actualImage:
            f:desiredImage:
        f:nodeTopologies:
          .:
          f:labels:
            .:
            f:kubernetes.io/hostname:
            f:topology.rook.io/rack:
        f:phase:
        f:relatedObjects:
    Manager:         ocs-operator
    Operation:       Update
    Time:            2021-04-22T05:12:29Z
  Resource Version:  74451
  UID:               6ccc75ea-2b85-4770-82aa-b0f2ede6d5b3
Spec:
  Arbiter:
  Encryption:
    Kms:
  External Storage:
  Managed Resources:
    Ceph Block Pools:
    Ceph Config:
    Ceph Dashboard:
    Ceph Filesystems:
    Ceph Object Store Users:
    Ceph Object Stores:
  Resources:
    Mds:
    Mgr:
    Mon:
    Noobaa - Core:
    Noobaa - Db:
    Noobaa - Endpoint:
      Limits:
        Cpu:     1
        Memory:  500Mi
      Requests:
        Cpu:     1
        Memory:  500Mi
    Rgw:
  Storage Device Sets:
    Config:
    Count:  1
    Data PVC Template:
      Metadata:
      Spec:
        Access Modes:
          ReadWriteOnce
        Resources:
          Requests:
            Storage:         100Gi
        Storage Class Name:  thin
        Volume Mode:         Block
      Status:
    Name:  ocs-deviceset
    Placement:
    Portable:  true
    Prepare Placement:
    Replica:  3
    Resources:
  Version:  4.8.0
Status:
  Conditions:
    Last Heartbeat Time:   2021-04-22T05:51:12Z
    Last Transition Time:  2021-04-22T05:10:15Z
    Message:               Reconcile completed successfully
    Reason:                ReconcileCompleted
    Status:                True
    Type:                  ReconcileComplete
    Last Heartbeat Time:   2021-04-22T05:13:31Z
    Last Transition Time:  2021-04-22T05:10:16Z
    Message:               Reconcile completed successfully
    Reason:                ReconcileCompleted
    Status:                True
    Type:                  Available
    Last Heartbeat Time:   2021-04-22T05:51:12Z
    Last Transition Time:  2021-04-22T05:13:32Z
    Message:               Waiting on Nooba instance to finish initialization
    Reason:                NoobaaInitializing
    Status:                True
    Type:                  Progressing
    Last Heartbeat Time:   2021-04-22T05:13:31Z
    Last Transition Time:  2021-04-22T05:10:13Z
    Message:               Reconcile completed successfully
    Reason:                ReconcileCompleted
    Status:                False
    Type:                  Degraded
    Last Heartbeat Time:   2021-04-22T05:13:31Z
    Last Transition Time:  2021-04-22T05:10:16Z
    Message:               Reconcile completed successfully
    Reason:                ReconcileCompleted
    Status:                True
    Type:                  Upgradeable
  Failure Domain:          rack
  Failure Domain Key:      topology.rook.io/rack
  Failure Domain Values:
    rack0
    rack1
    rack2
  Images:
    Ceph:
      Actual Image:   quay.io/rhceph-dev/rhceph@sha256:d2e99edf733960244256ad82e761acffe9f09e76749bb769469b4b929b25c509
      Desired Image:  quay.io/rhceph-dev/rhceph@sha256:d2e99edf733960244256ad82e761acffe9f09e76749bb769469b4b929b25c509
    Noobaa Core:
      Actual Image:   quay.io/rhceph-dev/mcg-core@sha256:6379f378916ba2c70774bde2765b62116c3d2c998a74fcc6eedfc767d6ab052c
      Desired Image:  quay.io/rhceph-dev/rhceph@sha256:d2e99edf733960244256ad82e761acffe9f09e76749bb769469b4b929b25c509
    Noobaa DB:
      Actual Image:   registry.redhat.io/rhel8/postgresql-12@sha256:f4e5c728b644bf1888ec8086424852ed74b5596a511be29e636fb10218fc9b6f
      Desired Image:  registry.redhat.io/rhel8/postgresql-12@sha256:f4e5c728b644bf1888ec8086424852ed74b5596a511be29e636fb10218fc9b6f
  Node Topologies:
    Labels:
      kubernetes.io/hostname:
        compute-0
        compute-1
        compute-2
      topology.rook.io/rack:
        rack0
        rack1
        rack2
  Phase:  Progressing
  Related Objects:
    API Version:       noobaa.io/v1alpha1
    Kind:              NooBaa
    Name:              noobaa
    Namespace:         openshift-storage
    Resource Version:  58963
    UID:               d3f23b88-db30-4f46-b7af-2a6ce1ce933b
Events:                <none>
$ 



job: https://ocs4-jenkins-csb-ocsqe.apps.ocp4.prod.psi.redhat.com/job/qe-deploy-ocs-cluster/2145/console

Comment 2 Vijay Avuthu 2021-04-22 06:18:22 UTC

> From rook-ceph-operator-55f9f45b79-8x5pb log:

2021-04-22 05:22:41.667702 E | ceph-cluster-controller: failed to retrieve ceph cluster "ocs-storagecluster-cephcluster" in namespace "openshift-storage" to update status to &{Health:{Status:HEALTH_OK Checks:map
[]} FSID:2a2c311d-a484-48ad-ba58-8a466e229ced ElectionEpoch:12 Quorum:[0 1 2] QuorumNames:[a b c] MonMap:{Epoch:3 FSID:2a2c311d-a484-48ad-ba58-8a466e229ced CreatedTime:2021-04-22 05:11:13.191104 ModifiedTime:202
1-04-22 05:12:05.515110 Mons:[{Name:a Rank:0 Address:172.30.127.81:6789/0 PublicAddr:172.30.127.81:6789/0 PublicAddrs:{Addrvec:[{Type:v2 Addr:172.30.127.81:3300 Nonce:0} {Type:v1 Addr:172.30.127.81:6789 Nonce:0}
]}} {Name:b Rank:1 Address:172.30.13.140:6789/0 PublicAddr:172.30.13.140:6789/0 PublicAddrs:{Addrvec:[{Type:v2 Addr:172.30.13.140:3300 Nonce:0} {Type:v1 Addr:172.30.13.140:6789 Nonce:0}]}} {Name:c Rank:2 Address
:172.30.215.129:6789/0 PublicAddr:172.30.215.129:6789/0 PublicAddrs:{Addrvec:[{Type:v2 Addr:172.30.215.129:3300 Nonce:0} {Type:v1 Addr:172.30.215.129:6789 Nonce:0}]}}]} OsdMap:{OsdMap:{Epoch:61 NumOsd:3 NumUpOsd
:3 NumInOsd:3 Full:false NearFull:false NumRemappedPgs:0}} PgMap:{PgsByState:[{StateName:active+clean Count:176}] Version:0 NumPgs:176 DataBytes:448340658 UsedBytes:3765305344 AvailableBytes:318357241856 TotalBy
tes:322122547200 ReadBps:1279 WriteBps:92812 ReadOps:2 WriteOps:1 RecoveryBps:0 RecoveryObjectsPerSec:0 RecoveryKeysPerSec:0 CacheFlushBps:0 CacheEvictBps:0 CachePromoteBps:0} MgrMap:{Epoch:11 ActiveGID:24521 Ac
tiveName:a ActiveAddr:10.129.2.18:6801/20 Available:true Standbys:[]} Fsmap:{Epoch:11 ID:1 Up:1 In:1 Max:1 ByRank:[{FilesystemID:1 Rank:0 Name:ocs-storagecluster-cephfilesystem-a Status:up:active Gid:4392} {File
systemID:1 Rank:0 Name:ocs-storagecluster-cephfilesystem-b Status:up:standby-replay Gid:15073}] UpStandby:0}}
E0422 05:22:42.677150       7 reflector.go:138] pkg/mod/k8s.io/client-go.0/tools/cache/reflector.go:167: Failed to watch *v1.CephObjectRealm: the server has received too many requests and has asked us to t
ry again later (get cephobjectrealms.ceph.rook.io)
E0422 05:22:42.677150       7 reflector.go:138] pkg/mod/k8s.io/client-go.0/tools/cache/reflector.go:167: Failed to watch *v1.ConfigMap: the server has received too many requests and has asked us to try aga
in later (get configmaps)
E0422 05:22:42.677150       7 reflector.go:138] pkg/mod/k8s.io/client-go.0/tools/cache/reflector.go:167: Failed to watch *v1.Pod: the server has received too many requests and has asked us to try again lat
er (get pods)
E0422 05:22:42.677156       7 reflector.go:138] pkg/mod/k8s.io/client-go.0/tools/cache/reflector.go:167: Failed to watch *v1.CephNFS: the server has received too many requests and has asked us to try again
 later (get cephnfses.ceph.rook.io)

> must gather logs: http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/vavuthu-ocs48/vavuthu-ocs48_20210422T040618/logs/failed_testcase_ocs_logs_1619064742/test_deployment_ocs_logs/

Comment 8 Vijay Avuthu 2021-06-15 11:26:47 UTC

Verified with ocs-registry:4.8.0-417.ci and deployment is successfull

Job: https://ocs4-jenkins-csb-ocsqe.apps.ocp4.prod.psi.redhat.com/job/qe-deploy-ocs-cluster/3781/consoleFull

Comment 10 errata-xmlrpc 2021-08-03 18:15:57 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenShift Container Storage 4.8.0 container images bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:3003

Note You need to log in before you can comment on or make changes to this bug.