Description of problem (please be detailed as possible and provide log snippests): rook ceph operator crash loop backoff after upgrade to 4.12.10 to 4.12.11 Version of all relevant components (if applicable): ODF 4.12.11 Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Customer is concerned this may persist and affect reconcile ops. Is there any workaround available to the best of your knowledge? None Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 4 Can this issue reproducible? Unknown at this time but Seen on 2 customer clusters same version Can this issue reproduce from the UI? If this is a regression, please provide more details to justify this: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Question: what was the state of the cluster before upgrading the cluster? Were mon's in quorum?
Hello, We dont have logs from before upgrade but customer stated no issues prior. current state: cluster: id: e6d56c78-3697-4095-ba29-658a2745359e health: HEALTH_OK services: mon: 5 daemons, quorum a,c,d,e,f (age 5m) mgr: a(active, since 4d), standbys: b mds: 1/1 daemons up, 1 hot standby osd: 16 osds: 16 up (since 15h), 16 in (since 11w) rgw: 2 daemons active (2 hosts, 1 zones) data: volumes: 1/1 healthy pools: 12 pools, 449 pgs objects: 376.25k objects, 1.2 TiB usage: 4.9 TiB used, 18 TiB / 23 TiB avail pgs: 449 active+clean io: client: 257 KiB/s rd, 22 MiB/s wr, 54 op/s rd, 2.86k op/s wr
Malay - could you pls share the steps to verify this bug?
Steps to verify would be, Stretch cluster setup on ODF 4.12.10 Upgrade to ODF 4.12.11 & observer that rook-ceph-operator pod has gone to CLBO. Upgrade to ODF 4.12.12, now the rook-ceph-operator pod should be back to up and running. @tnielsen please add if anything else needs to be checked.
Another verification step could be to look at the rook operator log and see many messages such as described in https://bugzilla.redhat.com/show_bug.cgi?id=2187952#c28
Closing the bug as we don't intend to test the fix in 4.12.12 for the reason above.