Bug 2216139

Summary: [GSS] Unable to recreate noobaa once it´s deleted
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: amansan <amanzane>
Component: rookAssignee: Blaine Gardner <brgardne>
Status: ASSIGNED --- QA Contact: Neha Berry <nberry>
Severity: high Docs Contact:
Priority: high    
Version: 4.10CC: brgardne, jalbo, mduasope, nbecker, odf-bz-bot, rafrojas, tnielsen
Target Milestone: ---Flags: tnielsen: needinfo? (mduasope)
brgardne: needinfo? (amanzane)
brgardne: needinfo? (rafrojas)
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description amansan 2023-06-20 09:11:22 UTC
Description of problem (please be detailed as possible and provide log snippests):

Due inconsistency in noobaa-db-pg-0 the customer finally agrees to rebuild noobaa to be sure we have a stable configuration

Version of all relevant components (if applicable):

ODF 4.10

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?

Yes, noobaa is not rebuilt and the customer need it to configure the applications


Is there any workaround available to the best of your knowledge?

No


Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)?

3

Can this issue reproducible?

In the customer site

Actual results:

Noobaa is not rebuilt

Expected results:

Noobaa working

Additional info:

Comment 13 Blaine Gardner 2023-07-11 21:51:29 UTC
@amanzane please collect an OCP must-gather. I can see that the configmaps have deletion timestamps, but they are not being deleted by the openshift system. I don't see any logs from OpenShift (like the kubelet) that would indicate what might be going wrong there.

I don't believe there is an RBAC issue. If that were the case, Rook would be reporting an error related to permissions. Instead, it is reporting a timeout waiting for the configmap to be deleted.

It's possible that this is an openshift bug of some kind, or that the OCP cluster is in a degraded state.

Comment 18 Blaine Gardner 2023-07-19 21:19:27 UTC
I think just `oc adm node-logs` will be sufficient. It should contain logs for kubelet and other host processes.

https://access.redhat.com/documentation/en-us/openshift_container_platform/4.13/html/support/gathering-cluster-data#querying-cluster-node-journal-logs_gathering-cluster-data