Bug 2073499

Summary: ReconcileRootSecret error in noobaa-oparator pod logs
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: Filip Balák <fbalak>
Component: Multi-Cloud Object GatewayAssignee: Alexander Indenbaum <aindenba>
Status: CLOSED CURRENTRELEASE QA Contact: Filip Balák <fbalak>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.10CC: aindenba, ebenahar, mbukatov, mmuench, nladha, ocs-bugs, odf-bz-bot
Target Milestone: ---Flags: sheggodu: needinfo? (aindenba)
sheggodu: needinfo? (aindenba)
Target Release: ODF 4.12.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: 4.12.0-86' Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-02-08 14:06:28 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Filip Balák 2022-04-08 15:41:54 UTC
Description of problem (please be detailed as possible and provide log
snippests):
In earlier versions (<4.10) there used to be a message in logs (without timestamps):
level=info msg="✅ Exists: Secret \"ocs-kms-token\"\n"
level=info msg="✅ (Optional) Exists: Secret \"ocs-kms-ca-secret\"\n"
level=info msg="✅ (Optional) Exists:  \"ocs-kms-client-cert\"\n"
level=info msg="✅ (Optional) Exists:  \"ocs-kms-client-key\"\n"
level=info msg="✅ (Optional) Exists: Secret \"ocs-kms-ca-secret\"\n"
level=info msg="✅ (Optional) Exists:  \"ocs-kms-client-cert\"\n"
level=info msg="✅ (Optional) Exists:  \"ocs-kms-client-key\"\n"
level=info msg="✅ Exists: Secret \"ocs-kms-token\"\n"
level=info msg="ReconcileRootSecret: found root secret in external KMS successfully" sys=openshift-storage/noobaa

For version ODF 4.10 this changes to error message:
level=info msg="✅ (Optional) Exists:  \"ocs-kms-ca-secret\"\n"
level=info msg="✅ (Optional) Exists:  \"ocs-kms-client-cert\"\n"
level=info msg="✅ (Optional) Exists:  \"ocs-kms-client-key\"\n"
level=info msg="✅ Exists:  \"ocs-kms-token\"\n"
level=info msg="setKMSConditionType vault" sys=openshift-storage/noobaa
level=error msg="ReconcileRootSecret, KMS Get error No Secret Data found for Secret ID" sys=openshift-storage/noobaa
level=info msg="setKMSConditionStatus Init" sys=openshift-storage/noobaa

From logs it looks like there is a change in kms functionality that could break encryption process.

Version of all relevant components (if applicable):
ODF 4.10.0-217

Steps to Reproduce:
1. Install ODF with cluster wide encryption.
2. Check noobaa-operator logs


Actual results:
There is an error log:ReconcileRootSecret, KMS Get error No Secret Data found for Secret ID

Expected results:
There used to be log messagge: ReconcileRootSecret: found root secret in external KMS successfully

Additional info:
ODF 4.10 noobaa-operator logs: http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/j-010vukv11cs33-t1/j-010vukv11cs33-t1_20220404T095039/logs/failed_testcase_ocs_logs_1649069382/test_noobaa_kms_validation_ocs_logs/ocs_must_gather/quay-io-rhceph-dev-ocs-must-gather-sha256-3a6619c418c74824b6a69dfce26f2070aae8b7d56b7cea756ab68d5e2c459e07/namespaces/openshift-storage/pods/noobaa-operator-bcd7f9c57-v2x2g/noobaa-operator/noobaa-operator/logs/current.log
ODF 4.9 noobaa-operator logs: http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/j-002vukv11cs33-t1/j-002vukv11cs33-t1_20220328T062106/logs/failed_testcase_ocs_logs_1648453414/test_failure_propagator_ocs_logs/ocs_must_gather/quay-io-rhceph-dev-ocs-must-gather-sha256-1b03fa95946c0265bfa57a63f2a5c141c16524a4db5bad56406c2026028cbb60/namespaces/openshift-storage/pods/noobaa-operator-55755594b9-p27l4/noobaa-operator/noobaa-operator/logs/current.log

Comment 2 Alexander Indenbaum 2022-04-26 14:31:48 UTC
Hello Filip, 

The KNS module implementation was changed in the following PR https://github.com/noobaa/noobaa-operator/pull/797. The log messages are no longer expected to match.

There are new mechanisms for troubleshooting: new condition statuses on NooBaa CR. The internal state of the KMS subsystem is exposed via two new condition types:
       "KMS-Status"
       "KMS-Type"
Both condition statuses are set during each reconciliation cycle.

So regarding KMS-Type, per log the type is vault, as expected.
> 2022-04-04T10:38:11.259392453Z time="2022-04-04T10:38:11Z" level=info msg="setKMSConditionType vault" sys=openshift-storage/noobaa


There are several possible values for the "KMS-Status": 
	"Init" - External KMS initialized
	"Sync" - The root key was synchronized from external KMS
	"Invalid" - Invalid external KMS definition
	"ErrorRead" - Error reading secret from external KMS
	"ErrorWrite" - Error writing initial root key to eternal KMS


During installation, the expected status transition is "Init" -> "Sync". After the initial transition, the status should remain "Sync"
                                

And indeed according to logs:
> 2022-04-04T10:36:37.040505500Z time="2022-04-04T10:36:37Z" level=info msg="setKMSConditionStatus Init" sys=openshift-storage/noobaa
> 2022-04-04T10:37:37.634139287Z time="2022-04-04T10:37:37Z" level=info msg="setKMSConditionStatus Sync" sys=openshift-storage/noobaa
> 2022-04-04T10:38:11.081543011Z time="2022-04-04T10:38:11Z" level=info msg="setKMSConditionStatus Sync" sys=openshift-storage/noobaa
> 2022-04-04T10:38:11.278200788Z time="2022-04-04T10:38:11Z" level=info msg="setKMSConditionStatus Sync" sys=openshift-storage/noobaa
...

Let me describe the logic of status transitions. "Init" happens only once during the initial system startup, the operator tries to fetch the master root key from the external KMS storage, however initially, it is empty, so a random master key is generated by the operator and written into the external KMS storage. Upon successful write completion, the status is set to "Init". Upon subsequent operator KMS reconciliation, the operator would read successfully the key from the external KMS storage and the status is set to "Sync".

So while the initial transition to "Init" both those log messages are expected:
> 2022-04-04T10:36:37.018701650Z time="2022-04-04T10:36:37Z" level=error msg="ReconcileRootSecret, KMS Get error No Secret Data found for Secret ID" sys=openshift-storage/noobaa
> 2022-04-04T10:36:37.040505500Z time="2022-04-04T10:36:37Z" level=info msg="setKMSConditionStatus Init" sys=openshift-storage/noobaa

Guess the phrasing which includes the word "error" is not the best one for the expected state.

So to sum up, the logs indicate an expected status transition. Let me know if you have any more question.

Thank you,
~baum

Comment 13 Elad 2022-11-03 08:06:00 UTC
Any news here? Is the bug supposed to be fixed now? If so, please move to ON_QA

Comment 17 Filip Balák 2022-11-09 13:55:39 UTC
Error log message "ReconcileRootSecret, KMS Get error No Secret Data found for Secret ID" is still present in logs but from context it seems that this is expected (a query for the secret that is created later?). The message is present before Init and Sync happen.

Tested with:
ocp 4.12.0-0.nightly-2022-11-07-181244
odf 4.12.0-91