Bug 2073499 - ReconcileRootSecret error in noobaa-oparator pod logs [NEEDINFO]
Summary: ReconcileRootSecret error in noobaa-oparator pod logs
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: Multi-Cloud Object Gateway
Version: 4.10
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ODF 4.12.0
Assignee: Alexander Indenbaum
QA Contact: Filip Balák
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-04-08 15:41 UTC by Filip Balák
Modified: 2023-08-09 16:49 UTC (History)
7 users (show)

Fixed In Version: 4.12.0-86'
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-02-08 14:06:28 UTC
Embargoed:
sheggodu: needinfo? (aindenba)
sheggodu: needinfo? (aindenba)


Attachments (Terms of Use)

Description Filip Balák 2022-04-08 15:41:54 UTC
Description of problem (please be detailed as possible and provide log
snippests):
In earlier versions (<4.10) there used to be a message in logs (without timestamps):
level=info msg="✅ Exists: Secret \"ocs-kms-token\"\n"
level=info msg="✅ (Optional) Exists: Secret \"ocs-kms-ca-secret\"\n"
level=info msg="✅ (Optional) Exists:  \"ocs-kms-client-cert\"\n"
level=info msg="✅ (Optional) Exists:  \"ocs-kms-client-key\"\n"
level=info msg="✅ (Optional) Exists: Secret \"ocs-kms-ca-secret\"\n"
level=info msg="✅ (Optional) Exists:  \"ocs-kms-client-cert\"\n"
level=info msg="✅ (Optional) Exists:  \"ocs-kms-client-key\"\n"
level=info msg="✅ Exists: Secret \"ocs-kms-token\"\n"
level=info msg="ReconcileRootSecret: found root secret in external KMS successfully" sys=openshift-storage/noobaa

For version ODF 4.10 this changes to error message:
level=info msg="✅ (Optional) Exists:  \"ocs-kms-ca-secret\"\n"
level=info msg="✅ (Optional) Exists:  \"ocs-kms-client-cert\"\n"
level=info msg="✅ (Optional) Exists:  \"ocs-kms-client-key\"\n"
level=info msg="✅ Exists:  \"ocs-kms-token\"\n"
level=info msg="setKMSConditionType vault" sys=openshift-storage/noobaa
level=error msg="ReconcileRootSecret, KMS Get error No Secret Data found for Secret ID" sys=openshift-storage/noobaa
level=info msg="setKMSConditionStatus Init" sys=openshift-storage/noobaa

From logs it looks like there is a change in kms functionality that could break encryption process.

Version of all relevant components (if applicable):
ODF 4.10.0-217

Steps to Reproduce:
1. Install ODF with cluster wide encryption.
2. Check noobaa-operator logs


Actual results:
There is an error log:ReconcileRootSecret, KMS Get error No Secret Data found for Secret ID

Expected results:
There used to be log messagge: ReconcileRootSecret: found root secret in external KMS successfully

Additional info:
ODF 4.10 noobaa-operator logs: http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/j-010vukv11cs33-t1/j-010vukv11cs33-t1_20220404T095039/logs/failed_testcase_ocs_logs_1649069382/test_noobaa_kms_validation_ocs_logs/ocs_must_gather/quay-io-rhceph-dev-ocs-must-gather-sha256-3a6619c418c74824b6a69dfce26f2070aae8b7d56b7cea756ab68d5e2c459e07/namespaces/openshift-storage/pods/noobaa-operator-bcd7f9c57-v2x2g/noobaa-operator/noobaa-operator/logs/current.log
ODF 4.9 noobaa-operator logs: http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/j-002vukv11cs33-t1/j-002vukv11cs33-t1_20220328T062106/logs/failed_testcase_ocs_logs_1648453414/test_failure_propagator_ocs_logs/ocs_must_gather/quay-io-rhceph-dev-ocs-must-gather-sha256-1b03fa95946c0265bfa57a63f2a5c141c16524a4db5bad56406c2026028cbb60/namespaces/openshift-storage/pods/noobaa-operator-55755594b9-p27l4/noobaa-operator/noobaa-operator/logs/current.log

Comment 2 Alexander Indenbaum 2022-04-26 14:31:48 UTC
Hello Filip, 

The KNS module implementation was changed in the following PR https://github.com/noobaa/noobaa-operator/pull/797. The log messages are no longer expected to match.

There are new mechanisms for troubleshooting: new condition statuses on NooBaa CR. The internal state of the KMS subsystem is exposed via two new condition types:
       "KMS-Status"
       "KMS-Type"
Both condition statuses are set during each reconciliation cycle.

So regarding KMS-Type, per log the type is vault, as expected.
> 2022-04-04T10:38:11.259392453Z time="2022-04-04T10:38:11Z" level=info msg="setKMSConditionType vault" sys=openshift-storage/noobaa


There are several possible values for the "KMS-Status": 
	"Init" - External KMS initialized
	"Sync" - The root key was synchronized from external KMS
	"Invalid" - Invalid external KMS definition
	"ErrorRead" - Error reading secret from external KMS
	"ErrorWrite" - Error writing initial root key to eternal KMS


During installation, the expected status transition is "Init" -> "Sync". After the initial transition, the status should remain "Sync"
                                

And indeed according to logs:
> 2022-04-04T10:36:37.040505500Z time="2022-04-04T10:36:37Z" level=info msg="setKMSConditionStatus Init" sys=openshift-storage/noobaa
> 2022-04-04T10:37:37.634139287Z time="2022-04-04T10:37:37Z" level=info msg="setKMSConditionStatus Sync" sys=openshift-storage/noobaa
> 2022-04-04T10:38:11.081543011Z time="2022-04-04T10:38:11Z" level=info msg="setKMSConditionStatus Sync" sys=openshift-storage/noobaa
> 2022-04-04T10:38:11.278200788Z time="2022-04-04T10:38:11Z" level=info msg="setKMSConditionStatus Sync" sys=openshift-storage/noobaa
...

Let me describe the logic of status transitions. "Init" happens only once during the initial system startup, the operator tries to fetch the master root key from the external KMS storage, however initially, it is empty, so a random master key is generated by the operator and written into the external KMS storage. Upon successful write completion, the status is set to "Init". Upon subsequent operator KMS reconciliation, the operator would read successfully the key from the external KMS storage and the status is set to "Sync".

So while the initial transition to "Init" both those log messages are expected:
> 2022-04-04T10:36:37.018701650Z time="2022-04-04T10:36:37Z" level=error msg="ReconcileRootSecret, KMS Get error No Secret Data found for Secret ID" sys=openshift-storage/noobaa
> 2022-04-04T10:36:37.040505500Z time="2022-04-04T10:36:37Z" level=info msg="setKMSConditionStatus Init" sys=openshift-storage/noobaa

Guess the phrasing which includes the word "error" is not the best one for the expected state.

So to sum up, the logs indicate an expected status transition. Let me know if you have any more question.

Thank you,
~baum

Comment 13 Elad 2022-11-03 08:06:00 UTC
Any news here? Is the bug supposed to be fixed now? If so, please move to ON_QA

Comment 17 Filip Balák 2022-11-09 13:55:39 UTC
Error log message "ReconcileRootSecret, KMS Get error No Secret Data found for Secret ID" is still present in logs but from context it seems that this is expected (a query for the secret that is created later?). The message is present before Init and Sync happen.

Tested with:
ocp 4.12.0-0.nightly-2022-11-07-181244
odf 4.12.0-91


Note You need to log in before you can comment on or make changes to this bug.