Bug 1934990
Summary: | Ceph health ERR post node drain on KMS encryption enabled cluster | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat OpenShift Container Storage | Reporter: | Persona non grata <nobody+410372> |
Component: | rook | Assignee: | Sébastien Han <shan> |
Status: | CLOSED ERRATA | QA Contact: | Persona non grata <nobody+410372> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | 4.7 | CC: | bniver, jthottan, madam, muagarwa, nberry, nojha, ocs-bugs, owasserm, shan |
Target Milestone: | --- | Keywords: | AutomationBackLog, Reopened |
Target Release: | OCS 4.7.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | No Doc Update | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2021-05-19 09:20:01 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Persona non grata
2021-03-04 07:17:47 UTC
This sounds related to the KMS keys not being loaded properly on restart, but I thought it was already fixed by https://github.com/rook/rook/pull/7240 a couple weeks ago. Seb PTAL The error is clearly indicated in the osd deployment logs: ['error performing token check: Vault is sealed']% Please fix your setup. The wrong kubeconfig was shared offline, so I'm re-opening, after looking at the logs, one PR state is active+clean+inconsistent. I ran "ceph pg deep-scrub 1.50" and then instructed Ceph to repair it with "ceph pg repair 1.50", now Ceph health is ok. I believe Ceph would have eventually repaired the PG during the next deep-scrub. Auto-repair works well on Bluestore. Josh/Neha for confirmation. Thanks. (In reply to Sébastien Han from comment #7) > The wrong kubeconfig was shared offline, so I'm re-opening, after looking at > the logs, one PR state is active+clean+inconsistent. > I ran "ceph pg deep-scrub 1.50" and then instructed Ceph to repair it with > "ceph pg repair 1.50", now Ceph health is ok. > > I believe Ceph would have eventually repaired the PG during the next > deep-scrub. Auto-repair works well on Bluestore. > Josh/Neha for confirmation. > > Thanks. This is true when osd_scrub_auto_repair is enabled and it repairs up to osd_scrub_auto_repair_num_errors errors. Thanks, as far as I can tell osd_scrub_auto_repair is disabled by default, is it advisable to enable it for OCS by default? (In reply to Sébastien Han from comment #9) > Thanks, as far as I can tell osd_scrub_auto_repair is disabled by default, > is it advisable to enable it for OCS by default? I think so - it is advisable to enable it in a cluster which has only BlueStore OSDs. Neha, Can't we auto-detect that from the OSD startup and set osd_scrub_auto_repair to true? Rook can force enable it in the meantime. (In reply to Sébastien Han from comment #11) > Neha, > > Can't we auto-detect that from the OSD startup and set osd_scrub_auto_repair > to true? We are considering enabling it by default in the next release, so not worth the extra complexity. > Rook can force enable it in the meantime. sure I guess we are fixing this in rook, please revert back if that is not correct. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: Red Hat OpenShift Container Storage 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2041 |