Bug 2032656
| Summary: | Rook not recovering when deleting osd deployment with kms encryption | ||||||
|---|---|---|---|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat OpenShift Data Foundation | Reporter: | Shay Rozen <srozen> | ||||
| Component: | rook | Assignee: | Sébastien Han <shan> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Rachael <rgeorge> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 4.9 | CC: | ebenahar, madam, mmuench, muagarwa, nberry, ocs-bugs, odf-bz-bot, rperiyas, shan | ||||
| Target Milestone: | --- | ||||||
| Target Release: | ODF 4.10.0 | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | 4.10.0-113 | Doc Type: | No Doc Update | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2022-04-13 18:50:40 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
Shay, do you have a must-gather or can I access the env? Thanks Part of the latest resync https://github.com/red-hat-storage/rook/pull/325 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.10.0 enhancement, security & bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:1372 |
Created attachment 1846296 [details] Rook log. Description of problem (please be detailed as possible and provide log snippests): When deleting OSD deployment rook tend to recover. However while kms encryption is enabled rook does not recover from deployment deletion. Version of all relevant components (if applicable): All versions Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Can't recover from OSD deployment while kms encryption is enabled. Is there any workaround available to the best of your knowledge? no Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 4 Can this issue reproducible? Yes Can this issue reproduce from the UI? Also If this is a regression, please provide more details to justify this: No Steps to Reproduce: 1. Install OCP4.9+odf4.9+kms encryption. 2. After all OSD are up and running delete OSD deployment 3. Check if OSD is up Actual results: OSD pod is not recovering with KMS encryption. With no KMS encryption OSD pod is recovering. Expected results: All OSD pods should be up after one of the OSD deployment is deleted Additional info: There are multiple error in rook log: Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new 2e48f23b-faaa-4e17-8879-1d1ba219e59d Running command: /usr/bin/ceph-authtool --gen-print-key Running command: /usr/sbin/cryptsetup --batch-mode --key-file - luksFormat /mnt/ocs-deviceset-thin-0-data-0j8s9q stderr: Device /mnt/ocs-deviceset-thin-0-data-0j8s9q is in use. Can not proceed with format operation. Running command: /usr/sbin/cryptsetup --key-file - --allow-discards luksOpen /mnt/ocs-deviceset-thin-0-data-0j8s9q ceph-2e48f23b-faaa-4e17-8879-1d1ba219e59d-sdc-block-dmcrypt stderr: Cannot use device /mnt/ocs-deviceset-thin-0-data-0j8s9q which is in use (already mapped or mounted). Running command: /usr/bin/mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-3 Running command: /usr/bin/chown -R ceph:ceph /dev/mapper/ceph-2e48f23b-faaa-4e17-8879-1d1ba219e59d-sdc-block-dmcrypt stderr: chown: cannot access '/dev/mapper/ceph-2e48f23b-faaa-4e17-8879-1d1ba219e59d-sdc-block-dmcrypt': No such file or directory --> Was unable to complete a new OSD, will rollback changes Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring osd purge-new osd.3 --yes-i-really-mean-it stderr: purged osd.3 Traceback (most recent call last): File "/usr/sbin/ceph-volume", line 11, in <module> load_entry_point('ceph-volume==1.0.0', 'console_scripts', 'ceph-volume')() File "/usr/lib/python3.6/site-packages/ceph_volume/main.py", line 40, in __init__ self.main(self.argv) File "/usr/lib/python3.6/site-packages/ceph_volume/decorators.py", line 59, in newfunc return f(*a, **kw) File "/usr/lib/python3.6/site-packages/ceph_volume/main.py", line 152, in main terminal.dispatch(self.mapper, subcommand_args) File "/usr/lib/python3.6/site-packages/ceph_volume/terminal.py", line 194, in dispatch instance.main() File "/usr/lib/python3.6/site-packages/ceph_volume/devices/raw/main.py", line 32, in main terminal.dispatch(self.mapper, self.argv) File "/usr/lib/python3.6/site-packages/ceph_volume/terminal.py", line 194, in dispatch instance.main() File "/usr/lib/python3.6/site-packages/ceph_volume/devices/raw/prepare.py", line 169, in main self.safe_prepare(self.args) File "/usr/lib/python3.6/site-packages/ceph_volume/devices/raw/prepare.py", line 91, in safe_prepare self.prepare() File "/usr/lib/python3.6/site-packages/ceph_volume/decorators.py", line 16, in is_root return func(*a, **kw) File "/usr/lib/python3.6/site-packages/ceph_volume/devices/raw/prepare.py", line 134, in prepare tmpfs, File "/usr/lib/python3.6/site-packages/ceph_volume/devices/raw/prepare.py", line 58, in prepare_bluestore prepare_utils.link_block(block, osd_id) File "/usr/lib/python3.6/site-packages/ceph_volume/util/prepare.py", line 370, in link_block _link_device(block_device, 'block', osd_id) File "/usr/lib/python3.6/site-packages/ceph_volume/util/prepare.py", line 336, in _link_device system.chown(device) File "/usr/lib/python3.6/site-packages/ceph_volume/util/system.py", line 123, in chown process.run(['chown', '-R', 'ceph:ceph', path]) File "/usr/lib/python3.6/site-packages/ceph_volume/process.py", line 153, in run raise RuntimeError(msg)