Previously, the ceph-ansible utility was unable to purge a cluster with encrypted OSD devices because the underlying ceph-disk utility was unable to destroy the partition table on an encrypted device by using the "--zap-disk" option. The underlying source code has been fixed allowing ceph-disk to use the "--zap-disk" option on encrypted devices. As a result, ceph-ansible can purge clusters with encrypted OSD devices as expected.
Created attachment 1242381[details]
ansible playbook log
Description of problem:
I have a cluster having 8 colocated encrypted OSD, and 2 encrypted OSD with dedicated journals. purge-cluster.yml is failing on this cluster:
TASK [zap osd disks] ***********************************************************
failed: [magna058] (item=/dev/sdb) => {"changed": true, "cmd": "ceph-disk zap \"/dev/sdb\"", "delta": "0:05:08.418620", "end": "2017-01-19 06:06:24.523658", "failed": true, "item": "/dev/sdb", "rc": 1, "start": "2017-01-19 06:01:16.105038", "stderr": "\u0007Caution: invalid backup GPT header, but valid main header; regenerating\nbackup header from main header.\n\nWarning! Main and backup partition tables differ! Use the 'c' and 'e' options\non the recovery & transformation menu to examine the two tables.\n\nWarning! One or more CRCs don't match. You should repair the disk!\n\nceph-disk: Error: partprobe /dev/sdb failed : Error: Partition(s) 1, 2 on /dev/sdb have been written, but we have been unable to inform the kernel of the change, probably because it/they are in use. As a result, the old partition(s) will remain in use. You should reboot now before making further changes.", "stdout": "\u0007\u0007****************************************************************************\nCaution: Found protective or hybrid MBR and corrupt GPT. Using GPT, but disk\nverification and recovery are STRONGLY recommended.\n****************************************************************************\nWarning: The kernel is still using the old partition table.\nThe new table will be used at the next reboot.\nGPT data structures destroyed! You may now partition the disk using fdisk or\nother utilities.\nCreating new GPT entries.\nWarning: The kernel is still using the old partition table.\nThe new table will be used at the next reboot.\nThe operation has completed successfully.", "stdout_lines": ["\u0007\u0007****************************************************************************", "Caution: Found protective or hybrid MBR and corrupt GPT. Using GPT, but disk", "verification and recovery are STRONGLY recommended.", "****************************************************************************", "Warning: The kernel is still using the old partition table.", "The new table will be used at the next reboot.", "GPT data structures destroyed! You may now partition the disk using fdisk or", "other utilities.", "Creating new GPT entries.", "Warning: The kernel is still using the old partition table.", "The new table will be used at the next reboot.", "The operation has completed successfully."], "warnings": []}
Version-Release number of selected component (if applicable):
ceph-ansible-2.1.3-1.el7scon.noarch
ansible-2.2.1.0-1.el7.noarch
How reproducible:
Always
Steps to Reproduce:
1. A ceph having a mix of colocated and dedicated journal encrypted OSDs
2. Purge the cluster.
Additional info:
The encrypted partitions are not purged:
~]# ceph-disk list
/dev/dm-0 other, unknown
/dev/dm-1 other, xfs
/dev/dm-2 other, unknown
/dev/dm-3 other, xfs
/dev/sda :
/dev/sda1 other, ext4, mounted on /
/dev/sdb :
/dev/sdb1 other, crypto_LUKS
/dev/sdc :
/dev/sdc1 other, crypto_LUKS
/dev/sdd :
/dev/sdd1 ceph journal (dmcrypt LUKS /dev/dm-0)
/dev/sdd2 ceph journal (dmcrypt LUKS /dev/dm-2)
The hosts file:
[mons]
magna028 monitor_address="10.8.128.28"
magna031
magna046
[osds]
magna046 devices="[ '/dev/sdb', '/dev/sdc', '/dev/sdd' ]"
magna052 devices="[ '/dev/sdb', '/dev/sdc', '/dev/sdd' ]"
magna058 devices="[ '/dev/sdb', '/dev/sdc' ]"
magna031 devices="[ '/dev/sdb', '/dev/sdc' ]"
[mdss]
magna061
[rgws]
magna061
[clients]
magna061
Attached the playbook log here.
From looking at the log it seems like the 'zap ceph journal partitions' task was skipped. I've opened an upstream PR that I think might address this: https://github.com/ceph/ceph-ansible/pull/1311
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHSA-2017:0515
Created attachment 1242381 [details] ansible playbook log Description of problem: I have a cluster having 8 colocated encrypted OSD, and 2 encrypted OSD with dedicated journals. purge-cluster.yml is failing on this cluster: TASK [zap osd disks] *********************************************************** failed: [magna058] (item=/dev/sdb) => {"changed": true, "cmd": "ceph-disk zap \"/dev/sdb\"", "delta": "0:05:08.418620", "end": "2017-01-19 06:06:24.523658", "failed": true, "item": "/dev/sdb", "rc": 1, "start": "2017-01-19 06:01:16.105038", "stderr": "\u0007Caution: invalid backup GPT header, but valid main header; regenerating\nbackup header from main header.\n\nWarning! Main and backup partition tables differ! Use the 'c' and 'e' options\non the recovery & transformation menu to examine the two tables.\n\nWarning! One or more CRCs don't match. You should repair the disk!\n\nceph-disk: Error: partprobe /dev/sdb failed : Error: Partition(s) 1, 2 on /dev/sdb have been written, but we have been unable to inform the kernel of the change, probably because it/they are in use. As a result, the old partition(s) will remain in use. You should reboot now before making further changes.", "stdout": "\u0007\u0007****************************************************************************\nCaution: Found protective or hybrid MBR and corrupt GPT. Using GPT, but disk\nverification and recovery are STRONGLY recommended.\n****************************************************************************\nWarning: The kernel is still using the old partition table.\nThe new table will be used at the next reboot.\nGPT data structures destroyed! You may now partition the disk using fdisk or\nother utilities.\nCreating new GPT entries.\nWarning: The kernel is still using the old partition table.\nThe new table will be used at the next reboot.\nThe operation has completed successfully.", "stdout_lines": ["\u0007\u0007****************************************************************************", "Caution: Found protective or hybrid MBR and corrupt GPT. Using GPT, but disk", "verification and recovery are STRONGLY recommended.", "****************************************************************************", "Warning: The kernel is still using the old partition table.", "The new table will be used at the next reboot.", "GPT data structures destroyed! You may now partition the disk using fdisk or", "other utilities.", "Creating new GPT entries.", "Warning: The kernel is still using the old partition table.", "The new table will be used at the next reboot.", "The operation has completed successfully."], "warnings": []} Version-Release number of selected component (if applicable): ceph-ansible-2.1.3-1.el7scon.noarch ansible-2.2.1.0-1.el7.noarch How reproducible: Always Steps to Reproduce: 1. A ceph having a mix of colocated and dedicated journal encrypted OSDs 2. Purge the cluster. Additional info: The encrypted partitions are not purged: ~]# ceph-disk list /dev/dm-0 other, unknown /dev/dm-1 other, xfs /dev/dm-2 other, unknown /dev/dm-3 other, xfs /dev/sda : /dev/sda1 other, ext4, mounted on / /dev/sdb : /dev/sdb1 other, crypto_LUKS /dev/sdc : /dev/sdc1 other, crypto_LUKS /dev/sdd : /dev/sdd1 ceph journal (dmcrypt LUKS /dev/dm-0) /dev/sdd2 ceph journal (dmcrypt LUKS /dev/dm-2) The hosts file: [mons] magna028 monitor_address="10.8.128.28" magna031 magna046 [osds] magna046 devices="[ '/dev/sdb', '/dev/sdc', '/dev/sdd' ]" magna052 devices="[ '/dev/sdb', '/dev/sdc', '/dev/sdd' ]" magna058 devices="[ '/dev/sdb', '/dev/sdc' ]" magna031 devices="[ '/dev/sdb', '/dev/sdc' ]" [mdss] magna061 [rgws] magna061 [clients] magna061 Attached the playbook log here.