1414647 – [ceph-ansible]: purge cluster fails to zap OSD disks having encrypted OSDs

Bug 1414647 - [ceph-ansible]: purge cluster fails to zap OSD disks having encrypted OSDs

Summary: [ceph-ansible]: purge cluster fails to zap OSD disks having encrypted OSDs

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Storage Console
Classification:	Red Hat Storage
Component:	ceph-ansible
Sub Component:
Version:	2
Hardware:	Unspecified
OS:	Linux
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	2
Assignee:	Sébastien Han
QA Contact:	Tejas
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-01-19 06:48 UTC by Tejas
Modified:	2017-03-14 15:53 UTC (History)
CC List:	11 users (show)
Fixed In Version:	ceph-ansible-2.1.9-1.el7scon
Doc Type:	Bug Fix
Doc Text:	Previously, the ceph-ansible utility was unable to purge a cluster with encrypted OSD devices because the underlying ceph-disk utility was unable to destroy the partition table on an encrypted device by using the "--zap-disk" option. The underlying source code has been fixed allowing ceph-disk to use the "--zap-disk" option on encrypted devices. As a result, ceph-ansible can purge clusters with encrypted OSD devices as expected.
Clone Of:
Environment:
Last Closed:	2017-03-14 15:53:53 UTC
Embargoed:

Attachments	(Terms of Use)
ansible playbook log (30.43 KB, text/plain) 2017-01-19 06:48 UTC, Tejas	no flags	Details
ansible playbook log (11.16 KB, text/plain) 2017-02-13 07:04 UTC, Tejas	no flags	Details
Show Obsolete (1) View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2017:0515	0	normal	SHIPPED_LIVE	Important: ansible and ceph-ansible security, bug fix, and enhancement update	2017-04-18 21:12:31 UTC

Description Tejas 2017-01-19 06:48:16 UTC

Created attachment 1242381 [details]
ansible playbook  log

Description of problem:
           I have a cluster having 8 colocated encrypted OSD, and 2 encrypted OSD with dedicated journals. purge-cluster.yml is failing on this cluster:

TASK [zap osd disks] ***********************************************************

failed: [magna058] (item=/dev/sdb) => {"changed": true, "cmd": "ceph-disk zap \"/dev/sdb\"", "delta": "0:05:08.418620", "end": "2017-01-19 06:06:24.523658", "failed": true, "item": "/dev/sdb", "rc": 1, "start": "2017-01-19 06:01:16.105038", "stderr": "\u0007Caution: invalid backup GPT header, but valid main header; regenerating\nbackup header from main header.\n\nWarning! Main and backup partition tables differ! Use the 'c' and 'e' options\non the recovery & transformation menu to examine the two tables.\n\nWarning! One or more CRCs don't match. You should repair the disk!\n\nceph-disk: Error: partprobe /dev/sdb failed : Error: Partition(s) 1, 2 on /dev/sdb have been written, but we have been unable to inform the kernel of the change, probably because it/they are in use.  As a result, the old partition(s) will remain in use.  You should reboot now before making further changes.", "stdout": "\u0007\u0007****************************************************************************\nCaution: Found protective or hybrid MBR and corrupt GPT. Using GPT, but disk\nverification and recovery are STRONGLY recommended.\n****************************************************************************\nWarning: The kernel is still using the old partition table.\nThe new table will be used at the next reboot.\nGPT data structures destroyed! You may now partition the disk using fdisk or\nother utilities.\nCreating new GPT entries.\nWarning: The kernel is still using the old partition table.\nThe new table will be used at the next reboot.\nThe operation has completed successfully.", "stdout_lines": ["\u0007\u0007****************************************************************************", "Caution: Found protective or hybrid MBR and corrupt GPT. Using GPT, but disk", "verification and recovery are STRONGLY recommended.", "****************************************************************************", "Warning: The kernel is still using the old partition table.", "The new table will be used at the next reboot.", "GPT data structures destroyed! You may now partition the disk using fdisk or", "other utilities.", "Creating new GPT entries.", "Warning: The kernel is still using the old partition table.", "The new table will be used at the next reboot.", "The operation has completed successfully."], "warnings": []}

Version-Release number of selected component (if applicable):
ceph-ansible-2.1.3-1.el7scon.noarch
ansible-2.2.1.0-1.el7.noarch

How reproducible:
Always

Steps to Reproduce:
1. A ceph having a mix of colocated and dedicated journal encrypted OSDs
2. Purge the cluster.



Additional info:

The encrypted partitions are not purged:

 ~]# ceph-disk list
/dev/dm-0 other, unknown
/dev/dm-1 other, xfs
/dev/dm-2 other, unknown
/dev/dm-3 other, xfs
/dev/sda :
 /dev/sda1 other, ext4, mounted on /
/dev/sdb :
 /dev/sdb1 other, crypto_LUKS
/dev/sdc :
 /dev/sdc1 other, crypto_LUKS
/dev/sdd :
 /dev/sdd1 ceph journal (dmcrypt LUKS /dev/dm-0)
 /dev/sdd2 ceph journal (dmcrypt LUKS /dev/dm-2)


The hosts file:

[mons]
magna028 monitor_address="10.8.128.28"
magna031
magna046

[osds]
magna046 devices="[ '/dev/sdb', '/dev/sdc', '/dev/sdd' ]"
magna052 devices="[ '/dev/sdb', '/dev/sdc', '/dev/sdd' ]"
magna058 devices="[ '/dev/sdb', '/dev/sdc' ]"
magna031 devices="[ '/dev/sdb', '/dev/sdc' ]"

[mdss]
magna061

[rgws]
magna061

[clients]
magna061



Attached the playbook log here.

Comment 3 seb 2017-01-19 14:07:05 UTC

I can almost reproduce this, this is fix for ceph-disk, we can workaround this with ceph-ansible, I'm working on a fix.

Comment 4 seb 2017-01-19 14:30:55 UTC

Fix part of that PR: https://github.com/ceph/ceph-ansible/pull/1235

Comment 6 Ken Dreyer (Red Hat) 2017-02-07 17:24:10 UTC

Sebastien, what is the next step with this BZ?

I'm guessing QE should re-test with ceph-ansible-2.1.6-1.el7scon?

Comment 7 seb 2017-02-07 21:19:28 UTC

Correct ken.

Comment 9 Tejas 2017-02-13 07:04:51 UTC

Created attachment 1249770 [details]
ansible playbook  log

Comment 11 seb 2017-02-13 10:09:51 UTC

Can you share the ansible play of the purge playbook?

Comment 12 Tejas 2017-02-13 10:12:28 UTC

Its in the attachment attachment 1249770 [details]

Comment 13 seb 2017-02-15 14:31:07 UTC

Can you run ansible with -vvvv and paste the debug in a file log?
Thanks!

Comment 14 Andrew Schoen 2017-02-15 15:56:41 UTC

From looking at the log it seems like the 'zap ceph journal partitions' task was skipped. I've opened an upstream PR that I think might address this: https://github.com/ceph/ceph-ansible/pull/1311

Comment 15 Andrew Schoen 2017-02-15 19:18:29 UTC

The PR (#1311) has been merged and backported to stable-2.1, v2.1.9 of ceph-ansible will contain the fix for this issue.

Comment 18 Tejas 2017-02-17 11:10:37 UTC

Verified in version:
ceph-ansible-2.1.9-1.el7scon.noarch

Comment 20 errata-xmlrpc 2017-03-14 15:53:53 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:0515

Note You need to log in before you can comment on or make changes to this bug.