Bug 1334034

Summary: OSD creation failure with physical disks
Product: [Red Hat Storage] Red Hat Storage Console Reporter: Nishanth Thomas <nthomas>
Component: ceph-ansibleAssignee: Alfredo Deza <adeza>
Status: CLOSED ERRATA QA Contact: Daniel Horák <dahorak>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 2CC: adeza, aschoen, ceph-eng-bugs, dahorak, kdreyer, mkudlej, nthomas, sankarshan, sds-qe-bugs
Target Milestone: ---   
Target Release: 2   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: ceph-ansible-1.0.5-8.el7scon Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-08-23 19:50:09 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Nishanth Thomas 2016-05-07 11:31:48 UTC
OSD creation failed for disks with the below error:

TASK: [ceph-osd | fix partitions gpt header or labels of the osd disks] ******* 
failed: [dhcp-126-123.lab.eng.brq.redhat.com] => (item=[{'changed': False, 'end': '2016-05-05 09:45:15.887391', 'failed': False, 'stdout': u'', 'cmd': 'parted --script /dev/vde print > /dev/null 2>&1', 'rc': 1, 'start': '2016-05-05 09:45:15.876054', 'item': u'/dev/vde', 'warnings': [], 'delta': '0:00:00.011337', 'invocation': {'module_name': u'shell', 'module_complex_args': {}, 'module_args': u'parted --script /dev/vde print > /dev/null 2>&1'}, 'stdout_lines': [], 'failed_when_result': False, 'stderr': u''}, u'/dev/vde']) => {"changed": false, "cmd": "sgdisk --zap-all --clear --mbrtogpt -g -- /dev/vde", "delta": "0:00:01.245651", "end": "2016-05-05 09:45:19.166575", "item": [{"changed": false, "cmd": "parted --script /dev/vde print > /dev/null 2>&1", "delta": "0:00:00.011337", "end": "2016-05-05 09:45:15.887391", "failed": false, "failed_when_result": false, "invocation": {"module_args": "parted --script /dev/vde print > /dev/null 2>&1", "module_complex_args": {}, "module_name": "shell"}, "item": "/dev/vde", "rc": 1, "start": "2016-05-05 09:45:15.876054", "stderr": "", "stdout": "", "stdout_lines": [], "warnings": []}, "/dev/vde"], "rc": 3, "start": "2016-05-05 09:45:17.920924", "warnings": []}
stderr: Caution: invalid main GPT header, but valid backup; regenerating main header
from backup!

Warning! Main partition table CRC mismatch! Loaded backup partition table
instead of main partition table!

Warning! One or more CRCs don't match. You should repair the disk!

Invalid partition data!
stdout: Caution! After loading partitions, the CRC doesn't check out!
GPT data structures destroyed! You may now partition the disk using fdisk or
other utilities.
Information: Creating fresh partition table; will override earlier problems!
Non-GPT disk; not saving changes. Use -g to override.

FATAL: all hosts have already failed -- aborting

================================
A bunch of OSDs failed with the error and all of then were physiscal disks

Comment 2 Alfredo Deza 2016-05-09 10:46:54 UTC
What happens if you re-run the OSD configure for that server? (I added the upstream ticket for ceph-ansible)

Comment 3 Alfredo Deza 2016-05-09 15:57:57 UTC
Pull request merged upstream https://github.com/ceph/ceph-ansible/pull/766

Comment 4 Nishanth Thomas 2016-05-10 05:05:56 UTC
Daniel,

Can you provide this info?

Comment 5 Daniel Horák 2016-05-10 07:22:15 UTC
Nishanth,

I'm not sure, how I can check it? Is it possible to relaunch the OSD configuration from USM web UI or API?

Comment 6 monti lawrence 2016-05-10 20:08:20 UTC
Alfredo will get changes downstream.

Comment 10 Daniel Horák 2016-08-02 15:29:38 UTC
I've retested the initial scenario with not properly cleaned disks and it created all the expected OSDs.

Tested on:
USM Server/ceph-installer server (RHEL 7.2):
  ceph-ansible-1.0.5-31.el7scon.noarch
  ceph-installer-1.0.14-1.el7scon.noarch
  rhscon-ceph-0.0.39-1.el7scon.x86_64
  rhscon-core-0.0.39-1.el7scon.x86_64
  rhscon-core-selinux-0.0.39-1.el7scon.noarch
  rhscon-ui-0.0.51-1.el7scon.noarch
  salt-2015.5.5-1.el7.noarch
  salt-master-2015.5.5-1.el7.noarch
  salt-selinux-0.0.39-1.el7scon.noarch

  Ceph node (RHEL 7.2):
  ceph-base-10.2.2-32.el7cp.x86_64
  ceph-common-10.2.2-32.el7cp.x86_64
  ceph-osd-10.2.2-32.el7cp.x86_64
  ceph-selinux-10.2.2-32.el7cp.x86_64
  libcephfs1-10.2.2-32.el7cp.x86_64
  python-cephfs-10.2.2-32.el7cp.x86_64
  rhscon-agent-0.0.16-1.el7scon.noarch
  rhscon-core-selinux-0.0.39-1.el7scon.noarch
  salt-2015.5.5-1.el7.noarch
  salt-minion-2015.5.5-1.el7.noarch
  salt-selinux-0.0.39-1.el7scon.noarch

>> VERIFIED

Comment 12 errata-xmlrpc 2016-08-23 19:50:09 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2016:1754