1334034 – OSD creation failure with physical disks

Bug 1334034 - OSD creation failure with physical disks

Summary: OSD creation failure with physical disks

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Storage Console
Classification:	Red Hat Storage
Component:	ceph-ansible
Sub Component:
Version:	2
Hardware:	x86_64
OS:	Linux
Priority:	urgent
Severity:	urgent
Target Milestone:	---
Target Release:	2
Assignee:	Alfredo Deza
QA Contact:	Daniel Horák
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-05-07 11:31 UTC by Nishanth Thomas
Modified:	2016-08-23 19:50 UTC (History)
CC List:	9 users (show)
Fixed In Version:	ceph-ansible-1.0.5-8.el7scon
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2016-08-23 19:50:09 UTC
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	ceph ceph-ansible issues 759	0	None	None	None	Never
Red Hat Product Errata	RHEA-2016:1754	0	normal	SHIPPED_LIVE	New packages: Red Hat Storage Console 2.0	2017-04-18 19:09:06 UTC

Description Nishanth Thomas 2016-05-07 11:31:48 UTC

OSD creation failed for disks with the below error:

TASK: [ceph-osd | fix partitions gpt header or labels of the osd disks] ******* 
failed: [dhcp-126-123.lab.eng.brq.redhat.com] => (item=[{'changed': False, 'end': '2016-05-05 09:45:15.887391', 'failed': False, 'stdout': u'', 'cmd': 'parted --script /dev/vde print > /dev/null 2>&1', 'rc': 1, 'start': '2016-05-05 09:45:15.876054', 'item': u'/dev/vde', 'warnings': [], 'delta': '0:00:00.011337', 'invocation': {'module_name': u'shell', 'module_complex_args': {}, 'module_args': u'parted --script /dev/vde print > /dev/null 2>&1'}, 'stdout_lines': [], 'failed_when_result': False, 'stderr': u''}, u'/dev/vde']) => {"changed": false, "cmd": "sgdisk --zap-all --clear --mbrtogpt -g -- /dev/vde", "delta": "0:00:01.245651", "end": "2016-05-05 09:45:19.166575", "item": [{"changed": false, "cmd": "parted --script /dev/vde print > /dev/null 2>&1", "delta": "0:00:00.011337", "end": "2016-05-05 09:45:15.887391", "failed": false, "failed_when_result": false, "invocation": {"module_args": "parted --script /dev/vde print > /dev/null 2>&1", "module_complex_args": {}, "module_name": "shell"}, "item": "/dev/vde", "rc": 1, "start": "2016-05-05 09:45:15.876054", "stderr": "", "stdout": "", "stdout_lines": [], "warnings": []}, "/dev/vde"], "rc": 3, "start": "2016-05-05 09:45:17.920924", "warnings": []}
stderr: Caution: invalid main GPT header, but valid backup; regenerating main header
from backup!

Warning! Main partition table CRC mismatch! Loaded backup partition table
instead of main partition table!

Warning! One or more CRCs don't match. You should repair the disk!

Invalid partition data!
stdout: Caution! After loading partitions, the CRC doesn't check out!
GPT data structures destroyed! You may now partition the disk using fdisk or
other utilities.
Information: Creating fresh partition table; will override earlier problems!
Non-GPT disk; not saving changes. Use -g to override.

FATAL: all hosts have already failed -- aborting

================================
A bunch of OSDs failed with the error and all of then were physiscal disks

Comment 2 Alfredo Deza 2016-05-09 10:46:54 UTC

What happens if you re-run the OSD configure for that server? (I added the upstream ticket for ceph-ansible)

Comment 3 Alfredo Deza 2016-05-09 15:57:57 UTC

Pull request merged upstream https://github.com/ceph/ceph-ansible/pull/766

Comment 4 Nishanth Thomas 2016-05-10 05:05:56 UTC

Daniel,

Can you provide this info?

Comment 5 Daniel Horák 2016-05-10 07:22:15 UTC

Nishanth,

I'm not sure, how I can check it? Is it possible to relaunch the OSD configuration from USM web UI or API?

Comment 6 monti lawrence 2016-05-10 20:08:20 UTC

Alfredo will get changes downstream.

Comment 10 Daniel Horák 2016-08-02 15:29:38 UTC

I've retested the initial scenario with not properly cleaned disks and it created all the expected OSDs.

Tested on:
USM Server/ceph-installer server (RHEL 7.2):
  ceph-ansible-1.0.5-31.el7scon.noarch
  ceph-installer-1.0.14-1.el7scon.noarch
  rhscon-ceph-0.0.39-1.el7scon.x86_64
  rhscon-core-0.0.39-1.el7scon.x86_64
  rhscon-core-selinux-0.0.39-1.el7scon.noarch
  rhscon-ui-0.0.51-1.el7scon.noarch
  salt-2015.5.5-1.el7.noarch
  salt-master-2015.5.5-1.el7.noarch
  salt-selinux-0.0.39-1.el7scon.noarch

  Ceph node (RHEL 7.2):
  ceph-base-10.2.2-32.el7cp.x86_64
  ceph-common-10.2.2-32.el7cp.x86_64
  ceph-osd-10.2.2-32.el7cp.x86_64
  ceph-selinux-10.2.2-32.el7cp.x86_64
  libcephfs1-10.2.2-32.el7cp.x86_64
  python-cephfs-10.2.2-32.el7cp.x86_64
  rhscon-agent-0.0.16-1.el7scon.noarch
  rhscon-core-selinux-0.0.39-1.el7scon.noarch
  salt-2015.5.5-1.el7.noarch
  salt-minion-2015.5.5-1.el7.noarch
  salt-selinux-0.0.39-1.el7scon.noarch

>> VERIFIED

Comment 12 errata-xmlrpc 2016-08-23 19:50:09 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2016:1754

Note You need to log in before you can comment on or make changes to this bug.