Bug 1541016

Summary: Ceph installation fails when using NVME disk
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Vikhyat Umrao <vumrao>
Component: Ceph-AnsibleAssignee: Sébastien Han <shan>
Status: CLOSED ERRATA QA Contact: Vasishta <vashastr>
Severity: high Docs Contact:
Priority: high    
Version: 3.0CC: adeza, aschoen, ceph-eng-bugs, ceph-qe-bugs, edonnell, gabrioux, gmeno, hnallurv, jtudelag, kdreyer, mhackett, nlevine, nthomas, sankarshan, shan, tserlin, vashastr
Target Milestone: z5Keywords: TestOnly
Target Release: 3.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: RHEL: ceph-ansible-3.0.23-1.el7cp Ubuntu: ceph-ansible_3.0.23-2redhat1 Doc Type: Bug Fix
Doc Text:
Ceph installation no longer fails when using NVME disk.
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-08-09 18:27:11 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1553254    

Description Vikhyat Umrao 2018-02-01 14:26:03 UTC
Description of problem:
Ceph installation fails when using NVME disk

The ceph-ansible fails to find the path to the NVME partition. It does not form the NVME partition path correctly. It is missing the "p1" suffix.

failed: [osd-node-1] (item=/dev/nvme0n1) => {"changed": false, "cmd": ["ceph-disk", "activate", "/dev/nvme0n11"], "delta": "0:00:00.060679", "end": "2018-01-31 11:04:07.468680", "failed": true, "item": "/dev/nvme0n1", "msg": "non-zero return code", "rc": 1, "start": "2018-01-31 11:04:07.408001", "stderr": "ceph-disk: Error: /dev/nvme0n11 does not exist", "std
err_lines": ["ceph-disk: Error: /dev/nvme0n11 does not exist"], "stdout": "", "stdout_lines": []}  

The task failing, belongs to this role: ceph-osd/tasks/activate_osds.yml.
The task name is: "- name: activate osd(s) when device is a disk"

The customer patched the installer in order to work, but It is a workaround.
This is how we changed the regexp of the Ansible task:

- name: activate osd(s) when device is a disk                                          
#command: ceph-disk activate  regex_replace('^(\/dev\/cciss\/c[0-9]{1}d[0-9]{1})$', '\\1p') 1                                                                    

command: ceph-disk activate " item p1"                                           
with_items:                              
 - "unique "               
  changed_when: false                      
  register: activate_osd_disk              
  when:                                    
	- not osd_auto_discovery               
	- not dmcrypt

Version-Release number of selected component (if applicable):
Red Hat Ceph Storage 3.0

How reproducible:
Always at the customer site.

Comment 3 Vikhyat Umrao 2018-02-01 14:34:58 UTC
This bug is for a NON-containerized installation. For containerized we have another bug - https://bugzilla.redhat.com/show_bug.cgi?id=1537980

Comment 5 Guillaume Abrioux 2018-02-16 07:33:09 UTC
The fix is included downstream from v3.0.23

Comment 16 Vasishta 2018-03-28 06:56:05 UTC
Worked fine with ceph-ansible-3.0.28-1.el7cp.noarch.

Moving to VERIFIED state.

Comment 23 Neil Levine 2018-06-08 16:38:55 UTC
Target release for this ticket needs fixing.

Comment 25 Vasishta 2018-07-16 05:13:05 UTC
Hi Ken, 

Yes, VERIFIED is the right status for the BZ.


Regards,
Vasishta Shastry
QE, Ceph

Comment 27 errata-xmlrpc 2018-08-09 18:27:11 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2375