Bug 1687828

Summary:	[cee/sd][ceph-ansible] rolling-update.yml does not restart nvme osds running in containers
Product:	[Red Hat Storage] Red Hat Ceph Storage	Reporter:	Tomas Petr <tpetr>
Component:	Ceph-Ansible	Assignee:	Dimitri Savineau <dsavinea>
Status:	CLOSED ERRATA	QA Contact:	Vasishta <vashastr>
Severity:	medium	Docs Contact:	Bara Ancincova <bancinco>
Priority:	urgent
Version:	3.2	CC:	anharris, aschoen, ceph-eng-bugs, ceph-qe-bugs, dsavinea, gabrioux, gmeno, nthomas, sankarshan, tchandra, tserlin, ukurundw
Target Milestone:	z2
Target Release:	3.2
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	RHEL: ceph-ansible-3.2.11-1.el7cp Ubuntu: ceph-ansible_3.2.11-2redhat1	Doc Type:	Bug Fix
Doc Text:	.The `rolling-upgrade.yml` playbook now restarts all OSDs as expected Due to a bug in a regular expression, the `rolling-upgrade.yml` playbook did not restart OSDs that used Non-volatile Memory Express devices. The regular expression has been fixed, and `rolling-upgrade.yml` now restarts all OSDs as expected.	Story Points:	---
Clone Of:		Environment:
Last Closed:	2019-04-30 15:57:07 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1629656

Description Tomas Petr 2019-03-12 13:05:23 UTC

Description of problem:
following pre-task in the rolling-upgrade.yml playbook fails with NVME devices, where the name of the service is like ceph-osd:
349     - name: get osd unit names - container
350       shell: systemctl list-units | grep -E "loaded * active" | grep -oE "ceph-osd@([0-9]{1,}|[a-z]+).service"

https://github.com/ceph/ceph-ansible/blob/v3.2.8/infrastructure-playbooks/rolling_update.yml#L349
-----------
$ systemctl list-units | grep -E "loaded * active" | grep -oE "ceph-osd@([0-9]{1,}|[a-z]+).service"
$ echo $?
1
-----------


there are two ways to modify it:
A) modified the line to be able to match NVME devices directly:
350       shell: systemctl list-units | grep -E "loaded * active" | grep -oE "ceph-osd@([0-9]{1,}|[a-z]+|nvme.*).service"
$ systemctl list-units | grep -E "loaded * active" | grep -oE "ceph-osd@([0-9]{1,}|[a-z]+|nvme.*).service"
ceph-osd
ceph-osd
ceph-osd
ceph-osd
ceph-osd
ceph-osd
ceph-osd
ceph-osd
ceph-osd
ceph-osd

B) Edit the reg exp for named containers after devices:
350       shell: systemctl list-units | grep -E "loaded * active" | grep -oE "ceph-osd@([0-9]{1,}|[a-z0-9]+).service"

"[a-z0-9]+" should work for any combination of characters "a-z" and "0-9" of length 1 character and more, that should include nvmeXn1 too.

$ systemctl list-units | grep -E "loaded * active" | grep -oE "ceph-osd@([0-9]{1,}|[a-z0-9]+).service"
ceph-osd
ceph-osd
ceph-osd
ceph-osd
ceph-osd
ceph-osd
ceph-osd
ceph-osd
ceph-osd
ceph-osd


Version-Release number of selected component (if applicable):
Ceph-ansible-3.28

How reproducible:
always

Steps to Reproduce:
1. deploy containerized ceph with nvme devices
2. run  rolling-upgrade.yml playbook
3. fails for nvme named sevices

Actual results:
fail

Expected results:
not fail

Additional info:

Comment 1 Tomas Petr 2019-03-12 13:06:06 UTC

correct ceph-ansible version is 3.2.8-1

Comment 12 errata-xmlrpc 2019-04-30 15:57:07 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2019:0911