Bug 1687828

Summary: [cee/sd][ceph-ansible] rolling-update.yml does not restart nvme osds running in containers
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Tomas Petr <tpetr>
Component: Ceph-AnsibleAssignee: Dimitri Savineau <dsavinea>
Status: CLOSED ERRATA QA Contact: Vasishta <vashastr>
Severity: medium Docs Contact: Bara Ancincova <bancinco>
Priority: urgent    
Version: 3.2CC: anharris, aschoen, ceph-eng-bugs, ceph-qe-bugs, dsavinea, gabrioux, gmeno, nthomas, sankarshan, tchandra, tserlin, ukurundw
Target Milestone: z2   
Target Release: 3.2   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: RHEL: ceph-ansible-3.2.11-1.el7cp Ubuntu: ceph-ansible_3.2.11-2redhat1 Doc Type: Bug Fix
Doc Text:
.The `rolling-upgrade.yml` playbook now restarts all OSDs as expected Due to a bug in a regular expression, the `rolling-upgrade.yml` playbook did not restart OSDs that used Non-volatile Memory Express devices. The regular expression has been fixed, and `rolling-upgrade.yml` now restarts all OSDs as expected.
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-04-30 15:57:07 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1629656    

Description Tomas Petr 2019-03-12 13:05:23 UTC
Description of problem:
following pre-task in the rolling-upgrade.yml playbook fails with NVME devices, where the name of the service is like ceph-osd:
349     - name: get osd unit names - container
350       shell: systemctl list-units | grep -E "loaded * active" | grep -oE "ceph-osd@([0-9]{1,}|[a-z]+).service"

https://github.com/ceph/ceph-ansible/blob/v3.2.8/infrastructure-playbooks/rolling_update.yml#L349
-----------
$ systemctl list-units | grep -E "loaded * active" | grep -oE "ceph-osd@([0-9]{1,}|[a-z]+).service"
$ echo $?
1
-----------


there are two ways to modify it:
A) modified the line to be able to match NVME devices directly:
350       shell: systemctl list-units | grep -E "loaded * active" | grep -oE "ceph-osd@([0-9]{1,}|[a-z]+|nvme.*).service"
$ systemctl list-units | grep -E "loaded * active" | grep -oE "ceph-osd@([0-9]{1,}|[a-z]+|nvme.*).service"
ceph-osd
ceph-osd
ceph-osd
ceph-osd
ceph-osd
ceph-osd
ceph-osd
ceph-osd
ceph-osd
ceph-osd

B) Edit the reg exp for named containers after devices:
350       shell: systemctl list-units | grep -E "loaded * active" | grep -oE "ceph-osd@([0-9]{1,}|[a-z0-9]+).service"

"[a-z0-9]+" should work for any combination of characters "a-z" and "0-9" of length 1 character and more, that should include nvmeXn1 too.

$ systemctl list-units | grep -E "loaded * active" | grep -oE "ceph-osd@([0-9]{1,}|[a-z0-9]+).service"
ceph-osd
ceph-osd
ceph-osd
ceph-osd
ceph-osd
ceph-osd
ceph-osd
ceph-osd
ceph-osd
ceph-osd


Version-Release number of selected component (if applicable):
Ceph-ansible-3.28

How reproducible:
always

Steps to Reproduce:
1. deploy containerized ceph with nvme devices
2. run  rolling-upgrade.yml playbook
3. fails for nvme named sevices

Actual results:
fail

Expected results:
not fail

Additional info:

Comment 1 Tomas Petr 2019-03-12 13:06:06 UTC
correct ceph-ansible version is 3.2.8-1

Comment 12 errata-xmlrpc 2019-04-30 15:57:07 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2019:0911