Bug 1687828 - [cee/sd][ceph-ansible] rolling-update.yml does not restart nvme osds running in containers
Summary: [cee/sd][ceph-ansible] rolling-update.yml does not restart nvme osds running ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat
Component: Ceph-Ansible
Version: 3.2
Hardware: Unspecified
OS: Unspecified
urgent
medium
Target Milestone: z2
: 3.2
Assignee: Dimitri Savineau
QA Contact: Vasishta
Bara Ancincova
URL:
Whiteboard:
Depends On:
Blocks: 1629656
TreeView+ depends on / blocked
 
Reported: 2019-03-12 13:05 UTC by Tomas Petr
Modified: 2019-08-08 03:01 UTC (History)
12 users (show)

Fixed In Version: RHEL: ceph-ansible-3.2.11-1.el7cp Ubuntu: ceph-ansible_3.2.11-2redhat1
Doc Type: Bug Fix
Doc Text:
.The `rolling-upgrade.yml` playbook now restarts all OSDs as expected Due to a bug in a regular expression, the `rolling-upgrade.yml` playbook did not restart OSDs that used Non-volatile Memory Express devices. The regular expression has been fixed, and `rolling-upgrade.yml` now restarts all OSDs as expected.
Clone Of:
Environment:
Last Closed: 2019-04-30 15:57:07 UTC


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2019:0911 None None None 2019-04-30 15:57:22 UTC
Github ceph ceph-ansible pull 3708 None None None 2019-03-12 15:26:29 UTC
Github ceph ceph-ansible pull 3746 None None None 2019-03-26 19:52:16 UTC
Red Hat Knowledge Base (Solution) 3981231 Troubleshoot None Ceph - ceph-ansible rolling-update.yml does not restart nvme osds running in containers 2019-03-12 13:31:50 UTC

Description Tomas Petr 2019-03-12 13:05:23 UTC
Description of problem:
following pre-task in the rolling-upgrade.yml playbook fails with NVME devices, where the name of the service is like ceph-osd@nvme0n1.service:
349     - name: get osd unit names - container
350       shell: systemctl list-units | grep -E "loaded * active" | grep -oE "ceph-osd@([0-9]{1,}|[a-z]+).service"

https://github.com/ceph/ceph-ansible/blob/v3.2.8/infrastructure-playbooks/rolling_update.yml#L349
-----------
$ systemctl list-units | grep -E "loaded * active" | grep -oE "ceph-osd@([0-9]{1,}|[a-z]+).service"
$ echo $?
1
-----------


there are two ways to modify it:
A) modified the line to be able to match NVME devices directly:
350       shell: systemctl list-units | grep -E "loaded * active" | grep -oE "ceph-osd@([0-9]{1,}|[a-z]+|nvme.*).service"
$ systemctl list-units | grep -E "loaded * active" | grep -oE "ceph-osd@([0-9]{1,}|[a-z]+|nvme.*).service"
ceph-osd@nvme0n1.service
ceph-osd@nvme1n1.service
ceph-osd@nvme2n1.service
ceph-osd@nvme3n1.service
ceph-osd@nvme4n1.service
ceph-osd@nvme5n1.service
ceph-osd@nvme6n1.service
ceph-osd@nvme7n1.service
ceph-osd@nvme8n1.service
ceph-osd@nvme9n1.service

B) Edit the reg exp for named containers after devices:
350       shell: systemctl list-units | grep -E "loaded * active" | grep -oE "ceph-osd@([0-9]{1,}|[a-z0-9]+).service"

"[a-z0-9]+" should work for any combination of characters "a-z" and "0-9" of length 1 character and more, that should include nvmeXn1 too.

$ systemctl list-units | grep -E "loaded * active" | grep -oE "ceph-osd@([0-9]{1,}|[a-z0-9]+).service"
ceph-osd@nvme0n1.service
ceph-osd@nvme1n1.service
ceph-osd@nvme2n1.service
ceph-osd@nvme3n1.service
ceph-osd@nvme4n1.service
ceph-osd@nvme5n1.service
ceph-osd@nvme6n1.service
ceph-osd@nvme7n1.service
ceph-osd@nvme8n1.service
ceph-osd@nvme9n1.service


Version-Release number of selected component (if applicable):
Ceph-ansible-3.28

How reproducible:
always

Steps to Reproduce:
1. deploy containerized ceph with nvme devices
2. run  rolling-upgrade.yml playbook
3. fails for nvme named sevices

Actual results:
fail

Expected results:
not fail

Additional info:

Comment 1 Tomas Petr 2019-03-12 13:06:06 UTC
correct ceph-ansible version is 3.2.8-1

Comment 12 errata-xmlrpc 2019-04-30 15:57:07 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2019:0911


Note You need to log in before you can comment on or make changes to this bug.