Bug 1630430

Summary: OSP 11 -> 12 upgrade with ceph - Switch to containerized ceph daemons fails with more than 99 OSDs
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Matt Flusche <mflusche>
Component: Ceph-AnsibleAssignee: Sébastien Han <shan>
Status: CLOSED ERRATA QA Contact: Vasishta <vashastr>
Severity: high Docs Contact: Bara Ancincova <bancinco>
Priority: high    
Version: 3.0CC: aschoen, ceph-eng-bugs, ceph-qe-bugs, edonnell, gmeno, hnallurv, karatecletus323, kdreyer, nthomas, sankarshan, shan, tchandra, vakulkar, vashastr, yrabl
Target Milestone: z1   
Target Release: 3.1   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: RHEL: ceph-ansible-3.1.9-1.el7cp Ubuntu: ceph-ansible_3.1.9-2redhat1 Doc Type: Bug Fix
Doc Text:
.Ansible successfully converts non-containerized Ceph deployments containing more than 99 OSDs to containers The `ceph-ansible` utility failed to convert bare-metal Ceph Storage clusters that contained more than 99 OSDs to containers because of insufficient regular expressions used in the `switch-from-non-containerized-to-containerized-ceph-daemons.yml` playbook. The playbook has been updated, and converting non-containerized clusters to containerized works as expected.
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-11-09 01:00:34 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1584264    

Description Matt Flusche 2018-09-18 15:46:04 UTC
Description of problem:

https://github.com/ceph/ceph-ansible/issues/3128

Change required to "collect running osds and ceph-disk unit(s)":

systemctl list-units | grep "loaded active" | grep -Eo 'ceph-osd@[0-9]{1,3}.service|ceph-disk@dev-[a-z]{3,4}[0-9]{1}.service'

Comment 4 Matt Wisch 2018-09-18 16:00:42 UTC
{1,3} is just what I used to workaround the issue with 200 OSDs, but {1,} may be better to avoid capping it artificially again.

Comment 5 Sébastien Han 2018-10-12 15:34:41 UTC
Present in https://github.com/ceph/ceph-ansible/releases/tag/v3.0.46

Comment 16 Vasishta 2018-10-29 10:43:46 UTC
Hi,

By looking at fix[1], we can observe that there has been just change in a Regular Expression as part of fix.
I think we can check whether the updated RE can parse numbers above 99, unlike old RE.

There was similar bug - BZ 1612854 , Which was verified in a similar manner.

Regards,
Vasishta Shastry
QE, Ceph

Comment 17 Vasishta 2018-10-29 13:01:13 UTC
As mentioned plan in comment 16, 

Added osd service names with IDs from 1-10 and 100-110 to a test file, tried to parse all names with old RE, only 1-10 were parsed, using new RE we could parse all service names.

$ for i in {0..10};do echo ceph-osd@$i.service >> test_file ;done
$ for i in {100..110};do echo ceph-osd@$i.service >> test_file ;done

$  grep -Eo 'ceph-osd@[0-9]{1,2}.service' test_file
ceph-osd
.
.
ceph-osd


$ grep -Eo 'ceph-osd@[0-9]+.service' test_file
ceph-osd
.
.
ceph-osd
ceph-osd
.
.
ceph-osd


We are planning to move this BZ to VERIFIED state on 30 OCT morning (IST).
Please let us know if there are any concerns/suggestions.

Regards,
Vasishta Shastry
QE, Ceph

Comment 20 Sébastien Han 2018-10-31 10:45:04 UTC
lgtm Bara thanks!

Comment 22 errata-xmlrpc 2018-11-09 01:00:34 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:3530