Bug 1624962

Summary: [RFE] Set flag noup during scaleout and unset it when all new OSD's daemons are running
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Vikhyat Umrao <vumrao>
Component: Ceph-AnsibleAssignee: Sébastien Han <shan>
Status: CLOSED ERRATA QA Contact: Vasishta <vashastr>
Severity: high Docs Contact: Bara Ancincova <bancinco>
Priority: high    
Version: 3.0CC: anharris, aschoen, ceph-eng-bugs, gabrioux, gmeno, hnallurv, mamccoma, nthomas, sankarshan, shan, tnielsen, tserlin
Target Milestone: rcKeywords: FutureFeature
Target Release: 3.2   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: RHEL: ceph-ansible-3.2.0-0.1.beta6.el7cp Ubuntu: ceph-ansible_3.2.0~beta6-2redhat1 Doc Type: Enhancement
Doc Text:
.The `noup` flag is now set before creating OSDs to distribute PGs properly The `ceph-ansible` utility now sets the `noup` flag before creating OSDs to prevent them from changing their status to `up` before all OSDs are created. Previously, if the flag was not set, placement groups (PGs) were created on only one OSD and got stuck in creation or activation. With this update, the `noup` flag is set before creating OSDs and unset after the creation is complete. As a result, PGs are distributed properly among all OSDs.
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-01-03 19:01:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1651060    
Bug Blocks: 1629656    

Description Vikhyat Umrao 2018-09-03 17:18:51 UTC
Description of problem:
[RFE] Set flag noup during scaleout and unset it when all new OSD's daemons are running

Version-Release number of selected component (if applicable):
RHCS 3

How reproducible:
Ansible adds OSD's one by one and if this noup flag is not set it causes a lot of PG's to get created to only one OSD and pgs get stuck in creation/activation.

This feature will help all the new OSD's to come online and then when noup would be unset then pg distribution will happen properly.

Comment 1 Vikhyat Umrao 2018-09-03 17:21:20 UTC
Why PG get stuck in activating you can read this KCS - https://access.redhat.com/solutions/3526531 This is a new feature in Luminous(RHCS 3) to avoid a large number of pgs to get mapped to one OSD.

Comment 3 mamccoma 2018-09-21 19:01:56 UTC
*** Test from lab environment with "noup" flag ***:

Before any changes in my environment (baseline):

[root@vm250-137 ~]# ceph -s
  cluster:
    id:     256b60c8-8d8e-47bb-9dfe-492055072a7e
    health: HEALTH_WARN
            application not enabled on 1 pool(s)
            1/3 mons down, quorum vm250-8,vm250-137
 
  services:
    mon:         3 daemons, quorum vm250-8,vm250-137, out of quorum: vm250-194
    mgr:         vm250-137(active), standbys: vm250-8
    osd:         9 osds: 9 up, 9 in
    rgw:         2 daemons active
    tcmu-runner: 2 daemons active
 
  data:
    pools:   9 pools, 576 pgs
    objects: 231 objects, 3873 bytes
    usage:   1112 MB used, 87876 MB / 88988 MB avail
    pgs:     576 active+clean
 
  io:
    client:   170 B/s rd, 0 op/s rd, 0 op/s wr


[root@vm250-137 ~]# ceph osd tree
ID CLASS WEIGHT  TYPE NAME          STATUS REWEIGHT PRI-AFF 
-1       0.08464 root default                               
-5       0.02888     host vm250-248                         
 1   hdd 0.00929         osd.1          up  1.00000 1.00000 
 3   hdd 0.00980         osd.3          up  1.00000 1.00000 
 6   hdd 0.00980         osd.6          up  1.00000 1.00000 
-7       0.02788     host vm251-254                         
 2   hdd 0.00929         osd.2          up  1.00000 1.00000 
 5   hdd 0.00929         osd.5          up  1.00000 1.00000 
 8   hdd 0.00929         osd.8          up  1.00000 1.00000 
-3       0.02788     host vm253-212                         
 0   hdd 0.00929         osd.0          up  1.00000 1.00000 
 4   hdd 0.00929         osd.4          up  1.00000 1.00000 
 7   hdd 0.00929         osd.7          up  1.00000 1.00000 
--------------------------------------------------------------------------------

** Remove OSDs/OSD node (vm251-254) and apply flags to simulate adding a new node with OSDs:

[root@vm250-137 ~]# ceph -s
  cluster:
    id:     256b60c8-8d8e-47bb-9dfe-492055072a7e
    health: HEALTH_WARN
            noup,nobackfill,norecover flag(s) set
            55/693 objects misplaced (7.937%)
            Degraded data redundancy: 176/693 objects degraded (25.397%), 352 pgs unclean, 21 pgs degraded, 352 pgs undersized
            application not enabled on 1 pool(s)
            1/3 mons down, quorum vm250-8,vm250-137
 
  services:
    mon:         3 daemons, quorum vm250-8,vm250-137, out of quorum: vm250-194
    mgr:         vm250-137(active), standbys: vm250-8
    osd:         6 osds: 6 up, 6 in; 224 remapped pgs
                 flags noup,nobackfill,norecover
    rgw:         2 daemons active
    tcmu-runner: 2 daemons active
 
  data:
    pools:   9 pools, 576 pgs
    objects: 231 objects, 3873 bytes
    usage:   750 MB used, 59087 MB / 59837 MB avail
    pgs:     176/693 objects degraded (25.397%)
             55/693 objects misplaced (7.937%)
             331 active+undersized
             214 active+clean+remapped
             21  active+undersized+degraded
             10  active+clean
 
  io:
    client:   127 B/s rd, 0 op/s rd, 0 op/s wr


[root@vm250-137 ~]# ceph osd tree
ID CLASS WEIGHT  TYPE NAME          STATUS REWEIGHT PRI-AFF 
-1       0.05676 root default                               
-5       0.02888     host vm250-248                         
 1   hdd 0.00929         osd.1          up  1.00000 1.00000 
 3   hdd 0.00980         osd.3          up  1.00000 1.00000 
 6   hdd 0.00980         osd.6          up  1.00000 1.00000 
-3       0.02788     host vm253-212                         
 0   hdd 0.00929         osd.0          up  1.00000 1.00000 
 4   hdd 0.00929         osd.4          up  1.00000 1.00000 
 7   hdd 0.00929         osd.7          up  1.00000 1.00000 
-------------------------------------------------------------------------------

** ceph-ansible playbook fails on this non-containerized task at the end of the playbook?? but still appears to be successful in applying the changes **

TASK [ceph-osd : manually prepare ceph "filestore" non-containerized osd disk(s) with collocated osd data and journal] *****
changed: [vm251-254] => (item=[{'_ansible_parsed': True, 'stderr_lines': [], u'cmd': u"parted --script /dev/sdb print | egrep -sq '^ 1.*ceph'", u'end': u'2018-09-21 14:44:00.954490', '_ansible_no_log': False, u'stdout': u'', '_ansible_item_result': True, u'changed': False, u'invocation': {u'module_args': {u'warn': True, u'executable': None, u'_uses_shell': True, u'_raw_params': u"parted --script /dev/sdb print | egrep -sq '^ 1.*ceph'", u'removes': None, u'creates': None, u'chdir': None, u'stdin': None}}, u'start': u'2018-09-21 14:44:00.910970', u'delta': u'0:00:00.043520', 'item': u'/dev/sdb', u'rc': 1, u'msg': u'non-zero return code', 'stdout_lines': [], 'failed_when_result': False, u'stderr': u'', '_ansible_ignore_errors': None, u'failed': False}, u'/dev/sdb'])
changed: [vm251-254] => (item=[{'_ansible_parsed': True, 'stderr_lines': [], u'cmd': u"parted --script /dev/sdc print | egrep -sq '^ 1.*ceph'", u'end': u'2018-09-21 14:44:01.467960', '_ansible_no_log': False, u'stdout': u'', '_ansible_item_result': True, u'changed': False, u'invocation': {u'module_args': {u'warn': True, u'executable': None, u'_uses_shell': True, u'_raw_params': u"parted --script /dev/sdc print | egrep -sq '^ 1.*ceph'", u'removes': None, u'creates': None, u'chdir': None, u'stdin': None}}, u'start': u'2018-09-21 14:44:01.408488', u'delta': u'0:00:00.059472', 'item': u'/dev/sdc', u'rc': 1, u'msg': u'non-zero return code', 'stdout_lines': [], 'failed_when_result': False, u'stderr': u'', '_ansible_ignore_errors': None, u'failed': False}, u'/dev/sdc'])
changed: [vm251-254] => (item=[{'_ansible_parsed': True, 'stderr_lines': [], u'cmd': u"parted --script /dev/sdd print | egrep -sq '^ 1.*ceph'", u'end': u'2018-09-21 14:44:02.006833', '_ansible_no_log': False, u'stdout': u'', '_ansible_item_result': True, u'changed': False, u'invocation': {u'module_args': {u'warn': True, u'executable': None, u'_uses_shell': True, u'_raw_params': u"parted --script /dev/sdd print | egrep -sq '^ 1.*ceph'", u'removes': None, u'creates': None, u'chdir': None, u'stdin': None}}, u'start': u'2018-09-21 14:44:01.959515', u'delta': u'0:00:00.047318', 'item': u'/dev/sdd', u'rc': 1, u'msg': u'non-zero return code', 'stdout_lines': [], 'failed_when_result': False, u'stderr': u'', '_ansible_ignore_errors': None, u'failed': False}, u'/dev/sdd'])
failed: [vm251-254] (item=[{'_ansible_parsed': True, 'stderr_lines': [], u'cmd': u"parted --script /dev/sdd print | egrep -sq '^ 1.*ceph'", u'end': u'2018-09-21 14:44:02.437479', '_ansible_no_log': False, u'stdout': u'', '_ansible_item_result': True, u'changed': False, u'invocation': {u'module_args': {u'warn': True, u'executable': None, u'_uses_shell': True, u'_raw_params': u"parted --script /dev/sdd print | egrep -sq '^ 1.*ceph'", u'removes': None, u'creates': None, u'chdir': None, u'stdin': None}}, u'start': u'2018-09-21 14:44:02.417113', u'delta': u'0:00:00.020366', 'item': u'/dev/sdd', u'rc': 1, u'msg': u'non-zero return code', 'stdout_lines': [], 'failed_when_result': False, u'stderr': u'', '_ansible_ignore_errors': None, u'failed': False}, u'/dev/sdd']) => {"changed": true, "cmd": ["ceph-disk", "prepare", "--cluster", "ceph", "--filestore", "/dev/sdd"], "delta": "0:00:01.538766", "end": "2018-09-21 14:44:46.513581", "item": [{"_ansible_ignore_errors": null, "_ansible_item_result": true, "_ansible_no_log": false, "_ansible_parsed": true, "changed": false, "cmd": "parted --script /dev/sdd print | egrep -sq '^ 1.*ceph'", "delta": "0:00:00.020366", "end": "2018-09-21 14:44:02.437479", "failed": false, "failed_when_result": false, "invocation": {"module_args": {"_raw_params": "parted --script /dev/sdd print | egrep -sq '^ 1.*ceph'", "_uses_shell": true, "chdir": null, "creates": null, "executable": null, "removes": null, "stdin": null, "warn": true}}, "item": "/dev/sdd", "msg": "non-zero return code", "rc": 1, "start": "2018-09-21 14:44:02.417113", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}, "/dev/sdd"], "msg": "non-zero return code", "rc": 1, "start": "2018-09-21 14:44:44.974815", "stderr": "Could not create partition 2 from 34 to 1048609\nError encountered; not saving changes.\n'/sbin/sgdisk --new=2:0:+512M --change-name=2:ceph journal --partition-guid=2:e07fb99d-f87e-4a44-aa6b-e6f466f7aef2 --typecode=2:45b0969e-9b03-4f30-b4c6-b4b80ceff106 --mbrtogpt -- /dev/sdd' failed with status code 4", "stderr_lines": ["Could not create partition 2 from 34 to 1048609", "Error encountered; not saving changes.", "'/sbin/sgdisk --new=2:0:+512M --change-name=2:ceph journal --partition-guid=2:e07fb99d-f87e-4a44-aa6b-e6f466f7aef2 --typecode=2:45b0969e-9b03-4f30-b4c6-b4b80ceff106 --mbrtogpt -- /dev/sdd' failed with status code 4"], "stdout": "", "stdout_lines": []}
failed: [vm251-254] (item=[{'_ansible_parsed': True, 'stderr_lines': [], u'cmd': u"parted --script /dev/sdb print | egrep -sq '^ 1.*ceph'", u'end': u'2018-09-21 14:44:02.846679', '_ansible_no_log': False, u'stdout': u'', '_ansible_item_result': True, u'changed': False, u'invocation': {u'module_args': {u'warn': True, u'executable': None, u'_uses_shell': True, u'_raw_params': u"parted --script /dev/sdb print | egrep -sq '^ 1.*ceph'", u'removes': None, u'creates': None, u'chdir': None, u'stdin': None}}, u'start': u'2018-09-21 14:44:02.830390', u'delta': u'0:00:00.016289', 'item': u'/dev/sdb', u'rc': 1, u'msg': u'non-zero return code', 'stdout_lines': [], 'failed_when_result': False, u'stderr': u'', '_ansible_ignore_errors': None, u'failed': False}, u'/dev/sdb']) => {"changed": true, "cmd": ["ceph-disk", "prepare", "--cluster", "ceph", "--filestore", "/dev/sdb"], "delta": "0:00:00.308885", "end": "2018-09-21 14:44:47.732516", "item": [{"_ansible_ignore_errors": null, "_ansible_item_result": true, "_ansible_no_log": false, "_ansible_parsed": true, "changed": false, "cmd": "parted --script /dev/sdb print | egrep -sq '^ 1.*ceph'", "delta": "0:00:00.016289", "end": "2018-09-21 14:44:02.846679", "failed": false, "failed_when_result": false, "invocation": {"module_args": {"_raw_params": "parted --script /dev/sdb print | egrep -sq '^ 1.*ceph'", "_uses_shell": true, "chdir": null, "creates": null, "executable": null, "removes": null, "stdin": null, "warn": true}}, "item": "/dev/sdb", "msg": "non-zero return code", "rc": 1, "start": "2018-09-21 14:44:02.830390", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}, "/dev/sdb"], "msg": "non-zero return code", "rc": 1, "start": "2018-09-21 14:44:47.423631", "stderr": "ceph-disk: Error: Device is mounted: /dev/sdb1", "stderr_lines": ["ceph-disk: Error: Device is mounted: /dev/sdb1"], "stdout": "", "stdout_lines": []}
failed: [vm251-254] (item=[{'_ansible_parsed': True, 'stderr_lines': [], u'cmd': u"parted --script /dev/sdc print | egrep -sq '^ 1.*ceph'", u'end': u'2018-09-21 14:44:03.289726', '_ansible_no_log': False, u'stdout': u'', '_ansible_item_result': True, u'changed': False, u'invocation': {u'module_args': {u'warn': True, u'executable': None, u'_uses_shell': True, u'_raw_params': u"parted --script /dev/sdc print | egrep -sq '^ 1.*ceph'", u'removes': None, u'creates': None, u'chdir': None, u'stdin': None}}, u'start': u'2018-09-21 14:44:03.273331', u'delta': u'0:00:00.016395', 'item': u'/dev/sdc', u'rc': 1, u'msg': u'non-zero return code', 'stdout_lines': [], 'failed_when_result': False, u'stderr': u'', '_ansible_ignore_errors': None, u'failed': False}, u'/dev/sdc']) => {"changed": true, "cmd": ["ceph-disk", "prepare", "--cluster", "ceph", "--filestore", "/dev/sdc"], "delta": "0:00:00.306034", "end": "2018-09-21 14:44:48.455621", "item": [{"_ansible_ignore_errors": null, "_ansible_item_result": true, "_ansible_no_log": false, "_ansible_parsed": true, "changed": false, "cmd": "parted --script /dev/sdc print | egrep -sq '^ 1.*ceph'", "delta": "0:00:00.016395", "end": "2018-09-21 14:44:03.289726", "failed": false, "failed_when_result": false, "invocation": {"module_args": {"_raw_params": "parted --script /dev/sdc print | egrep -sq '^ 1.*ceph'", "_uses_shell": true, "chdir": null, "creates": null, "executable": null, "removes": null, "stdin": null, "warn": true}}, "item": "/dev/sdc", "msg": "non-zero return code", "rc": 1, "start": "2018-09-21 14:44:03.273331", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}, "/dev/sdc"], "msg": "non-zero return code", "rc": 1, "start": "2018-09-21 14:44:48.149587", "stderr": "ceph-disk: Error: Device is mounted: /dev/sdc1", "stderr_lines": ["ceph-disk: Error: Device is mounted: /dev/sdc1"], "stdout": "", "stdout_lines": []}

PLAY RECAP *****************************************************************************************************************
vm251-254                  : ok=63   changed=4    unreachable=0    failed=1   


--------------------------------------------------------------------------------

*** Changes after the run of the playbook ***

ID CLASS WEIGHT  TYPE NAME          STATUS REWEIGHT PRI-AFF
-1	 0.08464 root default
-5	 0.02888     host vm250-248
 1   hdd 0.00929         osd.1          up  1.00000 1.00000
 3   hdd 0.00980         osd.3          up  1.00000 1.00000
 6   hdd 0.00980         osd.6          up  1.00000 1.00000
-7	 0.02788     host vm251-254
 2   hdd 0.00929         osd.2        down        0 1.00000
 5   hdd 0.00929         osd.5        down        0 1.00000
 8   hdd 0.00929         osd.8        down        0 1.00000
-3	 0.02788     host vm253-212
 0   hdd 0.00929         osd.0          up  1.00000 1.00000
 4   hdd 0.00929         osd.4          up  1.00000 1.00000
 7   hdd 0.00929         osd.7          up  1.00000 1.00000



[root@vm250-137 ~]# ceph osd unset nobackfill
nobackfill is unset
[root@vm250-137 ~]# ceph osd unset norecover
norecover is unset
[root@vm250-137 ~]# ceph osd unset noup
noup is unset
[root@vm250-137 ~]# ceph osd tree
ID CLASS WEIGHT  TYPE NAME          STATUS REWEIGHT PRI-AFF 
-1       0.08464 root default                               
-5       0.02888     host vm250-248                         
 1   hdd 0.00929         osd.1          up  1.00000 1.00000 
 3   hdd 0.00980         osd.3          up  1.00000 1.00000 
 6   hdd 0.00980         osd.6          up  1.00000 1.00000 
-7       0.02788     host vm251-254                         
 2   hdd 0.00929         osd.2          up  1.00000 1.00000 
 5   hdd 0.00929         osd.5          up  1.00000 1.00000 
 8   hdd 0.00929         osd.8          up  1.00000 1.00000 
-3       0.02788     host vm253-212                         
 0   hdd 0.00929         osd.0          up  1.00000 1.00000 
 4   hdd 0.00929         osd.4          up  1.00000 1.00000 
 7   hdd 0.00929         osd.7          up  1.00000 1.00000


** Cluster has now successfully re-balanced **:

[root@vm250-137 ~]# ceph -s
  cluster:
    id:     256b60c8-8d8e-47bb-9dfe-492055072a7e
    health: HEALTH_WARN
            application not enabled on 1 pool(s)
            1/3 mons down, quorum vm250-8,vm250-137
 
  services:
    mon:         3 daemons, quorum vm250-8,vm250-137, out of quorum: vm250-194
    mgr:         vm250-137(active), standbys: vm250-8
    osd:         9 osds: 9 up, 9 in
    rgw:         2 daemons active
    tcmu-runner: 2 daemons active
 
  data:
    pools:   9 pools, 576 pgs
    objects: 231 objects, 3873 bytes
    usage:   1090 MB used, 87898 MB / 88988 MB avail
    pgs:     576 active+clean
 
  io:
    client:   85 B/s rd, 0 op/s rd, 0 op/s wr

Comment 4 Sébastien Han 2018-09-25 13:24:09 UTC
Vikhyat, for day 2 operations it is encouraged to use the playbook osd-configure.yml, which will add new OSDs. So we are going to add your request in this playbook.

Comment 5 Sébastien Han 2018-10-17 15:18:04 UTC
Present in https://github.com/ceph/ceph-ansible/releases/tag/v3.2.0beta6

Comment 10 Sébastien Han 2018-11-07 10:05:31 UTC
lgtm

Comment 11 Vasishta 2018-11-19 12:39:09 UTC
Observed that noup falg was set and unset as required.
Moving to VERIFIED state.

ceph-ansible-3.2.0-0.1.rc3.el7cp.noarch

Comment 13 errata-xmlrpc 2019-01-03 19:01:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0020