Bug 1297978 - rhel-osp-director: after scale down the number of ceph nodes (from 3 to 1) osdmap remains 3 osds: 1 up, 1 in
rhel-osp-director: after scale down the number of ceph nodes (from 3 to 1) os...
Status: CLOSED NOTABUG
Product: Red Hat OpenStack
Classification: Red Hat
Component: rhosp-director (Show other bugs)
8.0 (Liberty)
x86_64 Linux
urgent Severity medium
: ga
: 8.0 (Liberty)
Assigned To: Giulio Fidente
Yogev Rabl
Derek
: TestOnly, Triaged, ZStream
: 1311997 (view as bug list)
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2016-01-12 17:04 EST by Omri Hochman
Modified: 2018-05-02 06:53 EDT (History)
21 users (show)

See Also:
Fixed In Version: openstack-puppet-modules-7.0.16-1.el7ost
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-03-29 14:15:44 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 2187251 None None None 2016-03-02 02:26 EST
OpenStack gerrit 294091 None None None 2016-03-17 11:16 EDT

  None (edit)
Description Omri Hochman 2016-01-12 17:04:30 EST
rhel-osp-director: after scale down the number of ceph nodes (from 3 to 1) osdmap remains 3 osds: 1 up, 1 in.


Environment (OSP-D-7.2-GA): 
----------------------------
ceph-0.94.1-13.el7cp.x86_64
ceph-mon-0.94.1-13.el7cp.x86_64
ceph-osd-0.94.1-13.el7cp.x86_64
ceph-common-0.94.1-13.el7cp.x86_64
python-rdomanager-oscplugin-0.0.10-22.el7ost.noarch
python-heatclient-0.6.0-1.el7ost.noarch
openstack-heat-api-2015.1.2-4.el7ost.noarch
heat-cfntools-1.2.8-2.el7.noarch
openstack-heat-templates-0-0.8.20150605git.el7ost.noarch
instack-0.0.7-2.el7ost.noarch
instack-undercloud-2.1.2-36.el7ost.noarch


Description:
------------
I attempted to Scale down my 3 ceph nodes deployment to 1 ceph node. after the scale down - the ceph status still shows that number of disks are 3 but only 1 is UP ( the real situation is 1 disk available 1 UP )   

Deployment command :
openstack overcloud deploy --templates --control-scale 3 --compute-scale 1 --ceph-storage-scale 3 -e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml --ntp-server 10.5.26.10 --timeout 90


Before Scale down: 
-------------------
[root@overcloud-controller-0 ~]# ceph status
    cluster 4d13892c-b403-11e5-8522-525400c91767
     health HEALTH_OK
     monmap e1: 3 mons at {overcloud-controller-0=192.168.0.10:6789/0,overcloud-controller-1=192.168.0.11:6789/0,overcloud-controller-2=192.168.0.12:6789/0}
            election epoch 12, quorum 0,1,2 overcloud-controller-0,overcloud-controller-1,overcloud-controller-2
     osdmap e135: 3 osds: 3 up, 3 in
      pgmap v476: 224 pgs, 4 pools, 45659 kB data, 19 objects
            28331 MB used, 953 GB / 981 GB avail
                 224 active+clean
  client io 81 B/s wr, 0 op/s


-------------------------------------------------------------------------------

Scale-down command :
openstack overcloud deploy --templates --control-scale 3 --compute-scale 1 --ceph-storage-scale 3 -e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml --ntp-server 10.5.26.10 --timeout 90

After Scale down: 
------------------
[root@overcloud-controller-0 ~]# ceph status
    cluster 4d13892c-b403-11e5-8522-525400c91767
     health HEALTH_WARN
            183 pgs degraded
            63 pgs stale
            183 pgs stuck degraded
            2 pgs stuck inactive 
            63 pgs stuck stale   
            185 pgs stuck unclean
            183 pgs stuck undersized
            183 pgs undersized   
            recovery 15/38 objects degraded (39.474%)
            too many PGs per OSD (311 > max 300)
            pool vms pg_num 64 > pgp_num 56
            pool images pg_num 64 > pgp_num 56
            pool volumes pg_num 64 > pgp_num 56
     monmap e1: 3 mons at {overcloud-controller-0=192.168.0.10:6789/0,overcloud-controller-1=192.168.0.11:6789/0,overcloud-controller-2=192.168.0.12:6789/0}
            election epoch 12, quorum 0,1,2 overcloud-controller-0,overcloud-controller-1,overcloud-controller-2
     osdmap e144: 3 osds: 1 up, 1 in
      pgmap v727: 248 pgs, 4 pools, 45659 kB data, 19 objects
            12852 MB used, 38334 MB / 51187 MB avail
            15/38 objects degraded (39.474%)
                 183 active+undersized+degraded
                  63 stale+active+clean
                   2 creating
  client io 61 B/s rd, 0 op/s


Ceph health:
------------
[root@overcloud-controller-0 ~]# ceph health
HEALTH_WARN 183 pgs degraded; 63 pgs stale; 183 pgs stuck degraded; 2 pgs stuck inactive; 63 pgs stuck stale; 185 pgs stuck unclean; 183 pgs stuck undersized; 183 pgs undersized; recovery 15/38 objects degraded (39.474%); too many PGs per OSD (311 > max 300); pool vms pg_num 64 > pgp_num 56; pool images pg_num 64 > pgp_num 56; pool volumes pg_num 64 > pgp_num 56
Comment 6 Mike Burns 2016-02-25 08:50:13 EST
*** Bug 1311997 has been marked as a duplicate of this bug. ***
Comment 8 Jaromir Coufal 2016-03-17 10:49:06 EDT
Scaling down of Ceph nodes is not supported so far. This will need to be RFE. We will though need to prevent user from achieving this state.
Comment 9 Emilien Macchi 2016-03-21 13:54:20 EDT
Patch is merged upstream.
Comment 16 Omri Hochman 2016-05-31 18:34:25 EDT
verified ospd9 :
[stack@undercloud72 ~]$ rpm -qa | grep heat
openstack-heat-api-6.0.0-3.el7ost.noarch
openstack-tripleo-heat-templates-liberty-2.0.0-8.el7ost.noarch
openstack-heat-common-6.0.0-3.el7ost.noarch
openstack-tripleo-heat-templates-2.0.0-8.el7ost.noarch
heat-cfntools-1.3.0-2.el7ost.noarch
openstack-heat-templates-0-0.8.20150605git.el7ost.noarch
openstack-heat-api-cfn-6.0.0-3.el7ost.noarch
python-heatclient-1.0.0-1.el7ost.noarch
openstack-tripleo-heat-templates-kilo-2.0.0-8.el7ost.noarch
openstack-heat-engine-6.0.0-3.el7ost.noarch


[root@undercloud72 ~]# nova list
+--------------------------------------+-------------------------+--------+------------+-------------+-----------------------+
| ID                                   | Name                    | Status | Task State | Power State | Networks              |
+--------------------------------------+-------------------------+--------+------------+-------------+-----------------------+
| e7882f51-cea8-4fd6-9825-d4ce94209f66 | overcloud-cephstorage-0 | ACTIVE | -          | Running     | ctlplane=192.168.0.8  |
| b12ab653-57d4-44a8-9eac-62114ac0fc36 | overcloud-cephstorage-1 | ACTIVE | -          | Running     | ctlplane=192.168.0.7  |
| ffb95cfc-cb2f-4baa-8f28-6f6accbf3efb | overcloud-controller-0  | ACTIVE | -          | Running     | ctlplane=192.168.0.11 |
| 00b2396a-19da-4cf9-a666-3e0ce0ed659c | overcloud-controller-1  | ACTIVE | -          | Running     | ctlplane=192.168.0.10 |
| 0c271562-1660-467e-97d4-f89a5f454407 | overcloud-controller-2  | ACTIVE | -          | Running     | ctlplane=192.168.0.12 |
| 4139094d-6f7f-48ce-a715-67c603083b1a | overcloud-novacompute-0 | ACTIVE | -          | Running     | ctlplane=192.168.0.9  |
| 09888732-7c25-4d97-9c9d-9a9d9cfa0b7a | overcloud-novacompute-1 | ACTIVE | -          | Running     | ctlplane=192.168.0.13 |
+--------------------------------------+-------------------------+--------+------------+-------------+-----------------------+


openstack overcloud deploy --templates --control-scale 3 --compute-scale 2 --ceph-storage-scale 1   --neutron-network-type vxlan --neutron-tunnel-types vxlan  --ntp-server 10.5.26.10 --timeout 90 -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e network-environment.yaml

..
..
2016-05-31 22:28:51 [overcloud-CephStorageNodesPostDeployment-cbztvrgeq2kl-ExtraConfig-jmi4qykg2gpw]: UPDATE_COMPLETE Stack UPDATE completed successfully
2016-05-31 22:28:52 [ExtraConfig]: UPDATE_COMPLETE state changed
2016-05-31 22:28:53 [NetworkDeployment]: SIGNAL_COMPLETE Unknown
2016-05-31 22:28:57 [overcloud-CephStorageNodesPostDeployment-cbztvrgeq2kl]: UPDATE_COMPLETE Stack UPDATE completed successfully
Stack overcloud UPDATE_COMPLETE
Overcloud Endpoint: http://10.19.184.210:5000/v2.0
Overcloud Deployed
[stack@undercloud72 ~]$ nova list
+--------------------------------------+-------------------------+--------+------------+-------------+-----------------------+
| ID                                   | Name                    | Status | Task State | Power State | Networks              |
+--------------------------------------+-------------------------+--------+------------+-------------+-----------------------+
| e7882f51-cea8-4fd6-9825-d4ce94209f66 | overcloud-cephstorage-0 | ACTIVE | -          | Running     | ctlplane=192.168.0.8  |
| ffb95cfc-cb2f-4baa-8f28-6f6accbf3efb | overcloud-controller-0  | ACTIVE | -          | Running     | ctlplane=192.168.0.11 |
| 00b2396a-19da-4cf9-a666-3e0ce0ed659c | overcloud-controller-1  | ACTIVE | -          | Running     | ctlplane=192.168.0.10 |
| 0c271562-1660-467e-97d4-f89a5f454407 | overcloud-controller-2  | ACTIVE | -          | Running     | ctlplane=192.168.0.12 |
| 4139094d-6f7f-48ce-a715-67c603083b1a | overcloud-novacompute-0 | ACTIVE | -          | Running     | ctlplane=192.168.0.9  |
| 09888732-7c25-4d97-9c9d-9a9d9cfa0b7a | overcloud-novacompute-1 | ACTIVE | -          | Running     | ctlplane=192.168.0.13 |
+--------------------------------------+-------------------------+--------+------------+-------------+-----------------------+
Comment 17 Omri Hochman 2016-05-31 22:40:35 EDT
failed_qa for further investigation 

After scale down to 1 ceph node, the command 'ceph status' shows :  

[heat-admin@overcloud-controller-1 ~]$ sudo ceph status 
    cluster 1442b054-d029-11e5-bc14-525400c91767
     health HEALTH_WARN
            192 pgs degraded
            192 pgs stuck degraded
            192 pgs stuck unclean
            192 pgs stuck undersized
            192 pgs undersized
     monmap e1: 3 mons at {overcloud-controller-0=10.19.105.15:6789/0,overcloud-controller-1=10.19.105.14:6789/0,overcloud-controller-2=10.19.105.11:6789/0}
            election epoch 8, quorum 0,1,2 overcloud-controller-2,overcloud-controller-1,overcloud-controller-0
     osdmap e19: 2 osds: 1 up, 1 in
      pgmap v1605: 192 pgs, 5 pools, 0 bytes data, 0 objects
            4102 MB used, 414 GB / 437 GB avail
                 192 active+undersized+degraded

------------------------------------------------------
result : osdmap e19: 2 osds: 1 up, 1 in
expected : osdmap e19: 1 osds: 1 up, 1 in
Comment 18 Emilien Macchi 2016-06-01 17:26:24 EDT
could you provide system logs? I want to see ceph osd/mon and puppet run logs.

Thanks
Comment 20 jomurphy 2017-03-29 14:15:44 EDT
Scaling down to one node is not supported. Therefore this bug is being closed as not a supported configuration.
Comment 21 Amit Ugol 2018-05-02 06:53:17 EDT
closed, no need for needinfo.

Note You need to log in before you can comment on or make changes to this bug.