Bug 1297978

Summary: rhel-osp-director: after scale down the number of ceph nodes (from 3 to 1) osdmap remains 3 osds: 1 up, 1 in
Product: Red Hat OpenStack Reporter: Omri Hochman <ohochman>
Component: rhosp-directorAssignee: Giulio Fidente <gfidente>
Status: CLOSED NOTABUG QA Contact: Yogev Rabl <yrabl>
Severity: medium Docs Contact: Derek <dcadzow>
Priority: urgent    
Version: 8.0 (Liberty)CC: augol, dbecker, elicohen, emacchi, ggillies, hbrock, jcoufal, jguiditt, jomurphy, jraju, jschluet, jslagle, kbasil, mburns, mcornea, morazi, ohochman, rhel-osp-director-maint, roxenham, sasha, srevivo, yrabl
Target Milestone: gaKeywords: TestOnly, Triaged, ZStream
Target Release: 8.0 (Liberty)   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: openstack-puppet-modules-7.0.16-1.el7ost Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-03-29 18:15:44 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Omri Hochman 2016-01-12 22:04:30 UTC
rhel-osp-director: after scale down the number of ceph nodes (from 3 to 1) osdmap remains 3 osds: 1 up, 1 in.


Environment (OSP-D-7.2-GA): 
----------------------------
ceph-0.94.1-13.el7cp.x86_64
ceph-mon-0.94.1-13.el7cp.x86_64
ceph-osd-0.94.1-13.el7cp.x86_64
ceph-common-0.94.1-13.el7cp.x86_64
python-rdomanager-oscplugin-0.0.10-22.el7ost.noarch
python-heatclient-0.6.0-1.el7ost.noarch
openstack-heat-api-2015.1.2-4.el7ost.noarch
heat-cfntools-1.2.8-2.el7.noarch
openstack-heat-templates-0-0.8.20150605git.el7ost.noarch
instack-0.0.7-2.el7ost.noarch
instack-undercloud-2.1.2-36.el7ost.noarch


Description:
------------
I attempted to Scale down my 3 ceph nodes deployment to 1 ceph node. after the scale down - the ceph status still shows that number of disks are 3 but only 1 is UP ( the real situation is 1 disk available 1 UP )   

Deployment command :
openstack overcloud deploy --templates --control-scale 3 --compute-scale 1 --ceph-storage-scale 3 -e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml --ntp-server 10.5.26.10 --timeout 90


Before Scale down: 
-------------------
[root@overcloud-controller-0 ~]# ceph status
    cluster 4d13892c-b403-11e5-8522-525400c91767
     health HEALTH_OK
     monmap e1: 3 mons at {overcloud-controller-0=192.168.0.10:6789/0,overcloud-controller-1=192.168.0.11:6789/0,overcloud-controller-2=192.168.0.12:6789/0}
            election epoch 12, quorum 0,1,2 overcloud-controller-0,overcloud-controller-1,overcloud-controller-2
     osdmap e135: 3 osds: 3 up, 3 in
      pgmap v476: 224 pgs, 4 pools, 45659 kB data, 19 objects
            28331 MB used, 953 GB / 981 GB avail
                 224 active+clean
  client io 81 B/s wr, 0 op/s


-------------------------------------------------------------------------------

Scale-down command :
openstack overcloud deploy --templates --control-scale 3 --compute-scale 1 --ceph-storage-scale 3 -e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml --ntp-server 10.5.26.10 --timeout 90

After Scale down: 
------------------
[root@overcloud-controller-0 ~]# ceph status
    cluster 4d13892c-b403-11e5-8522-525400c91767
     health HEALTH_WARN
            183 pgs degraded
            63 pgs stale
            183 pgs stuck degraded
            2 pgs stuck inactive 
            63 pgs stuck stale   
            185 pgs stuck unclean
            183 pgs stuck undersized
            183 pgs undersized   
            recovery 15/38 objects degraded (39.474%)
            too many PGs per OSD (311 > max 300)
            pool vms pg_num 64 > pgp_num 56
            pool images pg_num 64 > pgp_num 56
            pool volumes pg_num 64 > pgp_num 56
     monmap e1: 3 mons at {overcloud-controller-0=192.168.0.10:6789/0,overcloud-controller-1=192.168.0.11:6789/0,overcloud-controller-2=192.168.0.12:6789/0}
            election epoch 12, quorum 0,1,2 overcloud-controller-0,overcloud-controller-1,overcloud-controller-2
     osdmap e144: 3 osds: 1 up, 1 in
      pgmap v727: 248 pgs, 4 pools, 45659 kB data, 19 objects
            12852 MB used, 38334 MB / 51187 MB avail
            15/38 objects degraded (39.474%)
                 183 active+undersized+degraded
                  63 stale+active+clean
                   2 creating
  client io 61 B/s rd, 0 op/s


Ceph health:
------------
[root@overcloud-controller-0 ~]# ceph health
HEALTH_WARN 183 pgs degraded; 63 pgs stale; 183 pgs stuck degraded; 2 pgs stuck inactive; 63 pgs stuck stale; 185 pgs stuck unclean; 183 pgs stuck undersized; 183 pgs undersized; recovery 15/38 objects degraded (39.474%); too many PGs per OSD (311 > max 300); pool vms pg_num 64 > pgp_num 56; pool images pg_num 64 > pgp_num 56; pool volumes pg_num 64 > pgp_num 56

Comment 6 Mike Burns 2016-02-25 13:50:13 UTC
*** Bug 1311997 has been marked as a duplicate of this bug. ***

Comment 8 Jaromir Coufal 2016-03-17 14:49:06 UTC
Scaling down of Ceph nodes is not supported so far. This will need to be RFE. We will though need to prevent user from achieving this state.

Comment 9 Emilien Macchi 2016-03-21 17:54:20 UTC
Patch is merged upstream.

Comment 16 Omri Hochman 2016-05-31 22:34:25 UTC
verified ospd9 :
[stack@undercloud72 ~]$ rpm -qa | grep heat
openstack-heat-api-6.0.0-3.el7ost.noarch
openstack-tripleo-heat-templates-liberty-2.0.0-8.el7ost.noarch
openstack-heat-common-6.0.0-3.el7ost.noarch
openstack-tripleo-heat-templates-2.0.0-8.el7ost.noarch
heat-cfntools-1.3.0-2.el7ost.noarch
openstack-heat-templates-0-0.8.20150605git.el7ost.noarch
openstack-heat-api-cfn-6.0.0-3.el7ost.noarch
python-heatclient-1.0.0-1.el7ost.noarch
openstack-tripleo-heat-templates-kilo-2.0.0-8.el7ost.noarch
openstack-heat-engine-6.0.0-3.el7ost.noarch


[root@undercloud72 ~]# nova list
+--------------------------------------+-------------------------+--------+------------+-------------+-----------------------+
| ID                                   | Name                    | Status | Task State | Power State | Networks              |
+--------------------------------------+-------------------------+--------+------------+-------------+-----------------------+
| e7882f51-cea8-4fd6-9825-d4ce94209f66 | overcloud-cephstorage-0 | ACTIVE | -          | Running     | ctlplane=192.168.0.8  |
| b12ab653-57d4-44a8-9eac-62114ac0fc36 | overcloud-cephstorage-1 | ACTIVE | -          | Running     | ctlplane=192.168.0.7  |
| ffb95cfc-cb2f-4baa-8f28-6f6accbf3efb | overcloud-controller-0  | ACTIVE | -          | Running     | ctlplane=192.168.0.11 |
| 00b2396a-19da-4cf9-a666-3e0ce0ed659c | overcloud-controller-1  | ACTIVE | -          | Running     | ctlplane=192.168.0.10 |
| 0c271562-1660-467e-97d4-f89a5f454407 | overcloud-controller-2  | ACTIVE | -          | Running     | ctlplane=192.168.0.12 |
| 4139094d-6f7f-48ce-a715-67c603083b1a | overcloud-novacompute-0 | ACTIVE | -          | Running     | ctlplane=192.168.0.9  |
| 09888732-7c25-4d97-9c9d-9a9d9cfa0b7a | overcloud-novacompute-1 | ACTIVE | -          | Running     | ctlplane=192.168.0.13 |
+--------------------------------------+-------------------------+--------+------------+-------------+-----------------------+


openstack overcloud deploy --templates --control-scale 3 --compute-scale 2 --ceph-storage-scale 1   --neutron-network-type vxlan --neutron-tunnel-types vxlan  --ntp-server 10.5.26.10 --timeout 90 -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e network-environment.yaml

..
..
2016-05-31 22:28:51 [overcloud-CephStorageNodesPostDeployment-cbztvrgeq2kl-ExtraConfig-jmi4qykg2gpw]: UPDATE_COMPLETE Stack UPDATE completed successfully
2016-05-31 22:28:52 [ExtraConfig]: UPDATE_COMPLETE state changed
2016-05-31 22:28:53 [NetworkDeployment]: SIGNAL_COMPLETE Unknown
2016-05-31 22:28:57 [overcloud-CephStorageNodesPostDeployment-cbztvrgeq2kl]: UPDATE_COMPLETE Stack UPDATE completed successfully
Stack overcloud UPDATE_COMPLETE
Overcloud Endpoint: http://10.19.184.210:5000/v2.0
Overcloud Deployed
[stack@undercloud72 ~]$ nova list
+--------------------------------------+-------------------------+--------+------------+-------------+-----------------------+
| ID                                   | Name                    | Status | Task State | Power State | Networks              |
+--------------------------------------+-------------------------+--------+------------+-------------+-----------------------+
| e7882f51-cea8-4fd6-9825-d4ce94209f66 | overcloud-cephstorage-0 | ACTIVE | -          | Running     | ctlplane=192.168.0.8  |
| ffb95cfc-cb2f-4baa-8f28-6f6accbf3efb | overcloud-controller-0  | ACTIVE | -          | Running     | ctlplane=192.168.0.11 |
| 00b2396a-19da-4cf9-a666-3e0ce0ed659c | overcloud-controller-1  | ACTIVE | -          | Running     | ctlplane=192.168.0.10 |
| 0c271562-1660-467e-97d4-f89a5f454407 | overcloud-controller-2  | ACTIVE | -          | Running     | ctlplane=192.168.0.12 |
| 4139094d-6f7f-48ce-a715-67c603083b1a | overcloud-novacompute-0 | ACTIVE | -          | Running     | ctlplane=192.168.0.9  |
| 09888732-7c25-4d97-9c9d-9a9d9cfa0b7a | overcloud-novacompute-1 | ACTIVE | -          | Running     | ctlplane=192.168.0.13 |
+--------------------------------------+-------------------------+--------+------------+-------------+-----------------------+

Comment 17 Omri Hochman 2016-06-01 02:40:35 UTC
failed_qa for further investigation 

After scale down to 1 ceph node, the command 'ceph status' shows :  

[heat-admin@overcloud-controller-1 ~]$ sudo ceph status 
    cluster 1442b054-d029-11e5-bc14-525400c91767
     health HEALTH_WARN
            192 pgs degraded
            192 pgs stuck degraded
            192 pgs stuck unclean
            192 pgs stuck undersized
            192 pgs undersized
     monmap e1: 3 mons at {overcloud-controller-0=10.19.105.15:6789/0,overcloud-controller-1=10.19.105.14:6789/0,overcloud-controller-2=10.19.105.11:6789/0}
            election epoch 8, quorum 0,1,2 overcloud-controller-2,overcloud-controller-1,overcloud-controller-0
     osdmap e19: 2 osds: 1 up, 1 in
      pgmap v1605: 192 pgs, 5 pools, 0 bytes data, 0 objects
            4102 MB used, 414 GB / 437 GB avail
                 192 active+undersized+degraded

------------------------------------------------------
result : osdmap e19: 2 osds: 1 up, 1 in
expected : osdmap e19: 1 osds: 1 up, 1 in

Comment 18 Emilien Macchi 2016-06-01 21:26:24 UTC
could you provide system logs? I want to see ceph osd/mon and puppet run logs.

Thanks

Comment 20 jomurphy 2017-03-29 18:15:44 UTC
Scaling down to one node is not supported. Therefore this bug is being closed as not a supported configuration.

Comment 21 Amit Ugol 2018-05-02 10:53:17 UTC
closed, no need for needinfo.