Hide Forgot
rhel-osp-director: The number of OSDs is smaller than expected. Environment: openstack-puppet-modules-9.0.0-0.20160802183056.8c758d6.el7ost.noarch openstack-tripleo-heat-templates-5.0.0-0.20160907212643.90c852e.1.el7ost.noarch instack-undercloud-5.0.0-0.20160907134010.649dc3f.el7ost.noarch Steps to reproduce: 1. Prepare a setup with ceph nodes equipped with multiple (5) disks: a. Set a particular disk for OS with root_device hint b. with flavors ensure that nodes with multiple disks are used for ceph nodes. 2. create a ceph.yaml with: parameter_defaults: ExtraConfig: ceph::profile::params::osds: '/dev/sdb': {} '/dev/sdc': {} '/dev/sdd': {} 3. Run overcloud deployment with: openstack overcloud deploy --template --control-scale 3 --compute-scale 2 --ceph-storage-scale 3 --neutron-network-type vxlan --neutron-tunnel-types vxlan -e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e network-environment.yaml --control-flavor control --compute-flavor compute --ceph-storage-flavor ceph-storage -e ceph.yaml --ntp-server clock.redhat.com 4. After the deployment completes successfully run a quick check on ceph. Result: [root@overcloud-controller-1 ~]# ceph status cluster b381483c-7ea8-11e6-8b3f-5254003ec994 health HEALTH_WARN clock skew detected on mon.overcloud-controller-2, mon.overcloud-controller-0 Monitor clock skew detected monmap e2: 3 mons at {overcloud-controller-0=10.19.95.25:6789/0,overcloud-controller-1=10.19.95.21:6789/0,overcloud-controller-2=10.19.95.24:6789/0} election epoch 6, quorum 0,1,2 overcloud-controller-1,overcloud-controller-2,overcloud-controller-0 osdmap e35: 5 osds: 5 up, 5 in flags sortbitwise pgmap v4737: 224 pgs, 6 pools, 68853 kB data, 1494 objects 389 MB used, 9186 GB / 9186 GB avail 224 active+clean client io 2764 B/s rd, 0 op/s rd, 0 op/s wr Note that there are 5 osds, instead of expected 9. [root@overcloud-controller-1 ~]# ceph osd tree ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY -1 8.97095 root default -2 5.38257 host overcloud-cephstorage-0 0 1.79419 osd.0 up 1.00000 1.00000 3 1.79419 osd.3 up 1.00000 1.00000 4 1.79419 osd.4 up 1.00000 1.00000 -3 1.79419 host overcloud-cephstorage-1 1 1.79419 osd.1 up 1.00000 1.00000 -4 1.79419 host overcloud-cephstorage-2 2 1.79419 osd.2 up 1.00000 1.00000 6[stack@undercloud ~]$ nova list|grep ceph | 69d91ca7-db49-474e-8965-dbca19fc765f | overcloud-cephstorage-0 | ACTIVE | - | Running | ctlplane=192.168.0.6 | | d351f4c3-742d-40d5-99e7-480ab6b86ca0 | overcloud-cephstorage-1 | ACTIVE | - | Running | ctlplane=192.168.0.15 | | 8f8d54ca-3544-443f-9355-61885bd47e68 | overcloud-cephstorage-2 | ACTIVE | - | Running | ctlplane=192.168.0.16 | [stack@undercloud ~]$ for i in 192.168.0.6 192.168.0.15 192.168.0.16; do echo "######################################################"; echo $i; echo "######################################################"; ssh heat-admin@$i "sudo fdisk -l"; done ###################################################### 192.168.0.6 ###################################################### WARNING: fdisk GPT support is currently new, and therefore in an experimental phase. Use at your own discretion. Disk /dev/sdc: 1979.1 GB, 1979120091136 bytes, 3865468928 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk label type: gpt # Start End Size Type Name 1 10487808 3865468894 1.8T unknown ceph data 2 2048 10487807 5G unknown ceph journal Disk /dev/sda: 1979.1 GB, 1979120091136 bytes, 3865468928 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk /dev/sdb: 1979.1 GB, 1979120091136 bytes, 3865468928 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk label type: gpt # Start End Size Type Name 1 10487808 3865468894 1.8T unknown ceph data 2 2048 10487807 5G unknown ceph journal Disk /dev/sdd: 1979.1 GB, 1979120091136 bytes, 3865468928 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk label type: gpt # Start End Size Type Name 1 10487808 3865468894 1.8T unknown ceph data 2 2048 10487807 5G unknown ceph journal Disk /dev/sde: 82.9 GB, 82896224256 bytes, 161906688 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk label type: dos Disk identifier: 0x000ecf7a Device Boot Start End Blocks Id System /dev/sde1 2048 4095 1024 83 Linux /dev/sde2 * 4096 161903069 80949487 83 Linux WARNING: fdisk GPT support is currently new, and therefore in an experimental phase. Use at your own discretion. WARNING: fdisk GPT support is currently new, and therefore in an experimental phase. Use at your own discretion. ###################################################### 192.168.0.15 ###################################################### WARNING: fdisk GPT support is currently new, and therefore in an experimental phase. Use at your own discretion. Disk /dev/sdb: 1979.1 GB, 1979120091136 bytes, 3865468928 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk label type: gpt # Start End Size Type Name 1 10487808 3865468894 1.8T unknown ceph data 2 2048 10487807 5G unknown ceph journal Disk /dev/sda: 1979.1 GB, 1979120091136 bytes, 3865468928 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk /dev/sdc: 1979.1 GB, 1979120091136 bytes, 3865468928 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk label type: gpt # Start End Size Type Name 1 10487808 3865468894 1.8T unknown ceph data 2 2048 10487807 5G unknown ceph journal Disk /dev/sdd: 1979.1 GB, 1979120091136 bytes, 3865468928 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk label type: gpt # Start End Size Type Name 1 10487808 3865468894 1.8T unknown ceph data 2 2048 10487807 5G unknown ceph journal Disk /dev/sde: 82.9 GB, 82896224256 bytes, 161906688 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk label type: dos Disk identifier: 0x000cf73e Device Boot Start End Blocks Id System /dev/sde1 2048 4095 1024 83 Linux /dev/sde2 * 4096 161903069 80949487 83 Linux WARNING: fdisk GPT support is currently new, and therefore in an experimental phase. Use at your own discretion. WARNING: fdisk GPT support is currently new, and therefore in an experimental phase. Use at your own discretion. ###################################################### 192.168.0.16 ###################################################### WARNING: fdisk GPT support is currently new, and therefore in an experimental phase. Use at your own discretion. WARNING: fdisk GPT support is currently new, and therefore in an experimental phase. Use at your own discretion. WARNING: fdisk GPT support is currently new, and therefore in an experimental phase. Use at your own discretion. Disk /dev/sda: 1979.1 GB, 1979120091136 bytes, 3865468928 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk /dev/sdc: 1979.1 GB, 1979120091136 bytes, 3865468928 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk label type: gpt # Start End Size Type Name 1 10487808 3865468894 1.8T unknown ceph data 2 2048 10487807 5G unknown ceph journal Disk /dev/sdb: 1979.1 GB, 1979120091136 bytes, 3865468928 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk label type: gpt # Start End Size Type Name 1 10487808 3865468894 1.8T unknown ceph data 2 2048 10487807 5G unknown ceph journal Disk /dev/sdd: 1979.1 GB, 1979120091136 bytes, 3865468928 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk label type: gpt # Start End Size Type Name 1 10487808 3865468894 1.8T unknown ceph data 2 2048 10487807 5G unknown ceph journal Disk /dev/sde: 82.9 GB, 82896224256 bytes, 161906688 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk label type: dos Disk identifier: 0x00039053 Device Boot Start End Blocks Id System /dev/sde1 2048 4095 1024 83 Linux /dev/sde2 * 4096 161903069 80949487 83 Linux Expected result: 9 osds
It looks like ceph-disk prepare succeeded for all nodes, can you add output from: # ceph-disk list from one of the failing nodes to confirm they all have the correct FSID? This seems to be a duplicate of BZ 1371218 only for some reason happening with a much lower parallelism.
###################################################### 192.168.0.6 ####################################################### /dev/sda other, unknown /dev/sdb : /dev/sdb2 ceph journal, for /dev/sdb1 /dev/sdb1 ceph data, active, cluster ceph, osd.4, journal /dev/sdb2 /dev/sdc : /dev/sdc2 ceph journal, for /dev/sdc1 /dev/sdc1 ceph data, active, cluster ceph, osd.0, journal /dev/sdc2 /dev/sdd : /dev/sdd2 ceph journal, for /dev/sdd1 /dev/sdd1 ceph data, active, cluster ceph, osd.3, journal /dev/sdd2 /dev/sde : /dev/sde1 other, iso9660 /dev/sde2 other, ext4, mounted on / ###################################################### 192.168.0.15 ####################################################### /dev/sda other, unknown /dev/sdb : /dev/sdb2 ceph journal, for /dev/sdb1 /dev/sdb1 ceph data, prepared, unknown cluster 74c2353a-7a1a-11e6-90da-5254003ec994, osd.3, journal /dev/sdb2 /dev/sdc : /dev/sdc2 ceph journal, for /dev/sdc1 /dev/sdc1 ceph data, prepared, unknown cluster 74c2353a-7a1a-11e6-90da-5254003ec994, osd.0, journal /dev/sdc2 /dev/sdd : /dev/sdd2 ceph journal, for /dev/sdd1 /dev/sdd1 ceph data, active, cluster ceph, osd.1, journal /dev/sdd2 /dev/sde : /dev/sde1 other, iso9660 /dev/sde2 other, ext4, mounted on / ###################################################### 192.168.0.16 ####################################################### /dev/sda other, unknown /dev/sdb : /dev/sdb2 ceph journal, for /dev/sdb1 /dev/sdb1 ceph data, prepared, unknown cluster 74c2353a-7a1a-11e6-90da-5254003ec994, osd.5, journal /dev/sdb2 /dev/sdc : /dev/sdc2 ceph journal, for /dev/sdc1 /dev/sdc1 ceph data, prepared, unknown cluster 74c2353a-7a1a-11e6-90da-5254003ec994, osd.1, journal /dev/sdc2 /dev/sdd : /dev/sdd2 ceph journal, for /dev/sdd1 /dev/sdd1 ceph data, active, cluster ceph, osd.2, journal /dev/sdd2 /dev/sde : /dev/sde1 other, iso9660 /dev/sde2 other, ext4, mounted on /
Not a duplicate, looks like some disks were used for a different Ceph deployment and have not been re-used in the latest deployment attempt. This is because by default Director will not re-use disks containing data from a different Ceph cluster. You can zap the disks automatically with every new deployment attempt as described in [1]. If that works, then I think we can close this as NOTABUG. 1. https://access.redhat.com/documentation/en/red-hat-openstack-platform/9/single/red-hat-ceph-storage-for-the-overcloud/#Formatting_Ceph_Storage_Nodes_Disks_to_GPT
Per last comment, needed to zap the disks.