Bug 1377852 - rhel-osp-director: The number of OSDs is smaller than expected.
Summary: rhel-osp-director: The number of OSDs is smaller than expected.
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: rhosp-director
Version: 9.0 (Mitaka)
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ga
: 9.0 (Mitaka)
Assignee: Angus Thomas
QA Contact: Omri Hochman
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-09-20 19:07 UTC by Alexander Chuzhoy
Modified: 2016-09-20 20:29 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-09-20 20:28:41 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Alexander Chuzhoy 2016-09-20 19:07:10 UTC
rhel-osp-director: The number of OSDs  is smaller than expected.

Environment:
openstack-puppet-modules-9.0.0-0.20160802183056.8c758d6.el7ost.noarch
openstack-tripleo-heat-templates-5.0.0-0.20160907212643.90c852e.1.el7ost.noarch
instack-undercloud-5.0.0-0.20160907134010.649dc3f.el7ost.noarch


Steps to reproduce:
1. Prepare a setup with ceph nodes equipped with multiple (5) disks:
   a. Set a particular disk for OS with root_device hint
   b. with flavors ensure that nodes with multiple disks are used for ceph nodes.
2. create a ceph.yaml with:
parameter_defaults:
  ExtraConfig:
    ceph::profile::params::osds:
        '/dev/sdb': {}
        '/dev/sdc': {}
        '/dev/sdd': {}


3. Run overcloud deployment with:
openstack overcloud deploy --template --control-scale 3 --compute-scale 2 --ceph-storage-scale 3 --neutron-network-type vxlan --neutron-tunnel-types vxlan -e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml -e  /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e network-environment.yaml --control-flavor control --compute-flavor compute --ceph-storage-flavor ceph-storage -e ceph.yaml --ntp-server clock.redhat.com


4. After the deployment completes successfully run a quick check on ceph.

Result:

[root@overcloud-controller-1 ~]# ceph status
    cluster b381483c-7ea8-11e6-8b3f-5254003ec994
     health HEALTH_WARN
            clock skew detected on mon.overcloud-controller-2, mon.overcloud-controller-0
            Monitor clock skew detected
     monmap e2: 3 mons at {overcloud-controller-0=10.19.95.25:6789/0,overcloud-controller-1=10.19.95.21:6789/0,overcloud-controller-2=10.19.95.24:6789/0}
            election epoch 6, quorum 0,1,2 overcloud-controller-1,overcloud-controller-2,overcloud-controller-0
     osdmap e35: 5 osds: 5 up, 5 in
            flags sortbitwise
      pgmap v4737: 224 pgs, 6 pools, 68853 kB data, 1494 objects
            389 MB used, 9186 GB / 9186 GB avail
                 224 active+clean
  client io 2764 B/s rd, 0 op/s rd, 0 op/s wr


Note that there are 5 osds, instead of expected 9.



[root@overcloud-controller-1 ~]# ceph osd tree
ID WEIGHT  TYPE NAME                        UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 8.97095 root default
-2 5.38257     host overcloud-cephstorage-0
 0 1.79419         osd.0                         up  1.00000          1.00000
 3 1.79419         osd.3                         up  1.00000          1.00000
 4 1.79419         osd.4                         up  1.00000          1.00000
-3 1.79419     host overcloud-cephstorage-1
 1 1.79419         osd.1                         up  1.00000          1.00000
-4 1.79419     host overcloud-cephstorage-2
 2 1.79419         osd.2                         up  1.00000          1.00000




6[stack@undercloud ~]$ nova list|grep ceph                                                                                          
| 69d91ca7-db49-474e-8965-dbca19fc765f | overcloud-cephstorage-0 | ACTIVE | -          | Running     | ctlplane=192.168.0.6  |
| d351f4c3-742d-40d5-99e7-480ab6b86ca0 | overcloud-cephstorage-1 | ACTIVE | -          | Running     | ctlplane=192.168.0.15 |
| 8f8d54ca-3544-443f-9355-61885bd47e68 | overcloud-cephstorage-2 | ACTIVE | -          | Running     | ctlplane=192.168.0.16 |
[stack@undercloud ~]$ for i in 192.168.0.6 192.168.0.15 192.168.0.16; do echo "######################################################"; echo $i; echo "######################################################"; ssh heat-admin@$i "sudo fdisk -l"; done                                                                                                                                                                                  
######################################################                                                                                                                                                               
192.168.0.6                                                                                                                                                                                                          
######################################################                                                                                                                                                              
WARNING: fdisk GPT support is currently new, and therefore in an experimental phase. Use at your own discretion.                                                                                                     

Disk /dev/sdc: 1979.1 GB, 1979120091136 bytes, 3865468928 sectors
Units = sectors of 1 * 512 = 512 bytes                           
Sector size (logical/physical): 512 bytes / 512 bytes            
I/O size (minimum/optimal): 512 bytes / 512 bytes                
Disk label type: gpt                                             


#         Start          End    Size  Type            Name
 1     10487808   3865468894    1.8T  unknown         ceph data
 2         2048     10487807      5G  unknown         ceph journal

Disk /dev/sda: 1979.1 GB, 1979120091136 bytes, 3865468928 sectors
Units = sectors of 1 * 512 = 512 bytes                           
Sector size (logical/physical): 512 bytes / 512 bytes            
I/O size (minimum/optimal): 512 bytes / 512 bytes                


Disk /dev/sdb: 1979.1 GB, 1979120091136 bytes, 3865468928 sectors
Units = sectors of 1 * 512 = 512 bytes                           
Sector size (logical/physical): 512 bytes / 512 bytes            
I/O size (minimum/optimal): 512 bytes / 512 bytes                
Disk label type: gpt                                             


#         Start          End    Size  Type            Name
 1     10487808   3865468894    1.8T  unknown         ceph data
 2         2048     10487807      5G  unknown         ceph journal

Disk /dev/sdd: 1979.1 GB, 1979120091136 bytes, 3865468928 sectors
Units = sectors of 1 * 512 = 512 bytes                           
Sector size (logical/physical): 512 bytes / 512 bytes            
I/O size (minimum/optimal): 512 bytes / 512 bytes                
Disk label type: gpt                                             


#         Start          End    Size  Type            Name
 1     10487808   3865468894    1.8T  unknown         ceph data
 2         2048     10487807      5G  unknown         ceph journal

Disk /dev/sde: 82.9 GB, 82896224256 bytes, 161906688 sectors
Units = sectors of 1 * 512 = 512 bytes                      
Sector size (logical/physical): 512 bytes / 512 bytes       
I/O size (minimum/optimal): 512 bytes / 512 bytes           
Disk label type: dos                                        
Disk identifier: 0x000ecf7a                                 

   Device Boot      Start         End      Blocks   Id  System
/dev/sde1            2048        4095        1024   83  Linux 
/dev/sde2   *        4096   161903069    80949487   83  Linux 
WARNING: fdisk GPT support is currently new, and therefore in an experimental phase. Use at your own discretion.
WARNING: fdisk GPT support is currently new, and therefore in an experimental phase. Use at your own discretion.
######################################################                                                          
192.168.0.15                                                                                                    
######################################################                                                         
WARNING: fdisk GPT support is currently new, and therefore in an experimental phase. Use at your own discretion.

Disk /dev/sdb: 1979.1 GB, 1979120091136 bytes, 3865468928 sectors
Units = sectors of 1 * 512 = 512 bytes                           
Sector size (logical/physical): 512 bytes / 512 bytes            
I/O size (minimum/optimal): 512 bytes / 512 bytes                
Disk label type: gpt                                             


#         Start          End    Size  Type            Name
 1     10487808   3865468894    1.8T  unknown         ceph data
 2         2048     10487807      5G  unknown         ceph journal

Disk /dev/sda: 1979.1 GB, 1979120091136 bytes, 3865468928 sectors
Units = sectors of 1 * 512 = 512 bytes                           
Sector size (logical/physical): 512 bytes / 512 bytes            
I/O size (minimum/optimal): 512 bytes / 512 bytes                


Disk /dev/sdc: 1979.1 GB, 1979120091136 bytes, 3865468928 sectors
Units = sectors of 1 * 512 = 512 bytes                           
Sector size (logical/physical): 512 bytes / 512 bytes            
I/O size (minimum/optimal): 512 bytes / 512 bytes                
Disk label type: gpt                                             


#         Start          End    Size  Type            Name
 1     10487808   3865468894    1.8T  unknown         ceph data
 2         2048     10487807      5G  unknown         ceph journal

Disk /dev/sdd: 1979.1 GB, 1979120091136 bytes, 3865468928 sectors
Units = sectors of 1 * 512 = 512 bytes                           
Sector size (logical/physical): 512 bytes / 512 bytes            
I/O size (minimum/optimal): 512 bytes / 512 bytes                
Disk label type: gpt                                             


#         Start          End    Size  Type            Name
 1     10487808   3865468894    1.8T  unknown         ceph data
 2         2048     10487807      5G  unknown         ceph journal

Disk /dev/sde: 82.9 GB, 82896224256 bytes, 161906688 sectors
Units = sectors of 1 * 512 = 512 bytes                      
Sector size (logical/physical): 512 bytes / 512 bytes       
I/O size (minimum/optimal): 512 bytes / 512 bytes           
Disk label type: dos                                        
Disk identifier: 0x000cf73e                                 

   Device Boot      Start         End      Blocks   Id  System
/dev/sde1            2048        4095        1024   83  Linux 
/dev/sde2   *        4096   161903069    80949487   83  Linux 
WARNING: fdisk GPT support is currently new, and therefore in an experimental phase. Use at your own discretion.
WARNING: fdisk GPT support is currently new, and therefore in an experimental phase. Use at your own discretion.
######################################################                                                          
192.168.0.16                                                                                                    
######################################################                                                        
WARNING: fdisk GPT support is currently new, and therefore in an experimental phase. Use at your own discretion.
WARNING: fdisk GPT support is currently new, and therefore in an experimental phase. Use at your own discretion.
WARNING: fdisk GPT support is currently new, and therefore in an experimental phase. Use at your own discretion.

Disk /dev/sda: 1979.1 GB, 1979120091136 bytes, 3865468928 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes


Disk /dev/sdc: 1979.1 GB, 1979120091136 bytes, 3865468928 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk label type: gpt


#         Start          End    Size  Type            Name
 1     10487808   3865468894    1.8T  unknown         ceph data
 2         2048     10487807      5G  unknown         ceph journal

Disk /dev/sdb: 1979.1 GB, 1979120091136 bytes, 3865468928 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk label type: gpt


#         Start          End    Size  Type            Name
 1     10487808   3865468894    1.8T  unknown         ceph data
 2         2048     10487807      5G  unknown         ceph journal

Disk /dev/sdd: 1979.1 GB, 1979120091136 bytes, 3865468928 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk label type: gpt


#         Start          End    Size  Type            Name
 1     10487808   3865468894    1.8T  unknown         ceph data
 2         2048     10487807      5G  unknown         ceph journal

Disk /dev/sde: 82.9 GB, 82896224256 bytes, 161906688 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk label type: dos
Disk identifier: 0x00039053

   Device Boot      Start         End      Blocks   Id  System
/dev/sde1            2048        4095        1024   83  Linux
/dev/sde2   *        4096   161903069    80949487   83  Linux




Expected result:
9 osds

Comment 2 Giulio Fidente 2016-09-20 19:33:43 UTC
It looks like ceph-disk prepare succeeded for all nodes, can you add output from:

  # ceph-disk list

from one of the failing nodes to confirm they all have the correct FSID?

This seems to be a duplicate of BZ 1371218 only for some reason happening with a much lower parallelism.

Comment 3 Alexander Chuzhoy 2016-09-20 20:00:49 UTC
######################################################
192.168.0.6
#######################################################
/dev/sda other, unknown
/dev/sdb :
 /dev/sdb2 ceph journal, for /dev/sdb1
 /dev/sdb1 ceph data, active, cluster ceph, osd.4, journal /dev/sdb2
/dev/sdc :
 /dev/sdc2 ceph journal, for /dev/sdc1
 /dev/sdc1 ceph data, active, cluster ceph, osd.0, journal /dev/sdc2
/dev/sdd :
 /dev/sdd2 ceph journal, for /dev/sdd1
 /dev/sdd1 ceph data, active, cluster ceph, osd.3, journal /dev/sdd2
/dev/sde :
 /dev/sde1 other, iso9660
 /dev/sde2 other, ext4, mounted on /
######################################################
192.168.0.15
#######################################################
/dev/sda other, unknown
/dev/sdb :
 /dev/sdb2 ceph journal, for /dev/sdb1
 /dev/sdb1 ceph data, prepared, unknown cluster 74c2353a-7a1a-11e6-90da-5254003ec994, osd.3, journal /dev/sdb2
/dev/sdc :
 /dev/sdc2 ceph journal, for /dev/sdc1
 /dev/sdc1 ceph data, prepared, unknown cluster 74c2353a-7a1a-11e6-90da-5254003ec994, osd.0, journal /dev/sdc2
/dev/sdd :
 /dev/sdd2 ceph journal, for /dev/sdd1
 /dev/sdd1 ceph data, active, cluster ceph, osd.1, journal /dev/sdd2
/dev/sde :
 /dev/sde1 other, iso9660
 /dev/sde2 other, ext4, mounted on /
######################################################
192.168.0.16
#######################################################
/dev/sda other, unknown
/dev/sdb :
 /dev/sdb2 ceph journal, for /dev/sdb1
 /dev/sdb1 ceph data, prepared, unknown cluster 74c2353a-7a1a-11e6-90da-5254003ec994, osd.5, journal /dev/sdb2
/dev/sdc :
 /dev/sdc2 ceph journal, for /dev/sdc1
 /dev/sdc1 ceph data, prepared, unknown cluster 74c2353a-7a1a-11e6-90da-5254003ec994, osd.1, journal /dev/sdc2
/dev/sdd :
 /dev/sdd2 ceph journal, for /dev/sdd1
 /dev/sdd1 ceph data, active, cluster ceph, osd.2, journal /dev/sdd2
/dev/sde :
 /dev/sde1 other, iso9660
 /dev/sde2 other, ext4, mounted on /

Comment 4 Giulio Fidente 2016-09-20 20:25:29 UTC
Not a duplicate, looks like some disks were used for a different Ceph deployment and have not been re-used in the latest deployment attempt. This is because by default Director will not re-use disks containing data from a different Ceph cluster.

You can zap the disks automatically with every new deployment attempt as described in [1]. If that works, then I think we can close this as NOTABUG.

1. https://access.redhat.com/documentation/en/red-hat-openstack-platform/9/single/red-hat-ceph-storage-for-the-overcloud/#Formatting_Ceph_Storage_Nodes_Disks_to_GPT

Comment 5 Alexander Chuzhoy 2016-09-20 20:28:41 UTC
Per last comment, needed to zap the disks.


Note You need to log in before you can comment on or make changes to this bug.