Bug 1358627 - Properly created OSD not synced and visible in USM
Summary: Properly created OSD not synced and visible in USM
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Storage Console
Classification: Red Hat
Component: Ceph
Version: 2
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: 2
Assignee: Shubhendu Tripathi
QA Contact: Daniel Horák
URL:
Whiteboard:
Depends On:
Blocks: Console-2-DevFreeze
TreeView+ depends on / blocked
 
Reported: 2016-07-21 07:56 UTC by Daniel Horák
Modified: 2016-08-23 19:57 UTC (History)
2 users (show)

Fixed In Version: rhscon-core-0.0.37-1.el7scon,rhscon-ceph-0.0.37-1.el7scon
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-08-23 19:57:41 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Gerrithub.io 285233 None None None 2016-07-27 15:02:23 UTC
Gerrithub.io 285234 None None None 2016-07-27 15:03:15 UTC
Gerrithub.io 285235 None None None 2016-07-27 15:02:49 UTC
Red Hat Product Errata RHEA-2016:1754 normal SHIPPED_LIVE New packages: Red Hat Storage Console 2.0 2017-04-18 19:09:06 UTC

Description Daniel Horák 2016-07-21 07:56:23 UTC
Description of problem:
  With some specific disk configuration, Ceph cluster is created correctly, but some OSDs are not visible in USM.

  I have a cluster with following disk configuration on two nodes:
  All disks are configured as SSD[1].
  - NODE3: 8 spare disks (sizes: 11G, 11G, 11G, 1T, 1T, 1T, 1T, 1T)
  - NODE4: 8 spare disks (sizes: 6G, 11G, 16G, 100G, 1T, 1T, 1T, 1T)
  
  NOTE: NODE1 and NODE2 have different disks schema and on each of them was properly created 6 OSDs.

  Cluster creation success, but there is problem with adding 2 OSDs on each of the above mentioned nodes:
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    OSD addition failed for [NODE3:map[/dev/vdh:/dev/vdd] NODE3:map[/dev/vdi:/dev/vdb] NODE4:map[/dev/vdf:/dev/vdb] NODE4:map[/dev/vdg:/dev/vdc]]
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

  And I don't see those problematic OSDs on the OSD summary page (Clusters->"CLUSTER"->OSDs).

  But when I check it directly on the nodes and in Ceph, all OSDs were created properly and are there:
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    [root@NODE3 ~]# ceph-disk list
    /dev/vda :
     /dev/vda1 other, swap
     /dev/vda2 other, xfs, mounted on /
    /dev/vdb :
     /dev/vdb1 ceph journal, for /dev/vdf1
     /dev/vdb2 ceph journal, for /dev/vdi1
    /dev/vdc :
     /dev/vdc1 ceph journal, for /dev/vde1
    /dev/vdd :
     /dev/vdd1 ceph journal, for /dev/vdg1
     /dev/vdd2 ceph journal, for /dev/vdh1
    /dev/vde :
     /dev/vde1 ceph data, active, cluster TestClusterA, osd.12, journal /dev/vdc1
    /dev/vdf :
     /dev/vdf1 ceph data, active, cluster TestClusterA, osd.15, journal /dev/vdb1
    /dev/vdg :
     /dev/vdg1 ceph data, active, cluster TestClusterA, osd.13, journal /dev/vdd1
    /dev/vdh :
     /dev/vdh1 ceph data, active, cluster TestClusterA, osd.14, journal /dev/vdd2
    /dev/vdi :
     /dev/vdi1 ceph data, active, cluster TestClusterA, osd.16, journal /dev/vdb2
    [root@NODE3 ~]# lsblk 
    NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
    vda    253:0    0   20G  0 disk 
    ├─vda1 253:1    0    2G  0 part 
    └─vda2 253:2    0   18G  0 part /
    vdb    253:16   0   11G  0 disk 
    ├─vdb1 253:17   0    5G  0 part 
    └─vdb2 253:18   0    5G  0 part 
    vdc    253:32   0   11G  0 disk 
    └─vdc1 253:33   0    5G  0 part 
    vdd    253:48   0   11G  0 disk 
    ├─vdd1 253:49   0    5G  0 part 
    └─vdd2 253:50   0    5G  0 part 
    vde    253:64   0    1T  0 disk 
    └─vde1 253:65   0 1024G  0 part /var/lib/ceph/osd/TestClusterA-12
    vdf    253:80   0    1T  0 disk 
    └─vdf1 253:81   0 1024G  0 part /var/lib/ceph/osd/TestClusterA-15
    vdg    253:96   0    1T  0 disk 
    └─vdg1 253:97   0 1024G  0 part /var/lib/ceph/osd/TestClusterA-13
    vdh    253:112  0    1T  0 disk 
    └─vdh1 253:113  0 1024G  0 part /var/lib/ceph/osd/TestClusterA-14
    vdi    253:128  0    1T  0 disk 
    └─vdi1 253:129  0 1024G  0 part /var/lib/ceph/osd/TestClusterA-16
    
    [root@NODE4 ~]# ceph-disk list
    /dev/vda :
     /dev/vda1 other, swap
     /dev/vda2 other, xfs, mounted on /
    /dev/vdb :
     /dev/vdb1 ceph journal, for /dev/vdf1
    /dev/vdc :
     /dev/vdc2 ceph journal, for /dev/vdg1
     /dev/vdc1 ceph journal, for /dev/vdh1
    /dev/vdd :
     /dev/vdd2 ceph journal, for /dev/vde1
     /dev/vdd1 ceph journal, for /dev/vdi1
    /dev/vde :
     /dev/vde1 ceph data, active, cluster TestClusterA, osd.21, journal /dev/vdd2
    /dev/vdf :
     /dev/vdf1 ceph data, active, cluster TestClusterA, osd.17, journal /dev/vdb1
    /dev/vdg :
     /dev/vdg1 ceph data, active, cluster TestClusterA, osd.19, journal /dev/vdc2
    /dev/vdh :
     /dev/vdh1 ceph data, active, cluster TestClusterA, osd.18, journal /dev/vdc1
    /dev/vdi :
     /dev/vdi1 ceph data, active, cluster TestClusterA, osd.20, journal /dev/vdd1
    [root@NODE4 ~]# lsblk 
    NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
    vda    253:0    0   20G  0 disk 
    ├─vda1 253:1    0    2G  0 part 
    └─vda2 253:2    0   18G  0 part /
    vdb    253:16   0    6G  0 disk 
    └─vdb1 253:17   0    5G  0 part 
    vdc    253:32   0   11G  0 disk 
    ├─vdc1 253:33   0    5G  0 part 
    └─vdc2 253:34   0    5G  0 part 
    vdd    253:48   0   16G  0 disk 
    ├─vdd1 253:49   0    5G  0 part 
    └─vdd2 253:50   0    5G  0 part 
    vde    253:64   0  100G  0 disk 
    └─vde1 253:65   0  100G  0 part /var/lib/ceph/osd/TestClusterA-21
    vdf    253:80   0    1T  0 disk 
    └─vdf1 253:81   0 1024G  0 part /var/lib/ceph/osd/TestClusterA-17
    vdg    253:96   0    1T  0 disk 
    └─vdg1 253:97   0 1024G  0 part /var/lib/ceph/osd/TestClusterA-19
    vdh    253:112  0    1T  0 disk 
    └─vdh1 253:113  0 1024G  0 part /var/lib/ceph/osd/TestClusterA-18
    vdi    253:128  0    1T  0 disk 
    └─vdi1 253:129  0 1024G  0 part /var/lib/ceph/osd/TestClusterA-20
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

  And also ceph knows about all OSDs (6 OSDs peer NODE1 and NODE2, 5 OSDs peer NODE3 and NODE4).
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    [root@MON1 ~]# ceph --cluster TestClusterA osd stat
         osdmap e204: 22 osds: 22 up, 22 in
                flags sortbitwise
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Version-Release number of selected component (if applicable):
  USM Server (RHEL 7.2):
  ceph-ansible-1.0.5-31.el7scon.noarch
  ceph-installer-1.0.14-1.el7scon.noarch
  rhscon-ceph-0.0.33-1.el7scon.x86_64
  rhscon-core-0.0.34-1.el7scon.x86_64
  rhscon-core-selinux-0.0.34-1.el7scon.noarch
  rhscon-ui-0.0.48-1.el7scon.noarch
  
  Ceph MON node (RHEL 7.2):
  calamari-server-1.4.7-1.el7cp.x86_64
  ceph-base-10.2.2-24.el7cp.x86_64
  ceph-common-10.2.2-24.el7cp.x86_64
  ceph-mon-10.2.2-24.el7cp.x86_64
  ceph-selinux-10.2.2-24.el7cp.x86_64
  libcephfs1-10.2.2-24.el7cp.x86_64
  python-cephfs-10.2.2-24.el7cp.x86_64
  rhscon-agent-0.0.15-1.el7scon.noarch
  rhscon-core-selinux-0.0.34-1.el7scon.noarch
  
  Ceph SOD node (RHEL 7.2):
  ceph-base-10.2.2-24.el7cp.x86_64
  ceph-common-10.2.2-24.el7cp.x86_64
  ceph-osd-10.2.2-24.el7cp.x86_64
  ceph-selinux-10.2.2-24.el7cp.x86_64
  libcephfs1-10.2.2-24.el7cp.x86_64
  python-cephfs-10.2.2-24.el7cp.x86_64
  rhscon-agent-0.0.15-1.el7scon.noarch
  rhscon-core-selinux-0.0.34-1.el7scon.noarch

How reproducible:
  100%

Steps to Reproduce:
1. Prepare nodes for USM cluster and on some nodes prepare/use following disks:
    - nodeA: 8 SSD disks (11G, 11G, 11G, 1T, 1T, 1T, 1T, 1T)
    - nodeB: 8 SSD disks (6G, 11G, 16G, 100G, 1T, 1T, 1T, 1T)

  Disks were created via this command:
    `qemu-img create -f qcow2 ${IMAGES_PATH}${NODE_NAME}-${DISK_NUMBER}.img ${SIZE}`
  and configured as SSD[1].

2. Create Ceph cluster via USM, use 5GB journal.
3. Check the "Create Cluster" task and OSD summary page (Clusters->"CLUSTER"->OSDs)
4. Check the number and list of OSDs in Ceph.
  On the MON node:
  # ceph --cluster TestClusterA osd stat
  On the OSD node:
  # lsblk
  # ceph-disk list

Actual results:
  Some OSDs were properly created, but USM reports them as Failed and they are not visible there.

Expected results:
  All properly created OSDs are visible in USM.

Additional info:
  [1] `echo 0 > /sys/block/vdX/queue/rotational`

Comment 2 Daniel Horák 2016-07-28 12:50:31 UTC
Tested on:
  USM Server (RHEL 7.2):
  ceph-ansible-1.0.5-31.el7scon.noarch
  ceph-installer-1.0.14-1.el7scon.noarch
  rhscon-ceph-0.0.37-1.el7scon.x86_64
  rhscon-core-0.0.37-1.el7scon.x86_64
  rhscon-core-selinux-0.0.37-1.el7scon.noarch
  rhscon-ui-0.0.51-1.el7scon.noarch

  Ceph MON (RHEL 7.2):
  calamari-server-1.4.7-1.el7cp.x86_64
  ceph-base-10.2.2-30.el7cp.x86_64
  ceph-common-10.2.2-30.el7cp.x86_64
  ceph-mon-10.2.2-30.el7cp.x86_64
  ceph-selinux-10.2.2-30.el7cp.x86_64
  libcephfs1-10.2.2-30.el7cp.x86_64
  python-cephfs-10.2.2-30.el7cp.x86_64
  rhscon-agent-0.0.16-1.el7scon.noarch
  rhscon-core-selinux-0.0.37-1.el7scon.noarch

  Ceph OSD (RHEL 7.2):
  ceph-base-10.2.2-30.el7cp.x86_64
  ceph-common-10.2.2-30.el7cp.x86_64
  ceph-osd-10.2.2-30.el7cp.x86_64
  ceph-selinux-10.2.2-30.el7cp.x86_64
  libcephfs1-10.2.2-30.el7cp.x86_64
  python-cephfs-10.2.2-30.el7cp.x86_64
  rhscon-agent-0.0.16-1.el7scon.noarch
  rhscon-core-selinux-0.0.37-1.el7scon.noarch

All created OSDs are visible in USM and neither is marked as Failed on the same configuration as described in Comment 0.

Comment 3 Daniel Horák 2016-07-28 12:58:49 UTC
Moving to VERIFIED as peer Comment 2.

Comment 5 errata-xmlrpc 2016-08-23 19:57:41 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2016:1754


Note You need to log in before you can comment on or make changes to this bug.