Bug 1655378

Summary: ceph-disk dedicated_devices scenario is failing to start OSD service
Product: Red Hat Ceph Storage Reporter: Ramakrishnan Periyasamy <rperiyas>
Component: Ceph-AnsibleAssignee: leseb <shan>
Status: CLOSED ERRATA QA Contact: Ramakrishnan Periyasamy <rperiyas>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 3.2CC: aschoen, ceph-eng-bugs, gmeno, hnallurv, nthomas, sankarshan, seb, tserlin, vashastr
Target Milestone: rcKeywords: Regression
Target Release: 3.2   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: RHEL: ceph-ansible-3.2.0-0.1.rc8.el7cp Ubuntu: ceph-ansible_3.2.0~rc8-2redhat1 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-01-03 19:02:28 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
Ansible log file. none

Description Ramakrishnan Periyasamy 2018-12-03 04:22:59 UTC
Created attachment 1510792 [details]
Ansible log file.

Description of problem:
Ceph-disk dedicated_devices scenario is failing to start OSD service in containers, observed this issue in multiple machines.

/etc/ansible/hosts file data
[osds]
cephqe-node3 dedicated_devices="['/dev/nvme0n1','/dev/sdd']" devices="['/dev/sdb','/dev/sdc']" osd_scenario="non-collocated" dmcrypt="True" osd_objectstore="filestore"
cephqe-node4 dedicated_devices="['/dev/nvme0n1','/dev/sdd']" devices="['/dev/sdb','/dev/sdc']" osd_scenario="non-collocated" osd_objectstore="filestore"

OSD's are repeatedly failing to start osd daemons.

journalctl error message in OSD's 
-----------------------------------
Dec 03 09:45:47 cephqe-node4.lab.eng.blr.redhat.com ceph-osd-run.sh[120149]: 2018-12-03 09:45:47  /entrypoint.sh: static: does not generate config
Dec 03 09:45:47 cephqe-node4.lab.eng.blr.redhat.com ceph-osd-run.sh[120149]: /dev/nvme0n1p1
Dec 03 09:45:48 cephqe-node4.lab.eng.blr.redhat.com ceph-osd-run.sh[120149]: main_activate: path = /dev/sdb1
Dec 03 09:45:48 cephqe-node4.lab.eng.blr.redhat.com ceph-osd-run.sh[120149]: get_dm_uuid: get_dm_uuid /dev/sdb1 uuid path is /sys/dev/block/8:17/dm/uuid
Dec 03 09:45:48 cephqe-node4.lab.eng.blr.redhat.com ceph-osd-run.sh[120149]: command: Running command: /usr/sbin/blkid -o udev -p /dev/sdb1
Dec 03 09:45:48 cephqe-node4.lab.eng.blr.redhat.com ceph-osd-run.sh[120149]: command: Running command: /sbin/blkid -p -s TYPE -o value -- /dev/sdb1
Dec 03 09:45:48 cephqe-node4.lab.eng.blr.redhat.com ceph-osd-run.sh[120149]: command: Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_mount_options_xfs
Dec 03 09:45:49 cephqe-node4.lab.eng.blr.redhat.com ceph-osd-run.sh[120149]: mount: Mounting /dev/sdb1 on /var/lib/ceph/tmp/mnt.3fi6L2 with options noatime,largeio,inode64,swalloc
Dec 03 09:45:49 cephqe-node4.lab.eng.blr.redhat.com ceph-osd-run.sh[120149]: command_check_call: Running command: /usr/bin/mount -t xfs -o noatime,largeio,inode64,swalloc -- /dev/sdb1 /var/lib/ceph/tmp/mnt.3fi6L2
Dec 03 09:45:49 cephqe-node4.lab.eng.blr.redhat.com ceph-osd-run.sh[120149]: command: Running command: /usr/sbin/restorecon /var/lib/ceph/tmp/mnt.3fi6L2
Dec 03 09:45:49 cephqe-node4.lab.eng.blr.redhat.com ceph-osd-run.sh[120149]: activate: Cluster uuid is aade34f6-53bb-43fa-8d9b-d5a51a4be74d
Dec 03 09:45:49 cephqe-node4.lab.eng.blr.redhat.com ceph-osd-run.sh[120149]: command: Running command: /usr/bin/ceph-osd --cluster=ceph --show-config-value=fsid
Dec 03 09:45:50 cephqe-node4.lab.eng.blr.redhat.com ceph-osd-run.sh[120149]: activate: Cluster name is ceph
Dec 03 09:45:50 cephqe-node4.lab.eng.blr.redhat.com ceph-osd-run.sh[120149]: activate: OSD uuid is a23a3aa3-8205-44d0-b4ce-a768f40ea602
Dec 03 09:45:50 cephqe-node4.lab.eng.blr.redhat.com ceph-osd-run.sh[120149]: activate: OSD id is 6
Dec 03 09:45:50 cephqe-node4.lab.eng.blr.redhat.com ceph-osd-run.sh[120149]: command: Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup init
Dec 03 09:45:50 cephqe-node4.lab.eng.blr.redhat.com ceph-osd-run.sh[120149]: command: Running command: /usr/bin/ceph-detect-init --default sysvinit
Dec 03 09:45:50 cephqe-node4.lab.eng.blr.redhat.com ceph-osd-run.sh[120149]: activate: Marking with init system none
Dec 03 09:45:50 cephqe-node4.lab.eng.blr.redhat.com ceph-osd-run.sh[120149]: command: Running command: /usr/sbin/restorecon -R /var/lib/ceph/tmp/mnt.3fi6L2/none
Dec 03 09:45:50 cephqe-node4.lab.eng.blr.redhat.com ceph-osd-run.sh[120149]: command: Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/tmp/mnt.3fi6L2/none
Dec 03 09:45:50 cephqe-node4.lab.eng.blr.redhat.com ceph-osd-run.sh[120149]: activate: ceph osd.6 data dir is ready at /var/lib/ceph/tmp/mnt.3fi6L2
Dec 03 09:45:50 cephqe-node4.lab.eng.blr.redhat.com ceph-osd-run.sh[120149]: move_mount: Moving mount to final location...
Dec 03 09:45:50 cephqe-node4.lab.eng.blr.redhat.com ceph-osd-run.sh[120149]: command_check_call: Running command: /bin/mount -o noatime,largeio,inode64,swalloc -- /dev/sdb1 /var/lib/ceph/osd/ceph-6
Dec 03 09:45:50 cephqe-node4.lab.eng.blr.redhat.com ceph-osd-run.sh[120149]: command_check_call: Running command: /bin/umount -l -- /var/lib/ceph/tmp/mnt.3fi6L2
Dec 03 09:45:50 cephqe-node4.lab.eng.blr.redhat.com ceph-osd-run.sh[120149]: Waiting for /dev/sdb2 to show up
Dec 03 09:45:51 cephqe-node4.lab.eng.blr.redhat.com ceph-osd-run.sh[120149]: Waiting for /dev/sdb2 to show up
Dec 03 09:45:52 cephqe-node4.lab.eng.blr.redhat.com ceph-osd-run.sh[120149]: Waiting for /dev/sdb2 to show up
Dec 03 09:45:53 cephqe-node4.lab.eng.blr.redhat.com ceph-osd-run.sh[120149]: Waiting for /dev/sdb2 to show up
Dec 03 09:45:54 cephqe-node4.lab.eng.blr.redhat.com ceph-osd-run.sh[120149]: Waiting for /dev/sdb2 to show up
Dec 03 09:45:55 cephqe-node4.lab.eng.blr.redhat.com ceph-osd-run.sh[120149]: Waiting for /dev/sdb2 to show up
Dec 03 09:45:56 cephqe-node4.lab.eng.blr.redhat.com ceph-osd-run.sh[120149]: Waiting for /dev/sdb2 to show up
Dec 03 09:45:57 cephqe-node4.lab.eng.blr.redhat.com ceph-osd-run.sh[120149]: Waiting for /dev/sdb2 to show up
Dec 03 09:45:58 cephqe-node4.lab.eng.blr.redhat.com ceph-osd-run.sh[120149]: Waiting for /dev/sdb2 to show up
Dec 03 09:45:59 cephqe-node4.lab.eng.blr.redhat.com ceph-osd-run.sh[120149]: Waiting for /dev/sdb2 to show up
Dec 03 09:46:00 cephqe-node4.lab.eng.blr.redhat.com systemd[1]: ceph-osd@sdb.service: main process exited, code=exited, status=124/n/a
Dec 03 09:46:00 cephqe-node4.lab.eng.blr.redhat.com docker[171709]: Error response from daemon: No such container: ceph-osd-cephqe-node4-sdb
Dec 03 09:46:00 cephqe-node4.lab.eng.blr.redhat.com systemd[1]: Unit ceph-osd@sdb.service entered failed state.
Dec 03 09:46:00 cephqe-node4.lab.eng.blr.redhat.com systemd[1]: ceph-osd@sdb.service failed.
[ubuntu@cephqe-node4 ~]$ lsblk
NAME                        MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda                           8:0    0   279G  0 disk 
├─sda1                        8:1    0     1G  0 part /boot
└─sda2                        8:2    0   278G  0 part 
  ├─rhel_cephqe--node4-root 253:0    0    50G  0 lvm  /
  ├─rhel_cephqe--node4-swap 253:1    0     4G  0 lvm  [SWAP]
  └─rhel_cephqe--node4-home 253:2    0   224G  0 lvm  /home
sdb                           8:16   0   1.1T  0 disk 
└─sdb1                        8:17   0   1.1T  0 part 
sdc                           8:32   0   1.1T  0 disk 
└─sdc1                        8:33   0   1.1T  0 part 
sdd                           8:48   0 223.1G  0 disk 
└─sdd1                        8:49   0     5G  0 part 
nvme0n1                     259:0    0 931.5G  0 disk 
└─nvme0n1p1                 259:1    0     5G  0 part 
[ubuntu@cephqe-node4 ~]$ 


Version-Release number of selected component (if applicable):
ceph version 12.2.8-44.el7cp (31b25bc7bd626437e6b3da93b2de79640d8ccf61) luminous (stable)
ceph-ansible-3.2.0-0.1.rc5.el7cp.noarch
ansible-2.6.8-1.el7ae.noarch

How reproducible:
4/4

Steps to Reproduce:
1. Start ceph-ansible playbook with ceph-disk dedicated devices osd scenario.

Actual results:
OSD creation failed.

Expected results:
OSD creation should succeed.

Additional info:
Attached ansible log file.

Comment 10 errata-xmlrpc 2019-01-03 19:02:28 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0020