Bug 1309926 - Overcloud Deploy OSPd on RHEL 7.2 fails on Ceph Install
Overcloud Deploy OSPd on RHEL 7.2 fails on Ceph Install
Status: CLOSED DUPLICATE of bug 1304367
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-puppet-modules (Show other bugs)
7.0 (Kilo)
x86_64 Linux
high Severity high
: ga
: 8.0 (Liberty)
Assigned To: Emilien Macchi
yeylon@redhat.com
:
Depends On: 1297251
Blocks: 1261979
  Show dependency treegraph
 
Reported: 2016-02-18 19:49 EST by Mike Burns
Modified: 2016-11-22 15:20 EST (History)
46 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1297251
Environment:
Last Closed: 2016-02-25 17:12:25 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
OpenStack gerrit 276141 None None None 2016-02-18 19:49 EST

  None (edit)
Description Mike Burns 2016-02-18 19:49:52 EST
Clone to ensure the patch makes it back into OSP 8 since it was fixed in OSP 7

+++ This bug was initially created as a clone of Bug #1297251 +++

Description of problem:

Installing Overcloud deploy reports completed fine, but Ceph does not come up.
Version-Release number of selected component (if applicable):
[root@osp7-director ~]# rpm -qa | grep oscplugin
python-rdomanager-oscplugin-0.0.10-22.el7ost.noarch
[root@osp7-director ~]# uname -a
Linux osp7-director.cisco.com 3.10.0-327.3.1.el7.x86_64 #1 SMP Fri Nov 20 05:40:26 EST 2015 x86_64 x86_64 x86_64 GNU/Linux
[root@osp7-director ~]# cat /etc/redhat-release 
Red Hat Enterprise Linux Server release 7.2 (Maipo)


How reproducible:
Reproducible with a custom ceph.yaml. The ceph configuration includes SSD disks. It is the same ceph.yaml that worked fine with RHEL 7.1 and y1.

Steps to Reproduce:
1.Have Ceph.yaml as below
[root@osp7-director ~]# cat /usr/share/openstack-tripleo-heat-templates/puppet/hieradata/ceph.yaml
ceph::profile::params::osd_journal_size: 20000
ceph::profile::params::osd_pool_default_pg_num: 128
ceph::profile::params::osd_pool_default_pgp_num: 128
ceph::profile::params::osd_pool_default_size: 3
ceph::profile::params::osd_pool_default_min_size: 1
ceph::profile::params::manage_repo: false
ceph::profile::params::authentication_type: cephx
ceph::profile::params::osds:
    '/dev/sdd':
        journal: '/dev/sdb'
    '/dev/sde':
        journal: '/dev/sdb'
    '/dev/sdf':
        journal: '/dev/sdb'
    '/dev/sdg':
        journal: '/dev/sdb'
    '/dev/sdh':
        journal: '/dev/sdc'
    '/dev/sdi':
        journal: '/dev/sdc'
    '/dev/sdj':
        journal: '/dev/sdc'
    '/dev/sdk':
        journal: '/dev/sdc'

ceph_pools:
  - "%{hiera('cinder_rbd_pool_name')}"
  - "%{hiera('nova::compute::rbd::libvirt_images_rbd_pool')}"
  - "%{hiera('glance::backend::rbd::rbd_store_pool')}"


2.Run overcloud deploy as below
#!/bin/bash
export HEAT_INCLUDE_PASSWORD=1
openstack overcloud deploy --templates \
-e /usr/share/openstack-tripleo-heat-templates/overcloud-resource-registry-puppet.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \
-e /home/stack/templates/network-environment.yaml \
-e /home/stack/templates/storage-environment.yaml \
-e /home/stack/templates/cisco-plugins.yaml \
--control-flavor control --compute-flavor compute --ceph-storage-flavor CephStorage \
--compute-scale 6 --control-scale 3  --ceph-storage-scale 3 \
--libvirt-type kvm \
--ntp-server 171.68.38.66 \
--neutron-network-type vlan \
--neutron-tunnel-type vlan \
--neutron-bridge-mappings datacentre:br-ex,physnet-tenant:br-tenant,floating:br-floating \
--neutron-network-vlan-ranges physnet-tenant:250:749,floating:160:160 \
--neutron-disable-tunneling --timeout 300 \
--verbose --debug --log-file overcloud_new.log

3.
The command completes as
DEBUG: os_cloud_config.utils.clients Creating nova client.
Overcloud Endpoint: http://173.36.215.91:5000/v2.0/
Overcloud Deployed
DEBUG: openstackclient.shell clean_up DeployOvercloud


Actual results:
However ceph is not up.
[heat-admin@overcloud-cephstorage-0 ~]$ sudo -i 


ID WEIGHT    TYPE NAME                        UP/DOWN REWEIGHT PRIMARY-AFFINITY 
-1 128.87988 root default                                                       
-2  42.95996     host overcloud-cephstorage-1                                   
 0   5.37000         osd.0                       down        0          1.00000 
 3   5.37000         osd.3                       down        0          1.00000 
 6   5.37000         osd.6                       down        0          1.00000 
 9   5.37000         osd.9                       down        0          1.00000 
12   5.37000         osd.12                      down        0          1.00000 
15   5.37000         osd.15                      down        0          1.00000 
18   5.37000         osd.18                      down        0          1.00000 
21   5.37000         osd.21                      down        0          1.00000 
-3  42.95996     host overcloud-cephstorage-2                                   
 1   5.37000         osd.1                       down        0          1.00000 
 4   5.37000         osd.4                       down        0          1.00000 
 7   5.37000         osd.7                       down        0          1.00000 
10   5.37000         osd.10                      down        0          1.00000 
13   5.37000         osd.13                      down        0          1.00000 
16   5.37000         osd.16                      down        0          1.00000 
19   5.37000         osd.19                      down        0          1.00000 
22   5.37000         osd.22                      down        0          1.00000 
-4  42.95996     host overcloud-cephstorage-0                                   
 2   5.37000         osd.2                       down        0          1.00000 
 5   5.37000         osd.5                       down        0          1.00000 
 8   5.37000         osd.8                       down        0          1.00000 
11   5.37000         osd.11                      down        0          1.00000 
14   5.37000         osd.14                      down        0          1.00000 
17   5.37000         osd.17                      down        0          1.00000 
20   5.37000         osd.20                      down        0          1.00000 
23   5.37000         osd.23                      down        0          1.00000 

Expected results:

The OSD's should be up and running.
Additional info:

The ceph logs has 

2016-01-10 14:05:16.996011 7f25093d67c0 -1 ^[[0;31m ** ERROR: unable to open OSD superblock on /var/lib/ceph/osd/ceph-0: (2) No such file or directory^[[0m
2016-01-10 14:05:17.799063 7f964a2397c0  0 ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff), process ceph-osd, pid 2460
2016-01-10 14:05:17.805586 7f964a2397c0  1 journal _open /dev/sdb4 fd 4: 5368709120 bytes, block size 4096 bytes, directio = 0, aio = 0
2016-01-10 14:05:20.995530 7fb805ba87c0  0 ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff), process ceph-osd, pid 3290
2016-01-10 14:05:20.995823 7fb805ba87c0 -1 ^[[0;31m ** ERROR: unable to open OSD superblock on /var/lib/ceph/osd/ceph-0: (2) No such file or directory^[[0m

The partitions have been initialzed with wipe.yaml and also manually as below
=============================================================================
service ceph –a stop 
umount -a
for i in `cat /proc/partitions | egrep "5767168000|36700" | awk '{print $4}'`
do
dd if=/dev/zero of=/dev/$i bs=4M count=100
parted --align optimal -s /dev/$i mklabel gpt
done


cat /proc/partitions - 
[root@overcloud-cephstorage-0 ~]# cat /proc/partitions 
major minor  #blocks  name

   8       16  367001600 sdb
   8       17    5242880 sdb1
   8       18    5242880 sdb2
   8       19    5242880 sdb3
   8       20    5242880 sdb4
   8        0  419430400 sda
   8        1       1024 sda1
   8        2  419422972 sda2
   8       32  367001600 sdc
   8       33    5242880 sdc1
   8       34    5242880 sdc2
   8       35    5242880 sdc3
   8       36    5242880 sdc4
   8       48 5767168000 sdd
   8       49 5767166956 sdd1
   8       80 5767168000 sdf
   8       81 5767166956 sdf1
   8       64 5767168000 sde
   8       65 5767166956 sde1
   8       96 5767168000 sdg
   8       97 5767166956 sdg1
   8      112 5767168000 sdh
   8      113 5767166956 sdh1
   8      128 5767168000 sdi
   8      129 5767166956 sdi1
   8      144 5767168000 sdj
   8      145 5767166956 sdj1
   8      160 5767168000 sdk
   8      161 5767166956 sdk1

Check the partition size of sdb and sdc, the journals, they are not 20G per ceph.yaml but only 5G. Unsure from where it picked up 5G partition.

Reboot the storage servers one after the other - 
==============================================
Now ceph comes up.

[root@overcloud-cephstorage-0 ~]# ceph -s 
    cluster a406713a-b7e2-11e5-84b2-0025b522225f
     health HEALTH_OK
     monmap e1: 3 mons at {overcloud-controller-0=10.22.120.54:6789/0,overcloud-controller-1=10.22.120.51:6789/0,overcloud-controller-2=10.22.120.52:6789/0}
            election epoch 6, quorum 0,1,2 overcloud-controller-1,overcloud-controller-2,overcloud-controller-0
     osdmap e68: 24 osds: 24 up, 24 in
      pgmap v92: 256 pgs, 4 pools, 0 bytes data, 0 objects
            836 MB used, 128 TB / 128 TB avail
                 256 active+clean

--- Additional comment from Rama on 2016-01-10 20:11 EST ---



--- Additional comment from Rama on 2016-01-12 14:28:26 EST ---

This problem is consistently reproducible. Repeated the install and it is the same problem.
Overcloud deploy said completed
[2016-01-12 08:25:41,375] DEBUG    cliff.commandmanager found command 'hypervisor_stats_show
[2016-01-12 10:40:44,552] os_cloud_config.utils.clients Creating nova client

Took around 2 hrs 15 mts.


Before rebooting the storage nodes
==================================
[root@overcloud-cephstorage-0 ~]# ceph osd tree  | grep down | wc -l 
24
[root@overcloud-cephstorage-0 ~]# ceph -s
    cluster 213cf9f0-b949-11e5-b2d6-0025b522225f
     health HEALTH_WARN
            256 pgs stuck inactive
            256 pgs stuck unclean
     monmap e2: 3 mons at {overcloud-controller-0=10.22.120.52:6789/0,overcloud-controller-1=10.22.120.51:6789/0,overcloud-controller-2=10.22.120.54:6789/0}
            election epoch 8, quorum 0,1,2 overcloud-controller-1,overcloud-controller-0,overcloud-controller-2
     osdmap e52: 24 osds: 0 up, 0 in
      pgmap v53: 256 pgs, 4 pools, 0 bytes data, 0 objects
            0 kB used, 0 kB / 0 kB avail
                 256 creating

After rebooting the first node.
[heat-admin@overcloud-cephstorage-0 ~]$ ceph -s
2016-01-12 14:10:42.254246 7f8808635700 -1 monclient(hunting): ERROR: missing keyring, cannot use cephx for authentication
2016-01-12 14:10:42.254249 7f8808635700  0 librados: client.admin initialization error (2) No such file or directory


Then start the monitors running on all the 3 controllers.
[root@overcloud-cephstorage-2 ~]# ceph -s 
    cluster 213cf9f0-b949-11e5-b2d6-0025b522225f
     health HEALTH_WARN
            1 mons down, quorum 0,1 overcloud-controller-1,overcloud-controller-0
     monmap e2: 3 mons at {overcloud-controller-0=10.22.120.52:6789/0,overcloud-controller-1=10.22.120.51:6789/0,overcloud-controller-2=10.22.120.54:6789/0}
            election epoch 18, quorum 0,1 overcloud-controller-1,overcloud-controller-0
     osdmap e84: 24 osds: 24 up, 24 in
      pgmap v123: 256 pgs, 4 pools, 0 bytes data, 0 objects
            850 MB used, 128 TB / 128 TB avail

--- Additional comment from Rama on 2016-01-12 14:52:19 EST ---

After restarting the monitors, post reboot we find the health as ok too.
[root@overcloud-controller-0 ~]# ceph -s 
    cluster 213cf9f0-b949-11e5-b2d6-0025b522225f
     health HEALTH_OK
     monmap e2: 3 mons at {overcloud-controller-0=10.22.120.52:6789/0,overcloud-controller-1=10.22.120.51:6789/0,overcloud-controller-2=10.22.120.54:6789/0}
            election epoch 30, quorum 0,1,2 overcloud-controller-1,overcloud-controller-0,overcloud-controller-2
     osdmap e84: 24 osds: 24 up, 24 in
      pgmap v132: 256 pgs, 4 pools, 0 bytes data, 0 objects
            846 MB used, 128 TB / 128 TB avail
                 256 active+clean

--- Additional comment from Giulio Fidente on 2016-01-15 10:56:55 EST ---

hi Rama, thanks for the report

I understand it is reproducible and will try to do so, meantime, when in comment #3 you write:

[heat-admin@overcloud-cephstorage-0 ~]$ ceph -s
2016-01-12 14:10:42.254246 7f8808635700 -1 monclient(hunting): ERROR: missing keyring, cannot use cephx for authentication
2016-01-12 14:10:42.254249 7f8808635700  0 librados: client.admin initialization error (2) No such file or directory

I think that is normal when given as non-root; only root can read the client.admin keyring

It would indeed be useful to know if that command works as expected after the intial deployment (before rebooting the storage nodes) when given as root.

I will also check the journal logs meantime, thanks.

--- Additional comment from Rama on 2016-01-15 12:07:10 EST ---

Hi Giulio,
I agree for the typo on ceph -s as non-root user.
But in the same comment 3, did capture as root wherein the OSD's are all reported as down before reboot. The command does work.
It isn't clear whether activate, blkid or something else that are fixing these issues after reboot.

--- Additional comment from Giulio Fidente on 2016-01-15 13:26:15 EST ---

Understood. I was unable to reproduce this with the standard settings though; could you please also check if there are OSDs processes running on the storage nodes before they are rebooted?

--- Additional comment from Steve Reichard on 2016-01-16 19:59:27 EST ---

I was able to reproduce this in an internal cluster, (non -UCS)

I will try to reproduce again and the OSD processes before rebooting.

--- Additional comment from Rama on 2016-01-18 04:09:19 EST ---

Here is more information which seems that ceph-disk-activate isn't happening.
Without Reboot
=============
Both ceph-s and ceph osd tree report as OSD's down

Cannot start a single OSD manually
[root@overcloud-cephstorage-0 ~]# /etc/init.d/ceph restart osd.0
/etc/init.d/ceph: osd.0 not found (/etc/ceph/ceph.conf defines , /var/lib/ceph defines )

[root@overcloud-cephstorage-0 ~]# cd  /var/lib/ceph/osd/ceph-0
[root@overcloud-cephstorage-0 ceph-0]# ls
[root@overcloud-cephstorage-0 ceph-0]# ls -l 
total 0

Nothing in each ceph directory.

Manually activate one OSD
=========================
[root@overcloud-cephstorage-0 ceph]# /usr/sbin/ceph-disk -v activate /dev/sdd1 
INFO:ceph-disk:Running command: /sbin/blkid -p -s TYPE -ovalue -- /dev/sdd1
INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_mount_options_xfs
INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_fs_mount_options_xfs
DEBUG:ceph-disk:Mounting /dev/sdd1 on /var/lib/ceph/tmp/mnt.lW9aIY with options noatime,inode64
INFO:ceph-disk:Running command: /usr/bin/mount -t xfs -o noatime,inode64 -- /dev/sdd1 /var/lib/ceph/tmp/mnt.lW9aIY
DEBUG:ceph-disk:Cluster uuid is 19f4189a-bdae-11e5-8937-0025b522225f
INFO:ceph-disk:Running command: /usr/bin/ceph-osd --cluster=ceph --show-config-value=fsid
DEBUG:ceph-disk:Cluster name is ceph
DEBUG:ceph-disk:OSD uuid is 93ccf964-6282-49fe-acbc-5dfa1d3f9ec7
DEBUG:ceph-disk:OSD id is 9
INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup init
DEBUG:ceph-disk:Marking with init system sysvinit
DEBUG:ceph-disk:ceph osd.9 data dir is ready at /var/lib/ceph/tmp/mnt.lW9aIY
DEBUG:ceph-disk:Moving mount to final location...
INFO:ceph-disk:Running command: /bin/mount -o noatime,inode64 -- /dev/sdd1 /var/lib/ceph/osd/ceph-9
INFO:ceph-disk:Running command: /bin/umount -l -- /var/lib/ceph/tmp/mnt.lW9aIY
DEBUG:ceph-disk:Starting ceph osd.9...
INFO:ceph-disk:Running command: /usr/sbin/service ceph --cluster ceph start osd.9
=== osd.9 === 
create-or-move updated item name 'osd.9' weight 5.37 at location {host=overcloud-cephstorage-0,root=default} to crush map
Starting Ceph osd.9 on overcloud-cephstorage-0...
Running as unit run-15939.service.
[root@overcloud-cephstorage-0 ceph]# ps -ef | grep ceph 
root     15941     1  0 04:00 ?        00:00:00 /bin/bash -c ulimit -n 32768; /usr/bin/ceph-osd -i 9 --pid-file /var/run/ceph/osd.9.pid -c /etc/ceph/ce
ph.conf --cluster ceph -f
root     15944 15941  2 04:00 ?        00:00:00 /usr/bin/ceph-osd -i 9 --pid-file /var/run/ceph/osd.9.pid -c /etc/ceph/ceph.conf --cluster ceph -f
root     16069 14392  0 04:00 pts/0    00:00:00 grep --color=auto ceph

The ceph process is started now.

Run the following on each node
==============================
for i in sdd1 sde1 sdf1 sdg1 sdh1 sdi1 sdj1 sdk1; do 
/usr/sbin/ceph-disk -v activate /dev/$i; 
done 

[root@overcloud-cephstorage-2 ~]# ceph -s 
    cluster 19f4189a-bdae-11e5-8937-0025b522225f
     health HEALTH_OK
     monmap e1: 3 mons at {overcloud-controller-0=10.22.120.52:6789/0,overcloud-controller-1=10.22.120.51:6789/0,overcloud-controller-2=10.22.120.55:6789/0}
            election epoch 6, quorum 0,1,2 overcloud-controller-1,overcloud-controller-0,overcloud-controller-2
     osdmap e71: 24 osds: 24 up, 24 in
      pgmap v88: 256 pgs, 4 pools, 0 bytes data, 0 objects
            861 MB used, 128 TB / 128 TB avail
                 256 active+clean

--- Additional comment from Rama on 2016-01-18 04:12:10 EST ---

[root@overcloud-controller-0 ceph]# grep -i journal /etc/ceph/ceph.conf 
[root@overcloud-controller-0 ceph]# 

Also there isn't any journal size in ceph.conf and the default of 5G still remains.

--- Additional comment from Giulio Fidente on 2016-01-18 12:12:02 EST ---

hi,

I've just reproduced this as well, the default configuration works fine but this will be hit when the osds hiera is changed to make puppet prepare a local disk  as OSD; I suspect there could be an issue with the puppet-ceph module and I'm currently investigating it.

--- Additional comment from Steve Reichard on 2016-01-18 17:59:38 EST ---

Guilio,

I reproduced twice more.  Once without the increased journal size.

Guilio, if you'd like access to my config to work the issue, let me know and I'll send creds.


It is 3 controllers 2 (only 1 being used - race BZ) computes, a 3 ceph OSDs (rx720xds with 12 HDD & 3 SSD)

--- Additional comment from David Gurtner on 2016-01-21 02:10:08 EST ---

When using entire disks for the OSDs instead of directories, the puppet-ceph modules expects activation to happen via udev. The relevant snippet is:

https://github.com/openstack/puppet-ceph/blob/master/manifests/osd.pp#L89-L96
set -ex
if ! test -b ${data} ; then
  mkdir -p ${data}
fi
# activate happens via udev when using the entire device
if ! test -b ${data} || ! test -b ${data}1 ; then
  ceph-disk activate ${data} || true
fi

This would also explain the behavior with the OSDs coming up after a reboot.

I would suggest to look into why udev doesn't activate the OSDs anymore, i.e. what changes in the full image could trigger this behavior.

Maybe a case for the Ceph devs?

--- Additional comment from Hugh Brock on 2016-01-22 07:50:48 EST ---

Given we haven't changed this code between 7.1 and 7.2, is it possible this is a RHEL 7.2 problem, or a ceph-vs.-rhel-7.2 problem? Adding needinfo on jdurgin to see.

--- Additional comment from Rama on 2016-01-22 09:31:23 EST ---

Here is what I did to workaround the problem for now to move ahead.
(1) Update wipe-disk for NodeUserData and redeploy overcloud
        { for disk in sdb sdc sdd sde sdf sdg sdh sdi sdj sdk
        do
           sgdisk -Z /dev/$disk
           sgdisk -g /dev/$disk
        done } > /root/wipe-disk.txt
        { for disk in sdb sdc
        do
          ptype1=45b0969e-9b03-4f30-b4c6-b4b80ceff106
          sgdisk --new=1:0:+19080MiB  --change-name="1:ceph journal"  --typecode="1:$ptype1" /dev/$disk
          sgdisk --new=2:19082MiB:+19080MiB  --change-name="2:ceph journal"  --typecode="2:$ptype1" /dev/$disk
          sgdisk --new=3:38163MiB:+19080MiB  --change-name="3:ceph journal"  --typecode="3:$ptype1" /dev/$disk
          sgdisk --new=4:57244MiB:+19080MiB  --change-name="4:ceph journal"  --typecode="4:$ptype1" /dev/$disk
        done } >> /root/wipe-disk.txt
(2) Update ceph.conf in all 3 controllers and 3 storage nodes.
(3) ceph-disk activate /dev/$I for the data partition
Check ceph -s and ceph daemon osd.${i} config get osd_journal_size

--- Additional comment from Josh Durgin on 2016-02-03 14:06:18 EST ---

Yes, this is a ceph-disk issue - similar to https://bugzilla.redhat.com/show_bug.cgi?id=1300617.

Working around it is necessary for now, until ceph-disk fixes are backported: http://www.spinics.net/lists/ceph-devel/msg28384.html

--- Additional comment from Joe Donohue on 2016-02-03 17:04:57 EST ---

Hi Rama,

OK if I open this bug to Dell as they are exposed to the same issue.

Thanks,
Joe

--- Additional comment from Giulio Fidente on 2016-02-04 05:47:15 EST ---

Thanks David and Josh, I've a tentative change here:
https://review.openstack.org/#/c/276141

--- Additional comment from Giulio Fidente on 2016-02-04 11:41:40 EST ---

Waiting for udev to settle after disk prepare didn't help.

Josh, do you have better ideas on if and which workaround we could put in place?

--- Additional comment from Giulio Fidente on 2016-02-04 12:08:23 EST ---

If the workaround is to remove the ceph udev rules entirely, maybe this should be addressed in the package?

--- Additional comment from Joe Donohue on 2016-02-08 09:27:39 EST ---

Opening this bugzilla to Dell with Cisco's permission. Please be take care to avoid exposing sensitive information in comments

--- Additional comment from Loic Dachary on 2016-02-08 11:55:37 EST ---

@Guilo I'm not sure what you mean by "addressed in the package" ?

--- Additional comment from Loic Dachary on 2016-02-08 12:15:16 EST ---

removing udev files in the context of puppet-ceph makes sense as it implements one specific deployment strategy. Removing them from the packages means ruling out all deployment strategies bsaed on udev rules which is too wide in scope, IMHO.

--- Additional comment from Giulio Fidente on 2016-02-08 12:19:22 EST ---

David, basing on previous comments, do you think we should go ahead and update maybe the existing submission [1] so that it deletes the udev rules?

1. https://review.openstack.org/276141

--- Additional comment from arkady kanevsky on 2016-02-08 12:21:00 EST ---

Looks like a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1298620

--- Additional comment from Giulio Fidente on 2016-02-08 12:25:02 EST ---



--- Additional comment from Sage Weil on 2016-02-09 13:39:54 EST ---

Removing the udev rules is ugly and annoying, but it is unlikely to break anything unless we make a large change here--and the only reason it's necessary is because we'd prefer not to make disruptive backports.  So I think it's a safe, easy path here.

When RHCS 2.0 comes around it will be different... but I'm guessing/hoping director will change at the same time?

--- Additional comment from Giulio Fidente on 2016-02-10 08:10:15 EST ---

From version #14 of the submission we're implementing a process where the rules are disabled before prepare and later activate all the block devices with 'ceph-disk activate-all'

Unfortunately this still doesn't seem to work, activation completes without actually activating any ceph disk. Launching it manually after the deployment activates the disks as expected. We're probably hitting the same timing issue which causes the udev rules to fail.

--- Additional comment from Loic Dachary on 2016-02-10 10:30:44 EST ---

@Giulio could you collect information about why ceph-disk activate-all fails ? with ceph-disk --verbose activate-all you should get an error message of some kind.

--- Additional comment from Giulio Fidente on 2016-02-10 10:55:43 EST ---

hi Loic, unfortunately it just does not seem to detect any new partition, see the output from patchset #14 :

Exec[ceph-osd-activate-all-/dev/sdb]/returns: + ceph-disk --verbose activate-all
Exec[ceph-osd-activate-all-/dev/sdb]/returns: DEBUG:ceph-disk:Scanning /dev/disk/by-parttypeuuid
Exec[ceph-osd-activate-all-/dev/sdb]/returns: executed successfully

if I add a 'sleep 4s' before 'activate-all' (as per patchset #16) then it will work so that means we should have a workaround now, but it relies on a sleep

can you point to an event we could use to figure when it is safe to run activate?

--- Additional comment from Loic Dachary on 2016-02-10 11:40:04 EST ---

@Giulio is /dev/disk/by-partuuid also empty ?

--- Additional comment from Loic Dachary on 2016-02-10 11:49:01 EST ---

@Giulio it means that 

 * partprobe /dev/sdb
 * udevadm settle

(which you do in the puppet module) can return before the udev rules that populate /dev/disk/by-*uuid are finished. This is news to me and I'll need to figure out how that can happen.

--- Additional comment from Giulio Fidente on 2016-02-10 11:59:00 EST ---

hi Loic, yes it is empty during the puppet run; if I try to list its contents in the 'activate-all' puppet exec it fails because the directory does not exist yet:

set -ex
ls -la /dev/disk/by-parttypeuuid/
ceph-disk --verbose activate-all
 returned 2 instead of one of [0]

if I run it after (again, after a sleep of a few seconds) it pops up with the appropriate contents

# ls -l /dev/disk/by-parttypeuuid/
total 0
lrwxrwxrwx. 1 root root 10 Feb 10 11:56 45b0969e-9b03-4f30-b4c6-b4b80ceff106.fffbebd9-bbdf-4a6e-9fdc-943c6382113d -> ../../sdb2
lrwxrwxrwx. 1 root root 10 Feb 10 11:56 4fbd7e29-9d25-41b8-afd0-062c0ceff05d.107f1b4e-ec68-4c43-995c-e910565b91b6 -> ../../sdb1

--- Additional comment from Loic Dachary on 2016-02-10 12:49:12 EST ---

@Giulio what about /dev/disk/by-partuuid ? (note the different name). Do you have 60-ceph-partuuid-workaround.rules installed ? If not which file /lib/udev/rules.d contains the string by-parttypeuuid ?

--- Additional comment from Loic Dachary on 2016-02-10 13:46:26 EST ---

Here is a theory: The puppet module does ceph-disk list after prepare, which relies on https://github.com/ceph/ceph/blob/hammer/src/ceph-disk#L2578 which calls sgdisk -i https://github.com/ceph/ceph/blob/hammer/src/ceph-disk#L2444

On a machine with udev > 214 (on your machine you have 219 according to udevadm --version) and that will do the equivalent of partprobe and remove devices and add them again. Which explains the /dev/disk/by-parttypeuuid/ directory is not populated.

There are more details about that problem in the hammer backport which is available at https://github.com/ceph/ceph/commit/88ffcc2cbd73602819ad653695de7e8718f6707d

It is possible that this problem is also the cause of the original race you're having. ceph-disk list immediately after ceph-disk prepare would race with the ceph-disk activate run indirectly by the udev rules.

--- Additional comment from Giulio Fidente on 2016-02-10 14:49:55 EST ---

David, Loic, patchset #18 works for me, I can successfully deploy passing to ::osd an entire unpartitioned disk. Please let me know what you think.

https://review.openstack.org/#/c/276141/18

--- Additional comment from Loic Dachary on 2016-02-10 22:54:26 EST ---

@Giulio I don't see what has changed ?

--- Additional comment from Giulio Fidente on 2016-02-11 06:44:29 EST ---

Loic, there is a sleep 3s before activate-all

--- Additional comment from Loic Dachary on 2016-02-11 06:59:26 EST ---

@Giulio this is very fragile. It would be better to either not run ceph-disk list before activate to not run into http://tracker.ceph.com/issues/14080 or apply https://github.com/ceph/ceph/pull/7475 which is the hammer backport fixing it.

--- Additional comment from Giulio Fidente on 2016-02-11 07:50:43 EST ---

hi Loic thanks for helping.

So the module used to do a 'list' in both the prepare and the activate puppet "unless" clause which means it gets executed *before* prepare and then again *before* activate (which follows prepare).

This is isn't related to the current change but if useful/related maybe we can remove it and rely only on the ls command? See the OR conditions at [1] and [2].

Alternatively, are there chances to get the ceph-disk list fixes in a Hammer package which we could include in the images?

And also, to make sure I understand it correctly, either the 'list' fix or the removal of the 'list' command will allow us to remove the sleep but won't change the fact we'll need to remove the udev rules and use activate-all as proposed in the current change, is this correct?

1. https://github.com/openstack/puppet-ceph/blob/master/manifests/osd.pp#L79
2. https://github.com/openstack/puppet-ceph/blob/master/manifests/osd.pp#L100

--- Additional comment from Giulio Fidente on 2016-02-11 09:45:24 EST ---

Loic, I tried to apply your 'list' patch to ceph-disk manually and it makes activate to work without the sleep as you suggested.

It's still necessary to remove the udev rules though.

So I think you have to vote on one of:

1. keep a sleep in the workaround
2. build a version of ceph-osd for hammer which includes your patch for ceph-disk

Unless there are better short-term alternatives?

--- Additional comment from Loic Dachary on 2016-02-11 10:19:45 EST ---

If applying the patch is not an option for some reason, another way to deal with the problem is to avoid using ceph-disk list between ceph-disk prepare and ceph-disk activate.

--- Additional comment from Loic Dachary on 2016-02-11 10:28:56 EST ---

I suggest you remove the "unless" that is at

https://review.openstack.org/#/c/276141/18/manifests/osd.pp

line 123 so that ceph-disk is not run. ceph-disk activate is idempotent, there is no harm in running it more than once. And with that I think you can also remove the sleep that is before the activate-all that follows.

--- Additional comment from Giulio Fidente on 2016-02-11 10:58:25 EST ---

Loic, understood. That line was there before my change though so we need to understand with David if it's viable. I think it will cause the activate to run when unnecessary but if it's idempotent ... maybe we can do that.

--- Additional comment from David Gurtner on 2016-02-11 11:35:01 EST ---

The unless is needed. This is not about idempotency of the underlying change, this is about the idempotency of Puppet. Without the unless, Puppet will run the exec and consequently this is not idempotent from a Puppet point of view. Which is why this test is failing:
http://logs.openstack.org/41/276141/18/check/gate-puppet-ceph-puppet-beaker-rspec-dsvm-centos7/15509b2/console.html#_2016-02-10_17_43_43_860
Specifically it fails the "catch_changes=>true" part which checks for idempotency. An exit code of 2 in Puppet means successfully applied changes, so this is considered a failure: https://docs.puppetlabs.com/puppet/latest/reference/man/apply.html#OPTIONS

--- Additional comment from Giulio Fidente on 2016-02-11 12:01:14 EST ---

Loic, David, patchset #19 works for me. I am activating only ${data}1 now and not adding the sleep. Thoughts?

--- Additional comment from Emilien Macchi on 2016-02-11 13:58:04 EST ---

Loic, look at the patch without unless, the puppet run is not idempotent.

--- Additional comment from David Gurtner on 2016-02-11 14:17:49 EST ---

Emilien, the unless is back in. Currently it fails due to unknown reason on the onlyif (this seems to also only happen on the jenkins env).

--- Additional comment from errata-xmlrpc on 2016-02-18 11:43:55 EST ---

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0265.html
Comment 1 Emilien Macchi 2016-02-25 17:12:25 EST

*** This bug has been marked as a duplicate of bug 1304367 ***
Comment 2 kobi ginon 2016-11-22 15:20:27 EST
My 2 cents for this issue.
i have this issue with ospd8 and rhel 7.2.
Seems that the osd.pp will work fine while the osd installed in on the same Disk.

Since osds.pp is creating multiple resources which are launched in parallel and there is a shared Disk used as a Journal we are bumped into a synchronization issue.

i have tried all the suggested fixes above but none was working for me.
Finally i iterated over the hash which is the parameter being supplied to osds.pp
and created each resource in in the iteration.
might be not such a clever idea and i  still need to see if i need to add parser=future into puppet.conf,
but several iterations over this solution is working.

Regards

Note You need to log in before you can comment on or make changes to this bug.