Bug 1297251

Summary: Ovecloud Deploy OSP7 y2 on RHEL 7.2 fails on Ceph Install
Product: Red Hat OpenStack Reporter: Rama <rnishtal>
Component: openstack-puppet-modulesAssignee: Giulio Fidente <gfidente>
Status: CLOSED ERRATA QA Contact: Alexander Chuzhoy <sasha>
Severity: high Docs Contact:
Priority: high    
Version: 7.0 (Kilo)CC: alan_bishop, arkady_kanevsky, athomas, bengland, bhouser, chhudson, dgurtner, emacchi, gfidente, hbrock, jcoufal, jdonohue, jdurgin, jguiditt, jtaleric, kbader, ldachary, mburns, mcornea, mflusche, morazi, racedoro, rhel-osp-director-maint, rnishtal, sreichar, sweil, twilkins, yeylon
Target Milestone: z4Keywords: ZStream
Target Release: 7.0 (Kilo)   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: openstack-puppet-modules-2015.1.8-51.el7ost Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1309812 1309926 (view as bug list) Environment:
Last Closed: 2016-02-18 16:43:55 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1191185, 1309812, 1309926    
Attachments:
Description Flags
Ceph Storage node[0] journalctl log none

Description Rama 2016-01-11 01:05:36 UTC
Description of problem:

Installing Overcloud deploy reports completed fine, but Ceph does not come up.
Version-Release number of selected component (if applicable):
[root@osp7-director ~]# rpm -qa | grep oscplugin
python-rdomanager-oscplugin-0.0.10-22.el7ost.noarch
[root@osp7-director ~]# uname -a
Linux osp7-director.cisco.com 3.10.0-327.3.1.el7.x86_64 #1 SMP Fri Nov 20 05:40:26 EST 2015 x86_64 x86_64 x86_64 GNU/Linux
[root@osp7-director ~]# cat /etc/redhat-release 
Red Hat Enterprise Linux Server release 7.2 (Maipo)


How reproducible:
Reproducible with a custom ceph.yaml. The ceph configuration includes SSD disks. It is the same ceph.yaml that worked fine with RHEL 7.1 and y1.

Steps to Reproduce:
1.Have Ceph.yaml as below
[root@osp7-director ~]# cat /usr/share/openstack-tripleo-heat-templates/puppet/hieradata/ceph.yaml
ceph::profile::params::osd_journal_size: 20000
ceph::profile::params::osd_pool_default_pg_num: 128
ceph::profile::params::osd_pool_default_pgp_num: 128
ceph::profile::params::osd_pool_default_size: 3
ceph::profile::params::osd_pool_default_min_size: 1
ceph::profile::params::manage_repo: false
ceph::profile::params::authentication_type: cephx
ceph::profile::params::osds:
    '/dev/sdd':
        journal: '/dev/sdb'
    '/dev/sde':
        journal: '/dev/sdb'
    '/dev/sdf':
        journal: '/dev/sdb'
    '/dev/sdg':
        journal: '/dev/sdb'
    '/dev/sdh':
        journal: '/dev/sdc'
    '/dev/sdi':
        journal: '/dev/sdc'
    '/dev/sdj':
        journal: '/dev/sdc'
    '/dev/sdk':
        journal: '/dev/sdc'

ceph_pools:
  - "%{hiera('cinder_rbd_pool_name')}"
  - "%{hiera('nova::compute::rbd::libvirt_images_rbd_pool')}"
  - "%{hiera('glance::backend::rbd::rbd_store_pool')}"


2.Run overcloud deploy as below
#!/bin/bash
export HEAT_INCLUDE_PASSWORD=1
openstack overcloud deploy --templates \
-e /usr/share/openstack-tripleo-heat-templates/overcloud-resource-registry-puppet.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \
-e /home/stack/templates/network-environment.yaml \
-e /home/stack/templates/storage-environment.yaml \
-e /home/stack/templates/cisco-plugins.yaml \
--control-flavor control --compute-flavor compute --ceph-storage-flavor CephStorage \
--compute-scale 6 --control-scale 3  --ceph-storage-scale 3 \
--libvirt-type kvm \
--ntp-server 171.68.38.66 \
--neutron-network-type vlan \
--neutron-tunnel-type vlan \
--neutron-bridge-mappings datacentre:br-ex,physnet-tenant:br-tenant,floating:br-floating \
--neutron-network-vlan-ranges physnet-tenant:250:749,floating:160:160 \
--neutron-disable-tunneling --timeout 300 \
--verbose --debug --log-file overcloud_new.log

3.
The command completes as
DEBUG: os_cloud_config.utils.clients Creating nova client.
Overcloud Endpoint: http://173.36.215.91:5000/v2.0/
Overcloud Deployed
DEBUG: openstackclient.shell clean_up DeployOvercloud


Actual results:
However ceph is not up.
[heat-admin@overcloud-cephstorage-0 ~]$ sudo -i 


ID WEIGHT    TYPE NAME                        UP/DOWN REWEIGHT PRIMARY-AFFINITY 
-1 128.87988 root default                                                       
-2  42.95996     host overcloud-cephstorage-1                                   
 0   5.37000         osd.0                       down        0          1.00000 
 3   5.37000         osd.3                       down        0          1.00000 
 6   5.37000         osd.6                       down        0          1.00000 
 9   5.37000         osd.9                       down        0          1.00000 
12   5.37000         osd.12                      down        0          1.00000 
15   5.37000         osd.15                      down        0          1.00000 
18   5.37000         osd.18                      down        0          1.00000 
21   5.37000         osd.21                      down        0          1.00000 
-3  42.95996     host overcloud-cephstorage-2                                   
 1   5.37000         osd.1                       down        0          1.00000 
 4   5.37000         osd.4                       down        0          1.00000 
 7   5.37000         osd.7                       down        0          1.00000 
10   5.37000         osd.10                      down        0          1.00000 
13   5.37000         osd.13                      down        0          1.00000 
16   5.37000         osd.16                      down        0          1.00000 
19   5.37000         osd.19                      down        0          1.00000 
22   5.37000         osd.22                      down        0          1.00000 
-4  42.95996     host overcloud-cephstorage-0                                   
 2   5.37000         osd.2                       down        0          1.00000 
 5   5.37000         osd.5                       down        0          1.00000 
 8   5.37000         osd.8                       down        0          1.00000 
11   5.37000         osd.11                      down        0          1.00000 
14   5.37000         osd.14                      down        0          1.00000 
17   5.37000         osd.17                      down        0          1.00000 
20   5.37000         osd.20                      down        0          1.00000 
23   5.37000         osd.23                      down        0          1.00000 

Expected results:

The OSD's should be up and running.
Additional info:

The ceph logs has 

2016-01-10 14:05:16.996011 7f25093d67c0 -1 ^[[0;31m ** ERROR: unable to open OSD superblock on /var/lib/ceph/osd/ceph-0: (2) No such file or directory^[[0m
2016-01-10 14:05:17.799063 7f964a2397c0  0 ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff), process ceph-osd, pid 2460
2016-01-10 14:05:17.805586 7f964a2397c0  1 journal _open /dev/sdb4 fd 4: 5368709120 bytes, block size 4096 bytes, directio = 0, aio = 0
2016-01-10 14:05:20.995530 7fb805ba87c0  0 ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff), process ceph-osd, pid 3290
2016-01-10 14:05:20.995823 7fb805ba87c0 -1 ^[[0;31m ** ERROR: unable to open OSD superblock on /var/lib/ceph/osd/ceph-0: (2) No such file or directory^[[0m

The partitions have been initialzed with wipe.yaml and also manually as below
=============================================================================
service ceph –a stop 
umount -a
for i in `cat /proc/partitions | egrep "5767168000|36700" | awk '{print $4}'`
do
dd if=/dev/zero of=/dev/$i bs=4M count=100
parted --align optimal -s /dev/$i mklabel gpt
done


cat /proc/partitions - 
[root@overcloud-cephstorage-0 ~]# cat /proc/partitions 
major minor  #blocks  name

   8       16  367001600 sdb
   8       17    5242880 sdb1
   8       18    5242880 sdb2
   8       19    5242880 sdb3
   8       20    5242880 sdb4
   8        0  419430400 sda
   8        1       1024 sda1
   8        2  419422972 sda2
   8       32  367001600 sdc
   8       33    5242880 sdc1
   8       34    5242880 sdc2
   8       35    5242880 sdc3
   8       36    5242880 sdc4
   8       48 5767168000 sdd
   8       49 5767166956 sdd1
   8       80 5767168000 sdf
   8       81 5767166956 sdf1
   8       64 5767168000 sde
   8       65 5767166956 sde1
   8       96 5767168000 sdg
   8       97 5767166956 sdg1
   8      112 5767168000 sdh
   8      113 5767166956 sdh1
   8      128 5767168000 sdi
   8      129 5767166956 sdi1
   8      144 5767168000 sdj
   8      145 5767166956 sdj1
   8      160 5767168000 sdk
   8      161 5767166956 sdk1

Check the partition size of sdb and sdc, the journals, they are not 20G per ceph.yaml but only 5G. Unsure from where it picked up 5G partition.

Reboot the storage servers one after the other - 
==============================================
Now ceph comes up.

[root@overcloud-cephstorage-0 ~]# ceph -s 
    cluster a406713a-b7e2-11e5-84b2-0025b522225f
     health HEALTH_OK
     monmap e1: 3 mons at {overcloud-controller-0=10.22.120.54:6789/0,overcloud-controller-1=10.22.120.51:6789/0,overcloud-controller-2=10.22.120.52:6789/0}
            election epoch 6, quorum 0,1,2 overcloud-controller-1,overcloud-controller-2,overcloud-controller-0
     osdmap e68: 24 osds: 24 up, 24 in
      pgmap v92: 256 pgs, 4 pools, 0 bytes data, 0 objects
            836 MB used, 128 TB / 128 TB avail
                 256 active+clean

Comment 2 Rama 2016-01-11 01:11:15 UTC
Created attachment 1113466 [details]
Ceph Storage node[0] journalctl log

Comment 3 Rama 2016-01-12 19:28:26 UTC
This problem is consistently reproducible. Repeated the install and it is the same problem.
Overcloud deploy said completed
[2016-01-12 08:25:41,375] DEBUG    cliff.commandmanager found command 'hypervisor_stats_show
[2016-01-12 10:40:44,552] os_cloud_config.utils.clients Creating nova client

Took around 2 hrs 15 mts.


Before rebooting the storage nodes
==================================
[root@overcloud-cephstorage-0 ~]# ceph osd tree  | grep down | wc -l 
24
[root@overcloud-cephstorage-0 ~]# ceph -s
    cluster 213cf9f0-b949-11e5-b2d6-0025b522225f
     health HEALTH_WARN
            256 pgs stuck inactive
            256 pgs stuck unclean
     monmap e2: 3 mons at {overcloud-controller-0=10.22.120.52:6789/0,overcloud-controller-1=10.22.120.51:6789/0,overcloud-controller-2=10.22.120.54:6789/0}
            election epoch 8, quorum 0,1,2 overcloud-controller-1,overcloud-controller-0,overcloud-controller-2
     osdmap e52: 24 osds: 0 up, 0 in
      pgmap v53: 256 pgs, 4 pools, 0 bytes data, 0 objects
            0 kB used, 0 kB / 0 kB avail
                 256 creating

After rebooting the first node.
[heat-admin@overcloud-cephstorage-0 ~]$ ceph -s
2016-01-12 14:10:42.254246 7f8808635700 -1 monclient(hunting): ERROR: missing keyring, cannot use cephx for authentication
2016-01-12 14:10:42.254249 7f8808635700  0 librados: client.admin initialization error (2) No such file or directory


Then start the monitors running on all the 3 controllers.
[root@overcloud-cephstorage-2 ~]# ceph -s 
    cluster 213cf9f0-b949-11e5-b2d6-0025b522225f
     health HEALTH_WARN
            1 mons down, quorum 0,1 overcloud-controller-1,overcloud-controller-0
     monmap e2: 3 mons at {overcloud-controller-0=10.22.120.52:6789/0,overcloud-controller-1=10.22.120.51:6789/0,overcloud-controller-2=10.22.120.54:6789/0}
            election epoch 18, quorum 0,1 overcloud-controller-1,overcloud-controller-0
     osdmap e84: 24 osds: 24 up, 24 in
      pgmap v123: 256 pgs, 4 pools, 0 bytes data, 0 objects
            850 MB used, 128 TB / 128 TB avail

Comment 4 Rama 2016-01-12 19:52:19 UTC
After restarting the monitors, post reboot we find the health as ok too.
[root@overcloud-controller-0 ~]# ceph -s 
    cluster 213cf9f0-b949-11e5-b2d6-0025b522225f
     health HEALTH_OK
     monmap e2: 3 mons at {overcloud-controller-0=10.22.120.52:6789/0,overcloud-controller-1=10.22.120.51:6789/0,overcloud-controller-2=10.22.120.54:6789/0}
            election epoch 30, quorum 0,1,2 overcloud-controller-1,overcloud-controller-0,overcloud-controller-2
     osdmap e84: 24 osds: 24 up, 24 in
      pgmap v132: 256 pgs, 4 pools, 0 bytes data, 0 objects
            846 MB used, 128 TB / 128 TB avail
                 256 active+clean

Comment 5 Giulio Fidente 2016-01-15 15:56:55 UTC
hi Rama, thanks for the report

I understand it is reproducible and will try to do so, meantime, when in comment #3 you write:

[heat-admin@overcloud-cephstorage-0 ~]$ ceph -s
2016-01-12 14:10:42.254246 7f8808635700 -1 monclient(hunting): ERROR: missing keyring, cannot use cephx for authentication
2016-01-12 14:10:42.254249 7f8808635700  0 librados: client.admin initialization error (2) No such file or directory

I think that is normal when given as non-root; only root can read the client.admin keyring

It would indeed be useful to know if that command works as expected after the intial deployment (before rebooting the storage nodes) when given as root.

I will also check the journal logs meantime, thanks.

Comment 6 Rama 2016-01-15 17:07:10 UTC
Hi Giulio,
I agree for the typo on ceph -s as non-root user.
But in the same comment 3, did capture as root wherein the OSD's are all reported as down before reboot. The command does work.
It isn't clear whether activate, blkid or something else that are fixing these issues after reboot.

Comment 7 Giulio Fidente 2016-01-15 18:26:15 UTC
Understood. I was unable to reproduce this with the standard settings though; could you please also check if there are OSDs processes running on the storage nodes before they are rebooted?

Comment 8 Steve Reichard 2016-01-17 00:59:27 UTC
I was able to reproduce this in an internal cluster, (non -UCS)

I will try to reproduce again and the OSD processes before rebooting.

Comment 9 Rama 2016-01-18 09:09:19 UTC
Here is more information which seems that ceph-disk-activate isn't happening.
Without Reboot
=============
Both ceph-s and ceph osd tree report as OSD's down

Cannot start a single OSD manually
[root@overcloud-cephstorage-0 ~]# /etc/init.d/ceph restart osd.0
/etc/init.d/ceph: osd.0 not found (/etc/ceph/ceph.conf defines , /var/lib/ceph defines )

[root@overcloud-cephstorage-0 ~]# cd  /var/lib/ceph/osd/ceph-0
[root@overcloud-cephstorage-0 ceph-0]# ls
[root@overcloud-cephstorage-0 ceph-0]# ls -l 
total 0

Nothing in each ceph directory.

Manually activate one OSD
=========================
[root@overcloud-cephstorage-0 ceph]# /usr/sbin/ceph-disk -v activate /dev/sdd1 
INFO:ceph-disk:Running command: /sbin/blkid -p -s TYPE -ovalue -- /dev/sdd1
INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_mount_options_xfs
INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_fs_mount_options_xfs
DEBUG:ceph-disk:Mounting /dev/sdd1 on /var/lib/ceph/tmp/mnt.lW9aIY with options noatime,inode64
INFO:ceph-disk:Running command: /usr/bin/mount -t xfs -o noatime,inode64 -- /dev/sdd1 /var/lib/ceph/tmp/mnt.lW9aIY
DEBUG:ceph-disk:Cluster uuid is 19f4189a-bdae-11e5-8937-0025b522225f
INFO:ceph-disk:Running command: /usr/bin/ceph-osd --cluster=ceph --show-config-value=fsid
DEBUG:ceph-disk:Cluster name is ceph
DEBUG:ceph-disk:OSD uuid is 93ccf964-6282-49fe-acbc-5dfa1d3f9ec7
DEBUG:ceph-disk:OSD id is 9
INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup init
DEBUG:ceph-disk:Marking with init system sysvinit
DEBUG:ceph-disk:ceph osd.9 data dir is ready at /var/lib/ceph/tmp/mnt.lW9aIY
DEBUG:ceph-disk:Moving mount to final location...
INFO:ceph-disk:Running command: /bin/mount -o noatime,inode64 -- /dev/sdd1 /var/lib/ceph/osd/ceph-9
INFO:ceph-disk:Running command: /bin/umount -l -- /var/lib/ceph/tmp/mnt.lW9aIY
DEBUG:ceph-disk:Starting ceph osd.9...
INFO:ceph-disk:Running command: /usr/sbin/service ceph --cluster ceph start osd.9
=== osd.9 === 
create-or-move updated item name 'osd.9' weight 5.37 at location {host=overcloud-cephstorage-0,root=default} to crush map
Starting Ceph osd.9 on overcloud-cephstorage-0...
Running as unit run-15939.service.
[root@overcloud-cephstorage-0 ceph]# ps -ef | grep ceph 
root     15941     1  0 04:00 ?        00:00:00 /bin/bash -c ulimit -n 32768; /usr/bin/ceph-osd -i 9 --pid-file /var/run/ceph/osd.9.pid -c /etc/ceph/ce
ph.conf --cluster ceph -f
root     15944 15941  2 04:00 ?        00:00:00 /usr/bin/ceph-osd -i 9 --pid-file /var/run/ceph/osd.9.pid -c /etc/ceph/ceph.conf --cluster ceph -f
root     16069 14392  0 04:00 pts/0    00:00:00 grep --color=auto ceph

The ceph process is started now.

Run the following on each node
==============================
for i in sdd1 sde1 sdf1 sdg1 sdh1 sdi1 sdj1 sdk1; do 
/usr/sbin/ceph-disk -v activate /dev/$i; 
done 

[root@overcloud-cephstorage-2 ~]# ceph -s 
    cluster 19f4189a-bdae-11e5-8937-0025b522225f
     health HEALTH_OK
     monmap e1: 3 mons at {overcloud-controller-0=10.22.120.52:6789/0,overcloud-controller-1=10.22.120.51:6789/0,overcloud-controller-2=10.22.120.55:6789/0}
            election epoch 6, quorum 0,1,2 overcloud-controller-1,overcloud-controller-0,overcloud-controller-2
     osdmap e71: 24 osds: 24 up, 24 in
      pgmap v88: 256 pgs, 4 pools, 0 bytes data, 0 objects
            861 MB used, 128 TB / 128 TB avail
                 256 active+clean

Comment 10 Rama 2016-01-18 09:12:10 UTC
[root@overcloud-controller-0 ceph]# grep -i journal /etc/ceph/ceph.conf 
[root@overcloud-controller-0 ceph]# 

Also there isn't any journal size in ceph.conf and the default of 5G still remains.

Comment 11 Giulio Fidente 2016-01-18 17:12:02 UTC
hi,

I've just reproduced this as well, the default configuration works fine but this will be hit when the osds hiera is changed to make puppet prepare a local disk  as OSD; I suspect there could be an issue with the puppet-ceph module and I'm currently investigating it.

Comment 12 Steve Reichard 2016-01-18 22:59:38 UTC
Guilio,

I reproduced twice more.  Once without the increased journal size.

Guilio, if you'd like access to my config to work the issue, let me know and I'll send creds.


It is 3 controllers 2 (only 1 being used - race BZ) computes, a 3 ceph OSDs (rx720xds with 12 HDD & 3 SSD)

Comment 13 David Gurtner 2016-01-21 07:10:08 UTC
When using entire disks for the OSDs instead of directories, the puppet-ceph modules expects activation to happen via udev. The relevant snippet is:

https://github.com/openstack/puppet-ceph/blob/master/manifests/osd.pp#L89-L96
set -ex
if ! test -b ${data} ; then
  mkdir -p ${data}
fi
# activate happens via udev when using the entire device
if ! test -b ${data} || ! test -b ${data}1 ; then
  ceph-disk activate ${data} || true
fi

This would also explain the behavior with the OSDs coming up after a reboot.

I would suggest to look into why udev doesn't activate the OSDs anymore, i.e. what changes in the full image could trigger this behavior.

Maybe a case for the Ceph devs?

Comment 14 Hugh Brock 2016-01-22 12:50:48 UTC
Given we haven't changed this code between 7.1 and 7.2, is it possible this is a RHEL 7.2 problem, or a ceph-vs.-rhel-7.2 problem? Adding needinfo on jdurgin to see.

Comment 16 Rama 2016-01-22 14:31:23 UTC
Here is what I did to workaround the problem for now to move ahead.
(1) Update wipe-disk for NodeUserData and redeploy overcloud
        { for disk in sdb sdc sdd sde sdf sdg sdh sdi sdj sdk
        do
           sgdisk -Z /dev/$disk
           sgdisk -g /dev/$disk
        done } > /root/wipe-disk.txt
        { for disk in sdb sdc
        do
          ptype1=45b0969e-9b03-4f30-b4c6-b4b80ceff106
          sgdisk --new=1:0:+19080MiB  --change-name="1:ceph journal"  --typecode="1:$ptype1" /dev/$disk
          sgdisk --new=2:19082MiB:+19080MiB  --change-name="2:ceph journal"  --typecode="2:$ptype1" /dev/$disk
          sgdisk --new=3:38163MiB:+19080MiB  --change-name="3:ceph journal"  --typecode="3:$ptype1" /dev/$disk
          sgdisk --new=4:57244MiB:+19080MiB  --change-name="4:ceph journal"  --typecode="4:$ptype1" /dev/$disk
        done } >> /root/wipe-disk.txt
(2) Update ceph.conf in all 3 controllers and 3 storage nodes.
(3) ceph-disk activate /dev/$I for the data partition
Check ceph -s and ceph daemon osd.${i} config get osd_journal_size

Comment 17 Josh Durgin 2016-02-03 19:06:18 UTC
Yes, this is a ceph-disk issue - similar to https://bugzilla.redhat.com/show_bug.cgi?id=1300617.

Working around it is necessary for now, until ceph-disk fixes are backported: http://www.spinics.net/lists/ceph-devel/msg28384.html

Comment 18 Joe Donohue 2016-02-03 22:04:57 UTC
Hi Rama,

OK if I open this bug to Dell as they are exposed to the same issue.

Thanks,
Joe

Comment 19 Giulio Fidente 2016-02-04 10:47:15 UTC
Thanks David and Josh, I've a tentative change here:
https://review.openstack.org/#/c/276141

Comment 21 Giulio Fidente 2016-02-04 16:41:40 UTC
Waiting for udev to settle after disk prepare didn't help.

Josh, do you have better ideas on if and which workaround we could put in place?

Comment 22 Giulio Fidente 2016-02-04 17:08:23 UTC
If the workaround is to remove the ceph udev rules entirely, maybe this should be addressed in the package?

Comment 23 Joe Donohue 2016-02-08 14:27:39 UTC
Opening this bugzilla to Dell with Cisco's permission. Please be take care to avoid exposing sensitive information in comments

Comment 24 Loic Dachary 2016-02-08 16:55:37 UTC
@Guilo I'm not sure what you mean by "addressed in the package" ?

Comment 25 Loic Dachary 2016-02-08 17:15:16 UTC
removing udev files in the context of puppet-ceph makes sense as it implements one specific deployment strategy. Removing them from the packages means ruling out all deployment strategies bsaed on udev rules which is too wide in scope, IMHO.

Comment 26 Giulio Fidente 2016-02-08 17:19:22 UTC
David, basing on previous comments, do you think we should go ahead and update maybe the existing submission [1] so that it deletes the udev rules?

1. https://review.openstack.org/276141

Comment 27 arkady kanevsky 2016-02-08 17:21:00 UTC
Looks like a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1298620

Comment 28 Giulio Fidente 2016-02-08 17:25:02 UTC
*** Bug 1298620 has been marked as a duplicate of this bug. ***

Comment 29 Sage Weil 2016-02-09 18:39:54 UTC
Removing the udev rules is ugly and annoying, but it is unlikely to break anything unless we make a large change here--and the only reason it's necessary is because we'd prefer not to make disruptive backports.  So I think it's a safe, easy path here.

When RHCS 2.0 comes around it will be different... but I'm guessing/hoping director will change at the same time?

Comment 30 Giulio Fidente 2016-02-10 13:10:15 UTC
From version #14 of the submission we're implementing a process where the rules are disabled before prepare and later activate all the block devices with 'ceph-disk activate-all'

Unfortunately this still doesn't seem to work, activation completes without actually activating any ceph disk. Launching it manually after the deployment activates the disks as expected. We're probably hitting the same timing issue which causes the udev rules to fail.

Comment 31 Loic Dachary 2016-02-10 15:30:44 UTC
@Giulio could you collect information about why ceph-disk activate-all fails ? with ceph-disk --verbose activate-all you should get an error message of some kind.

Comment 32 Giulio Fidente 2016-02-10 15:55:43 UTC
hi Loic, unfortunately it just does not seem to detect any new partition, see the output from patchset #14 :

Exec[ceph-osd-activate-all-/dev/sdb]/returns: + ceph-disk --verbose activate-all
Exec[ceph-osd-activate-all-/dev/sdb]/returns: DEBUG:ceph-disk:Scanning /dev/disk/by-parttypeuuid
Exec[ceph-osd-activate-all-/dev/sdb]/returns: executed successfully

if I add a 'sleep 4s' before 'activate-all' (as per patchset #16) then it will work so that means we should have a workaround now, but it relies on a sleep

can you point to an event we could use to figure when it is safe to run activate?

Comment 33 Loic Dachary 2016-02-10 16:40:04 UTC
@Giulio is /dev/disk/by-partuuid also empty ?

Comment 34 Loic Dachary 2016-02-10 16:49:01 UTC
@Giulio it means that 

 * partprobe /dev/sdb
 * udevadm settle

(which you do in the puppet module) can return before the udev rules that populate /dev/disk/by-*uuid are finished. This is news to me and I'll need to figure out how that can happen.

Comment 35 Giulio Fidente 2016-02-10 16:59:00 UTC
hi Loic, yes it is empty during the puppet run; if I try to list its contents in the 'activate-all' puppet exec it fails because the directory does not exist yet:

set -ex
ls -la /dev/disk/by-parttypeuuid/
ceph-disk --verbose activate-all
 returned 2 instead of one of [0]

if I run it after (again, after a sleep of a few seconds) it pops up with the appropriate contents

# ls -l /dev/disk/by-parttypeuuid/
total 0
lrwxrwxrwx. 1 root root 10 Feb 10 11:56 45b0969e-9b03-4f30-b4c6-b4b80ceff106.fffbebd9-bbdf-4a6e-9fdc-943c6382113d -> ../../sdb2
lrwxrwxrwx. 1 root root 10 Feb 10 11:56 4fbd7e29-9d25-41b8-afd0-062c0ceff05d.107f1b4e-ec68-4c43-995c-e910565b91b6 -> ../../sdb1

Comment 36 Loic Dachary 2016-02-10 17:49:12 UTC
@Giulio what about /dev/disk/by-partuuid ? (note the different name). Do you have 60-ceph-partuuid-workaround.rules installed ? If not which file /lib/udev/rules.d contains the string by-parttypeuuid ?

Comment 37 Loic Dachary 2016-02-10 18:46:26 UTC
Here is a theory: The puppet module does ceph-disk list after prepare, which relies on https://github.com/ceph/ceph/blob/hammer/src/ceph-disk#L2578 which calls sgdisk -i https://github.com/ceph/ceph/blob/hammer/src/ceph-disk#L2444

On a machine with udev > 214 (on your machine you have 219 according to udevadm --version) and that will do the equivalent of partprobe and remove devices and add them again. Which explains the /dev/disk/by-parttypeuuid/ directory is not populated.

There are more details about that problem in the hammer backport which is available at https://github.com/ceph/ceph/commit/88ffcc2cbd73602819ad653695de7e8718f6707d

It is possible that this problem is also the cause of the original race you're having. ceph-disk list immediately after ceph-disk prepare would race with the ceph-disk activate run indirectly by the udev rules.

Comment 38 Giulio Fidente 2016-02-10 19:49:55 UTC
David, Loic, patchset #18 works for me, I can successfully deploy passing to ::osd an entire unpartitioned disk. Please let me know what you think.

https://review.openstack.org/#/c/276141/18

Comment 39 Loic Dachary 2016-02-11 03:54:26 UTC
@Giulio I don't see what has changed ?

Comment 40 Giulio Fidente 2016-02-11 11:44:29 UTC
Loic, there is a sleep 3s before activate-all

Comment 41 Loic Dachary 2016-02-11 11:59:26 UTC
@Giulio this is very fragile. It would be better to either not run ceph-disk list before activate to not run into http://tracker.ceph.com/issues/14080 or apply https://github.com/ceph/ceph/pull/7475 which is the hammer backport fixing it.

Comment 42 Giulio Fidente 2016-02-11 12:50:43 UTC
hi Loic thanks for helping.

So the module used to do a 'list' in both the prepare and the activate puppet "unless" clause which means it gets executed *before* prepare and then again *before* activate (which follows prepare).

This is isn't related to the current change but if useful/related maybe we can remove it and rely only on the ls command? See the OR conditions at [1] and [2].

Alternatively, are there chances to get the ceph-disk list fixes in a Hammer package which we could include in the images?

And also, to make sure I understand it correctly, either the 'list' fix or the removal of the 'list' command will allow us to remove the sleep but won't change the fact we'll need to remove the udev rules and use activate-all as proposed in the current change, is this correct?

1. https://github.com/openstack/puppet-ceph/blob/master/manifests/osd.pp#L79
2. https://github.com/openstack/puppet-ceph/blob/master/manifests/osd.pp#L100

Comment 44 Giulio Fidente 2016-02-11 14:45:24 UTC
Loic, I tried to apply your 'list' patch to ceph-disk manually and it makes activate to work without the sleep as you suggested.

It's still necessary to remove the udev rules though.

So I think you have to vote on one of:

1. keep a sleep in the workaround
2. build a version of ceph-osd for hammer which includes your patch for ceph-disk

Unless there are better short-term alternatives?

Comment 45 Loic Dachary 2016-02-11 15:19:45 UTC
If applying the patch is not an option for some reason, another way to deal with the problem is to avoid using ceph-disk list between ceph-disk prepare and ceph-disk activate.

Comment 46 Loic Dachary 2016-02-11 15:28:56 UTC
I suggest you remove the "unless" that is at

https://review.openstack.org/#/c/276141/18/manifests/osd.pp

line 123 so that ceph-disk is not run. ceph-disk activate is idempotent, there is no harm in running it more than once. And with that I think you can also remove the sleep that is before the activate-all that follows.

Comment 47 Giulio Fidente 2016-02-11 15:58:25 UTC
Loic, understood. That line was there before my change though so we need to understand with David if it's viable. I think it will cause the activate to run when unnecessary but if it's idempotent ... maybe we can do that.

Comment 48 David Gurtner 2016-02-11 16:35:01 UTC
The unless is needed. This is not about idempotency of the underlying change, this is about the idempotency of Puppet. Without the unless, Puppet will run the exec and consequently this is not idempotent from a Puppet point of view. Which is why this test is failing:
http://logs.openstack.org/41/276141/18/check/gate-puppet-ceph-puppet-beaker-rspec-dsvm-centos7/15509b2/console.html#_2016-02-10_17_43_43_860
Specifically it fails the "catch_changes=>true" part which checks for idempotency. An exit code of 2 in Puppet means successfully applied changes, so this is considered a failure: https://docs.puppetlabs.com/puppet/latest/reference/man/apply.html#OPTIONS

Comment 49 Giulio Fidente 2016-02-11 17:01:14 UTC
Loic, David, patchset #19 works for me. I am activating only ${data}1 now and not adding the sleep. Thoughts?

Comment 51 Emilien Macchi 2016-02-11 18:58:04 UTC
Loic, look at the patch without unless, the puppet run is not idempotent.

Comment 52 David Gurtner 2016-02-11 19:17:49 UTC
Emilien, the unless is back in. Currently it fails due to unknown reason on the onlyif (this seems to also only happen on the jenkins env).

Comment 56 errata-xmlrpc 2016-02-18 16:43:55 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0265.html

Comment 57 Alan Bishop 2016-02-18 18:01:17 UTC
I see nothing in the errata that addresses this particular problem. The errata only pertains to an IPv6 issue, and not the OSDs being down.