Bug 1297251
Summary: | Ovecloud Deploy OSP7 y2 on RHEL 7.2 fails on Ceph Install | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Rama <rnishtal> | ||||
Component: | openstack-puppet-modules | Assignee: | Giulio Fidente <gfidente> | ||||
Status: | CLOSED ERRATA | QA Contact: | Alexander Chuzhoy <sasha> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | high | ||||||
Version: | 7.0 (Kilo) | CC: | alan_bishop, arkady_kanevsky, athomas, bengland, bhouser, chhudson, dgurtner, emacchi, gfidente, hbrock, jcoufal, jdonohue, jdurgin, jguiditt, jtaleric, kbader, ldachary, mburns, mcornea, mflusche, morazi, racedoro, rhel-osp-director-maint, rnishtal, sreichar, sweil, twilkins, yeylon | ||||
Target Milestone: | z4 | Keywords: | ZStream | ||||
Target Release: | 7.0 (Kilo) | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | openstack-puppet-modules-2015.1.8-51.el7ost | Doc Type: | Bug Fix | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | |||||||
: | 1309812 1309926 (view as bug list) | Environment: | |||||
Last Closed: | 2016-02-18 16:43:55 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1191185, 1309812, 1309926 | ||||||
Attachments: |
|
Description
Rama
2016-01-11 01:05:36 UTC
Created attachment 1113466 [details]
Ceph Storage node[0] journalctl log
This problem is consistently reproducible. Repeated the install and it is the same problem. Overcloud deploy said completed [2016-01-12 08:25:41,375] DEBUG cliff.commandmanager found command 'hypervisor_stats_show [2016-01-12 10:40:44,552] os_cloud_config.utils.clients Creating nova client Took around 2 hrs 15 mts. Before rebooting the storage nodes ================================== [root@overcloud-cephstorage-0 ~]# ceph osd tree | grep down | wc -l 24 [root@overcloud-cephstorage-0 ~]# ceph -s cluster 213cf9f0-b949-11e5-b2d6-0025b522225f health HEALTH_WARN 256 pgs stuck inactive 256 pgs stuck unclean monmap e2: 3 mons at {overcloud-controller-0=10.22.120.52:6789/0,overcloud-controller-1=10.22.120.51:6789/0,overcloud-controller-2=10.22.120.54:6789/0} election epoch 8, quorum 0,1,2 overcloud-controller-1,overcloud-controller-0,overcloud-controller-2 osdmap e52: 24 osds: 0 up, 0 in pgmap v53: 256 pgs, 4 pools, 0 bytes data, 0 objects 0 kB used, 0 kB / 0 kB avail 256 creating After rebooting the first node. [heat-admin@overcloud-cephstorage-0 ~]$ ceph -s 2016-01-12 14:10:42.254246 7f8808635700 -1 monclient(hunting): ERROR: missing keyring, cannot use cephx for authentication 2016-01-12 14:10:42.254249 7f8808635700 0 librados: client.admin initialization error (2) No such file or directory Then start the monitors running on all the 3 controllers. [root@overcloud-cephstorage-2 ~]# ceph -s cluster 213cf9f0-b949-11e5-b2d6-0025b522225f health HEALTH_WARN 1 mons down, quorum 0,1 overcloud-controller-1,overcloud-controller-0 monmap e2: 3 mons at {overcloud-controller-0=10.22.120.52:6789/0,overcloud-controller-1=10.22.120.51:6789/0,overcloud-controller-2=10.22.120.54:6789/0} election epoch 18, quorum 0,1 overcloud-controller-1,overcloud-controller-0 osdmap e84: 24 osds: 24 up, 24 in pgmap v123: 256 pgs, 4 pools, 0 bytes data, 0 objects 850 MB used, 128 TB / 128 TB avail After restarting the monitors, post reboot we find the health as ok too. [root@overcloud-controller-0 ~]# ceph -s cluster 213cf9f0-b949-11e5-b2d6-0025b522225f health HEALTH_OK monmap e2: 3 mons at {overcloud-controller-0=10.22.120.52:6789/0,overcloud-controller-1=10.22.120.51:6789/0,overcloud-controller-2=10.22.120.54:6789/0} election epoch 30, quorum 0,1,2 overcloud-controller-1,overcloud-controller-0,overcloud-controller-2 osdmap e84: 24 osds: 24 up, 24 in pgmap v132: 256 pgs, 4 pools, 0 bytes data, 0 objects 846 MB used, 128 TB / 128 TB avail 256 active+clean hi Rama, thanks for the report I understand it is reproducible and will try to do so, meantime, when in comment #3 you write: [heat-admin@overcloud-cephstorage-0 ~]$ ceph -s 2016-01-12 14:10:42.254246 7f8808635700 -1 monclient(hunting): ERROR: missing keyring, cannot use cephx for authentication 2016-01-12 14:10:42.254249 7f8808635700 0 librados: client.admin initialization error (2) No such file or directory I think that is normal when given as non-root; only root can read the client.admin keyring It would indeed be useful to know if that command works as expected after the intial deployment (before rebooting the storage nodes) when given as root. I will also check the journal logs meantime, thanks. Hi Giulio, I agree for the typo on ceph -s as non-root user. But in the same comment 3, did capture as root wherein the OSD's are all reported as down before reboot. The command does work. It isn't clear whether activate, blkid or something else that are fixing these issues after reboot. Understood. I was unable to reproduce this with the standard settings though; could you please also check if there are OSDs processes running on the storage nodes before they are rebooted? I was able to reproduce this in an internal cluster, (non -UCS) I will try to reproduce again and the OSD processes before rebooting. Here is more information which seems that ceph-disk-activate isn't happening. Without Reboot ============= Both ceph-s and ceph osd tree report as OSD's down Cannot start a single OSD manually [root@overcloud-cephstorage-0 ~]# /etc/init.d/ceph restart osd.0 /etc/init.d/ceph: osd.0 not found (/etc/ceph/ceph.conf defines , /var/lib/ceph defines ) [root@overcloud-cephstorage-0 ~]# cd /var/lib/ceph/osd/ceph-0 [root@overcloud-cephstorage-0 ceph-0]# ls [root@overcloud-cephstorage-0 ceph-0]# ls -l total 0 Nothing in each ceph directory. Manually activate one OSD ========================= [root@overcloud-cephstorage-0 ceph]# /usr/sbin/ceph-disk -v activate /dev/sdd1 INFO:ceph-disk:Running command: /sbin/blkid -p -s TYPE -ovalue -- /dev/sdd1 INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_mount_options_xfs INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_fs_mount_options_xfs DEBUG:ceph-disk:Mounting /dev/sdd1 on /var/lib/ceph/tmp/mnt.lW9aIY with options noatime,inode64 INFO:ceph-disk:Running command: /usr/bin/mount -t xfs -o noatime,inode64 -- /dev/sdd1 /var/lib/ceph/tmp/mnt.lW9aIY DEBUG:ceph-disk:Cluster uuid is 19f4189a-bdae-11e5-8937-0025b522225f INFO:ceph-disk:Running command: /usr/bin/ceph-osd --cluster=ceph --show-config-value=fsid DEBUG:ceph-disk:Cluster name is ceph DEBUG:ceph-disk:OSD uuid is 93ccf964-6282-49fe-acbc-5dfa1d3f9ec7 DEBUG:ceph-disk:OSD id is 9 INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup init DEBUG:ceph-disk:Marking with init system sysvinit DEBUG:ceph-disk:ceph osd.9 data dir is ready at /var/lib/ceph/tmp/mnt.lW9aIY DEBUG:ceph-disk:Moving mount to final location... INFO:ceph-disk:Running command: /bin/mount -o noatime,inode64 -- /dev/sdd1 /var/lib/ceph/osd/ceph-9 INFO:ceph-disk:Running command: /bin/umount -l -- /var/lib/ceph/tmp/mnt.lW9aIY DEBUG:ceph-disk:Starting ceph osd.9... INFO:ceph-disk:Running command: /usr/sbin/service ceph --cluster ceph start osd.9 === osd.9 === create-or-move updated item name 'osd.9' weight 5.37 at location {host=overcloud-cephstorage-0,root=default} to crush map Starting Ceph osd.9 on overcloud-cephstorage-0... Running as unit run-15939.service. [root@overcloud-cephstorage-0 ceph]# ps -ef | grep ceph root 15941 1 0 04:00 ? 00:00:00 /bin/bash -c ulimit -n 32768; /usr/bin/ceph-osd -i 9 --pid-file /var/run/ceph/osd.9.pid -c /etc/ceph/ce ph.conf --cluster ceph -f root 15944 15941 2 04:00 ? 00:00:00 /usr/bin/ceph-osd -i 9 --pid-file /var/run/ceph/osd.9.pid -c /etc/ceph/ceph.conf --cluster ceph -f root 16069 14392 0 04:00 pts/0 00:00:00 grep --color=auto ceph The ceph process is started now. Run the following on each node ============================== for i in sdd1 sde1 sdf1 sdg1 sdh1 sdi1 sdj1 sdk1; do /usr/sbin/ceph-disk -v activate /dev/$i; done [root@overcloud-cephstorage-2 ~]# ceph -s cluster 19f4189a-bdae-11e5-8937-0025b522225f health HEALTH_OK monmap e1: 3 mons at {overcloud-controller-0=10.22.120.52:6789/0,overcloud-controller-1=10.22.120.51:6789/0,overcloud-controller-2=10.22.120.55:6789/0} election epoch 6, quorum 0,1,2 overcloud-controller-1,overcloud-controller-0,overcloud-controller-2 osdmap e71: 24 osds: 24 up, 24 in pgmap v88: 256 pgs, 4 pools, 0 bytes data, 0 objects 861 MB used, 128 TB / 128 TB avail 256 active+clean [root@overcloud-controller-0 ceph]# grep -i journal /etc/ceph/ceph.conf [root@overcloud-controller-0 ceph]# Also there isn't any journal size in ceph.conf and the default of 5G still remains. hi, I've just reproduced this as well, the default configuration works fine but this will be hit when the osds hiera is changed to make puppet prepare a local disk as OSD; I suspect there could be an issue with the puppet-ceph module and I'm currently investigating it. Guilio, I reproduced twice more. Once without the increased journal size. Guilio, if you'd like access to my config to work the issue, let me know and I'll send creds. It is 3 controllers 2 (only 1 being used - race BZ) computes, a 3 ceph OSDs (rx720xds with 12 HDD & 3 SSD) When using entire disks for the OSDs instead of directories, the puppet-ceph modules expects activation to happen via udev. The relevant snippet is: https://github.com/openstack/puppet-ceph/blob/master/manifests/osd.pp#L89-L96 set -ex if ! test -b ${data} ; then mkdir -p ${data} fi # activate happens via udev when using the entire device if ! test -b ${data} || ! test -b ${data}1 ; then ceph-disk activate ${data} || true fi This would also explain the behavior with the OSDs coming up after a reboot. I would suggest to look into why udev doesn't activate the OSDs anymore, i.e. what changes in the full image could trigger this behavior. Maybe a case for the Ceph devs? Given we haven't changed this code between 7.1 and 7.2, is it possible this is a RHEL 7.2 problem, or a ceph-vs.-rhel-7.2 problem? Adding needinfo on jdurgin to see. Here is what I did to workaround the problem for now to move ahead. (1) Update wipe-disk for NodeUserData and redeploy overcloud { for disk in sdb sdc sdd sde sdf sdg sdh sdi sdj sdk do sgdisk -Z /dev/$disk sgdisk -g /dev/$disk done } > /root/wipe-disk.txt { for disk in sdb sdc do ptype1=45b0969e-9b03-4f30-b4c6-b4b80ceff106 sgdisk --new=1:0:+19080MiB --change-name="1:ceph journal" --typecode="1:$ptype1" /dev/$disk sgdisk --new=2:19082MiB:+19080MiB --change-name="2:ceph journal" --typecode="2:$ptype1" /dev/$disk sgdisk --new=3:38163MiB:+19080MiB --change-name="3:ceph journal" --typecode="3:$ptype1" /dev/$disk sgdisk --new=4:57244MiB:+19080MiB --change-name="4:ceph journal" --typecode="4:$ptype1" /dev/$disk done } >> /root/wipe-disk.txt (2) Update ceph.conf in all 3 controllers and 3 storage nodes. (3) ceph-disk activate /dev/$I for the data partition Check ceph -s and ceph daemon osd.${i} config get osd_journal_size Yes, this is a ceph-disk issue - similar to https://bugzilla.redhat.com/show_bug.cgi?id=1300617. Working around it is necessary for now, until ceph-disk fixes are backported: http://www.spinics.net/lists/ceph-devel/msg28384.html Hi Rama, OK if I open this bug to Dell as they are exposed to the same issue. Thanks, Joe Thanks David and Josh, I've a tentative change here: https://review.openstack.org/#/c/276141 Waiting for udev to settle after disk prepare didn't help. Josh, do you have better ideas on if and which workaround we could put in place? If the workaround is to remove the ceph udev rules entirely, maybe this should be addressed in the package? Opening this bugzilla to Dell with Cisco's permission. Please be take care to avoid exposing sensitive information in comments @Guilo I'm not sure what you mean by "addressed in the package" ? removing udev files in the context of puppet-ceph makes sense as it implements one specific deployment strategy. Removing them from the packages means ruling out all deployment strategies bsaed on udev rules which is too wide in scope, IMHO. David, basing on previous comments, do you think we should go ahead and update maybe the existing submission [1] so that it deletes the udev rules? 1. https://review.openstack.org/276141 Looks like a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1298620 *** Bug 1298620 has been marked as a duplicate of this bug. *** Removing the udev rules is ugly and annoying, but it is unlikely to break anything unless we make a large change here--and the only reason it's necessary is because we'd prefer not to make disruptive backports. So I think it's a safe, easy path here. When RHCS 2.0 comes around it will be different... but I'm guessing/hoping director will change at the same time? From version #14 of the submission we're implementing a process where the rules are disabled before prepare and later activate all the block devices with 'ceph-disk activate-all' Unfortunately this still doesn't seem to work, activation completes without actually activating any ceph disk. Launching it manually after the deployment activates the disks as expected. We're probably hitting the same timing issue which causes the udev rules to fail. @Giulio could you collect information about why ceph-disk activate-all fails ? with ceph-disk --verbose activate-all you should get an error message of some kind. hi Loic, unfortunately it just does not seem to detect any new partition, see the output from patchset #14 : Exec[ceph-osd-activate-all-/dev/sdb]/returns: + ceph-disk --verbose activate-all Exec[ceph-osd-activate-all-/dev/sdb]/returns: DEBUG:ceph-disk:Scanning /dev/disk/by-parttypeuuid Exec[ceph-osd-activate-all-/dev/sdb]/returns: executed successfully if I add a 'sleep 4s' before 'activate-all' (as per patchset #16) then it will work so that means we should have a workaround now, but it relies on a sleep can you point to an event we could use to figure when it is safe to run activate? @Giulio is /dev/disk/by-partuuid also empty ? @Giulio it means that * partprobe /dev/sdb * udevadm settle (which you do in the puppet module) can return before the udev rules that populate /dev/disk/by-*uuid are finished. This is news to me and I'll need to figure out how that can happen. hi Loic, yes it is empty during the puppet run; if I try to list its contents in the 'activate-all' puppet exec it fails because the directory does not exist yet: set -ex ls -la /dev/disk/by-parttypeuuid/ ceph-disk --verbose activate-all returned 2 instead of one of [0] if I run it after (again, after a sleep of a few seconds) it pops up with the appropriate contents # ls -l /dev/disk/by-parttypeuuid/ total 0 lrwxrwxrwx. 1 root root 10 Feb 10 11:56 45b0969e-9b03-4f30-b4c6-b4b80ceff106.fffbebd9-bbdf-4a6e-9fdc-943c6382113d -> ../../sdb2 lrwxrwxrwx. 1 root root 10 Feb 10 11:56 4fbd7e29-9d25-41b8-afd0-062c0ceff05d.107f1b4e-ec68-4c43-995c-e910565b91b6 -> ../../sdb1 @Giulio what about /dev/disk/by-partuuid ? (note the different name). Do you have 60-ceph-partuuid-workaround.rules installed ? If not which file /lib/udev/rules.d contains the string by-parttypeuuid ? Here is a theory: The puppet module does ceph-disk list after prepare, which relies on https://github.com/ceph/ceph/blob/hammer/src/ceph-disk#L2578 which calls sgdisk -i https://github.com/ceph/ceph/blob/hammer/src/ceph-disk#L2444 On a machine with udev > 214 (on your machine you have 219 according to udevadm --version) and that will do the equivalent of partprobe and remove devices and add them again. Which explains the /dev/disk/by-parttypeuuid/ directory is not populated. There are more details about that problem in the hammer backport which is available at https://github.com/ceph/ceph/commit/88ffcc2cbd73602819ad653695de7e8718f6707d It is possible that this problem is also the cause of the original race you're having. ceph-disk list immediately after ceph-disk prepare would race with the ceph-disk activate run indirectly by the udev rules. David, Loic, patchset #18 works for me, I can successfully deploy passing to ::osd an entire unpartitioned disk. Please let me know what you think. https://review.openstack.org/#/c/276141/18 @Giulio I don't see what has changed ? Loic, there is a sleep 3s before activate-all @Giulio this is very fragile. It would be better to either not run ceph-disk list before activate to not run into http://tracker.ceph.com/issues/14080 or apply https://github.com/ceph/ceph/pull/7475 which is the hammer backport fixing it. hi Loic thanks for helping. So the module used to do a 'list' in both the prepare and the activate puppet "unless" clause which means it gets executed *before* prepare and then again *before* activate (which follows prepare). This is isn't related to the current change but if useful/related maybe we can remove it and rely only on the ls command? See the OR conditions at [1] and [2]. Alternatively, are there chances to get the ceph-disk list fixes in a Hammer package which we could include in the images? And also, to make sure I understand it correctly, either the 'list' fix or the removal of the 'list' command will allow us to remove the sleep but won't change the fact we'll need to remove the udev rules and use activate-all as proposed in the current change, is this correct? 1. https://github.com/openstack/puppet-ceph/blob/master/manifests/osd.pp#L79 2. https://github.com/openstack/puppet-ceph/blob/master/manifests/osd.pp#L100 Loic, I tried to apply your 'list' patch to ceph-disk manually and it makes activate to work without the sleep as you suggested. It's still necessary to remove the udev rules though. So I think you have to vote on one of: 1. keep a sleep in the workaround 2. build a version of ceph-osd for hammer which includes your patch for ceph-disk Unless there are better short-term alternatives? If applying the patch is not an option for some reason, another way to deal with the problem is to avoid using ceph-disk list between ceph-disk prepare and ceph-disk activate. I suggest you remove the "unless" that is at https://review.openstack.org/#/c/276141/18/manifests/osd.pp line 123 so that ceph-disk is not run. ceph-disk activate is idempotent, there is no harm in running it more than once. And with that I think you can also remove the sleep that is before the activate-all that follows. Loic, understood. That line was there before my change though so we need to understand with David if it's viable. I think it will cause the activate to run when unnecessary but if it's idempotent ... maybe we can do that. The unless is needed. This is not about idempotency of the underlying change, this is about the idempotency of Puppet. Without the unless, Puppet will run the exec and consequently this is not idempotent from a Puppet point of view. Which is why this test is failing: http://logs.openstack.org/41/276141/18/check/gate-puppet-ceph-puppet-beaker-rspec-dsvm-centos7/15509b2/console.html#_2016-02-10_17_43_43_860 Specifically it fails the "catch_changes=>true" part which checks for idempotency. An exit code of 2 in Puppet means successfully applied changes, so this is considered a failure: https://docs.puppetlabs.com/puppet/latest/reference/man/apply.html#OPTIONS Loic, David, patchset #19 works for me. I am activating only ${data}1 now and not adding the sleep. Thoughts? Loic, look at the patch without unless, the puppet run is not idempotent. Emilien, the unless is back in. Currently it fails due to unknown reason on the onlyif (this seems to also only happen on the jenkins env). Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-0265.html I see nothing in the errata that addresses this particular problem. The errata only pertains to an IPv6 issue, and not the OSDs being down. |