Bug 1287689

Summary:	[ramdisk] Overcloud fails to deploy
Product:	Red Hat OpenStack	Reporter:	Joe Talerico <jtaleric>
Component:	rhosp-director	Assignee:	chris alfonso <calfonso>
Status:	CLOSED DUPLICATE	QA Contact:	yeylon <yeylon>
Severity:	urgent	Docs Contact:
Priority:	urgent
Version:	7.0 (Kilo)	CC:	dtantsur, hbrock, jcoufal, jtaleric, kprabhak, mburns, mcornea, michele, rhel-osp-director-maint, srevivo
Target Milestone:	ga
Target Release:	8.0 (Liberty)
Hardware:	All
OS:	All
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2016-01-19 11:36:20 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Joe Talerico 2015-12-02 14:00:36 UTC

Description of problem:

When deploying my overcloud with OSP7 (python-rdomanager-oscplugin-0.0.10-19.el7ost.noarch), things simply stop here : http://i.imgur.com/8ecvf7N.png

Talking with trown we looked in the ironic-conductor logs, and it seems like the disk was written, however the error in the console would suggest otherwise. 

Working with Lucas I used the upstream ramdisk/kernel and I was able to get the baremetal nodes to install. 

These nodes deployed OSP7 all day long until the recent update.

Comment 2 Jaromir Coufal 2015-12-09 16:15:19 UTC

Joe, on the call you mentioned it is deployment of OSP8, but here in the bugzilla is OSP7 version. Can you clarify please against which deployment you are hitting this issue? Thanks, Jarda

Comment 3 Jaromir Coufal 2015-12-09 16:15:32 UTC

Joe, on the call you mentioned it is deployment of OSP8, but here in the bugzilla is OSP7 version. Can you clarify please against which deployment you are hitting this issue? Thanks, Jarda

Comment 4 Joe Talerico 2015-12-09 17:22:11 UTC

Jarda - On the call I mentioned once we moved to the RHEL72 based image - which is OSP7 & OSP8 going forward.

Comment 5 Marius Cornea 2016-01-04 10:13:23 UTC

I hit this in a virt environment as well when trying to deploy an overcloud for a 2nd time (delete and redeploy). I worked around it by recreating the overcloud nodes image files:

for image in $(ls /var/lib/libvirt/images/ | grep baremetalbrbm); do qemu-img create -f qcow2 /var/lib/libvirt/images/$image 41G; done

Comment 6 Karthik Prabhakar 2016-01-18 19:21:13 UTC

Observed a similar issue, except in my case the error is about /dev/sda1 being write protected. This happens on nodes at random, and varies across deployment.

In my case, it doesn't cause the deployment to fail - eventually (~15 min later), heat reboots the stuck node(s) back into the deployment kernel, and it succeeds on the second attempt.

Comment 7 Dmitry Tantsur 2016-01-19 11:36:20 UTC

Karthik, this does not look similar, please report separately.

*** This bug has been marked as a duplicate of bug 1296330 ***