Bug 1277673

Summary: [Director] Ceph nodes are not successfully deploying
Product: Red Hat OpenStack Reporter: Dan Yocum <dyocum>
Component: rhosp-directorAssignee: Angus Thomas <athomas>
Status: CLOSED EOL QA Contact: Omri Hochman <ohochman>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 7.0 (Kilo)CC: dbecker, dtantsur, dyocum, kbasil, lbopf, mburns, morazi, rhel-osp-director-maint, rhos-docs, skinjo, srevivo
Target Milestone: ---Keywords: Documentation
Target Release: 7.0 (Kilo)   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-11-28 16:16:15 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Dan Yocum 2015-11-03 19:50:01 UTC
Description of problem:

This is stated in section 9.3.2:

If Provision State is active and Power State is power on, the bare metal deployment has finished successfully. This means the the problem occurred during the post-deployment configuration step.

However, I've discovered a failure mode that where the deployment has NOT finished successfully.

For some reason unknown to me, the Ceph nodes are not successfully deploying.  The error on the director is this:

dnsmasq-tftp[1531]: error 0 TFTP Aborted received from 10.3.3.115
dnsmasq-tftp[1531]: failed sending /tftpboot/undionly.kpxe to 10.3.3.115


The boot message on the system is this:

PXE-E79: NBP is too big to fit in free base memory

And 'ironic node-list' shows this:

| f5ef4ea0-f3b7-4660-bbb5-3799c691ee33 | None | c29eb6de-2742-4324-97fb-3ed9b381744e | power on   | active          | False       |
| 457ebedb-b0e9-44c4-87de-eb1ca2450b2f | None | 8e4ea3e7-08c0-422b-b4e4-2c2214857b45 | power on   | active          | False       |
| 5fd285de-e97c-4ffb-acd7-cb5733944d60 | None | 3808266f-18e8-4856-bf1f-b989025082b6 | power on   | active          | False       |
| c00a387b-0408-4724-9188-5a1d216f8420 | None | 3e11cd99-b82a-44e7-9465-eef7bc2f235f | power on   | active          | False       |

I don't know what's causing this.  I do know that similar systems are booting just fine with the same PXE boot image.

Comment 4 Dmitry Tantsur 2016-11-18 15:06:57 UTC
Dan, do you still experience this problem? What's your hardware model?