Bug 1271289
Summary: | overcloud-novacompute stuck in spawning state | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Community] RDO | Reporter: | Tzach Shefi <tshefi> | ||||||
Component: | rdo-manager | Assignee: | Hugh Brock <hbrock> | ||||||
Status: | CLOSED EOL | QA Contact: | Shai Revivo <srevivo> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | urgent | ||||||||
Version: | Liberty | CC: | chris.brown, jcoufal, mburns, ohochman, sasha | ||||||
Target Milestone: | GA | ||||||||
Target Release: | Liberty | ||||||||
Hardware: | x86_64 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2017-06-18 06:21:52 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Tzach Shefi
2015-10-13 14:29:18 UTC
Created attachment 1082468 [details]
logs
More debugging info: sshd and firewall rules on virt host OK having tested below: I can ssh into the virt host from my laptop with root user, checking 10.X.X.X net Can also ssh from instack vm to virt host, checking 192.168.122.X net. If overcloud controller node was created successfully and it uses same ssh virt power-on method I doubt this stopped working all of a sudden for compute nodes. My guess problem is something else. Adding ironic journal output (ironic.log) might shed some more light, started going over this file a few minutes ago. The stuck spawning compute node's ID is: 7f9f4f52-3ee6-42d9-9275-ff88582dd6e7 Created attachment 1082835 [details]
Ironic journal output
So all nodes except one aren't able to get IP during PXE boot. `nova list` for them shows status: BUILD and task-state: spawning. Running ironic node-port-list on the respective node(s) - I see that there are 2 MAC addresses. One of them (top) is of the NIC, that's used for PXE: +--------------------------------------+-------------------+ | UUID | Address | +--------------------------------------+-------------------+ | cfe91f4c-add7-439b-8df9-998f653710e5 | 00:0a:f7:79:93:2a | | 2ece3a6e-2326-4ecc-b303-b1391e0d259c | c8:1f:66:c7:e9:2b | +--------------------------------------+-------------------+ The iptables has this entry (among others). Chain ironic-inspector (1 references) target prot opt source destination DROP all -- anywhere anywhere MAC 00:0A:F7:79:93:2A Running tcpdump on the undercloud I see the attempted bootp: 02:59:10.438750 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 00:0a:f7:79:93:2a (oui Unknown), length 548 At some point I attempted removing the bottom MAC and re-attempted deployment - completed successfully. Something is odd my overcloud VMs including the "active" controller have only one MAC per VM, also evident on virsh dumpxml files. ironic node-port-list 4626bf90-7f95-4bd7-8bee-5f5b0a0981c6 +--------------------------------------+-------------------+ | UUID | Address | +--------------------------------------+-------------------+ | 0c48962d-8b6a-440c-8545-92acd6f89aec | 00:60:0c:4e:e2:0f | +--------------------------------------+-------------------+ [stack@instack ~]$ ironic node-port-list 8738f24c-45b4-4a17-b6ad-8963723c62df +--------------------------------------+-------------------+ | UUID | Address | +--------------------------------------+-------------------+ | a74b757b-09fd-494b-9dd6-1c20d4efabc2 | 00:d7:c4:3f:c9:73 | +--------------------------------------+-------------------+ [stack@instack ~]$ ironic node-port-list 9f39b3fc-6670-4ee3-9376-ad197bf00760 +--------------------------------------+-------------------+ | UUID | Address | +--------------------------------------+-------------------+ | 65f29507-b464-4f38-acf3-ea81563bad8a | 00:02:9d:99:2f:60 | +--------------------------------------+-------------------+ Shouldn't have instack-virt-setup built VMs with needed eth/networks, maybe something went wrong during that stage? Reproduced the behavior in comment #6. One machine wasn't able to pxeboot during the deployment. Removed the second MAC showsn in ironic node-port-list <node>. Re-attempted deployment - completed successfully. Encountered the same issue when I've tried HA deployment on Bare-Metal : It looks like 2 out of 3 controllers in the overcloud deployment were stuck in "build" and there was no process and the deployment eventually failed over time-out. Workarounds (after failed deployment) : --------------- (1) heat stack-delete overcloud (2) for j in $(for i in `ironic node-list|awk '/power/ {print $2}'`; do ironic node-port-list $i|awk '/c8:1f/ {print $2}'; done); do ironic port-delete $j; done (3) re-run the overcloud deployment commmand Environment: ------------ rdo-release-liberty-1.noarch instack-0.0.8-1.el7.noarch instack-undercloud-2.1.3-1.el7.noarch openstack-tripleo-heat-templates-0.8.7-1.el7.noarch openstack-ironic-inspector-2.2.2-1.el7.noarch I think this can be closed as its very old and I don't personally encounter this issue. |