Description of problem: While deploying overcloud on VM running on a Centos server, controller node is listed as active, compute node stuck at spawning state, even after a few hours. Version-Release number of selected component (if applicable): How reproducible: Unsure Steps to Reproduce: 1.deploy overcloud ( 1 compue + 1 controller) on Centos Virt env using guide: https://repos.fedorapeople.org/repos/openstack-m/rdo-manager-docs/liberty/environments/virtual.html 2. Deploy overcloud and wait for compute node to reach spawning state. On service checking virsh Actual results: Compute node doesn't complete installation [stack@puma53 ~]$ virsh list --all Id Name State ---------------------------------------------------- 2 instack running 9 baremetalbrbm_2 running - baremetalbrbm_0 shut off - baremetalbrbm_1 shut off Expected results: Compute node should complete installation/ vm should be running . Additional info: Not sure exactly when overcloudrc is created but I noticed it's not found, maybe not created yet. Figured if controller node is active I might already see this file. Adding nova logs from instack server, as well as virsh logs if they help.
Created attachment 1082468 [details] logs
More debugging info: sshd and firewall rules on virt host OK having tested below: I can ssh into the virt host from my laptop with root user, checking 10.X.X.X net Can also ssh from instack vm to virt host, checking 192.168.122.X net. If overcloud controller node was created successfully and it uses same ssh virt power-on method I doubt this stopped working all of a sudden for compute nodes. My guess problem is something else.
Adding ironic journal output (ironic.log) might shed some more light, started going over this file a few minutes ago. The stuck spawning compute node's ID is: 7f9f4f52-3ee6-42d9-9275-ff88582dd6e7
Created attachment 1082835 [details] Ironic journal output
So all nodes except one aren't able to get IP during PXE boot. `nova list` for them shows status: BUILD and task-state: spawning. Running ironic node-port-list on the respective node(s) - I see that there are 2 MAC addresses. One of them (top) is of the NIC, that's used for PXE: +--------------------------------------+-------------------+ | UUID | Address | +--------------------------------------+-------------------+ | cfe91f4c-add7-439b-8df9-998f653710e5 | 00:0a:f7:79:93:2a | | 2ece3a6e-2326-4ecc-b303-b1391e0d259c | c8:1f:66:c7:e9:2b | +--------------------------------------+-------------------+ The iptables has this entry (among others). Chain ironic-inspector (1 references) target prot opt source destination DROP all -- anywhere anywhere MAC 00:0A:F7:79:93:2A Running tcpdump on the undercloud I see the attempted bootp: 02:59:10.438750 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 00:0a:f7:79:93:2a (oui Unknown), length 548 At some point I attempted removing the bottom MAC and re-attempted deployment - completed successfully.
Something is odd my overcloud VMs including the "active" controller have only one MAC per VM, also evident on virsh dumpxml files. ironic node-port-list 4626bf90-7f95-4bd7-8bee-5f5b0a0981c6 +--------------------------------------+-------------------+ | UUID | Address | +--------------------------------------+-------------------+ | 0c48962d-8b6a-440c-8545-92acd6f89aec | 00:60:0c:4e:e2:0f | +--------------------------------------+-------------------+ [stack@instack ~]$ ironic node-port-list 8738f24c-45b4-4a17-b6ad-8963723c62df +--------------------------------------+-------------------+ | UUID | Address | +--------------------------------------+-------------------+ | a74b757b-09fd-494b-9dd6-1c20d4efabc2 | 00:d7:c4:3f:c9:73 | +--------------------------------------+-------------------+ [stack@instack ~]$ ironic node-port-list 9f39b3fc-6670-4ee3-9376-ad197bf00760 +--------------------------------------+-------------------+ | UUID | Address | +--------------------------------------+-------------------+ | 65f29507-b464-4f38-acf3-ea81563bad8a | 00:02:9d:99:2f:60 | +--------------------------------------+-------------------+ Shouldn't have instack-virt-setup built VMs with needed eth/networks, maybe something went wrong during that stage?
Reproduced the behavior in comment #6. One machine wasn't able to pxeboot during the deployment. Removed the second MAC showsn in ironic node-port-list <node>. Re-attempted deployment - completed successfully.
Encountered the same issue when I've tried HA deployment on Bare-Metal : It looks like 2 out of 3 controllers in the overcloud deployment were stuck in "build" and there was no process and the deployment eventually failed over time-out. Workarounds (after failed deployment) : --------------- (1) heat stack-delete overcloud (2) for j in $(for i in `ironic node-list|awk '/power/ {print $2}'`; do ironic node-port-list $i|awk '/c8:1f/ {print $2}'; done); do ironic port-delete $j; done (3) re-run the overcloud deployment commmand Environment: ------------ rdo-release-liberty-1.noarch instack-0.0.8-1.el7.noarch instack-undercloud-2.1.3-1.el7.noarch openstack-tripleo-heat-templates-0.8.7-1.el7.noarch openstack-ironic-inspector-2.2.2-1.el7.noarch
I think this can be closed as its very old and I don't personally encounter this issue.