Bug 1571092 - Deploying 1 Controller 12 Compute fails due to nodes not being powered on
Summary: Deploying 1 Controller 12 Compute fails due to nodes not being powered on
Keywords:
Status: CLOSED DUPLICATE of bug 1571384
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: python-virtualbmc
Version: 13.0 (Queens)
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: RHOS Maint
QA Contact: mlammon
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-04-24 05:59 UTC by Gurenko Alex
Modified: 2018-05-01 14:55 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-05-01 14:55:14 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Gurenko Alex 2018-04-24 05:59:35 UTC
Description of problem: I'm trying to deploy RHOS 13 with various topologies, but having more than 10 nodes of same type. I've tried 3 Controllers + 11 computes and 1 Controller + 12 Computes and fail with a same issue.


Version-Release number of selected component (if applicable): 2018-04-19.2


How reproducible: 100%


Steps to Reproduce:
1. Try to deploy 1 Controller + 12 Compute nodes with Infrared, partially (only virsh + undercloud part) or fully
2.
3.

Actual results: Stack overcloud CREATE_FAILED with overcloud.Compute.7. Nova list reports following line:

| 24df54fa-2374-431e-a2c9-cddfb6e8f709 | compute-7    | BUILD  | scheduling | NOSTATE     |                        |

and ironic log contains record like:

ironic.conductor.manager [req-ce1043d9-0179-467b-88c5-c230edd5c46f b715c5845eb34c06b950712c5e012bfa fd53286d965846608ba8f3b15674ef5e - default default] Error in deploy of node 4104c21e-fcc4-4250-9f19-70f3156cb1f6: IPMI call failed: power on.: IPMIFailure: IPMI call failed: power on.


Expected results:

All nodes are powered on and deployed successfully

Additional info:

Controller in this topology was set to 4 vCPU, 20 Gb RAM, each Compute node was set to 2 vCPU, 4 Gb of RAM to be able to fit 32 thread CPU/128 Gb server.

Powering on missed node manually works (with openstack baremetal node power on), but it fails to boot and participate in the deployment.

Comment 1 Julia Kreger 2018-04-24 20:06:27 UTC
I've encountered this elsewhere. Te underlying cause appears to be coming up from libvirt when the domain is being powered on, which cascades causing other failures elsewhere in the test environment. I think someone who has a great deal of libvirt experience should likely take a look at this

2018-04-24 15:23:56,314.314 15263 ERROR VirtualBMC [-] Error powering on the domain controller-0. Error: Network not found: no network with matching name 'data'
2018-04-24 15:23:56,314.314 15379 ERROR VirtualBMC [-] Error powering on the domain ceph-0. Error: Network not found: no network with matching name 'data'
2018-04-24 15:23:57,053.053 15552 ERROR VirtualBMC [-] Error powering on the domain compute-0. Error: Network not found: no network with matching name 'management'
2018-04-24 15:23:57,193.193 15495 ERROR VirtualBMC [-] Error powering on the domain compute-1. Error: Network not found: no network with matching name 'management'

Although libvirt does not log any errors. Compute-0's log:

2018-04-24 15:23:44.364+0000: shutting down, reason=shutdown
2018-04-24 15:23:56.454+0000: shutting down, reason=failed

Journalctl:

Apr 24 15:23:31 sealusa10.mobius.lab.eng.rdu2.redhat.com libvirtd[28171]: 2018-04-24 15:23:31.742+0000: 28171: info : libvirt version: 3.9.0, package: 14.el7_5.2 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2018-03-20-14:43:16, x86-041.build.eng.bos.redhat.com)
Apr 24 15:23:31 sealusa10.mobius.lab.eng.rdu2.redhat.com libvirtd[28171]: 2018-04-24 15:23:31.742+0000: 28171: info : hostname: sealusa10.mobius.lab.eng.rdu2.redhat.com
Apr 24 15:23:31 sealusa10.mobius.lab.eng.rdu2.redhat.com libvirtd[28171]: 2018-04-24 15:23:31.742+0000: 28171: error : qemuMonitorIO:704 : internal error: End of file from qemu monitor
Apr 24 15:23:37 sealusa10.mobius.lab.eng.rdu2.redhat.com libvirtd[28171]: 2018-04-24 15:23:37.284+0000: 28171: error : qemuMonitorIO:704 : internal error: End of file from qemu monitor
Apr 24 15:23:42 sealusa10.mobius.lab.eng.rdu2.redhat.com libvirtd[28171]: 2018-04-24 15:23:42.278+0000: 28171: error : qemuMonitorIO:704 : internal error: End of file from qemu monitor
Apr 24 15:23:42 sealusa10.mobius.lab.eng.rdu2.redhat.com libvirtd[28171]: 2018-04-24 15:23:42.876+0000: 28171: error : qemuMonitorIO:704 : internal error: End of file from qemu monitor
Apr 24 15:23:43 sealusa10.mobius.lab.eng.rdu2.redhat.com libvirtd[28171]: 2018-04-24 15:23:43.265+0000: 28171: error : qemuMonitorIO:704 : internal error: End of file from qemu monitor
Apr 24 15:23:43 sealusa10.mobius.lab.eng.rdu2.redhat.com libvirtd[28171]: 2018-04-24 15:23:43.307+0000: 28171: error : qemuMonitorIO:704 : internal error: End of file from qemu monitor
Apr 24 15:23:48 sealusa10.mobius.lab.eng.rdu2.redhat.com libvirtd[28171]: 2018-04-24 15:23:48.144+0000: 28171: error : qemuMonitorIO:704 : internal error: End of file from qemu monitor
Apr 24 15:23:48 sealusa10.mobius.lab.eng.rdu2.redhat.com libvirtd[28171]: 2018-04-24 15:23:48.166+0000: 28171: error : qemuMonitorIO:704 : internal error: End of file from qemu monitor
Apr 24 15:23:56 sealusa10.mobius.lab.eng.rdu2.redhat.com libvirtd[28171]: 2018-04-24 15:23:56.309+0000: 28174: error : virHashSearch:727 : Hash operation not allowed during iteration
Apr 24 15:23:56 sealusa10.mobius.lab.eng.rdu2.redhat.com libvirtd[28171]: 2018-04-24 15:23:56.309+0000: 28174: error : virHashSearch:727 : Hash operation not allowed during iteration
Apr 24 15:23:56 sealusa10.mobius.lab.eng.rdu2.redhat.com libvirtd[28171]: 2018-04-24 15:23:56.309+0000: 28174: error : virHashSearch:727 : Hash operation not allowed during iteration
Apr 24 15:23:56 sealusa10.mobius.lab.eng.rdu2.redhat.com libvirtd[28171]: 2018-04-24 15:23:56.309+0000: 28174: error : virHashSearch:727 : Hash operation not allowed during iteration
Apr 24 15:23:56 sealusa10.mobius.lab.eng.rdu2.redhat.com libvirtd[28171]: 2018-04-24 15:23:56.310+0000: 28175: error : virHashSearch:727 : Hash operation not allowed during iteration
Apr 24 15:23:56 sealusa10.mobius.lab.eng.rdu2.redhat.com libvirtd[28171]: 2018-04-24 15:23:56.310+0000: 28175: error : virHashSearch:727 : Hash operation not allowed during iteration
Apr 24 15:23:56 sealusa10.mobius.lab.eng.rdu2.redhat.com libvirtd[28171]: 2018-04-24 15:23:56.310+0000: 28175: error : virHashSearch:727 : Hash operation not allowed during iteration
Apr 24 15:23:56 sealusa10.mobius.lab.eng.rdu2.redhat.com libvirtd[28171]: 2018-04-24 15:23:56.310+0000: 28175: error : virHashSearch:727 : Hash operation not allowed during iteration
Apr 24 15:23:56 sealusa10.mobius.lab.eng.rdu2.redhat.com libvirtd[28171]: 2018-04-24 15:23:56.313+0000: 28174: error : virHashSearch:727 : Hash operation not allowed during iteration
Apr 24 15:23:56 sealusa10.mobius.lab.eng.rdu2.redhat.com libvirtd[28171]: 2018-04-24 15:23:56.313+0000: 28175: error : virHashSearch:727 : Hash operation not allowed during iteration
Apr 24 15:23:56 sealusa10.mobius.lab.eng.rdu2.redhat.com libvirtd[28171]: 2018-04-24 15:23:56.313+0000: 28174: error : virHashSearch:727 : Hash operation not allowed during iteration
Apr 24 15:23:56 sealusa10.mobius.lab.eng.rdu2.redhat.com libvirtd[28171]: 2018-04-24 15:23:56.313+0000: 28175: error : virHashSearch:727 : Hash operation not allowed during iteration
Apr 24 15:23:56 sealusa10.mobius.lab.eng.rdu2.redhat.com libvirtd[28171]: 2018-04-24 15:23:56.313+0000: 28174: error : virHashSearch:727 : Hash operation not allowed during iteration
Apr 24 15:23:56 sealusa10.mobius.lab.eng.rdu2.redhat.com libvirtd[28171]: 2018-04-24 15:23:56.313+0000: 28175: error : virHashSearch:727 : Hash operation not allowed during iteration
Apr 24 15:23:56 sealusa10.mobius.lab.eng.rdu2.redhat.com libvirtd[28171]: 2018-04-24 15:23:56.454+0000: 34996: error : virHashSearch:727 : Hash operation not allowed during iteration
Apr 24 15:23:56 sealusa10.mobius.lab.eng.rdu2.redhat.com libvirtd[28171]: 2018-04-24 15:23:56.607+0000: 28172: error : virHashSearch:727 : Hash operation not allowed during iteration
Apr 24 15:23:56 sealusa10.mobius.lab.eng.rdu2.redhat.com libvirtd[28171]: 2018-04-24 15:23:56.767+0000: 34996: error : virHashSearch:727 : Hash operation not allowed during iteration
Apr 24 15:23:56 sealusa10.mobius.lab.eng.rdu2.redhat.com libvirtd[28171]: 2018-04-24 15:23:56.904+0000: 28172: error : virHashSearch:727 : Hash operation not allowed during iteration

Comment 2 Gurenko Alex 2018-04-25 11:47:28 UTC
(In reply to Julia Kreger from comment #1)
> I've encountered this elsewhere. Te underlying cause appears to be coming up
> from libvirt when the domain is being powered on, which cascades causing
> other failures elsewhere in the test environment. I think someone who has a
> great deal of libvirt experience should likely take a look at this
> 
> 2018-04-24 15:23:56,314.314 15263 ERROR VirtualBMC [-] Error powering on the
> domain controller-0. Error: Network not found: no network with matching name
> 'data'
> 2018-04-24 15:23:56,314.314 15379 ERROR VirtualBMC [-] Error powering on the
> domain ceph-0. Error: Network not found: no network with matching name 'data'
> 2018-04-24 15:23:57,053.053 15552 ERROR VirtualBMC [-] Error powering on the
> domain compute-0. Error: Network not found: no network with matching name
> 'management'
> 2018-04-24 15:23:57,193.193 15495 ERROR VirtualBMC [-] Error powering on the
> domain compute-1. Error: Network not found: no network with matching name
> 'management'
> 
> Although libvirt does not log any errors. Compute-0's log:
> 
> 2018-04-24 15:23:44.364+0000: shutting down, reason=shutdown
> 2018-04-24 15:23:56.454+0000: shutting down, reason=failed
> 
> Journalctl:
> 
> Apr 24 15:23:31 sealusa10.mobius.lab.eng.rdu2.redhat.com libvirtd[28171]:
> 2018-04-24 15:23:31.742+0000: 28171: info : libvirt version: 3.9.0, package:
> 14.el7_5.2 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>,
> 2018-03-20-14:43:16, x86-041.build.eng.bos.redhat.com)
> Apr 24 15:23:31 sealusa10.mobius.lab.eng.rdu2.redhat.com libvirtd[28171]:
> 2018-04-24 15:23:31.742+0000: 28171: info : hostname:
> sealusa10.mobius.lab.eng.rdu2.redhat.com
> Apr 24 15:23:31 sealusa10.mobius.lab.eng.rdu2.redhat.com libvirtd[28171]:
> 2018-04-24 15:23:31.742+0000: 28171: error : qemuMonitorIO:704 : internal
> error: End of file from qemu monitor
> Apr 24 15:23:37 sealusa10.mobius.lab.eng.rdu2.redhat.com libvirtd[28171]:
> 2018-04-24 15:23:37.284+0000: 28171: error : qemuMonitorIO:704 : internal
> error: End of file from qemu monitor
> Apr 24 15:23:42 sealusa10.mobius.lab.eng.rdu2.redhat.com libvirtd[28171]:
> 2018-04-24 15:23:42.278+0000: 28171: error : qemuMonitorIO:704 : internal
> error: End of file from qemu monitor
> Apr 24 15:23:42 sealusa10.mobius.lab.eng.rdu2.redhat.com libvirtd[28171]:
> 2018-04-24 15:23:42.876+0000: 28171: error : qemuMonitorIO:704 : internal
> error: End of file from qemu monitor
> Apr 24 15:23:43 sealusa10.mobius.lab.eng.rdu2.redhat.com libvirtd[28171]:
> 2018-04-24 15:23:43.265+0000: 28171: error : qemuMonitorIO:704 : internal
> error: End of file from qemu monitor
> Apr 24 15:23:43 sealusa10.mobius.lab.eng.rdu2.redhat.com libvirtd[28171]:
> 2018-04-24 15:23:43.307+0000: 28171: error : qemuMonitorIO:704 : internal
> error: End of file from qemu monitor
> Apr 24 15:23:48 sealusa10.mobius.lab.eng.rdu2.redhat.com libvirtd[28171]:
> 2018-04-24 15:23:48.144+0000: 28171: error : qemuMonitorIO:704 : internal
> error: End of file from qemu monitor
> Apr 24 15:23:48 sealusa10.mobius.lab.eng.rdu2.redhat.com libvirtd[28171]:
> 2018-04-24 15:23:48.166+0000: 28171: error : qemuMonitorIO:704 : internal
> error: End of file from qemu monitor
> Apr 24 15:23:56 sealusa10.mobius.lab.eng.rdu2.redhat.com libvirtd[28171]:
> 2018-04-24 15:23:56.309+0000: 28174: error : virHashSearch:727 : Hash
> operation not allowed during iteration
> Apr 24 15:23:56 sealusa10.mobius.lab.eng.rdu2.redhat.com libvirtd[28171]:
> 2018-04-24 15:23:56.309+0000: 28174: error : virHashSearch:727 : Hash
> operation not allowed during iteration
> Apr 24 15:23:56 sealusa10.mobius.lab.eng.rdu2.redhat.com libvirtd[28171]:
> 2018-04-24 15:23:56.309+0000: 28174: error : virHashSearch:727 : Hash
> operation not allowed during iteration
> Apr 24 15:23:56 sealusa10.mobius.lab.eng.rdu2.redhat.com libvirtd[28171]:
> 2018-04-24 15:23:56.309+0000: 28174: error : virHashSearch:727 : Hash
> operation not allowed during iteration
> Apr 24 15:23:56 sealusa10.mobius.lab.eng.rdu2.redhat.com libvirtd[28171]:
> 2018-04-24 15:23:56.310+0000: 28175: error : virHashSearch:727 : Hash
> operation not allowed during iteration
> Apr 24 15:23:56 sealusa10.mobius.lab.eng.rdu2.redhat.com libvirtd[28171]:
> 2018-04-24 15:23:56.310+0000: 28175: error : virHashSearch:727 : Hash
> operation not allowed during iteration
> Apr 24 15:23:56 sealusa10.mobius.lab.eng.rdu2.redhat.com libvirtd[28171]:
> 2018-04-24 15:23:56.310+0000: 28175: error : virHashSearch:727 : Hash
> operation not allowed during iteration
> Apr 24 15:23:56 sealusa10.mobius.lab.eng.rdu2.redhat.com libvirtd[28171]:
> 2018-04-24 15:23:56.310+0000: 28175: error : virHashSearch:727 : Hash
> operation not allowed during iteration
> Apr 24 15:23:56 sealusa10.mobius.lab.eng.rdu2.redhat.com libvirtd[28171]:
> 2018-04-24 15:23:56.313+0000: 28174: error : virHashSearch:727 : Hash
> operation not allowed during iteration
> Apr 24 15:23:56 sealusa10.mobius.lab.eng.rdu2.redhat.com libvirtd[28171]:
> 2018-04-24 15:23:56.313+0000: 28175: error : virHashSearch:727 : Hash
> operation not allowed during iteration
> Apr 24 15:23:56 sealusa10.mobius.lab.eng.rdu2.redhat.com libvirtd[28171]:
> 2018-04-24 15:23:56.313+0000: 28174: error : virHashSearch:727 : Hash
> operation not allowed during iteration
> Apr 24 15:23:56 sealusa10.mobius.lab.eng.rdu2.redhat.com libvirtd[28171]:
> 2018-04-24 15:23:56.313+0000: 28175: error : virHashSearch:727 : Hash
> operation not allowed during iteration
> Apr 24 15:23:56 sealusa10.mobius.lab.eng.rdu2.redhat.com libvirtd[28171]:
> 2018-04-24 15:23:56.313+0000: 28174: error : virHashSearch:727 : Hash
> operation not allowed during iteration
> Apr 24 15:23:56 sealusa10.mobius.lab.eng.rdu2.redhat.com libvirtd[28171]:
> 2018-04-24 15:23:56.313+0000: 28175: error : virHashSearch:727 : Hash
> operation not allowed during iteration
> Apr 24 15:23:56 sealusa10.mobius.lab.eng.rdu2.redhat.com libvirtd[28171]:
> 2018-04-24 15:23:56.454+0000: 34996: error : virHashSearch:727 : Hash
> operation not allowed during iteration
> Apr 24 15:23:56 sealusa10.mobius.lab.eng.rdu2.redhat.com libvirtd[28171]:
> 2018-04-24 15:23:56.607+0000: 28172: error : virHashSearch:727 : Hash
> operation not allowed during iteration
> Apr 24 15:23:56 sealusa10.mobius.lab.eng.rdu2.redhat.com libvirtd[28171]:
> 2018-04-24 15:23:56.767+0000: 34996: error : virHashSearch:727 : Hash
> operation not allowed during iteration
> Apr 24 15:23:56 sealusa10.mobius.lab.eng.rdu2.redhat.com libvirtd[28171]:
> 2018-04-24 15:23:56.904+0000: 28172: error : virHashSearch:727 : Hash
> operation not allowed during iteration

What's the hypervisor's RHEL version in your case? Can it be related to RHEL7.5?

Comment 3 Julia Kreger 2018-04-25 13:40:28 UTC
Seems others have encountered this: https://storyboard.openstack.org/#!/story/2001798

Comment 4 Bob Fournier 2018-05-01 14:55:14 UTC
This is the same libvirt issue that is showing up elsewhere. A fix to add a retry to  virtualbmc is in progress (https://review.openstack.org/#/c/564878/). 

The signature is this problem turning on power to virtual nodes.
 ironic.conductor.manager [req-ce1043d9-0179-467b-88c5-c230edd5c46f b715c5845eb34c06b950712c5e012bfa fd53286d965846608ba8f3b15674ef5e - default default] Error in deploy of node 4104c21e-fcc4-4250-9f19-70f3156cb1f6: IPMI call failed: power on.: IPMIFailure: IPMI call failed: power on.

I'm marking this as a dup of the bug we are using to track this issue which is flagged as a blocker.

*** This bug has been marked as a duplicate of bug 1571384 ***


Note You need to log in before you can comment on or make changes to this bug.