| Summary: | Nodes are not being started at deploy time | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Amit Ugol <augol> | ||||
| Component: | openstack-ironic | Assignee: | Lucas Alvares Gomes <lmartins> | ||||
| Status: | CLOSED WORKSFORME | QA Contact: | Toure Dunnon <tdunnon> | ||||
| Severity: | urgent | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 7.0 (Kilo) | CC: | augol, mburns, rhel-osp-director-maint, srevivo | ||||
| Target Milestone: | --- | Keywords: | ZStream | ||||
| Target Release: | 7.0 (Kilo) | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2016-09-06 15:00:52 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Attachments: |
|
||||||
Looking at the logs this seem to be something to do with the hypervisor: 2016-02-16 04:22:40.455 1230 DEBUG oslo_concurrency.processutils [-] Result was 1 ssh_execute /usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py:363 2016-02-16 04:22:40.456 1230 ERROR ironic.drivers.modules.ssh [-] Cannot execute SSH cmd LC_ALL=C /usr/bin/virsh --connect qemu:///system destroy baremetalbrbm_brbm1_2. Reason: Unexpected error while running command. Command: LC_ALL=C /usr/bin/virsh --connect qemu:///system destroy baremetalbrbm_brbm1_2 Exit code: 1 Stdout: u'\n' Stderr: u'2016-02-16 09:22:07.728+0000: 81748: info : libvirt version: 1.2.17, package: 13.el7_2.2 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2015-11-23-07:46:04, x86-019.build.eng.bos.redhat.com)\n2016-02-16 09:22:07.728+0000: 81748: warning : virKeepAliveTimerInternal:143 : No response from client 0x7f2d25f7cf40 after 6 keepalive messages in 35 seconds\n2016-02-16 09:22:07.728+0000: 81764: warning : virKeepAliveTimerInternal:143 : No response from client 0x7f2d25f7cf40 after 6 keepalive messages in 35 seconds\nerror: Failed to destroy domain baremetalbrbm_brbm1_2\nerror: internal error: received hangup / error event on socket\n'. ... Could you verify if you can start these VMs manually using virsh? Things have been running smoother since. Also changing the CI method to do it has changed, the error leading to this issue is being bypassed. |
Created attachment 1127557 [details] ironic logs Description of problem: redeploying a 2nd time after heat stack-delete overcloud will sometimes fail to start VMs, keeping the status from nova's point of view in spawning forever. the deployment will ultimately fail on timeout. Version-Release number of selected component (if applicable): ironic 2015.1.2-2 How reproducible: 50% Steps to Reproduce: 1. delete an overcloud deployment 2. re-run the same deployment Actual results: VMs remain in power-off state Expected results: All needed VMs start Additional info: This is the status after ~30 minutes into starting the deployment: +--------------------------------------+------+--------------------------------------+-------------+-----------------+-------------+ | UUID | Name | Instance UUID | Power State | Provision State | Maintenance | +--------------------------------------+------+--------------------------------------+-------------+-----------------+-------------+ | 57459d9b-6b74-4aef-9218-6678a76bb787 | None | b4174381-57af-43e4-9124-1f9f52f89741 | power on | active | False | | a474aa51-d537-4829-8586-ada9c22e75c6 | None | None | power off | available | False | | e83c125d-f8b5-4d06-9ace-0b491bafae1a | None | 8d529d51-5035-4671-afd0-594d64804cca | power off | deploying | False | | 01921b20-f5e9-4298-a157-901cedfafdab | None | 0402aeeb-bbb9-4844-9eb9-c8dc93cd27ee | power on | active | False | | 32d2b69c-79aa-403e-8bd8-888d76dfbf5e | None | 3b7757d8-332b-4d86-b343-4789b6d9050c | power off | deploying | False | | 6689daae-ba1b-467e-bf48-833c4aec3dad | None | None | power off | available | False | | e8f68851-e1cb-46ab-ae40-d2d52483b5fe | None | 51fe1ffc-e3d4-4fbd-a83e-2eae06a1b1a4 | power on | active | False | | 3945694d-0e54-4211-8b3a-f65d3d0e5f47 | None | 73f87561-9262-46de-b2a6-4daaff6f34ee | power on | active | False | +--------------------------------------+------+--------------------------------------+-------------+-----------------+-------------+ attached ironic logs