Bug 1466339
Summary: | Overcloud nodes hangs on deploy intermittently. After reboot it will sometimes be imaged | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Jeremy <jmelvin> | ||||
Component: | openstack-ironic | Assignee: | RHOS Maint <rhos-maint> | ||||
Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | mlammon | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 10.0 (Newton) | CC: | bfournie, jmelvin, mburns, rhel-osp-director-maint, srevivo | ||||
Target Milestone: | --- | Keywords: | Unconfirmed | ||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2017-09-28 15:19:21 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
This bugzilla has been removed from the release and needs to be reviewed and Triaged for another Target Release. For the IPA screenshot we see "ironic_python_agent.ironic_api_client ConnectionError: HTTPConnectionPool (host=45.12.159.21, port=6385)... Failed to establish a new connection: [Errno 101] Network is unreachable" In the ironic_conductor.log there are a few node timeouts logged, I assume this for nodes that exhibited deploy failures. 2017-06-28 14:40:04.306 15646 ERROR ironic.conductor.utils [req-2aa38d75-31d9-4f6b-9bc7-86fdbedf8174 - - - - -] Timeout reached while waiting for callback for node d620b044-d848-42bf-86b3-e3ba98cec67d 2017-06-28 15:12:04.327 15646 ERROR ironic.conductor.utils [req-2aa38d75-31d9-4f6b-9bc7-86fdbedf8174 - - - - -] Timeout reached while waiting for callback for node d620b044-d848-42bf-86b3-e3ba98cec67d 2017-06-28 16:27:04.417 15646 ERROR ironic.conductor.utils [req-2aa38d75-31d9-4f6b-9bc7-86fdbedf8174 - - - - -] Timeout reached while waiting for callback for node 8e7c299b-7a52-4ca7-8101-bc3ab1c51ebe 2017-06-28 16:58:04.464 15646 ERROR ironic.conductor.utils [req-2aa38d75-31d9-4f6b-9bc7-86fdbedf8174 - - - - -] Timeout reached while waiting for callback for node 8e7c299b-7a52-4ca7-8101-bc3ab1c51ebe 2017-06-28 18:09:04.575 15646 ERROR ironic.conductor.utils [req-2aa38d75-31d9-4f6b-9bc7-86fdbedf8174 - - - - -] Timeout reached while waiting for callback for node 65660b90-60f8-45e1-8bb8-216d5c6211a0 It would be useful to get the logs in /var/log/ironic/deploy for the node when an error occurs, e.g for 65660b90-60f8-45e1-8bb8-216d5c6211a0. I'm not sure if its possible to retrieve them at this point. Hi Jeremy - it looks like the related case is closed. There wasn't much info in the conductor log, we'd really need ramdisk logs to try and figure out what is going on here. From the conductor logs it just looks like a temporary network issue. Is there any more info we can get to try and make progress? Thanks. Jeremy - closing this for now as case is closed and we don't have much to go on besides the network instability in the screen shot. Please reopen if this occurs again, we will need to debug PXE boot process. Thank you. The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days |
Created attachment 1292877 [details] ironic error on the overcloud node console. Description of problem: During deployment some nodes hang before they get the overcloud image. On the console we see ironic errors.(screenshot attached) .Also see errors in ironic-conductor.log: 2017-06-28 18:09:04.575 15646 ERROR ironic.conductor.utils [req-2aa38d75-31d9-4f6b-9bc7-86fdbedf8174 - - - - -] Timeout reached while waiting for callback for node 65660b90-60f8-45e1-8bb8-216d5c6211a0 Version-Release number of selected component (if applicable): openstack-ironic-api-6.2.2-2.el7ost.noarch Wed Jun 14 01:23:25 2017 openstack-ironic-common-6.2.2-2.el7ost.noarch openstack-ironic-conductor-6.2.2-2.el7ost.noarch Wed Jun 14 01:23:19 2017 How reproducible: intermittent Steps to Reproduce: 1.deploy 2.notice some nodes hang 3. Actual results: some nodes hang. HAve to reset and re- pxe the node and it will sometimes work. Expected results: always pxe and deploy overcloud image Additional info: