Bug 1494132
| Summary: | dhcp-all-interfaces.sh fails due to delayed link detection during introspection | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Jaison Raju <jraju> | ||||||
| Component: | diskimage-builder | Assignee: | Bob Fournier <bfournie> | ||||||
| Status: | CLOSED ERRATA | QA Contact: | mlammon | ||||||
| Severity: | urgent | Docs Contact: | |||||||
| Priority: | high | ||||||||
| Version: | 10.0 (Newton) | CC: | akaris, bfournie, dbecker, jraju, mburns, pablo.iranzo, slinaber | ||||||
| Target Milestone: | z7 | Keywords: | Triaged, ZStream | ||||||
| Target Release: | 10.0 (Newton) | ||||||||
| Hardware: | All | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | diskimage-builder-1.26.1-2.el7ost | Doc Type: | Bug Fix | ||||||
| Doc Text: |
Cause: Algorithm checking interface state on baremetal nodes does not have proper retry mechanism.
Consequence: Under certain conditions when the link is going up and down, the interfaces on baremetal nodes do not come up correctly and fail to get an IP address from DHCP. The following error can be seen in the logs - 'Invalid Argument'.
Fix: Change to the retry mechanism to ensure interfaces are brought up correctly.
Result: Interfaces are up and get an IP address assigned via DHCP.
|
Story Points: | --- | ||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2018-02-27 16:43:33 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Bug Depends On: | |||||||||
| Bug Blocks: | 1292691 | ||||||||
| Attachments: |
|
||||||||
|
Description
Jaison Raju
2017-09-21 14:07:02 UTC
The following patch fixed the issue, but i had to increase the retries to 35, as u noticed the link usually took 25-30sec on ens255f0 https://review.openstack.org/#/c/419527/1/elements/dhcp-all-interfaces/install.d/dhcp-all-interfaces.sh @Dmitry - yes, a backport is in progress - https://code.engineering.redhat.com/gerrit/#/c/118646/ I'm working with akaris to get more clarification on the long time (25-30 seconds) for link to be detected. That would require another patch to the same code to increase the loop counter. Created attachment 1329142 [details]
ipxe initialising devices
got there with: nova boot --flavor baremetal --nic net-id=<uuid> --image overcloud-full test Created attachment 1329238 [details]
screenshots
Hi, The customer requested that https://review.openstack.org/#/c/419527/1/elements/dhcp-all-interfaces/install.d/dhcp-all-interfaces.sh PLUS an increased number of retries be included in their images and shipped as a fix. "but 20 retries didnt help . i noticed link up takes 25-35 sec." I don't know how realistic that is? Thanks Andreas. The backport for the carrier check is in progress -https://code.engineering.redhat.com/gerrit/#/c/118646/ We'd prefer not to make the second change to increase the timeout as this can have an affect on all deployments, especially if there are servers with unconnected NICs. *** Bug 1320034 has been marked as a duplicate of this bug. *** Moving this to POST as fix has merged. Per discussion at GSS weekly meeting, the change to increase the time-out for nics will not be made. Pablo/Jaison - I don't think you need a hotfix to test out this change as its available here - https://errata.devel.redhat.com/advisory/32371/builds as part of the OSP-10z7 build. The pkg is diskimage-builder-1.26.1-2.el7ost. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:0365 |