Bug 1200596
Summary: | Bare metal overcloud hosts fail cloud-init metadata retrieval | ||
---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Dan Sneddon <dsneddon> |
Component: | instack | Assignee: | Jay Dobies <jason.dobies> |
Status: | CLOSED ERRATA | QA Contact: | Alexander Chuzhoy <sasha> |
Severity: | high | Docs Contact: | |
Priority: | urgent | ||
Version: | 7.0 (Kilo) | CC: | calfonso, dsneddon, emacchi, hbrock, jason.dobies, jdobies, mburns, ohochman, rhel-osp-director-maint |
Target Milestone: | ga | Keywords: | TestOnly, Triaged |
Target Release: | Director | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2015-08-05 13:50:51 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Dan Sneddon
2015-03-10 22:26:08 UTC
To confirm this bug, I tried adding this line to the /usr/bin/cloud-init script: /usr/bin/cloud-init line 200: time.sleep(30) This gave the network time to come up, and cloud-init functioned. Maybe this patch is helping: https://review.openstack.org/#/c/163219/ (In reply to Emilien Macchi from comment #4) > Maybe this patch is helping: https://review.openstack.org/#/c/163219/ I'm pretty sure that patch won't help. The patch linked is to a Puppet module, in this case Puppet doesn't even run correctly because the cloud-init failed. I did a test which involved mounting the built disk images and making the one-line change at line 200 to introduce a sleep cycle. This allowed cloud-init to get further, but it looks like SSH keys did not transmit, and Puppet still didn't run. There is another patch that rlandy put in that helps, by making sure that em2 is available: https://review.gerrithub.io/#/c/218559/1/scripts/instack-build-images I am trying this with a new build that includes that patch, and I'm going to see if cloud-init fails in the same way. If so, I'll add the sleep and try again. cloud-init-0.7.6-2.el7 is available in Red Hat Common. Ideally we would take advantage of systemd to make this work reliably. One option would be to insert a service that runs Before= cloud-init, e.g., something like: [Unit] Description=Wait for networking before starting cloud-init Before=cloud-init-local.service cloud-init.service [Service] Type=oneshot # perform some sort of network availability test ExecStart=/bin/sh -c 'while ! curl -sf http://169.254.169.254/; do sleep 1; done' [Install] RequiredBy=cloud-init-local.service cloud-init.service But this doesn't make sense as a general solution, because cloud-init does not necessarily require a network-accessible metadata service in order to operate. Dan, is this still an issue? There are a few ideas to solve or work around the issue in the comments. Is there concensus at this point? (In reply to chris alfonso from comment #9) > Dan, is this still an issue? There are a few ideas to solve or work around > the issue in the comments. Is there concensus at this point? Chris, This turned out to be a mostly cosmetic issue. The issue is that cloud-init starts immediately after network.target starts, but it takes a while for the network to actually activate and get a DHCP address. After cloud-init reports the errors, it continues to retry the connection and it eventually works once the network is completely up. The fix to this is to change the systemd startup file for cloud-init to depend on the network being online, not just when the network is first activated. This is done by making cloud-init depend on one of the following: [Unit] Wants=network-online.target After=network-online.target or it could be added to the dependencies on network.target that exist already: [Unit] Wants=network.target network-online.target After=network.target network-online.target I would still like to see this get addressed in a future fix, because it will generate needless support questions when people see the very scary failure. (In reply to Lars Kellogg-Stedman from comment #8) > Ideally we would take advantage of systemd to make this work reliably. One > option would be to insert a service that runs Before= cloud-init, e.g., > something like: > [...] I tried inserting a service in between network.target and cloud-init, and it didn't make a difference. The actual fix is to make cloud-init depend on network-online.target: [Unit] Wants=network-online.target After=network-online.target did this get fixed? do we need a fix? is this a blocker? I don't think this ever got fixed, but since it's mainly a cosmetic bug I'm not sure that anyone cares enough to fix it. Verified: Environment: instack-undercloud-2.1.2-21.el7ost.noarch Able to deploy an overcloud. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2015:1549 |