Bug 1262950
Summary: | Race condition with NetworkManager on discovery image | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Satellite | Reporter: | David Critch <dcritch> | ||||||||
Component: | Discovery Image | Assignee: | Lukas Zapletal <lzap> | ||||||||
Status: | CLOSED ERRATA | QA Contact: | Sachin Ghai <sghai> | ||||||||
Severity: | high | Docs Contact: | |||||||||
Priority: | high | ||||||||||
Version: | 6.1.1 | CC: | bbuckingham, chrobert, dcritch, hartsjc, kdixon, lzap, meeveret, mmccune, sghai, sthirugn | ||||||||
Target Milestone: | Unspecified | Keywords: | Triaged | ||||||||
Target Release: | Unused | ||||||||||
Hardware: | Unspecified | ||||||||||
OS: | Unspecified | ||||||||||
URL: | http://projects.theforeman.org/issues/12429 | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2016-07-27 11:06:43 UTC | Type: | Bug | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Attachments: |
|
Description
David Critch
2015-09-14 17:52:08 UTC
Created attachment 1080015 [details]
journal from discovery boot
This is a dump of journalctl from the host once booted in discovery mode. You can see the first couple of times, the host fails to send to foreman due to DNS issues.
After kill -HUP the discovery process, you can see it then register properly.
Discovery is starting before DHCP is fully up, and can't resolve the foreman URL at that time. Eventually the host is on the network and can resolve foreman, but the discovery process never seems to learn that the host is resolvable until after restarting the discovery daemon.
Hello David, I can confirm we encountered this kind of behaviour and it has been fixed upstream already. We are planning an discovery errata in one or two months that will rebase the image and include this fix as well. I can make you another build if you want and upload it for you. Just let me know on IRC. with the latest build, the discovery-register service doesn't start. attaching latest journalctl Created attachment 1104757 [details]
journal output from foreman-discovery-image-2.1.1-1.el7sat.noarch.rpm
Yes, Brad, there is a patch pending we need to include. David, does the above link from comment 22 work? Anyway, 6.1.5 errata is out and it contains completly rebased image, it won't be compatible with OSP tho anymore, but this bug was filed against Satellite 6, so use it. We track one additional race condition which hasn't been merged yet upstream. I am attaching it to this BZ, symptoms are similar (this time foreman-proxy is not started properly): http://projects.theforeman.org/issues/12429 Moving to POST since upstream bug http://projects.theforeman.org/issues/12429 has been closed ------------- Kamil Madac Solution (workaround?) is to start foreman-proxy after NetworkManager-wait-online.service is ready. More in https://github.com/theforeman/foreman-discovery-image/pull/48 ------------- Kamil Madac My fault. I forgot to replace both lines (Wants= and After=) in foreman-proxy.service (https://github.com/lzap/foreman-discovery-image/commit/c72e80902b4cd34c4b8369f3ec118b8ef7ac9bf6). Once I did it, provisioning works as expected. ------------- Lukas Zapletal Great, can you please confirm in the PR itself that the build I made works as expected? Or at least show us the patch you made on your own build. Thanks. It's the https://github.com/theforeman/foreman-discovery-image/pull/50 ------------- Anonymous Applied in changeset commit:foreman-discovery-image|0c18ba2a6d04e5105db1e2085fe69f091b6922c7. @Lukas, Please provide verification steps. Assuming host should have multiple interfaces to reproduce this ? Please advise. QA steps: Simply verify if discovery works with one or multiple NICs in various environments. Also, if possible, simulate slow DHCP and verify it starts correctly as well. You could easily simulate this by turning off DHCP server on the network, waiting until Welcome screen appears and then turning it on. The background process should start discovery request and after few seconds, you should be able to refresh the screen. The status will likely be UNKNOWN - Use Refresh button to update info, this is expected. Verified with sat6.2 beta snap8.2 I discovered a host with two nics and tried to simulate the slow DHCP as suggested in comment29. However, I'm not able to reproduce the reported issue. Host is discovered successfully and I can see that host in webUI. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2016:1501 |