Bug 1846093
Summary: | CEO chooses 1st host interface as bootstrap ip rather than one that belongs to machine network CIDR | |||
---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Rom Freiman <rfreiman> | |
Component: | Etcd Operator | Assignee: | Dan Mace <dmace> | |
Status: | CLOSED ERRATA | QA Contact: | ge liu <geliu> | |
Severity: | high | Docs Contact: | ||
Priority: | high | |||
Version: | 4.5 | CC: | dhellmann, dsanzmor, eslutsky, scuppett, skolicha, steven.barre, ukalifon | |
Target Milestone: | --- | |||
Target Release: | 4.6.0 | |||
Hardware: | All | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | No Doc Update | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1854402 (view as bug list) | Environment: | ||
Last Closed: | 2020-10-27 16:06:32 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1854402 |
Description
Rom Freiman
2020-06-10 18:20:47 UTC
This isn't a showstopper for 4.5.0 GA at this point. Setting target release to 4.6.0 (the current development branch). For fixes (if any) requested/required on prior versions, clones will be created targeting those z-stream releases as appropriate. I’m adding UpcomingSprint, because I was occupied by fixing bugs with higher priority/severity, developing new features with higher priority, or developing new features to improve stability at a macro level. I will revisit this bug next sprint. I tested the fix on libvirt with VMs that were connected to 2 networks. At first, I was able to discover the hosts and start an installation - but the installation hung in the middle and never completed. The bootstrap node was stuck at step 0/7 and the other nodes at step 4/7. I got the same results when I tried setting the VIP on the other network. I then tried to reverse the order of the NICs. This time - the nodes never even registered back in the service. I could see the following warnings repeating themselves in the journal log on one of the masters: Jun 26 15:03:17 master-1-1.****.redhat.com agent[1727]: WARNING: Unable to read board_asset_tag: open /sys/class/dmi/id/board_asset_tag: no such > Jun 26 15:03:17 master-1-1.****.redhat.com agent[1727]: WARNING: Unable to read board_serial: open /sys/class/dmi/id/board_serial: no such file o> Jun 26 15:03:17 master-1-1.****.redhat.com agent[1727]: WARNING: Unable to read board_vendor: open /sys/class/dmi/id/board_vendor: no such file o> Jun 26 15:03:17 master-1-1.****.redhat.com agent[1727]: WARNING: Unable to read board_version: open /sys/class/dmi/id/board_version: no such file> Jun 26 15:03:17 master-1-1.****.redhat.com agent[1727]: time="26-06-2020 15:03:17" level=warning msg="Could not find motherboard serial number" f> Jun 26 15:03:17 master-1-1.****.redhat.com agent[1727]: time="26-06-2020 15:03:17" level=warning msg="Error registering host: Post http://assiste> The latest fix got passed the above error, but still the cluster deployment got stuck at a later stage. Need to test again on a clean environment. *** Bug 1856336 has been marked as a duplicate of this bug. *** https://bugzilla.redhat.com/show_bug.cgi?id=1856336 is a gap I missed in the fix for this, so I moved the bug back to POST and opened https://github.com/openshift/cluster-etcd-operator/pull/388 to hopefully complete the overall fix. *** Bug 1856346 has been marked as a duplicate of this bug. *** close it according to comments 12. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196 |