Bug 1857158
Summary: | OpenStack IPI bootstrap fails when cluster name contains period | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Michal Jurc <mjurc> |
Component: | Installer | Assignee: | Mike Fedosin <mfedosin> |
Installer sub component: | OpenShift on OpenStack | QA Contact: | David Sanz <dsanzmor> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | low | ||
Priority: | low | CC: | adahiya, bleanhar, egarcia, m.andre, mfedosin, michal_mazurek, pprinett |
Version: | 4.5 | ||
Target Milestone: | --- | ||
Target Release: | 4.6.0 | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: |
Cause: OpenShift can't parse cluster names with the '.' symbol to generate keepalived configuration. All data after the first '.' is truncated.
Consequence: When users tried to provision clusters on OpenStack with '.' in their names, installation failed at bootstrap with a vague error message.
Fix: Forbid the '.' in cluster names for OpenStack platform.
Result: If the cluster name contains '.' users will see a correct error message on the initial stages of installation.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2020-10-27 16:14:38 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Michal Jurc
2020-07-15 09:47:52 UTC
This is not urgent and I believe the outcome will simply be to validate that cluster name doesn't contain a dot, so for now simply choose a cluster name without a dot as a workaround. Moving to openstack to fix the validation for openstack Is this really just an OpenStack problem, though? It feels like the validation should be common to all platforms. I suppose that's because the resulting hostname for the nodes wouldn't be valid. In which case, we should match against the [a-z0-9-]+ regex. Possibly related to https://github.com/openshift/installer/pull/3900 @Martin, yep, this is specific to OpenStack. By default in the installer it is allowed to have dots in the cluster name https://github.com/openshift/installer/blob/832a6b5d31641ee99501e4fb5b6bc9acf8188741/pkg/validate/validate_test.go#L29 Lowering the severity as there is an easy workaround. Postponing to an upcoming sprint. Could this be at least documented ASAP? The installer fails with very unhelpful message and none of the node logs really hint at what's going on, which makes the detection of the underlying cause very problematic for the end user. (In reply to Mike Fedosin from comment #6) > @Martin, yep, this is specific to OpenStack. By default in the installer it > is allowed to have dots in the cluster name > https://github.com/openshift/installer/blob/ > 832a6b5d31641ee99501e4fb5b6bc9acf8188741/pkg/validate/validate_test.go#L29 Assuming that the unit test actually reflect what the installer supports. Would be good to understand a bit better what is going on for OpenStack. Let's merge the OpenStack check and leave the issue open while we investigate the root cause. The problem seems to be related to the keepalived configuration file when the cluster name contains a dot. On the bootstrap node, the VRRP instance name correctly contains the cluster name, while it's truncated to the dot for master nodes, resulting in separate VRRP groups. On bootstrap node: VRRP_Instance(m.andre_API) Transition to MASTER STATE On master node: VRRP_Instance(m_API) Transition to MASTER STATE It turns out we pass a `--cluster-config` argument to runtimecfg render keepalived config file for for the bootstrap node and not for the master nodes. And it first tries to get the cluster name from cluster-config.yml, otherwise it calls the GetKubeconfigClusterNameAndDomain() function that splits on the dot. https://github.com/openshift/baremetal-runtimecfg/blob/d5fd996/pkg/config/node.go#L247-L255 If there is no reliable way of getting the cluster name, we should forbid the dot in the cluster name for all on-prem platforms, like we did for OpenStack in https://github.com/openshift/installer/pull/3934 *** Bug 1859290 has been marked as a duplicate of this bug. *** Verified on 4.6.0-0.nightly-2020-08-13-091737 08-13 14:20:12 level=fatal msg="failed to fetch Metadata: failed to load asset \"Install Config\": invalid \"install-config.yaml\" file: metadata.name: Invalid value: \"mrnd-134.6\": cluster name can't contain \".\" character" Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196 |