Bug 2084471

Summary: Capital letters in install-config.yaml .platform.baremetal.hosts[].name cause bootkube errors
Product: OpenShift Container Platform Reporter: Omer Tuchfeld <otuchfel>
Component: InstallerAssignee: Tudor Domnescu <tdomnesc>
Installer sub component: OpenShift on Bare Metal IPI QA Contact: Jad Haj Yahya <jhajyahy>
Status: CLOSED ERRATA Docs Contact: Mike Pytlak <mpytlak>
Severity: low    
Priority: low CC: janders, lshilin, mpytlak, tdomnesc, wking
Version: 4.9Keywords: Triaged
Target Milestone: ---   
Target Release: 4.12.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Previously, installing a cluster on bare metal failed silently with booktube stuck in an error loop if a host name that was specified in the `install-config.yaml` file contained a capital letter. This update improves how the installation program handles this condition, exiting early with an obvious error message about the host name formatting requirements. (link:https://bugzilla.redhat.com/show_bug.cgi?id=2084471[*BZ#2084471*])
Story Points: ---
Clone Of:
: 2090901 (view as bug list) Environment:
Last Closed: 2023-01-17 19:48:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Omer Tuchfeld 2022-05-12 08:28:00 UTC
Version:
4.10.11

$ openshift-install version
>built from commit 08bc665c50ff867ffd81cfe8f485f2b7c501506b
>release image quay.io/openshift-release-dev/ocp->release@sha256:0dc1a4b4d9ea7954987f63e506474a4f0dc55e5f1ea5c1f6f1179e2c09eaffda
>release architecture amd64

Platform:
baremetal

Please specify:
I believe this affects both of IPI and UPI, in this case Assisted Installer was used when this bug was noticed
* IPI (automated install with `openshift-install`. If you don't know, then it's IPI)
* UPI (semi-manual installation on customized infrastructure)

What happened?

Capital letters in install-config.yaml .platform.baremetal.hosts[].name caused bootkube errors

bootkube stuck looping with the following error on repeat:

>May 09 17:12:10 My-node-0 bootkube.sh[35010]: "99_openshift-cluster-api_hosts-4.yaml": failed to create baremetalhosts.v1alpha1.metal3.io/My-node-0 -n openshift-machine-api: BareMetalHost.metal3.io "My-node-0" is invalid: metadata.name: Invalid value: "My-node-0": a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')

This happens because those install config names are used directly as BMH .metadata.name's, see [1]

What did you expect to happen?

Either one of:

1. The installer should transform the names to become valid k8s resource identifiers and make sure that there are no collisions post-transformation
or
2. The installer should have failed at a much earlier stage, with clear validation errors on those names

How to reproduce it (as minimally and precisely as possible)?

Use capital letters in install config metal host names

Anything else we need to know?
-

[1] https://github.com/openshift/installer/blob/8b3d14deea2e5360565b09173e6e448dff4883b5/pkg/asset/machines/baremetal/hosts.go#L96

Comment 1 W. Trevor King 2022-05-12 19:11:33 UTC
Trying to pin down the lowercase requirement [1]:

  The syntax of a legal Internet host name was specified in RFC-952
  [DNS:4].  One aspect of host name syntax is hereby changed: the
  restriction on the first character is relaxed to allow either a
  letter or a digit.  Host software MUST support this more liberal
  syntax.

And [2]:

  No distinction is made between upper and lower case.

And moving over to [3]:

  For all parts of the DNS that are part of the official protocol, all
  comparisons between character strings (e.g., labels, domain names, etc.)
  are done in a case-insensitive manner.  At present, this rule is in
  force throughout the domain system without exception.  However, future
  additions beyond current usage may need to use the full binary octet
  capabilities in names, so attempts to store domain names in 7-bit ASCII
  or use of special bytes to terminate labels, etc., should be avoided.

  When data enters the domain system, its original case should be
  preserved whenever possible.

Rooting around upstream turned up [4] which replaced "RFC 1123" with "DNS-1123" (I don't understand why), pointing at [5] and claiming:

  #39635 was rejected because it wasn't clear to the author (me) that lower-case DNS labels are in fact a Kubernetes requirement rather than from the DNS RFC 1035 or/and DNS RFC 1123.

Then [6] moved us to "a lowercase RFC 1123 subdomain", which I expect means "Kubernetes requires RFC 1123 compliance and extend that to also require lowercasing".

So we should probably slot in k8s.io/apimachinery/pkg/util/validation's IsDNS1123Subdomain (which we already use [7]) for this field too, so we enforce both the RFC constrains and Kube's case extension.

[1]: https://www.rfc-editor.org/rfc/rfc1123.html#section-2
[2]: https://www.rfc-editor.org/rfc/rfc952
[3]: https://www.rfc-editor.org/rfc/rfc1035#section-2.3.3
[4]: https://github.com/kubernetes/kubernetes/pull/39675
[5]: https://github.com/kubernetes/kubernetes/pull/39635#issuecomment-271404975
[6]: https://github.com/kubernetes/kubernetes/pull/94182
[7]: https://github.com/openshift/installer/blob/c7bd7993409f003bc0fb9105d7231253f546f1cd/pkg/validate/validate.go#L49

Comment 2 Jacob Anders 2022-06-07 05:23:34 UTC
Triage notes: I think this is a valid request, however I'd think most operators would just use lowercase without thinking about it much hence setting severity/urgency to low.

@wking: thanks heaps for thorough research on the topic and extensive references, it will be very useful while looking at a fix.

Comment 6 errata-xmlrpc 2023-01-17 19:48:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.12.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:7399