Bug 2090901 - Capital letters in install-config.yaml .platform.baremetal.hosts[].name cause bootkube errors
Summary: Capital letters in install-config.yaml .platform.baremetal.hosts[].name cause...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Advanced Cluster Management for Kubernetes
Classification: Red Hat
Component: Infrastructure Operator
Version: rhacm-2.6
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: rhacm-2.6
Assignee: Crystal Chun
QA Contact: Chad Crum
Derek
URL:
Whiteboard:
Depends On:
Blocks: 2090903
TreeView+ depends on / blocked
 
Reported: 2022-05-26 20:59 UTC by Crystal Chun
Modified: 2022-09-06 22:31 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 2084471
Environment:
Last Closed: 2022-09-06 22:30:54 UTC
Target Upstream Version:
Embargoed:
cbynum: rhacm-2.6+
cbynum: rhacm-2.6.z+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift assisted-service pull 3821 0 None Merged Bug 2090901: Disallow capital letters from hostnames 2022-08-15 14:26:10 UTC
Github stolostron backlog issues 22796 0 None None None 2022-05-26 22:59:43 UTC
Red Hat Issue Tracker MGMTBUGSM-415 0 None None None 2022-06-06 08:48:56 UTC
Red Hat Product Errata RHSA-2022:6370 0 None None None 2022-09-06 22:31:11 UTC

Description Crystal Chun 2022-05-26 20:59:57 UTC
+++ This bug was initially created as a clone of Bug #2084471 +++

Version:
4.10.11

$ openshift-install version
>built from commit 08bc665c50ff867ffd81cfe8f485f2b7c501506b
>release image quay.io/openshift-release-dev/ocp->release@sha256:0dc1a4b4d9ea7954987f63e506474a4f0dc55e5f1ea5c1f6f1179e2c09eaffda
>release architecture amd64

Platform:
baremetal

Please specify:
I believe this affects both of IPI and UPI, in this case Assisted Installer was used when this bug was noticed
* IPI (automated install with `openshift-install`. If you don't know, then it's IPI)
* UPI (semi-manual installation on customized infrastructure)

What happened?

Capital letters in install-config.yaml .platform.baremetal.hosts[].name caused bootkube errors

bootkube stuck looping with the following error on repeat:

>May 09 17:12:10 My-node-0 bootkube.sh[35010]: "99_openshift-cluster-api_hosts-4.yaml": failed to create baremetalhosts.v1alpha1.metal3.io/My-node-0 -n openshift-machine-api: BareMetalHost.metal3.io "My-node-0" is invalid: metadata.name: Invalid value: "My-node-0": a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')

This happens because those install config names are used directly as BMH .metadata.name's, see [1]

What did you expect to happen?

Either one of:

1. The installer should transform the names to become valid k8s resource identifiers and make sure that there are no collisions post-transformation
or
2. The installer should have failed at a much earlier stage, with clear validation errors on those names

How to reproduce it (as minimally and precisely as possible)?

Use capital letters in install config metal host names

Anything else we need to know?
-

[1] https://github.com/openshift/installer/blob/8b3d14deea2e5360565b09173e6e448dff4883b5/pkg/asset/machines/baremetal/hosts.go#L96

--- Additional comment from W. Trevor King on 2022-05-12 19:11:33 UTC ---

Trying to pin down the lowercase requirement [1]:

  The syntax of a legal Internet host name was specified in RFC-952
  [DNS:4].  One aspect of host name syntax is hereby changed: the
  restriction on the first character is relaxed to allow either a
  letter or a digit.  Host software MUST support this more liberal
  syntax.

And [2]:

  No distinction is made between upper and lower case.

And moving over to [3]:

  For all parts of the DNS that are part of the official protocol, all
  comparisons between character strings (e.g., labels, domain names, etc.)
  are done in a case-insensitive manner.  At present, this rule is in
  force throughout the domain system without exception.  However, future
  additions beyond current usage may need to use the full binary octet
  capabilities in names, so attempts to store domain names in 7-bit ASCII
  or use of special bytes to terminate labels, etc., should be avoided.

  When data enters the domain system, its original case should be
  preserved whenever possible.

Rooting around upstream turned up [4] which replaced "RFC 1123" with "DNS-1123" (I don't understand why), pointing at [5] and claiming:

  #39635 was rejected because it wasn't clear to the author (me) that lower-case DNS labels are in fact a Kubernetes requirement rather than from the DNS RFC 1035 or/and DNS RFC 1123.

Then [6] moved us to "a lowercase RFC 1123 subdomain", which I expect means "Kubernetes requires RFC 1123 compliance and extend that to also require lowercasing".

So we should probably slot in k8s.io/apimachinery/pkg/util/validation's IsDNS1123Subdomain (which we already use [7]) for this field too, so we enforce both the RFC constrains and Kube's case extension.

[1]: https://www.rfc-editor.org/rfc/rfc1123.html#section-2
[2]: https://www.rfc-editor.org/rfc/rfc952
[3]: https://www.rfc-editor.org/rfc/rfc1035#section-2.3.3
[4]: https://github.com/kubernetes/kubernetes/pull/39675
[5]: https://github.com/kubernetes/kubernetes/pull/39635#issuecomment-271404975
[6]: https://github.com/kubernetes/kubernetes/pull/94182
[7]: https://github.com/openshift/installer/blob/c7bd7993409f003bc0fb9105d7231253f546f1cd/pkg/validate/validate.go#L49

Comment 1 Trey West 2022-08-24 16:17:18 UTC
On 2.6.0-DOWNSTREAM-2022-08-15-19-04-09, I see the correct error message on the agents: 
{
    "lastTransitionTime": "2022-08-24T16:07:50Z",
    "message": "The Spec could not be synced due to an input error: Hostname does not pass required regex validation: ^[a-z0-9][a-z0-9-]{0,62}(?:[.][a-z0-9-]{1,63})*$. Hostname: test-Master-0-0",
    "reason": "InputError",
    "status": "False",
    "type": "SpecSynced"
},

However, the AgentClusterInstall is in ready state:
NAME     CLUSTER   STATE
test-0   test-0    ready

@cchun, is this expected behavior?

Comment 2 Crystal Chun 2022-08-25 20:47:38 UTC
@trwest May I see the current status and state of the agents?

Comment 3 Trey West 2022-08-25 22:07:06 UTC
Found that this is expected behavior because the agent was falling back to a valid hostname provided by libvirt

Comment 6 errata-xmlrpc 2022-09-06 22:30:54 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat Advanced Cluster Management 2.6.0 security updates and bug fixes), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:6370


Note You need to log in before you can comment on or make changes to this bug.