+++ This bug was initially created as a clone of Bug #2084471 +++ Version: 4.10.11 $ openshift-install version >built from commit 08bc665c50ff867ffd81cfe8f485f2b7c501506b >release image quay.io/openshift-release-dev/ocp->release@sha256:0dc1a4b4d9ea7954987f63e506474a4f0dc55e5f1ea5c1f6f1179e2c09eaffda >release architecture amd64 Platform: baremetal Please specify: I believe this affects both of IPI and UPI, in this case Assisted Installer was used when this bug was noticed * IPI (automated install with `openshift-install`. If you don't know, then it's IPI) * UPI (semi-manual installation on customized infrastructure) What happened? Capital letters in install-config.yaml .platform.baremetal.hosts[].name caused bootkube errors bootkube stuck looping with the following error on repeat: >May 09 17:12:10 My-node-0 bootkube.sh[35010]: "99_openshift-cluster-api_hosts-4.yaml": failed to create baremetalhosts.v1alpha1.metal3.io/My-node-0 -n openshift-machine-api: BareMetalHost.metal3.io "My-node-0" is invalid: metadata.name: Invalid value: "My-node-0": a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*') This happens because those install config names are used directly as BMH .metadata.name's, see [1] What did you expect to happen? Either one of: 1. The installer should transform the names to become valid k8s resource identifiers and make sure that there are no collisions post-transformation or 2. The installer should have failed at a much earlier stage, with clear validation errors on those names How to reproduce it (as minimally and precisely as possible)? Use capital letters in install config metal host names Anything else we need to know? - [1] https://github.com/openshift/installer/blob/8b3d14deea2e5360565b09173e6e448dff4883b5/pkg/asset/machines/baremetal/hosts.go#L96 --- Additional comment from W. Trevor King on 2022-05-12 19:11:33 UTC --- Trying to pin down the lowercase requirement [1]: The syntax of a legal Internet host name was specified in RFC-952 [DNS:4]. One aspect of host name syntax is hereby changed: the restriction on the first character is relaxed to allow either a letter or a digit. Host software MUST support this more liberal syntax. And [2]: No distinction is made between upper and lower case. And moving over to [3]: For all parts of the DNS that are part of the official protocol, all comparisons between character strings (e.g., labels, domain names, etc.) are done in a case-insensitive manner. At present, this rule is in force throughout the domain system without exception. However, future additions beyond current usage may need to use the full binary octet capabilities in names, so attempts to store domain names in 7-bit ASCII or use of special bytes to terminate labels, etc., should be avoided. When data enters the domain system, its original case should be preserved whenever possible. Rooting around upstream turned up [4] which replaced "RFC 1123" with "DNS-1123" (I don't understand why), pointing at [5] and claiming: #39635 was rejected because it wasn't clear to the author (me) that lower-case DNS labels are in fact a Kubernetes requirement rather than from the DNS RFC 1035 or/and DNS RFC 1123. Then [6] moved us to "a lowercase RFC 1123 subdomain", which I expect means "Kubernetes requires RFC 1123 compliance and extend that to also require lowercasing". So we should probably slot in k8s.io/apimachinery/pkg/util/validation's IsDNS1123Subdomain (which we already use [7]) for this field too, so we enforce both the RFC constrains and Kube's case extension. [1]: https://www.rfc-editor.org/rfc/rfc1123.html#section-2 [2]: https://www.rfc-editor.org/rfc/rfc952 [3]: https://www.rfc-editor.org/rfc/rfc1035#section-2.3.3 [4]: https://github.com/kubernetes/kubernetes/pull/39675 [5]: https://github.com/kubernetes/kubernetes/pull/39635#issuecomment-271404975 [6]: https://github.com/kubernetes/kubernetes/pull/94182 [7]: https://github.com/openshift/installer/blob/c7bd7993409f003bc0fb9105d7231253f546f1cd/pkg/validate/validate.go#L49
On 2.6.0-DOWNSTREAM-2022-08-15-19-04-09, I see the correct error message on the agents: { "lastTransitionTime": "2022-08-24T16:07:50Z", "message": "The Spec could not be synced due to an input error: Hostname does not pass required regex validation: ^[a-z0-9][a-z0-9-]{0,62}(?:[.][a-z0-9-]{1,63})*$. Hostname: test-Master-0-0", "reason": "InputError", "status": "False", "type": "SpecSynced" }, However, the AgentClusterInstall is in ready state: NAME CLUSTER STATE test-0 test-0 ready @cchun, is this expected behavior?
@trwest May I see the current status and state of the agents?
Found that this is expected behavior because the agent was falling back to a valid hostname provided by libvirt
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: Red Hat Advanced Cluster Management 2.6.0 security updates and bug fixes), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:6370