Bug 1732984
Summary: | GCP RHCOS image cannot accept hostnames greater than 64 characters | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Abhinav Dahiya <adahiya> |
Component: | RHCOS | Assignee: | Steve Milner <smilner> |
Status: | CLOSED WONTFIX | QA Contact: | Micah Abbott <miabbott> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | 4.2.0 | CC: | bbreard, dustymabe, imcleod, jligon, lucab, nstielau, walters |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2019-07-26 12:49:02 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Abhinav Dahiya
2019-07-24 21:53:08 UTC
For reference, systemd-networkd does truncate the hostname to the first dot or to `HOST_MAX_LEN` (whatever comes earlier) when receiving an overlong one from DHCP: https://github.com/systemd/systemd/pull/7616 The hostname is invalid according to systemd https://github.com/systemd/systemd/issues/3979#issuecomment-240887597 but we'll take a look. > Based on https://github.com/coreos/bugs/issues/2273 That's a Container Linux bug... > it seems like GKE doesn't have this problem and we probably should also be accepting this. Which doesn't really have a direct relationship with GKE. I don't think we should truncate; seems highly likely to cause problems with node identity, CSR signing etc. I think the installer PR to use shorter names is the right thing here short term. *Longer* term I think we should have a better concept of node identity than hostnames; basically the MCO/machineAPI would combine to own this, and the injected Ignition would include bits to control the hostname or so. That said I am spinning up a GKE cluster right now to see what they do, out of curiosity. > I don't think we should truncate; seems highly likely to cause problems with node identity, CSR signing etc. based on https://cloud.google.com/compute/docs/internal-dns#instance-fully-qualified-domain-names the fqdn that RHCOS will receive for instance on GCP is `[INSTANCE_NAME].[ZONE].c.[PROJECT_ID].internal` so the bits (except the instance name) will be of length 23 (longest zone name northamerica-northeast1) + 1 (c) + 30 (max project id) + 8 (internal) + 4 (dots) = 66 which is longer than the HOST_NAME_MAX of 64. SO it looks like the hostname has to truncated from FQDN to first dns label... for RHCOS to be GCP supported. > I don't think we should truncate; seems highly likely to cause problems with node identity, CSR signing etc. Yes we definitely need the kubelet to register with FQDN of the instance because the node-name in the cluster currently needs to be resolvable inside the cluster. but if we truncate the kubelet on GCP will use the `os.GetHostname` for the node-name https://github.com/kubernetes/kubernetes/blob/81684586dba9ee4d446c624e91d2a82346f022df/staging/src/k8s.io/legacy-cloud-providers/gce/gce_instances.go#L356-L360 So we might have to edit our kubelet service on GCP to use the `--hostname-override` flag to set the node-name to be registered as the FQDN of the instance. I was curious to poke at the current state of GKE, so I just spun up a cluster there. (One side note, apparently `oc` can't work with GCE auth...and Fedora ships /usr/bin/kubectl -> oc... had to build upstream kubectl) Spun up a privileged pod and chrooted into the host, and the hostname is just: gke-walters-test-default-pool-39fad701-fmbd From what I can tell, that's from cloud-init, not DHCP. Jul 25 16:05:49 gke-walters-test-default-pool-39fad701-fmbd cloud-init[1223]: [CLOUDINIT] url_helper.py[DEBUG]: [0/6] open 'http://metadata.google.internal/computeMetadata/v1/instance/hostname' with {'url': 'http://metadata.google.internal/computeMetadata/v1/instance/hostname', 'headers': {'X-Google-Metadata-Request': 'True'}, 'allow_redirects': True, 'method': 'GET'} configuration Jul 25 16:05:49 gke-walters-test-default-pool-39fad701-fmbd cloud-init[1223]: [CLOUDINIT] url_helper.py[DEBUG]: Read from http://metadata.google.internal/computeMetadata/v1/instance/hostname (200, 74b) after 1 attempts So, perhaps RHCOS should do the same? And here by "RHCOS" I really mean Afterburn https://github.com/coreos/afterburn/ which is doing a similar thing for Azure. > So, perhaps RHCOS should do the same? I'd rather not. The DHCP provides the hostname for the node, that's the authoritative source of truth. If we want to statically override the hostname that's a legit customization, and Afterburn supports that (on GCP too, see `AFTERBURN_GCP_HOSTNAME` and `--hostname`). However by default we don't do that as it would introduce a two-general-problem regarding the source of truth of a machine hostname. Especially in case of failures/bugs in Afterburn or in the metadata service. > which is doing a similar thing for Azure It is not. On Azure, the DHCP does not provide the hostname for the node. As such, we are forced to hack around via Afterburn. > It is not. On Azure, the DHCP does not provide the hostname for the node. As such, we are forced to hack around via Afterburn. ... > I'd rather not. The DHCP provides the hostname for the node, that's the authoritative source of truth. Right, fair enough! RESOLVED => DHCP |