Bug 1843722 - Kubelet complains when truncated node name ends in a period.
Summary: Kubelet complains when truncated node name ends in a period.
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 4.5
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: ---
: 4.7.0
Assignee: Harshal Patil
QA Contact: Sunil Choudhary
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-06-03 22:54 UTC by Jeremiah Stuever
Modified: 2021-01-06 07:24 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-10-23 14:33:16 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Jeremiah Stuever 2020-06-03 22:54:18 UTC
Description of problem:

When the hostname of a node has a period in the 65th character, it gets truncated at 65 characters and then the hostname ends in a period. This causes kubelet.service to complain about the DNS-1123 compatibility of the spec.Nodename in some pod manifests.

How reproducible:

Always, when the circumstances are met.

Steps to Reproduce:
1. Generate a cluster name that will cause the bootstrap node hostname to have a period at the 65th character. (e.g. 'js-tu-jstuever-123456' in project openshift-dev-installer).
2. Deploy a GCP IPI cluster using this cluster name.

Actual results:

INFO Waiting up to 20m0s for the Kubernetes API at https://api.js-tu-jstuever-123456.installer.gcp.devcluster.openshift.com:6443...                                                                                                           
ERROR Attempted to gather ClusterOperator status after installation failure: listing ClusterOperator objects: the server could not find the requested resource (get clusteroperators.config.openshift.io) 
INFO Pulling debug logs from the bootstrap machine 
INFO Bootstrap gather logs captured here "/home/jstuever/gcp_upi/assetsipi/log-bundle-20200603152251.tar.gz" 
FATAL Bootstrap failed to complete: failed waiting for Kubernetes API: the server could not find the requested resource


bootstrap/journals/kubelet.log:

Jun 03 22:03:21 js-tu-jstuever-123456-pss9n-bootstrap.c.openshift-dev-installer. hyperkube[2239]: E0603 22:03:21.999500    2239 file.go:108] Unable to process watch event: can't process config file "/etc/kubernetes/manifests/etcd-member-pod.yaml": invalid pod: [metadata.name: Invalid value: "etcd-bootstrap-member-js-tu-jstuever-123456-pss9n-bootstrap.c.openshift-dev-installer.a DNS-1123 subdomain":  must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*') spec.nodeName: Invalid value: "js-tu-jstuever-123456-pss9n-bootstrap.c.openshift-dev-installer.": a DNS-1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')]


Expected results:

Cluster bootstrap should be successful.

Additional info:

This was previously masked by a more strict truncation of the cluster name, which was relaxed in https://github.com/openshift/installer/pull/3544.

Comment 2 Abhinav Dahiya 2020-06-04 00:11:04 UTC
For static pods the kubelet creates a name for the pod using `metadata.name` from static pod manifest + node name. Now since the max length of pod names is 63 characters, it truncates this to that length. The truncation should fix keep to prevent this error..

```
metadata.name: Invalid value: "etcd-bootstrap-member-js-tu-jstuever-123456-pss9n-bootstrap.c.openshift-dev-installer.a DNS-1123 subdomain":  must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')
```

Comment 5 Harshal Patil 2020-06-17 10:40:32 UTC
I will look into it in coming sprint.

Comment 6 Harshal Patil 2020-07-01 06:45:06 UTC
Moving this installer team to make sure the valid cluster name is generated that doesn't cause kubelet validation to fail. Maybe it would be worth taking a closer look at effects of, https://github.com/openshift/installer/pull/3544

Comment 7 Abhinav Dahiya 2020-07-01 07:11:04 UTC
The installer is generating valid cluster names. The kubelet is adding pod name to the hostname and wrongly truncating when creating mirror pods for static pods.

Moving back to Node

Comment 18 To Hung Sze 2020-09-18 21:14:39 UTC
I used openshift-install-linux-4.6.0-0.nightly-2020-09-18-071428 with cluster name: tszegcp91820192char and the cluster failed to complete install.
Bootstrap finished but install failed with this:
NFO Cluster operator ingress Progressing is True with Reconciling: Not all ingress controllers are available.
ERROR Cluster operator ingress Degraded is True with IngressControllersDegraded: Some ingresscontrollers are degraded: default
INFO Cluster operator insights Disabled is False with AsExpected:
INFO Cluster operator kube-storage-version-migrator Available is False with _NoMigratorPod: Available: deployment/migrator.openshift-kube-storage-version-migrator: no replicas are available
INFO Cluster operator monitoring Progressing is True with RollOutInProgress: Rolling out the stack.
ERROR Cluster operator monitoring Degraded is True with UpdatingAlertmanagerFailed: Failed to rollout the stack. Error: running task Updating Alertmanager failed: waiting for Alertmanager Route to become ready failed: waiting for route openshift-monitoring/alertmanager-main: no status available
INFO Cluster operator monitoring Available is False with :
FATAL failed to initialize the cluster: Some cluster operators are still updating: authentication, console, image-registry, ingress, kube-storage-version-migrator, monitoring

Worker's journalctl -u kubelet showed:
Sep 18 20:30:05 tszegcp91820192char-psf6h-worker-c-kk5mg.c.openshift-qe.internal hyperkube[20508]: E0918 20:30:05.496933   20508 kubelet_node_status.go:92] Unable to register node "tszegcp91820192char-psf6h-worker-c-kk5mg.c.openshift-qe.internal" with API server: Node "tszegcp91820192char-psf6h-worker-c-kk5mg.c.openshift-qe.internal" is invalid: metadata.labels: Invalid value: "tszegcp91820192char-psf6h-worker-c-kk5mg.c.openshift-qe.internal": must be no more than 63 characters
Sep 18 20:30:05 tszegcp91820192char-psf6h-worker-c-kk5mg.c.openshift-qe.internal hyperkube[20508]: E0918 20:30:05.582516   20508 kubelet.go:2190] node "tszegcp91820192char-psf6h-worker-c-kk5mg.c.openshift-qe.internal" not found
Sep 18 20:30:05 tszegcp91820192char-psf6h-worker-c-kk5mg.c.openshift-qe.internal hyperkube[20508]: E0918 20:30:05.682685   20508 kubelet.go:2190] node "tszegcp91820192char-psf6h-worker-c-kk5mg.c.openshift-qe.internal" not found
Sep 18 20:30:05 tszegcp91820192char-psf6h-worker-c-kk5mg.c.openshift-qe.internal hyperkube[20508]: E0918 20:30:05.782873   20508 kubelet.go:2190] node "tszegcp91820192char-psf6h-worker-c-kk5mg.c.openshift-qe.internal" not found

https://bugzilla.redhat.com/show_bug.cgi?id=1844613 indicates that name itself is not a problem. 
Hence adding the info here.

Comment 19 To Hung Sze 2020-09-28 15:50:00 UTC
There is another defect about long name just in case its related to this one:
https://bugzilla.redhat.com/show_bug.cgi?id=1872885


Note You need to log in before you can comment on or make changes to this bug.