Description of problem: During a recent cluster upgrade from v4.5.11 to v4.5.16, we have noticed a situation occur which seems to be very similar to BZ 1746968. We have also observed this at least one time in the past on a separate cluster again, again on v4.5. The upgrade got stuck as etcd was unable to schedule properly, due to multiple master nodes having the label "kubernetes.io/hostname=localhost", which impacted etcd's anti-affinity scheduling. On further investigation of the nodes with the problem (multiple masters and workers), it was discovered that /etc/hostname was a directory and that systemd-hostname was not running properly. As a remediation, we were able to follow the steps of BZ1746968: - Rm /etc/hostname - Restart systemd-hostnamed - Restart kubelet We suspect that bug may be reoccuring due to the similarity of the symptoms. Version-Release number of selected component (if applicable): Upgrading from 4.5.11 to 4.5.16 How reproducible: This problem has occurred in at least two separate cluster upgrades on v4.5.
Moving to the containers team as the collector no longer mounts `/etc/hostname` per https://github.com/openshift/cluster-logging-operator/pull/239 associated with the bug in question. Additional confirmation of the mounts from the daemonset: /etc/docker from dockerdaemoncfg (ro) /etc/fluent/configs.d/secure-forward from secureforwardconfig (ro) /etc/fluent/configs.d/syslog from syslogconfig (ro) /etc/fluent/configs.d/user from config (ro) /etc/fluent/keys from certs (ro) /etc/fluent/metrics from collector-metrics (rw) /etc/localtime from localtime (ro) /etc/ocp-forward from secureforwardcerts (ro) /etc/ocp-syslog from syslogcerts (ro) /etc/pki/ca-trust/extracted/pem/ from fluentd-trusted-ca-bundle (ro) /etc/sysconfig/docker from dockercfg (ro) /opt/app-root/src/run.sh from entrypoint (ro,path="run.sh") /run/log/journal from runlogjournal (rw) /var/lib/docker from varlibdockercontainers (ro) /var/lib/fluentd from filebufferstorage (rw) /var/log from varlog (rw) /var/run/ocp-collector/secrets/fluentd from clo-default-output-es (rw) /var/run/secrets/kubernetes.io/serviceaccount from logcollector-token-k5vkk (ro) and volumes: Volumes: runlogjournal: Type: HostPath (bare host directory volume) Path: /run/log/journal HostPathType: varlog: Type: HostPath (bare host directory volume) Path: /var/log HostPathType: varlibdockercontainers: Type: HostPath (bare host directory volume) Path: /var/lib/docker HostPathType: config: Type: ConfigMap (a volume populated by a ConfigMap) Name: fluentd Optional: false secureforwardconfig: Type: ConfigMap (a volume populated by a ConfigMap) Name: secure-forward Optional: true secureforwardcerts: Type: Secret (a volume populated by a Secret) SecretName: secure-forward Optional: true syslogconfig: Type: ConfigMap (a volume populated by a ConfigMap) Name: syslog Optional: true syslogcerts: Type: Secret (a volume populated by a Secret) SecretName: syslog Optional: true entrypoint: Type: ConfigMap (a volume populated by a ConfigMap) Name: fluentd Optional: false certs: Type: Secret (a volume populated by a Secret) SecretName: fluentd Optional: false localtime: Type: HostPath (bare host directory volume) Path: /etc/localtime HostPathType: dockercfg: Type: HostPath (bare host directory volume) Path: /etc/sysconfig/docker HostPathType: dockerdaemoncfg: Type: HostPath (bare host directory volume) Path: /etc/docker HostPathType: filebufferstorage: Type: HostPath (bare host directory volume) Path: /var/lib/fluentd HostPathType: collector-metrics: Type: Secret (a volume populated by a Secret) SecretName: fluentd-metrics Optional: false clo-default-output-es: Type: Secret (a volume populated by a Secret) SecretName: fluentd Optional: false fluentd-trusted-ca-bundle: Type: ConfigMap (a volume populated by a ConfigMap) Name: fluentd-trusted-ca-bundle Optional: false logcollector-token-k5vkk: Type: Secret (a volume populated by a Secret) SecretName: logcollector-token-k5vkk Optional: false
oops wrong one
For docker compatibility, cri-o needs to create a directory at a volume host path if it does not currently exist. I think the only way to fix this would be to have cri-o start after `systemd-hostnamed.service` (or mandate no containers bind-mount /etc/hostname, which is not as simple to do...)
I grepped through the must gather, 'etc/hostname' does not show up anywhere. I grepped through our pods and deployments from a recent AWS CI run, no occurrence there either. I took a look at a master and worker in 4.7. Neither of these even has /etc/hostname present, everything is running fine. systemd-hostnamed was found not running. I restarted systemd-hostnamed, and no change, now stays running. So, I created /etc/hostname as an empty dir, and restarted systemd-hostnamed, and that caused it to fail. I remove the dir, restarted, and it works again. Unfortunately, I don't think ensuring systemd-hostnamed starts first will accomplish what we want. We need to ensure we put a file there so a directory doesn't get created instead. I created an empty file, systemd-hostnamed didn't populate the file with anything, but it also didn't complain either.
This is what I see on master: sh-4.4# cat /etc/hostname ci-ln-c02zkjt-f76d1-nksxx-worker-b-bblgc sh-4.4# ls -l /etc/hostname -rw-r--r--. 1 root root 41 Mar 3 20:36 /etc/hostname sh-4.4# systemctl status systemd-hostnamed ● systemd-hostnamed.service - Hostname Service Loaded: loaded (/usr/lib/systemd/system/systemd-hostnamed.service; static; vendor preset: disabled) Active: inactive (dead) Docs: man:systemd-hostnamed.service(8) man:hostname(5) man:machine-info(5) https://www.freedesktop.org/wiki/Software/systemd/hostnamed Mar 03 20:36:43 ci-ln-c02zkjt-f76d1-nksxx-worker-b-bblgc.c.openshift-gce-devel-c systemd[1]: Starting Hostname Service... Mar 03 20:36:43 ci-ln-c02zkjt-f76d1-nksxx-worker-b-bblgc.c.openshift-gce-devel-c systemd[1]: Started Hostname Service. Mar 03 20:36:43 ci-ln-c02zkjt-f76d1-nksxx-worker-b-bblgc systemd-hostnamed[1310]: Changed static host name to 'ci-ln-c02zkjt-f76d1-nksxx-worker-b-bblgc' Mar 03 20:37:13 ci-ln-c02zkjt-f76d1-nksxx-worker-b-bblgc systemd[1]: systemd-hostnamed.service: Succeeded. Mar 03 20:37:13 ci-ln-c02zkjt-f76d1-nksxx-worker-b-bblgc systemd[1]: systemd-hostnamed.service: Consumed 270ms CPU time sh-4.4# hostname ci-ln-c02zkjt-f76d1-nksxx-worker-b-bblgc sh-4.4# uname -a Linux ci-ln-c02zkjt-f76d1-nksxx-worker-b-bblgc 4.18.0-240.15.1.el8_3.x86_64 #1 SMP Wed Feb 3 03:12:15 EST 2021 x86_64 x86_64 x86_64 GNU/Linux sh-4.4# exit exit sh-4.4# exit exit
Digged into the systemd code a bit and this is the function that is getting called: https://github.com/systemd/systemd/blob/v239/src/hostname/hostnamed.c#L443.
https://github.com/systemd/systemd/blob/v239/src/hostname/hostnamed.c#L488 writes to /etc/hostname
And we have a service here - https://github.com/openshift/machine-config-operator/blob/522f0fa36cc7b952c6e98e120c58c66e6d795544/templates/common/_base/files/usr-local-sbin-set-valid-hostname.yaml
https://github.com/openshift/machine-config-operator/blob/522f0fa36cc7b952c6e98e120c58c66e6d795544/templates/common/_base/files/usr-local-sbin-set-valid-hostname.yaml#L64 calls hostnamectl --static only if /etc/hostname exists. If instead, we always have it call hostnamectl --static in addition to the transient call then we will have a valid /etc/hostname. Anyone see any downsides to that?
Did you chroot to /host before checking that file? It was present in the debug container, but not on the system /etc/hostname Here's my terminal from the time. I saw the same on a master and worker: $ ./node-debug.sh ip-10-0-152-85.us-west-2.compute.internal Starting pod/ip-10-0-152-85us-west-2computeinternal-debug ... To use host binaries, run `chroot /host` Pod IP: 10.0.152.85 If you don't see a command prompt, try pressing enter. sh-4.2# chroot /host sh-4.4# systemctl status systemd-hostnamed ● systemd-hostnamed.service - Hostname Service Loaded: loaded (/usr/lib/systemd/system/systemd-hostnamed.service; static; vendor preset: disabled) Active: inactive (dead) Docs: man:systemd-hostnamed.service(8) man:hostname(5) man:machine-info(5) https://www.freedesktop.org/wiki/Software/systemd/hostnamed Mar 03 15:15:59 localhost systemd[1]: Starting Hostname Service... Mar 03 15:15:59 localhost systemd[1]: Started Hostname Service. Mar 03 15:15:59 ip-10-0-152-85 systemd-hostnamed[1283]: Changed host name to 'ip-10-0-152-85' Mar 03 15:16:30 ip-10-0-152-85 systemd[1]: systemd-hostnamed.service: Succeeded. Mar 03 15:16:30 ip-10-0-152-85 systemd[1]: systemd-hostnamed.service: Consumed 228ms CPU time sh-4.4# ls /etc/host host.conf hosts <---- this is tab complete output to terminal sh-4.4# ls /etc/host ls: cannot access '/etc/host': No such file or directory sh-4.4# mount | grep /etc/host tmpfs on /var/lib/containers/storage/overlay/bc2382db9bc5d480ba8c5596e0357a80235918339ea88378ee459d020e149db8/merged/etc/hostname type tmpfs (rw,nosuid,nodev,seclabel,mode=755) # ^^^ this is the debug container sh-4.4# mkdir /etc/hostname sh-4.4# systemctl restart systemd-hostnamed sh-4.4# systemctl status systemd-hostnamed ● systemd-hostnamed.service - Hostname Service Loaded: loaded (/usr/lib/systemd/system/systemd-hostnamed.service; static; vendor preset: disabled) Active: failed (Result: exit-code) since Wed 2021-03-03 21:52:59 UTC; 12s ago Docs: man:systemd-hostnamed.service(8) man:hostname(5) man:machine-info(5) https://www.freedesktop.org/wiki/Software/systemd/hostnamed Process: 264918 ExecStart=/usr/lib/systemd/systemd-hostnamed (code=exited, status=1/FAILURE) Main PID: 264918 (code=exited, status=1/FAILURE) CPU: 568ms Mar 03 21:52:58 ip-10-0-152-85 systemd[1]: Starting Hostname Service... Mar 03 21:52:59 ip-10-0-152-85 systemd-hostnamed[264918]: Failed to read hostname and machine information: Is a directory Mar 03 21:52:59 ip-10-0-152-85 systemd[1]: Started Hostname Service. Mar 03 21:52:59 ip-10-0-152-85 systemd[1]: systemd-hostnamed.service: Main process exited, code=exited, status=1/FAILURE Mar 03 21:52:59 ip-10-0-152-85 systemd[1]: systemd-hostnamed.service: Failed with result 'exit-code'. Mar 03 21:52:59 ip-10-0-152-85 systemd[1]: systemd-hostnamed.service: Consumed 568ms CPU time sh-4.4# hostnamectl Failed to query system properties: Message recipient disconnected from message bus without replying sh-4.4# hostname ip-10-0-152-85 sh-4.4# rmdir /etc/hostname sh-4.4# systemctl restart systemd-hostnamed sh-4.4# systemctl status systemd-hostnamed ● systemd-hostnamed.service - Hostname Service Loaded: loaded (/usr/lib/systemd/system/systemd-hostnamed.service; static; vendor preset: disabled) Active: active (running) since Wed 2021-03-03 21:55:27 UTC; 3s ago Docs: man:systemd-hostnamed.service(8) man:hostname(5) man:machine-info(5) https://www.freedesktop.org/wiki/Software/systemd/hostnamed Main PID: 266420 (systemd-hostnam) Tasks: 1 (limit: 100092) Memory: 2.4M CPU: 531ms CGroup: /system.slice/systemd-hostnamed.service └─266420 /usr/lib/systemd/systemd-hostnamed Mar 03 21:55:27 ip-10-0-152-85 systemd[1]: Starting Hostname Service... Mar 03 21:55:27 ip-10-0-152-85 systemd[1]: Started Hostname Service. sh-4.4# ls /etc/hostname ls: cannot access '/etc/hostname': No such file or directory sh-4.4# touch /etc/hostname sh-4.4# systemctl restart systemd-hostnamed sh-4.4# systemctl status systemd-hostnamed ● systemd-hostnamed.service - Hostname Service Loaded: loaded (/usr/lib/systemd/system/systemd-hostnamed.service; static; vendor preset: disabled) Active: active (running) since Wed 2021-03-03 21:56:01 UTC; 2s ago Docs: man:systemd-hostnamed.service(8) man:hostname(5) man:machine-info(5) https://www.freedesktop.org/wiki/Software/systemd/hostnamed Main PID: 266792 (systemd-hostnam) Tasks: 1 (limit: 100092) Memory: 2.4M CPU: 589ms CGroup: /system.slice/systemd-hostnamed.service └─266792 /usr/lib/systemd/systemd-hostnamed Mar 03 21:56:00 ip-10-0-152-85 systemd[1]: Starting Hostname Service... Mar 03 21:56:01 ip-10-0-152-85 systemd[1]: Started Hostname Service. sh-4.4# cat /etc/hostname sh-4.4#
(In reply to Mrunal Patel from comment #13) > https://github.com/openshift/machine-config-operator/blob/ > 522f0fa36cc7b952c6e98e120c58c66e6d795544/templates/common/_base/files/usr- > local-sbin-set-valid-hostname.yaml#L64 calls hostnamectl --static only if > /etc/hostname exists. If instead, we always have it call hostnamectl > --static in addition to the transient call then we will have a valid > /etc/hostname. Anyone see any downsides to that? @behoward Since you did the implementation of this dispatcher script, could you weigh in on the suggestion?
still working through a solution
I need to check in with folks about the state of this. EOD Friday is probably not the best time to expect that to happen.
Basically, privileged containers shouldn't mount `/etc/hostname`. Instead, read `/proc/sys/kernel/hostname` to find the dynamic hostname value. I could imagine the containers stack special casing the /etc/hostname mount and not making it into an empty directory. But host-side workarounds for this would involve *another* thing being in the hostname business which we really don't want. So my recommendation is basically close this WONTFIX.
I talked with Mrunal about this, and we came up with a compromise. The problem is, we really shouldn't be bricking the nodes if we can avoid it. However, we don't want to add another entity to bicker about the hostname. We came up with a solution where we special-case /etc/hostname in cri-o. If we find /etc/hostname and see that it does not exist on the host, we fail on container/pod creation. That way, a user gets a helpful message explaining how they messed up, but it doesn't brick the node. They can either mount /proc/sys/kernel/hostname, or create /etc/hostname themselves to work around it.
attached PR, plus a PR in MCO, will implement the above behavior
4.8 version attached (will still need a MCO PR once this merges)
*** Bug 1960691 has been marked as a duplicate of this bug. ***
Anyone affected by this should be able to work around it with something like this: ``` $ for node in $(oc get nodes -o name); do (set -x; oc debug -q $node -- rmdir /host/etc/hostname); done ``` `rmdir` will (harmlessly and correctly) fail if it's not a directory, so only broken nodes should be changed.
Checked on 4.8.0-0.nightly-2021-06-10-224448. crio conf now have set absent_mount_sources_to_reject = "/etc/hostname" $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.8.0-0.nightly-2021-06-10-224448 True False 3h27m Cluster version is 4.8.0-0.nightly-2021-06-10-224448 sh-4.4# cat /etc/crio/crio.conf.d/00-default [crio] internal_wipe = true [crio.api] stream_address = "" stream_port = "10010" [crio.runtime] conmon = "/usr/libexec/crio/conmon" conmon_cgroup = "pod" default_env = [ "NSS_SDB_USE_CACHE=no", ] log_level = "info" cgroup_manager = "systemd" default_sysctls = [ "net.ipv4.ping_group_range=0 2147483647", ] hooks_dir = [ "/etc/containers/oci/hooks.d", "/run/containers/oci/hooks.d", ] manage_ns_lifecycle = true absent_mount_sources_to_reject = [ "/etc/hostname", ] [crio.image] global_auth_file = "/var/lib/kubelet/config.json" pause_image = "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:d3dc6f2e6de660f92199ac81e9160f28518471fc04faffde6addc3f4998c8a88" pause_image_auth_file = "/var/lib/kubelet/config.json" pause_command = "/usr/bin/pod" [crio.network] network_dir = "/etc/kubernetes/cni/net.d/" plugin_dirs = [ "/var/lib/cni/bin", "/usr/libexec/cni", ] [crio.metrics] enable_metrics = true metrics_port = 9537
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days