Bug 1837124 - OpenShift Cluster fails to initialize on 4.3.z install due to a node with a hostname of localhost
Summary: OpenShift Cluster fails to initialize on 4.3.z install due to a node with a h...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: RHCOS
Version: 4.3.z
Hardware: x86_64
OS: Unspecified
high
high
Target Milestone: ---
: 4.3.z
Assignee: Ben Howard
QA Contact: Michael Nguyen
URL:
Whiteboard:
: 1837125 (view as bug list)
Depends On: 1837122
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-05-18 21:27 UTC by OpenShift BugZilla Robot
Modified: 2020-06-17 20:28 UTC (History)
15 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-06-17 20:28:32 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-api-provider-gcp pull 92 0 None closed Bug 1837124: Add the machine's name as a known NodeInternalDNS 2020-09-04 22:39:40 UTC
Github openshift machine-config-operator pull 1738 0 None closed Bug 1837124: templates: add etc-networkmanager-dispatcher.d-90-long-hostname.yaml 2020-09-04 22:39:41 UTC
Red Hat Product Errata RHBA-2020:2436 0 None None None 2020-06-17 20:28:49 UTC

Internal Links: 1843565

Comment 1 Micah Abbott 2020-05-19 15:59:38 UTC
*** Bug 1837125 has been marked as a duplicate of this bug. ***

Comment 5 Micah Abbott 2020-06-01 17:59:22 UTC
Similar to BZ#1809345, I didn't have access to an environment where I was able to create an extremely long project name, so I verified the fix using a single RHCOS node in GCP with an extremely long hostname.  Additionally, since the fixed template is only delivered when the MCO pod is running on the host, I had to craft an Ignition snippet with the dispatcher script to simulate the fixed template being used.

Ignition config:

```
$ jq . < ignition-bz1837122.json 
{
  "ignition": {
    "version": "2.2.0"
  },
  "passwd": {
    "users": [
      {
        "groups": [
          "sudo",
          "wheel",
          "adm",
          "systemd-journal"
        ],
        "name": "core",
        "passwordHash": "$6$xyz....",
        "sshAuthorizedKeys": [
          "ssh-rsa AAAAB3...."
        ]
      }
    ]
  },
  "storage": {
    "files": [
      {
        "filesystem": "root",
        "path": "/etc/NetworkManager/dispatcher.d/90-long-hostname",
        "mode": 493,
        "contents": {
          "source": "data:;base64,IyEvYmluL2Jhc2gKIwojIE9uIEdvb2dsZSBDb21wdXRlIFBsYXRmb3JtIChHQ1ApIHRoZSBob3N0bmFtZSBtYXkgYmUgdG9vIGxvbmcgKD42MyBjaGFycykuCiMgRHVyaW5nIGZpcnN0Ym9vdCB0aGUgaG9zdG5hbWUgaXMgc2V0IGluIHRoZSBpbml0cmFtZnMgYmVmb3JlIE5ldHdvcmtNYW5hZ2VyCiMgcnVuczsgb24gcmVib290IGFmZmVjdCBub2RlcyB1c2UgJ2xvY2FsaG9zdCcuIFRoaXMgaG9vayBpcyBhIHNpbXBsZSB3b3JrCiMgYXJvdW5kOiBpZiB0aGUgaG9zdCBuYW1lIGlzIGxvbmdlciB0aGFuIDYzIGNoYXJhY3RlcnMsIHRoZW4gdGhlIGhvc3RuYW1lCiMgaXMgdHJ1bmNhdGVkIGF0IHRoZSBfZmlyc3RfIGRvdC4KIwojIEFkZGl0aW9uYWxseSwgdGhpcyBob29rIGRvZXMgbm90IGJyZWFrIEROUyBvciBjbHVzdGVyIEROUyByZXNvbHV0aW9uLAojIHNpbmNlIE5ldHdvcmtNYW5hZ2VyIHNldHMgdGhlIGFwcHJvcHJpYXRlIC9ldGMvcmVzb2x2LmNvbmYgc2V0dGluZ3MuCklGPSQxClNUQVRVUz0kMgpsb2coKSB7IGxvZ2dlciAtLXRhZyAibmV0d29yay1tYW5hZ2VyLyQoYmFzZW5hbWUgJDApIiAiJHtAfSI7IH0KIyBjYXB0dXJlIGFsbCBlbGlnaWJsZSBob3N0bmFtZXMKaWYgW1sgISAiJCgvYmluL2hvc3RuYW1lKSIgPX4gKGxvY2FsaG9zdHxsb2NhbGhvc3QubG9jYWwpIF1dOyB0aGVuCiAgICBsb2cgImhvc3RuYW1lIGlzIGFscmVhZHkgc2V0IgogICAgZXhpdCAwCmZpCmlmIFtbICEgIiRTVEFUVVMiID1+ICh1cHxob3N0bmFtZXxkaGNwNC1jaGFuZ2V8ZGhjcDYtY2hhbmdlKSBdXTsgdGhlbgogICAgZXhpdCAwCmZpCmRlZmF1bHRfaG9zdD0iJHtESENQNF9IT1NUX05BTUU6LSRESENQNl9IT1NUX05BTUV9IgojIHRydW5jYXRlIHRoZSBob3N0bmFtZSB0byB0aGUgZmlyc3QgZG90IGFuZCB0aGFuIDY0IGNoYXJhY3RlcnMuCmhvc3Q9JChlY2hvICR7ZGVmYXVsdF9ob3N0fSB8IGN1dCAtZjEgLWQnLicgfCBjdXQgLWMgLTYzKQppZiBbICIkeyNkZWZhdWx0X2hvc3R9IiAtZ3QgNjMgXTsgdGhlbgogICAgbG9nICJkaXNjb3ZlcmVkIGhvc3RuYW1lIGlzIGxvbmdlciB0aGFuIHRoYW4gNjMgY2hhcmFjdGVycyIKICAgIGxvZyAidHJ1bmNhdGluZyAke2RlZmF1bHRfaG9zdH0gPT4gJHtob3N0fSIKICAgIC9iaW4vaG9zdG5hbWVjdGwgLS10cmFuc2llbnQgc2V0LWhvc3RuYW1lICIke2hvc3R9IgpmaQo="
        }
      }
    ]
  }
}
```

Instructions for booting a single RHCOS node:

```
$ gsutil cp rhcos-4.3.8-x86_64-gcp.x86_64.tar.gz gs://rhcos-devel/devel/rhcos/
$ gcloud compute images create rhcos-4-3-8 --project=openshift-rhcos-devel  --source-uri=https://storage.googleapis.com/rhcos-devel/devel/rhcos/rhcos-4.3.8-x86_64-gcp.x86_64.tar.gz
$ longname="rhcos-4-3-8-long-long-long-long-long-long-long-long-longgggg"
$ echo $longname | wc -c
61
$ gcloud compute --project=openshift-rhcos-devel instances create $longname --zone=us-central1-a --machine-type=n1-standard-1 --subnet=default --network-tier=PREMIUM --maintenance-policy=MIGRATE --service-account=439519768758-compute@developer.gserviceaccount.com --scopes=https://www.googleapis.com/auth/devstorage.read_only,https://www.googleapis.com/auth/logging.write,https://www.googleapis.com/auth/monitoring.write,https://www.googleapis.com/auth/servicecontrol,https://www.googleapis.com/auth/service.management.readonly,https://www.googleapis.com/auth/trace.append --image=rhcos-4-3-8 --boot-disk-size=16GB --boot-disk-type=pd-standard --boot-disk-device-name=rhcos-4-3-8 --image-project=openshift-rhcos-devel --metadata-from-file=user-data=/var/home/miabbott/Downloads/ignition-bz1837122.json
```

Error is observed using 4.3.8 image without the NM dispatcher script.  We get the right hostname on first boot, but on reboot, it reverts to localhost:

```
$ sshq -l core 34.70.30.2
Warning: Permanently added '34.70.30.2' (ECDSA) to the list of known hosts.
Red Hat Enterprise Linux CoreOS 43.81.202003191953.0
  Part of OpenShift 4.3, RHCOS is a Kubernetes native operating system
  managed by the Machine Config Operator (`clusteroperator/machine-config`).

WARNING: Direct SSH access to machines is not recommended; instead,
make configuration changes via `machineconfig` objects:
  https://docs.openshift.com/container-platform/4.3/architecture/architecture-rhcos.html

---
[core@rhcos-4-3-8-long-long-long-long-long-long-long-long-longgggg ~]$ hostname
rhcos-4-3-8-long-long-long-long-long-long-long-long-longgggg.c.o
[core@rhcos-4-3-8-long-long-long-long-long-long-long-long-longgggg ~]$ hostnamectl
   Static hostname: n/a
Transient hostname: rhcos-4-3-8-long-long-long-long-long-long-long-long-longgggg.c.o
         Icon name: computer-vm
           Chassis: vm
        Machine ID: d80d3e4ceb6aa4c389d1da421851a772
           Boot ID: 4f2f24307650450d9b1c5099023beb38
    Virtualization: kvm
  Operating System: Red Hat Enterprise Linux CoreOS 43.81.202003191953.0 (Ootpa)
            Kernel: Linux 4.18.0-147.5.1.el8_1.x86_64
      Architecture: x86-64
[core@rhcos-4-3-8-long-long-long-long-long-long-long-long-longgggg ~]$ ls /etc/NetworkManager/dispatcher.d/90-long-hostname
ls: cannot access '/etc/NetworkManager/dispatcher.d/90-long-hostname': No such file or directory
[core@rhcos-4-3-8-long-long-long-long-long-long-long-long-longgggg ~]$ sudo systemctl reboot
[core@rhcos-4-3-8-long-long-long-long-long-long-long-long-longgggg ~]$ Connection to 34.70.30.2 closed by remote host.
Connection to 34.70.30.2 closed.

$ sshq -l core 34.70.30.2
Warning: Permanently added '34.70.30.2' (ECDSA) to the list of known hosts.
Red Hat Enterprise Linux CoreOS 43.81.202003191953.0
  Part of OpenShift 4.3, RHCOS is a Kubernetes native operating system
  managed by the Machine Config Operator (`clusteroperator/machine-config`).

WARNING: Direct SSH access to machines is not recommended; instead,
make configuration changes via `machineconfig` objects:
  https://docs.openshift.com/container-platform/4.3/architecture/architecture-rhcos.html

---
Last login: Mon Jun  1 17:41:50 2020 from 108.49.19.84
[core@localhost ~]$ hostname
localhost
[core@localhost ~]$ hostnamectl
   Static hostname: n/a
Transient hostname: localhost
         Icon name: computer-vm
           Chassis: vm
        Machine ID: d80d3e4ceb6aa4c389d1da421851a772
           Boot ID: 8eb2fc06f49749dd8b3fc55a3eb2c10e
    Virtualization: kvm
  Operating System: Red Hat Enterprise Linux CoreOS 43.81.202003191953.0 (Ootpa)
            Kernel: Linux 4.18.0-147.5.1.el8_1.x86_64
      Architecture: x86-64
```

If the host is rebooted with the NM dispatcher script in place, the hostname is truncated and set correctly:

```
$ sshq -l core 34.70.30.2
Warning: Permanently added '34.70.30.2' (ECDSA) to the list of known hosts.
Red Hat Enterprise Linux CoreOS 43.81.202003191953.0
  Part of OpenShift 4.3, RHCOS is a Kubernetes native operating system                                                                        
  managed by the Machine Config Operator (`clusteroperator/machine-config`).
                                                                       
WARNING: Direct SSH access to machines is not recommended; instead,                                                                           
make configuration changes via `machineconfig` objects:
  https://docs.openshift.com/container-platform/4.3/architecture/architecture-rhcos.html

---
[core@rhcos-4-3-8-long-long-long-long-long-long-long-long-longgggg ~]$ hostname
rhcos-4-3-8-long-long-long-long-long-long-long-long-longgggg.c.o
[core@rhcos-4-3-8-long-long-long-long-long-long-long-long-longgggg ~]$ hostnamectl
   Static hostname: n/a
Transient hostname: rhcos-4-3-8-long-long-long-long-long-long-long-long-longgggg.c.o
         Icon name: computer-vm
           Chassis: vm
        Machine ID: d80d3e4ceb6aa4c389d1da421851a772
           Boot ID: 1db4e4993a9b454ca1f45f9c291ae869
    Virtualization: kvm
  Operating System: Red Hat Enterprise Linux CoreOS 43.81.202003191953.0 (Ootpa)
            Kernel: Linux 4.18.0-147.5.1.el8_1.x86_64
      Architecture: x86-64
[core@rhcos-4-3-8-long-long-long-long-long-long-long-long-longgggg ~]$ rpm-ostree status
State: idle
AutomaticUpdates: disabled
Deployments:
● ostree://624bfc39e8091d69b3c48bb16e85683ff5166dc8b03ac0686753d8e555613b54
                   Version: 43.81.202003191953.0 (2020-03-19T19:59:17Z)
[core@rhcos-4-3-8-long-long-long-long-long-long-long-long-longgggg ~]$ ls /etc/NetworkManager/dispatcher.d/90-long-hostname
/etc/NetworkManager/dispatcher.d/90-long-hostname
[core@rhcos-4-3-8-long-long-long-long-long-long-long-long-longgggg ~]$ head /etc/NetworkManager/dispatcher.d/90-long-hostname
==> /etc/NetworkManager/dispatcher.d/90-long-hostname <==
#!/bin/bash
#
# On Google Compute Platform (GCP) the hostname may be too long (>63 chars).
# During firstboot the hostname is set in the initramfs before NetworkManager
# runs; on reboot affect nodes use 'localhost'. This hook is a simple work
# around: if the host name is longer than 63 characters, then the hostname
# is truncated at the _first_ dot.
#
# Additionally, this hook does not break DNS or cluster DNS resolution, 
# since NetworkManager sets the appropriate /etc/resolv.conf settings.
[core@rhcos-4-3-8-long-long-long-long-long-long-long-long-longgggg ~]$ sudo systemctl reboot
[core@rhcos-4-3-8-long-long-long-long-long-long-long-long-longgggg ~]$ Connection to 34.70.30.2 closed by remote host.
Connection to 34.70.30.2 closed.

$ sshq -l core 34.70.30.2
Warning: Permanently added '34.70.30.2' (ECDSA) to the list of known hosts.
Red Hat Enterprise Linux CoreOS 43.81.202003191953.0
  Part of OpenShift 4.3, RHCOS is a Kubernetes native operating system
  managed by the Machine Config Operator (`clusteroperator/machine-config`).

WARNING: Direct SSH access to machines is not recommended; instead,
make configuration changes via `machineconfig` objects:
  https://docs.openshift.com/container-platform/4.3/architecture/architecture-rhcos.html

---
Last login: Mon Jun  1 17:24:28 2020 from 108.49.19.84
[core@rhcos-4-3-8-long-long-long-long-long-long-long-long-longgggg ~]$ hostname
rhcos-4-3-8-long-long-long-long-long-long-long-long-longgggg
[core@rhcos-4-3-8-long-long-long-long-long-long-long-long-longgggg ~]$ hostnamectl
   Static hostname: n/a
Transient hostname: rhcos-4-3-8-long-long-long-long-long-long-long-long-longgggg
         Icon name: computer-vm
           Chassis: vm
        Machine ID: d80d3e4ceb6aa4c389d1da421851a772
           Boot ID: 23a71fb3813e4680b65abe7ad9f2c009
    Virtualization: kvm
  Operating System: Red Hat Enterprise Linux CoreOS 43.81.202003191953.0 (Ootpa)
            Kernel: Linux 4.18.0-147.5.1.el8_1.x86_64
      Architecture: x86-64
```

If the MCO image for 4.3-nightly is unpacked, we see the NM dispatcher script in the templates

```
$ oc image info --output json $(oc adm release info -a ~/openshift-cluster-installs/all-the-pull-secrets.json --image-for=machine-config-operator registry.svc.ci.openshift.org/ocp/release:4.3.0-0.nightly-2020-06-01-043839) | jq .name
"quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:91dfc55028ca96f2664f28a07a64218025f7cb991c6e5df3e0e3dc02c9fa6d0d"

$ sudo podman pull --authfile ~/openshift-cluster-installs/all-the-pull-secrets.json quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:91dfc55028ca96f2664f28a07a64218025f7cb991c6e5df3e0e3dc02c9fa6d0d
Trying to pull quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:91dfc55028ca96f2664f28a07a64218025f7cb991c6e5df3e0e3dc02c9fa6d0d...
Getting image source signatures
Copying blob 82a8f4ea76cb done  
Copying blob 0d3fd00c1dbb done  
Copying blob a3ac36470b00 done  
Copying blob 838ac44155c1 done  
Copying blob 9f7926a96396 done  
Copying config 53a7d6ab76 done  
Writing manifest to image destination
Storing signatures
53a7d6ab763ecf4d2ec02487614ebb0ccc516bdb2eb6234b99f0b9609aa86b20
[miabbott@mastershake ~/redhat-coreos ]$ ctr=$(sudo podman create 53a7d6ab763ecf4d2ec02487614ebb0ccc516bdb2eb6234b99f0b9609aa86b20)
[miabbott@mastershake ~/redhat-coreos ]$ mnt=$(sudo podman mount $ctr)
[miabbott@mastershake ~/redhat-coreos ]$ sudo ls $mnt/etc/mcc/templates/common/_base/files/
additional-trust-bundle.yaml  container-storage.yaml                                      etc-systemd-system-kubelet.service.d-10-default-env.conf.yaml                     kubelet-ca.yaml     root-ca.yaml              volume-plugins.yaml
cleanup-cni-conf.yaml         etc-networkmanager-dispatcher.d-90-long-hostname.yaml       etc-systemd-system-machine-config-daemon-host.service.d-10-default-env.conf.yaml  nm-ignore-sdn.yaml  sysctl-forward-conf.yaml
cloud-provider-ca.yaml        etc-systemd-system-crio.service.d-10-default-env.conf.yaml  etc-systemd-system-pivot.service.d-10-default-env.conf.yaml                       pull-secret.yaml    sysctl-inotify.conf.yaml
[miabbott@mastershake ~/redhat-coreos ]$ sudo head $mnt/etc/mcc/templates/common/_base/files/etc-networkmanager-dispatcher.d-90-long-hostname.yaml
filesystem: "root"
mode: 0755
path: "/etc/NetworkManager/dispatcher.d/90-long-hostname"
contents:
  inline: |
    #!/bin/bash
    #
    # On Google Compute Platform (GCP) the hostname may be too long (>63 chars).
    # During firstboot the hostname is set in the initramfs before NetworkManager
    # runs; on reboot affect nodes use 'localhost'. This hook is a simple work
```

Marking verified with 4.3.0-0.nightly-2020-06-01-043839

Comment 9 errata-xmlrpc 2020-06-17 20:28:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2436


Note You need to log in before you can comment on or make changes to this bug.