Bug 1837122 - OpenShift Cluster fails to initialize on 4.3.z install due to a node with a hostname of localhost
Summary: OpenShift Cluster fails to initialize on 4.3.z install due to a node with a h...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: RHCOS
Version: 4.3.z
Hardware: x86_64
OS: Unspecified
high
high
Target Milestone: ---
: 4.4.z
Assignee: Ben Howard
QA Contact: Michael Nguyen
URL:
Whiteboard:
: 1823882 (view as bug list)
Depends On: 1809345
Blocks: 1837124 1837125
TreeView+ depends on / blocked
 
Reported: 2020-05-18 21:26 UTC by OpenShift BugZilla Robot
Modified: 2020-06-10 16:44 UTC (History)
14 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-06-02 11:18:33 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Github openshift cluster-api-provider-gcp pull 91 None closed Bug 1837122: Add the machine's name as a known NodeInternalDNS 2020-07-16 11:23:01 UTC
Github openshift machine-config-operator pull 1737 None closed Bug 1837122: templates: add etc-networkmanager-dispatcher.d-90-long-hostname.yaml 2020-07-16 11:23:00 UTC
Red Hat Product Errata RHBA-2020:2310 None None None 2020-06-02 11:19:09 UTC

Comment 3 Micah Abbott 2020-05-28 18:22:58 UTC
Similar to BZ#1809345, I didn't have access to an environment where I was able to create an extremely long project name, so I verified the fix using a single RHCOS node in GCP with an extremely long hostname.  Additionally, since the fixed template is only delivered when the MCO pod is running on the host, I had to craft an Ignition snippet with the dispatcher script to simulate the fixed template being used.

Ignition config:

```
$ jq . < ignition-bz1837122.json 
{
  "ignition": {
    "version": "2.2.0"
  },
  "passwd": {
    "users": [
      {
        "groups": [
          "sudo",
          "wheel",
          "adm",
          "systemd-journal"
        ],
        "name": "core",
        "passwordHash": "$6$xyz....",
        "sshAuthorizedKeys": [
          "ssh-rsa AAAAB3...."
        ]
      }
    ]
  },
  "storage": {
    "files": [
      {
        "filesystem": "root",
        "path": "/etc/NetworkManager/dispatcher.d/90-long-hostname",
        "mode": 493,
        "contents": {
          "source": "data:;base64,IyEvYmluL2Jhc2gKIwojIE9uIEdvb2dsZSBDb21wdXRlIFBsYXRmb3JtIChHQ1ApIHRoZSBob3N0bmFtZSBtYXkgYmUgdG9vIGxvbmcgKD42MyBjaGFycykuCiMgRHVyaW5nIGZpcnN0Ym9vdCB0aGUgaG9zdG5hbWUgaXMgc2V0IGluIHRoZSBpbml0cmFtZnMgYmVmb3JlIE5ldHdvcmtNYW5hZ2VyCiMgcnVuczsgb24gcmVib290IGFmZmVjdCBub2RlcyB1c2UgJ2xvY2FsaG9zdCcuIFRoaXMgaG9vayBpcyBhIHNpbXBsZSB3b3JrCiMgYXJvdW5kOiBpZiB0aGUgaG9zdCBuYW1lIGlzIGxvbmdlciB0aGFuIDYzIGNoYXJhY3RlcnMsIHRoZW4gdGhlIGhvc3RuYW1lCiMgaXMgdHJ1bmNhdGVkIGF0IHRoZSBfZmlyc3RfIGRvdC4KIwojIEFkZGl0aW9uYWxseSwgdGhpcyBob29rIGRvZXMgbm90IGJyZWFrIEROUyBvciBjbHVzdGVyIEROUyByZXNvbHV0aW9uLAojIHNpbmNlIE5ldHdvcmtNYW5hZ2VyIHNldHMgdGhlIGFwcHJvcHJpYXRlIC9ldGMvcmVzb2x2LmNvbmYgc2V0dGluZ3MuCklGPSQxClNUQVRVUz0kMgpsb2coKSB7IGxvZ2dlciAtLXRhZyAibmV0d29yay1tYW5hZ2VyLyQoYmFzZW5hbWUgJDApIiAiJHtAfSI7IH0KIyBjYXB0dXJlIGFsbCBlbGlnaWJsZSBob3N0bmFtZXMKaWYgW1sgISAiJCgvYmluL2hvc3RuYW1lKSIgPX4gKGxvY2FsaG9zdHxsb2NhbGhvc3QubG9jYWwpIF1dOyB0aGVuCiAgICBsb2cgImhvc3RuYW1lIGlzIGFscmVhZHkgc2V0IgogICAgZXhpdCAwCmZpCmlmIFtbICEgIiRTVEFUVVMiID1+ICh1cHxob3N0bmFtZXxkaGNwNC1jaGFuZ2V8ZGhjcDYtY2hhbmdlKSBdXTsgdGhlbgogICAgZXhpdCAwCmZpCmRlZmF1bHRfaG9zdD0iJHtESENQNF9IT1NUX05BTUU6LSRESENQNl9IT1NUX05BTUV9IgojIHRydW5jYXRlIHRoZSBob3N0bmFtZSB0byB0aGUgZmlyc3QgZG90IGFuZCB0aGFuIDY0IGNoYXJhY3RlcnMuCmhvc3Q9JChlY2hvICR7ZGVmYXVsdF9ob3N0fSB8IGN1dCAtZjEgLWQnLicgfCBjdXQgLWMgLTYzKQppZiBbICIkeyNkZWZhdWx0X2hvc3R9IiAtZ3QgNjMgXTsgdGhlbgogICAgbG9nICJkaXNjb3ZlcmVkIGhvc3RuYW1lIGlzIGxvbmdlciB0aGFuIHRoYW4gNjMgY2hhcmFjdGVycyIKICAgIGxvZyAidHJ1bmNhdGluZyAke2RlZmF1bHRfaG9zdH0gPT4gJHtob3N0fSIKICAgIC9iaW4vaG9zdG5hbWVjdGwgLS10cmFuc2llbnQgc2V0LWhvc3RuYW1lICIke2hvc3R9IgpmaQo="
        }
      }
    ]
  }
}
```

Instructions for booting a single RHCOS node:

```
$ gsutil cp rhcos-4.4.3-x86_64-gcp.x86_64.tar.gz gs://rhcos-devel/devel/rhcos/
$ gcloud compute images create rhcos-4-4-3 --project=openshift-rhcos-devel  --source-uri=https://storage.googleapis.com/rhcos-devel/devel/rhcos/rhcos-4.4.3-x86_64-gcp.x86_64.tar.gz
$ longname="rhcos-4-4-3-long-long-long-long-long-long-long-long-longgggg"
$ echo $longname | wc -c
61
$ gcloud compute --project=openshift-rhcos-devel instances create $longname --zone=us-central1-a --machine-type=n1-standard-1 --subnet=default --network-tier=PREMIUM --maintenance-policy=MIGRATE --service-account=439519768758-compute@developer.gserviceaccount.com --scopes=https://www.googleapis.com/auth/devstorage.read_only,https://www.googleapis.com/auth/logging.write,https://www.googleapis.com/auth/monitoring.write,https://www.googleapis.com/auth/servicecontrol,https://www.googleapis.com/auth/service.management.readonly,https://www.googleapis.com/auth/trace.append --image=rhcos-4-4-3 --boot-disk-size=16GB --boot-disk-type=pd-standard --boot-disk-device-name=rhcos-4-4-3-vm0527 --image-project=openshift-rhcos-devel --metadata-from-file=user-data=/var/home/miabbott/Downloads/ignition-bz1837122.json
```

Error is observed using 4.4.3 image without the NM dispatcher script.  We get the right hostname on first boot, but on reboot, it reverts to localhost:

```
$ sshq -l core 35.239.130.112
Warning: Permanently added '35.239.130.112' (ECDSA) to the list of known hosts.
Red Hat Enterprise Linux CoreOS 44.81.202004260825-0
  Part of OpenShift 4.4, RHCOS is a Kubernetes native operating system
  managed by the Machine Config Operator (`clusteroperator/machine-config`).

WARNING: Direct SSH access to machines is not recommended; instead,
make configuration changes via `machineconfig` objects:
  https://docs.openshift.com/container-platform/4.4/architecture/architecture-rhcos.html

---

[core@rhcos-4-4-3-long-long-long-long-long-long-long-long-longgggg ~]$ hostname
rhcos-4-4-3-long-long-long-long-long-long-long-long-longgggg.c.o
[core@rhcos-4-4-3-long-long-long-long-long-long-long-long-longgggg ~]$ hostnamectl
   Static hostname: n/a
Transient hostname: rhcos-4-4-3-long-long-long-long-long-long-long-long-longgggg.c.o
         Icon name: computer-vm
           Chassis: vm
        Machine ID: 6c2835e2fe0b9e7c6f4aeb31433d56db
           Boot ID: 6c6112ec83be4602b15308a236383340
    Virtualization: kvm
  Operating System: Red Hat Enterprise Linux CoreOS 44.81.202004260825-0 (Ootpa)
            Kernel: Linux 4.18.0-147.8.1.el8_1.x86_64
      Architecture: x86-64
[core@rhcos-4-4-3-long-long-long-long-long-long-long-long-longgggg ~]$ ls /etc/NetworkManager/dispatcher.d/90-long-hostname 
ls: cannot access '/etc/NetworkManager/dispatcher.d/91-long-hostname': No such file or directory
[core@rhcos-4-4-3-long-long-long-long-long-long-long-long-longgggg ~]$ sudo systemctl reboot
[core@rhcos-4-4-3-long-long-long-long-long-long-long-long-longgggg ~]$ Connection to 35.239.130.112 closed by remote host.
Connection to 35.239.130.112 closed.

$ sshq -l core 35.239.130.112
Warning: Permanently added '35.239.130.112' (ECDSA) to the list of known hosts.
Red Hat Enterprise Linux CoreOS 44.81.202004260825-0
  Part of OpenShift 4.4, RHCOS is a Kubernetes native operating system
  managed by the Machine Config Operator (`clusteroperator/machine-config`).

WARNING: Direct SSH access to machines is not recommended; instead,
make configuration changes via `machineconfig` objects:
  https://docs.openshift.com/container-platform/4.4/architecture/architecture-rhcos.html

---
Last login: Thu May 28 17:43:11 2020 from 108.49.19.84
[core@localhost ~]$ hostname 
localhost
[core@localhost ~]$ hostnamectl
   Static hostname: n/a
Transient hostname: localhost
         Icon name: computer-vm
           Chassis: vm
        Machine ID: 6c2835e2fe0b9e7c6f4aeb31433d56db
           Boot ID: e6e65b5e45fe4792b7f03328a5b07ead
    Virtualization: kvm
  Operating System: Red Hat Enterprise Linux CoreOS 44.81.202004260825-0 (Ootpa)
            Kernel: Linux 4.18.0-147.8.1.el8_1.x86_64
      Architecture: x86-64
[core@localhost ~]$ 
```

If the host is rebooted with the NM dispatcher script in place, the hostname is truncated and set correctly:

```
$ sshq -l core 35.239.130.112
Warning: Permanently added '35.239.130.112' (ECDSA) to the list of known hosts.
Red Hat Enterprise Linux CoreOS 44.81.202004260825-0
  Part of OpenShift 4.4, RHCOS is a Kubernetes native operating system
  managed by the Machine Config Operator (`clusteroperator/machine-config`).

WARNING: Direct SSH access to machines is not recommended; instead,
make configuration changes via `machineconfig` objects:
  https://docs.openshift.com/container-platform/4.4/architecture/architecture-rhcos.html

---
[core@rhcos-4-4-3-long-long-long-long-long-long-long-long-longgggg ~]$ ls /etc/NetworkManager/dispatcher.d/
04-iscsi          11-dhclient       20-chrony         90-long-hostname  no-wait.d/        pre-down.d/       pre-up.d/         
[core@rhcos-4-4-3-long-long-long-long-long-long-long-long-longgggg ~]$ ls /etc/NetworkManager/dispatcher.d/90-long-hostname 
/etc/NetworkManager/dispatcher.d/90-long-hostname
[core@rhcos-4-4-3-long-long-long-long-long-long-long-long-longgggg ~]$ hostname
rhcos-4-4-3-long-long-long-long-long-long-long-long-longgggg.c.o
[core@rhcos-4-4-3-long-long-long-long-long-long-long-long-longgggg ~]$ hostnamectl
   Static hostname: n/a
Transient hostname: rhcos-4-4-3-long-long-long-long-long-long-long-long-longgggg.c.o
         Icon name: computer-vm
           Chassis: vm
        Machine ID: 6c2835e2fe0b9e7c6f4aeb31433d56db
           Boot ID: d5b323e8a6b74983b202a8f6331b8162
    Virtualization: kvm
  Operating System: Red Hat Enterprise Linux CoreOS 44.81.202004260825-0 (Ootpa)
            Kernel: Linux 4.18.0-147.8.1.el8_1.x86_64
      Architecture: x86-64
[core@rhcos-4-4-3-long-long-long-long-long-long-long-long-longgggg ~]$ sudo systemctl reboot
[core@rhcos-4-4-3-long-long-long-long-long-long-long-long-longgggg ~]$ Connection to 35.239.130.112 closed by remote host.
Connection to 35.239.130.112 closed.

$ sshq -l core 35.239.130.112
Warning: Permanently added '35.239.130.112' (ECDSA) to the list of known hosts.
Red Hat Enterprise Linux CoreOS 44.81.202004260825-0
  Part of OpenShift 4.4, RHCOS is a Kubernetes native operating system
  managed by the Machine Config Operator (`clusteroperator/machine-config`).

WARNING: Direct SSH access to machines is not recommended; instead,
make configuration changes via `machineconfig` objects:
  https://docs.openshift.com/container-platform/4.4/architecture/architecture-rhcos.html

---
Last login: Thu May 28 18:10:46 2020 from 108.49.19.84
[core@rhcos-4-4-3-long-long-long-long-long-long-long-long-longgggg ~]$ hostname
rhcos-4-4-3-long-long-long-long-long-long-long-long-longgggg
[core@rhcos-4-4-3-long-long-long-long-long-long-long-long-longgggg ~]$ hostnamectl
   Static hostname: n/a
Transient hostname: rhcos-4-4-3-long-long-long-long-long-long-long-long-longgggg
         Icon name: computer-vm
           Chassis: vm
        Machine ID: 6c2835e2fe0b9e7c6f4aeb31433d56db
           Boot ID: 88cdd4dbc38b47159ec370a3f13f4bff
    Virtualization: kvm
  Operating System: Red Hat Enterprise Linux CoreOS 44.81.202004260825-0 (Ootpa)
            Kernel: Linux 4.18.0-147.8.1.el8_1.x86_64
      Architecture: x86-64
[core@rhcos-4-4-3-long-long-long-long-long-long-long-long-longgggg ~]$ 
```

Additionally, if the MCO image in 4.4.6 is unpacked, we can see the NM dispatcher script included:

```
$ oc image info --output json $(oc adm release info -a ~/openshift-cluster-installs/all-the-pull-secrets.json --image-for=machine-config-operator quay.io/openshift-release-dev/ocp-release:4.4.6-x86_64) | jq .name
"quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:5a2ea35bed93801b12180aa8ddceea441d5f052ec105b0955aa2a0075cba2109"

$ sudo podman pull --authfile ~/openshift-cluster-installs/all-the-pull-secrets.json quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:5a2ea35bed93801b12180aa8ddceea441d5f052ec105b0955aa2a0075cba2109
Trying to pull quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:5a2ea35bed93801b12180aa8ddceea441d5f052ec105b0955aa2a0075cba2109...
Getting image source signatures
Copying blob 7eb8d240ca4d done  
Copying blob a3ac36470b00 done  
Copying blob 51fd24847e78 done  
Copying blob e1a6856f83e7 done  
Copying blob 82a8f4ea76cb done  
Copying config db9815ebcd done  
Writing manifest to image destination
Storing signatures
db9815ebcd397d8a9c57dcac689ac26bf070b8c54f17965f7647a12b29e4c79e
$ ctr=$(sudo podman create quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:5a2ea35bed93801b12180aa8ddceea441d5f052ec105b0955aa2a0075cba2109)
$ mnt=$(sudo podman mount $ctr)
$ sudo head $mnt/etc/mcc/templates/common/_base/files/etc-networkmanager-dispatcher.d-90-long-hostname.yaml
filesystem: "root"
mode: 0755
path: "/etc/NetworkManager/dispatcher.d/90-long-hostname"
contents:
  inline: |
    #!/bin/bash
    #
    # On Google Compute Platform (GCP) the hostname may be too long (>63 chars).
    # During firstboot the hostname is set in the initramfs before NetworkManager
    # runs; on reboot affect nodes use 'localhost'. This hook is a simple work
```

Marking as VERIFIED with 4.4.6

Comment 5 errata-xmlrpc 2020-06-02 11:18:33 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2310

Comment 6 Ben Howard 2020-06-10 16:44:40 UTC
*** Bug 1823882 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.