1899187 – [Openstack] node-valid-hostname.service failes during the first boot leading to 5 minute provisioning delay

Bug 1899187 - [Openstack] node-valid-hostname.service failes during the first boot leading to 5 minute provisioning delay

Summary: [Openstack] node-valid-hostname.service failes during the first boot leading ...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Machine Config Operator
Sub Component:
Version:	4.6
Hardware:	Unspecified
OS:	Unspecified
Priority:	low
Severity:	low
Target Milestone:	---
Target Release:	4.8.0
Assignee:	Mike Fedosin
QA Contact:	Michael Nguyen
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-11-18 17:00 UTC by Robert Heinzmann
Modified:	2024-06-13 23:27 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Cause: A race condition between setting of the hostname and enabling networking. Consequence: If the network started before the name was set the node couldn't join the cluster and waited 5 minutes to try again. Fix: Explicitly set the hostname before the network is up. Result: The node automatically joins the cluster from the first attempt.
Clone Of:
Environment:
Last Closed:	2021-07-27 22:34:24 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Console Log Initial Boot (boot -1) (98.08 KB, text/plain) 2020-11-18 17:00 UTC, Robert Heinzmann	no flags	Details
Console Log Initial Boot (boot 0) (97.84 KB, text/plain) 2020-11-18 17:00 UTC, Robert Heinzmann	no flags	Details
Journal Log Initial Boot (boot -1) (231.77 KB, text/plain) 2020-11-18 17:01 UTC, Robert Heinzmann	no flags	Details
Journal Log Initial Boot (boot 0) (4.74 MB, text/plain) 2020-11-18 17:03 UTC, Robert Heinzmann	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift machine-config-operator pull 2516	0	None	open	Bug 1899187: run afterburn-hostname only when Network Manager is up	2021-04-07 15:06:08 UTC
Red Hat Product Errata	RHSA-2021:2438	0	None	None	None	2021-07-27 22:34:53 UTC

Description Robert Heinzmann 2020-11-18 17:00:13 UTC

Created attachment 1730611 [details]
Console Log Initial Boot (boot -1)

Description of problem:

When Deploying OpenShift 4.6.3 on OpenStack (16.1) using the IPI installer and then scaling nodes, the provisioning of new nodes takes a very long time and errors are reported in the boot log. 

It looks like during the initial boot (boot -1), the node-valid-hostname.service failes after a timeout 5 minutes.

This also has been reported by the customer with OpenShift 4.5.

System comes up afterwards and is working, however provisioning is very slow leading to openshift-installer timeouts.

Console LOGS as well as Journal Logs attached.

Version-Release number of selected component (if applicable):

OpenShift 4.6.3 IPI
OpenStack 16.1

How reproducible:

Always

Steps to Reproduce:
1. Deploy ShiftStack IPI 4.6.3
2. Scale up a node 
3. Watch the deployment time and logs

Actual results:

Node comes up after long time (10min++). Logs indicate the node-valid-hostname.service service failes during the initial boot (boot -1). 

~~~
[systemd]
Failed Units: 1
  node-valid-hostname.service
[core@localhost ~]$ sudo journalctl --list-boots
 0 bd90c5ed16384332831eb02a524f4e52 Wed 2020-11-18 16:20:56 UTC—Wed 2020-11-18 16:29:18 UTC
~~~

Logs: 

~~~
Nov 18 16:26:51 localhost systemd[1]: node-valid-hostname.service: Start operation timed out. Terminating.
Nov 18 16:26:51 localhost systemd[1]: node-valid-hostname.service: Main process exited, code=killed, status=15/TERM
Nov 18 16:26:51 localhost systemd[1]: node-valid-hostname.service: Failed with result 'timeout'.
~~~

Expected results:

Node comes up quickly. No services failed. No error messages in boot log.

Additional info:

[stack@osp16 ocp-test1]$ date
Wed Nov 18 16:19:46 UTC 2020

[stack@osp16 ocp-test1]$ oc scale machineset ocp-qmhzz-worker-0 -n openshift-machine-api --replicas=3
machineset.machine.openshift.io/ocp-qmhzz-worker-0 scaled

[stack@osp16 ocp-test1]$ oc get machine -n openshift-machine-api
NAME                       PHASE          TYPE               REGION      ZONE   AGE
ocp-qmhzz-master-0         Running        openshift.master   regionOne   nova   17d
ocp-qmhzz-master-1         Running        openshift.master   regionOne   nova   17d
ocp-qmhzz-master-2         Running        openshift.master   regionOne   nova   17d
ocp-qmhzz-worker-0-d99t9   Provisioning                                         19s
ocp-qmhzz-worker-0-vgqvs   Running        openshift.worker   regionOne   nova   17h
ocp-qmhzz-worker-0-xh6b8   Running        openshift.worker   regionOne   nova   17d
[stack@osp16 ocp-test1]$ openstack server list
+--------------------------------------+--------------------------+--------+-------------------------------------+-----------------+------------------+
| ID                                   | Name                     | Status | Networks                            | Image           | Flavor           |
+--------------------------------------+--------------------------+--------+-------------------------------------+-----------------+------------------+
| cf0d46b2-1dc9-4759-94c2-c49fcb05b276 | ocp-qmhzz-worker-0-d99t9 | BUILD  |                                     | ocp-qmhzz-rhcos | openshift.worker |
| 5e2a9e8a-7548-4359-ad56-13931d681118 | ocp-qmhzz-worker-0-vgqvs | ACTIVE | ocp-qmhzz-openshift=192.168.150.225 | ocp-qmhzz-rhcos | openshift.worker |
| da5918a7-83f1-481a-91cf-94bc8e29957e | ocp-qmhzz-worker-0-xh6b8 | ACTIVE | ocp-qmhzz-openshift=192.168.150.202 | ocp-qmhzz-rhcos | openshift.worker |
| 23986685-969c-4477-acf7-14d0760d6d71 | ocp-qmhzz-master-2       | ACTIVE | ocp-qmhzz-openshift=192.168.150.238 | ocp-qmhzz-rhcos | openshift.master |
| 638e3c83-a592-4696-9a6e-dac5cd81fbab | ocp-qmhzz-master-1       | ACTIVE | ocp-qmhzz-openshift=192.168.150.120 | ocp-qmhzz-rhcos | openshift.master |
| 4e79c91b-f50f-41a1-a41b-53ed93e46d5f | ocp-qmhzz-master-0       | ACTIVE | ocp-qmhzz-openshift=192.168.150.83  | ocp-qmhzz-rhcos | openshift.master |
+--------------------------------------+--------------------------+--------+-------------------------------------+-----------------+------------------+

[stack@osp16 ocp-test1]$ date
Wed Nov 18 16:20:21 UTC 2020

[stack@osp16 ocp-test1]$ openstack server list
+--------------------------------------+--------------------------+--------+-------------------------------------+-----------------+------------------+
| ID                                   | Name                     | Status | Networks                            | Image           | Flavor           |
+--------------------------------------+--------------------------+--------+-------------------------------------+-----------------+------------------+
| cf0d46b2-1dc9-4759-94c2-c49fcb05b276 | ocp-qmhzz-worker-0-d99t9 | ACTIVE | ocp-qmhzz-openshift=192.168.150.36  | ocp-qmhzz-rhcos | openshift.worker |
| 5e2a9e8a-7548-4359-ad56-13931d681118 | ocp-qmhzz-worker-0-vgqvs | ACTIVE | ocp-qmhzz-openshift=192.168.150.225 | ocp-qmhzz-rhcos | openshift.worker |
| da5918a7-83f1-481a-91cf-94bc8e29957e | ocp-qmhzz-worker-0-xh6b8 | ACTIVE | ocp-qmhzz-openshift=192.168.150.202 | ocp-qmhzz-rhcos | openshift.worker |
| 23986685-969c-4477-acf7-14d0760d6d71 | ocp-qmhzz-master-2       | ACTIVE | ocp-qmhzz-openshift=192.168.150.238 | ocp-qmhzz-rhcos | openshift.master |
| 638e3c83-a592-4696-9a6e-dac5cd81fbab | ocp-qmhzz-master-1       | ACTIVE | ocp-qmhzz-openshift=192.168.150.120 | ocp-qmhzz-rhcos | openshift.master |
| 4e79c91b-f50f-41a1-a41b-53ed93e46d5f | ocp-qmhzz-master-0       | ACTIVE | ocp-qmhzz-openshift=192.168.150.83  | ocp-qmhzz-rhcos | openshift.master |
+--------------------------------------+--------------------------+--------+-------------------------------------+-----------------+------------------+

[stack@osp16 ocp-test1]$ (date; openstack console log show ocp-qmhzz-worker-0-d99t9) > consolelog.boot.-1.log

[core@ocp-qmhzz-master-0 ~]$ date
Wed Nov 18 16:23:50 UTC 2020
[core@ocp-qmhzz-master-0 ~]$ ssh 192.168.150.36
ssh: connect to host 192.168.150.36 port 22: Connection refused


[core@ocp-qmhzz-master-0 ~]$ ssh 192.168.150.36
The authenticity of host '192.168.150.36 (192.168.150.36)' can't be established.
ECDSA key fingerprint is SHA256:UlKhyQD9PXa0RaxgKxFpDHFMxLLdHFp5u19pPa0fRDU.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added '192.168.150.36' (ECDSA) to the list of known hosts.
Red Hat Enterprise Linux CoreOS 46.82.202010011740-0
  Part of OpenShift 4.6, RHCOS is a Kubernetes native operating system
  managed by the Machine Config Operator (`clusteroperator/machine-config`).

WARNING: Direct SSH access to machines is not recommended; instead,
make configuration changes via `machineconfig` objects:
  https://docs.openshift.com/container-platform/4.6/architecture/architecture-rhcos.html

---
[systemd]
Failed Units: 1
  node-valid-hostname.service
[core@localhost ~]$ sudo journalctl --list-boots
 0 bd90c5ed16384332831eb02a524f4e52 Wed 2020-11-18 16:20:56 UTC—Wed 2020-11-18 16:29:18 UTC


[stack@osp16 ocp-test1]$ date
Wed Nov 18 16:31:08 UTC 2020
[stack@osp16 ocp-test1]$ oc get machine -n openshift-machine-api -o wide
NAME                       PHASE         TYPE               REGION      ZONE   AGE   NODE                       PROVIDERID   STATE
ocp-qmhzz-master-0         Running       openshift.master   regionOne   nova   17d   ocp-qmhzz-master-0                      ACTIVE
ocp-qmhzz-master-1         Running       openshift.master   regionOne   nova   17d   ocp-qmhzz-master-1                      ACTIVE
ocp-qmhzz-master-2         Running       openshift.master   regionOne   nova   17d   ocp-qmhzz-master-2                      ACTIVE
ocp-qmhzz-worker-0-d99t9   Provisioned   openshift.worker   regionOne   nova   11m                                           ACTIVE
ocp-qmhzz-worker-0-vgqvs   Running       openshift.worker   regionOne   nova   17h   ocp-qmhzz-worker-0-vgqvs                ACTIVE
ocp-qmhzz-worker-0-xh6b8   Running       openshift.worker   regionOne   nova   17d   ocp-qmhzz-worker-0-xh6b8                ACTIVE
[stack@osp16 ocp-test1]$ oc get node
NAME                       STATUS   ROLES    AGE   VERSION
ocp-qmhzz-master-0         Ready    master   17d   v1.19.0+9f84db3
ocp-qmhzz-master-1         Ready    master   17d   v1.19.0+9f84db3
ocp-qmhzz-master-2         Ready    master   17d   v1.19.0+9f84db3
ocp-qmhzz-worker-0-vgqvs   Ready    worker   17h   v1.19.0+9f84db3
ocp-qmhzz-worker-0-xh6b8   Ready    worker   17d   v1.19.0+9f84db3


[core@ocp-qmhzz-master-0 ~]$ ssh 192.168.150.36
Red Hat Enterprise Linux CoreOS 46.82.202010301241-0
  Part of OpenShift 4.6, RHCOS is a Kubernetes native operating system
  managed by the Machine Config Operator (`clusteroperator/machine-config`).

WARNING: Direct SSH access to machines is not recommended; instead,
make configuration changes via `machineconfig` objects:
  https://docs.openshift.com/container-platform/4.6/architecture/architecture-rhcos.html

---
[core@ocp-qmhzz-worker-0-d99t9 ~]$ sudo journalctl --list-boots
-1 bd90c5ed16384332831eb02a524f4e52 Wed 2020-11-18 16:20:56 UTC—Wed 2020-11-18 16:29:45 UTC
 0 83abe93617994fe2a5a7e2bbf5bead13 Wed 2020-11-18 16:30:20 UTC—Wed 2020-11-18 16:31:51 UTC

[stack@osp16 ocp-test1]$ oc get csr
NAME        AGE    SIGNERNAME                                    REQUESTOR                                                                   CONDITION
csr-8p9w2   105s   kubernetes.io/kube-apiserver-client-kubelet   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Approved,Issued
csr-krv28   95s    kubernetes.io/kubelet-serving                 system:node:ocp-qmhzz-worker-0-d99t9                                        Approved,Issued
csr-ls9hr   94m    kubernetes.io/kubelet-serving                 system:node:ocp-qmhzz-worker-0-rwsgd                                        Approved,Issued
csr-ttrcz   94m    kubernetes.io/kube-apiserver-client-kubelet   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Approved,Issued
[stack@osp16 ocp-test1]$ oc get machine -n openshift-machine-api -o wide
NAME                       PHASE     TYPE               REGION      ZONE   AGE   NODE                       PROVIDERID   STATE
ocp-qmhzz-master-0         Running   openshift.master   regionOne   nova   17d   ocp-qmhzz-master-0                      ACTIVE
ocp-qmhzz-master-1         Running   openshift.master   regionOne   nova   17d   ocp-qmhzz-master-1                      ACTIVE
ocp-qmhzz-master-2         Running   openshift.master   regionOne   nova   17d   ocp-qmhzz-master-2                      ACTIVE
ocp-qmhzz-worker-0-d99t9   Running   openshift.worker   regionOne   nova   14m   ocp-qmhzz-worker-0-d99t9                ACTIVE
ocp-qmhzz-worker-0-vgqvs   Running   openshift.worker   regionOne   nova   17h   ocp-qmhzz-worker-0-vgqvs                ACTIVE
ocp-qmhzz-worker-0-xh6b8   Running   openshift.worker   regionOne   nova   17d   ocp-qmhzz-worker-0-xh6b8                ACTIVE
[stack@osp16 ocp-test1]$ oc get node
NAME                       STATUS     ROLES    AGE    VERSION
ocp-qmhzz-master-0         Ready      master   17d    v1.19.0+9f84db3
ocp-qmhzz-master-1         Ready      master   17d    v1.19.0+9f84db3
ocp-qmhzz-master-2         Ready      master   17d    v1.19.0+9f84db3
ocp-qmhzz-worker-0-d99t9   NotReady   worker   107s   v1.19.0+9f84db3
ocp-qmhzz-worker-0-vgqvs   Ready      worker   17h    v1.19.0+9f84db3
ocp-qmhzz-worker-0-xh6b8   Ready      worker   17d    v1.19.0+9f84db3

[stack@osp16 ocp-test1]$ date
Wed Nov 18 16:34:27 UTC 2020


[stack@osp16 ocp-test1]$ date
Wed Nov 18 16:36:25 UTC 2020
[stack@osp16 ocp-test1]$ oc get node
NAME                       STATUS   ROLES    AGE    VERSION
ocp-qmhzz-master-0         Ready    master   17d    v1.19.0+9f84db3
ocp-qmhzz-master-1         Ready    master   17d    v1.19.0+9f84db3
ocp-qmhzz-master-2         Ready    master   17d    v1.19.0+9f84db3
ocp-qmhzz-worker-0-d99t9   Ready    worker   4m3s   v1.19.0+9f84db3
ocp-qmhzz-worker-0-vgqvs   Ready    worker   17h    v1.19.0+9f84db3
ocp-qmhzz-worker-0-xh6b8   Ready    worker   17d    v1.19.0+9f84db3

[stack@osp16 ocp-test1]$ (date; openstack console log show ocp-qmhzz-worker-0-d99t9) > consolelog.boot.0.log

[stack@osp16 ocp-test1]$ oc debug node/ocp-qmhzz-worker-0-d99t9 -- chroot /host journalctl --list-boots
-1 bd90c5ed16384332831eb02a524f4e52 Wed 2020-11-18 16:20:56 UTC—Wed 2020-11-18 16:29:45 UTC
 0 83abe93617994fe2a5a7e2bbf5bead13 Wed 2020-11-18 16:30:20 UTC—Wed 2020-11-18 16:37:59 UTC


[stack@osp16 ocp-test1]$ oc debug node/ocp-qmhzz-worker-0-d99t9 -- chroot /host journalctl -b -1 > boot.-1.log 
[stack@osp16 ocp-test1]$ oc debug node/ocp-qmhzz-worker-0-d99t9 -- chroot /host journalctl -b 0 > boot.0.log

Comment 1 Robert Heinzmann 2020-11-18 17:00:45 UTC

Created attachment 1730612 [details]
Console Log Initial Boot (boot 0)

Comment 2 Robert Heinzmann 2020-11-18 17:01:20 UTC

Created attachment 1730613 [details]
Journal Log Initial Boot (boot -1)

Comment 3 Robert Heinzmann 2020-11-18 17:03:14 UTC

Created attachment 1730614 [details]
Journal Log Initial Boot (boot 0)

Comment 5 Robert Heinzmann 2020-11-23 19:39:41 UTC

It seems the reason for the timeout is a problem on OpenStack with OVN. It seems OVN is not handing out the "host-name" dhcp option [1], which can also be seen in the logs:

OSP16.1 with OVN (no dhcp option for hostname):

~~~
sh-4.4# journalctl -b -1 | grep dhcp4 | grep -e "option host_name" -e "option ip_address"
Nov 23 18:08:22 localhost NetworkManager[745]: <info>  [1606154902.7909] dhcp4 (ens3): option ip_address           => '192.168.150.167'
Nov 23 18:08:35 localhost NetworkManager[1651]: <info>  [1606154915.1373] dhcp4 (ens3): option ip_address           => '192.168.150.167'

sh-4.4# journalctl -b -1 | awk '{print $4}' | uniq -c
      1 at
   1776 localhost
    362 ocp-99l7h-worker-0-g7hxm
~~~

QuickLab 4.5 (possibly not OVN):

~~~
sh-4.4# journalctl -b -1 | grep dhcp4 | grep -e "option host_name" -e "option ip_address"
Nov 22 06:50:00 worker-0.sharedocp4upi45.xxx.xxxx.xxx.xxx NetworkManager[1635]: <info>  [1606027800.0301] dhcp4 (ens3): option host_name            => 'host-10-0-91-158'
Nov 22 06:50:00 worker-0.sharedocp4upi45.xxx.xxxx.xxx.xxx NetworkManager[1635]: <info>  [1606027800.0301] dhcp4 (ens3): option ip_address           => '10.0.91.158'


sh-4.4# journalctl -b -1 | awk '{print $4}' | uniq -c
      1 at
    607 localhost
   1253 host-10-0-91-158.openstacklocal  ## <--- DHCP Host Name
    777 worker-0.sharedocp4upi45.xxx.xxx.xxx.xxx

~~~

I then created the following machineconfig which uses nmcli to set the hostname after "afterburn-hostname.service" has been run:

~~~
[stack@osp16 ocp-test1]$ cat 99-worker-fix-hostname.yaml 
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker
  name: 99-worker-fix-hostname
spec:
  config:
    ignition:
      config: {}
      security:
        tls: {}
      timeouts: {}
      version: 2.2.0
    networkd: {}
    passwd: {}
    systemd:
      units:
      - contents: |
          [Unit]
          Description=Set Hostname via /etc/hostname
          Before=node-valid-hostname.service
          After=afterburn-hostname.service
          
          [Service]
          ExecStart=/bin/sh -c "nmcli general hostname $(cat /etc/hostname)"
          Type=oneshot
          
          [Install]
          WantedBy=network-online.target
        name: afterburn-hostname-set.service
        enabled: true
  fips: false
  kernelArguments: null
  osImageURL: ""
~~~

With this in place, there is no timeout anymore and boot time can be reduced from 13 minutes to 7, avoiding installer timeouts.

---

[1] https://docs.openstack.org/neutron/latest/ovn/dhcp_opts.html

Comment 12 Michael Nguyen 2021-04-27 17:07:32 UTC

Verified on 4.8.0-0.nightly-2021-04-26-151924.  Scaled up on IPI Openstack.
$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.8.0-0.nightly-2021-04-26-151924   True        False         109m    Cluster version is 4.8.0-0.nightly-2021-04-26-151924

$ oc get nodes
NAME                                  STATUS   ROLES    AGE    VERSION
mnguyen0427-10-j4gjr-master-0         Ready    master   124m   v1.21.0-rc.0+6143dea
mnguyen0427-10-j4gjr-master-1         Ready    master   123m   v1.21.0-rc.0+6143dea
mnguyen0427-10-j4gjr-master-2         Ready    master   116m   v1.21.0-rc.0+6143dea
mnguyen0427-10-j4gjr-worker-0-26s28   Ready    worker   106m   v1.21.0-rc.0+6143dea
mnguyen0427-10-j4gjr-worker-0-cwfjh   Ready    worker   105m   v1.21.0-rc.0+6143dea
mnguyen0427-10-j4gjr-worker-0-d8kxh   Ready    worker   93m    v1.21.0-rc.0+6143dea

$ oc debug node/mnguyen0427-10-j4gjr-worker-0-26s28
Starting pod/mnguyen0427-10-j4gjr-worker-0-26s28-debug ...
To use host binaries, run `chroot /host`
If you don't see a command prompt, try pressing enter.
sh-4.2# chroot /host
sh-4.4# journalctl -u afterburn-hostname
-- Logs begin at Tue 2021-04-27 14:45:24 UTC, end at Tue 2021-04-27 16:44:48 UTC. --
Apr 27 14:50:47 localhost systemd[1]: Starting Afterburn Hostname...
Apr 27 14:50:47 localhost afterburn[1684]: Apr 27 14:50:47.365 INFO Fetching http://169.254.169.254/latest/meta-data/hostname: Attempt >
Apr 27 14:50:49 localhost afterburn[1684]: Apr 27 14:50:49.026 INFO Fetch successful
Apr 27 14:50:49 localhost systemd[1]: afterburn-hostname.service: Succeeded.
Apr 27 14:50:49 localhost systemd[1]: Started Afterburn Hostname.
Apr 27 14:50:49 localhost systemd[1]: afterburn-hostname.service: Consumed 23ms CPU time
-- Reboot --
Apr 27 14:56:49 mnguyen0427-10-j4gjr-worker-0-26s28 systemd[1]: Starting Afterburn Hostname...
Apr 27 14:56:49 mnguyen0427-10-j4gjr-worker-0-26s28 afterburn[1268]: Apr 27 14:56:49.064 INFO Fetching http://169.254.169.254/latest/me>
Apr 27 14:56:49 mnguyen0427-10-j4gjr-worker-0-26s28 afterburn[1268]: Apr 27 14:56:49.781 INFO Fetch successful
Apr 27 14:56:49 mnguyen0427-10-j4gjr-worker-0-26s28 systemd[1]: afterburn-hostname.service: Succeeded.
Apr 27 14:56:49 mnguyen0427-10-j4gjr-worker-0-26s28 systemd[1]: Started Afterburn Hostname.
Apr 27 14:56:49 mnguyen0427-10-j4gjr-worker-0-26s28 systemd[1]: afterburn-hostname.service: Consumed 19ms CPU time
...skipping...
-- Logs begin at Tue 2021-04-27 14:45:24 UTC, end at Tue 2021-04-27 16:44:48 UTC. --
Apr 27 14:50:47 localhost systemd[1]: Starting Afterburn Hostname...
Apr 27 14:50:47 localhost afterburn[1684]: Apr 27 14:50:47.365 INFO Fetching http://169.254.169.254/latest/meta-data/hostname: Attempt >
Apr 27 14:50:49 localhost afterburn[1684]: Apr 27 14:50:49.026 INFO Fetch successful
Apr 27 14:50:49 localhost systemd[1]: afterburn-hostname.service: Succeeded.
Apr 27 14:50:49 localhost systemd[1]: Started Afterburn Hostname.
Apr 27 14:50:49 localhost systemd[1]: afterburn-hostname.service: Consumed 23ms CPU time
-- Reboot --
Apr 27 14:56:49 mnguyen0427-10-j4gjr-worker-0-26s28 systemd[1]: Starting Afterburn Hostname...
Apr 27 14:56:49 mnguyen0427-10-j4gjr-worker-0-26s28 afterburn[1268]: Apr 27 14:56:49.064 INFO Fetching http://169.254.169.254/latest/me>
Apr 27 14:56:49 mnguyen0427-10-j4gjr-worker-0-26s28 afterburn[1268]: Apr 27 14:56:49.781 INFO Fetch successful
Apr 27 14:56:49 mnguyen0427-10-j4gjr-worker-0-26s28 systemd[1]: afterburn-hostname.service: Succeeded.
Apr 27 14:56:49 mnguyen0427-10-j4gjr-worker-0-26s28 systemd[1]: Started Afterburn Hostname.
Apr 27 14:56:49 mnguyen0427-10-j4gjr-worker-0-26s28 systemd[1]: afterburn-hostname.service: Consumed 19ms CPU time
sh-4.4# systemctl cat afterburn-hostname
# /etc/systemd/system/afterburn-hostname.service
[Unit]
Description=Afterburn Hostname
# Block services relying on Networking being up.
Before=network-online.target
# Wait for NetworkManager to report its online
After=NetworkManager-wait-online.service
# Run before hostname checks
Before=node-valid-hostname.service

[Service]
ExecStart=/usr/bin/afterburn --provider openstack-metadata --hostname=/etc/hostname
Type=oneshot

[Install]
WantedBy=network-online.target
sh-4.4# exit
exit
sh-4.2# exit
exit

Removing debug pod ...

$ oc -n openshift-machine-api get machinesets
NAME                            DESIRED   CURRENT   READY   AVAILABLE   AGE
mnguyen0427-10-j4gjr-worker-0   3         3         3       3           129m

$ oc -n openshift-machine-api scale --replicas=4 machinesets/mnguyen0427-10-j4gjr-worker-0
machineset.machine.openshift.io/mnguyen0427-10-j4gjr-worker-0 scaled

$ oc -n openshift-machine-api get machinesets
NAME                            DESIRED   CURRENT   READY   AVAILABLE   AGE
mnguyen0427-10-j4gjr-worker-0   4         4         3       3           129m

$  oc -n openshift-machine-api get machines
NAME                                  PHASE          TYPE        REGION      ZONE   AGE
mnguyen0427-10-j4gjr-master-0         Running        m1.xlarge   regionOne   nova   131m
mnguyen0427-10-j4gjr-master-1         Running        m1.xlarge   regionOne   nova   131m
mnguyen0427-10-j4gjr-master-2         Running        m1.xlarge   regionOne   nova   131m
mnguyen0427-10-j4gjr-worker-0-26s28   Running        m1.large    regionOne   nova   129m
mnguyen0427-10-j4gjr-worker-0-9nntv   Provisioning                                  2m13s
mnguyen0427-10-j4gjr-worker-0-cwfjh   Running        m1.large    regionOne   nova   129m
mnguyen0427-10-j4gjr-worker-0-d8kxh   Running        m1.large    regionOne   nova   129m

$ oc -n openshift-machine-api get machines
NAME                                  PHASE         TYPE        REGION      ZONE   AGE
mnguyen0427-10-j4gjr-master-0         Running       m1.xlarge   regionOne   nova   135m
mnguyen0427-10-j4gjr-master-1         Running       m1.xlarge   regionOne   nova   135m
mnguyen0427-10-j4gjr-master-2         Running       m1.xlarge   regionOne   nova   135m
mnguyen0427-10-j4gjr-worker-0-26s28   Running       m1.large    regionOne   nova   133m
mnguyen0427-10-j4gjr-worker-0-9nntv   Provisioned   m1.large    regionOne   nova   6m4s
mnguyen0427-10-j4gjr-worker-0-cwfjh   Running       m1.large    regionOne   nova   133m
mnguyen0427-10-j4gjr-worker-0-d8kxh   Running       m1.large    regionOne   nova   133m

$ oc -n openshift-machine-api get machines
NAME                                  PHASE     TYPE        REGION      ZONE   AGE
mnguyen0427-10-j4gjr-master-0         Running   m1.xlarge   regionOne   nova   140m
mnguyen0427-10-j4gjr-master-1         Running   m1.xlarge   regionOne   nova   140m
mnguyen0427-10-j4gjr-master-2         Running   m1.xlarge   regionOne   nova   140m
mnguyen0427-10-j4gjr-worker-0-26s28   Running   m1.large    regionOne   nova   138m
mnguyen0427-10-j4gjr-worker-0-9nntv   Running   m1.large    regionOne   nova   11m
mnguyen0427-10-j4gjr-worker-0-cwfjh   Running   m1.large    regionOne   nova   138m
mnguyen0427-10-j4gjr-worker-0-d8kxh   Running   m1.large    regionOne   nova   138m
$ oc get nodes
NAME                                  STATUS     ROLES    AGE    VERSION
mnguyen0427-10-j4gjr-master-0         Ready      master   140m   v1.21.0-rc.0+6143dea
mnguyen0427-10-j4gjr-master-1         Ready      master   139m   v1.21.0-rc.0+6143dea
mnguyen0427-10-j4gjr-master-2         Ready      master   132m   v1.21.0-rc.0+6143dea
mnguyen0427-10-j4gjr-worker-0-26s28   Ready      worker   122m   v1.21.0-rc.0+6143dea
mnguyen0427-10-j4gjr-worker-0-9nntv   NotReady   worker   47s    v1.21.0-rc.0+6143dea
mnguyen0427-10-j4gjr-worker-0-cwfjh   Ready      worker   121m   v1.21.0-rc.0+6143dea
mnguyen0427-10-j4gjr-worker-0-d8kxh   Ready      worker   109m   v1.21.0-rc.0+6143dea

$ oc get nodes
NAME                                  STATUS   ROLES    AGE    VERSION
mnguyen0427-10-j4gjr-master-0         Ready    master   141m   v1.21.0-rc.0+6143dea
mnguyen0427-10-j4gjr-master-1         Ready    master   139m   v1.21.0-rc.0+6143dea
mnguyen0427-10-j4gjr-master-2         Ready    master   132m   v1.21.0-rc.0+6143dea
mnguyen0427-10-j4gjr-worker-0-26s28   Ready    worker   123m   v1.21.0-rc.0+6143dea
mnguyen0427-10-j4gjr-worker-0-9nntv   Ready    worker   79s    v1.21.0-rc.0+6143dea
mnguyen0427-10-j4gjr-worker-0-cwfjh   Ready    worker   122m   v1.21.0-rc.0+6143dea
mnguyen0427-10-j4gjr-worker-0-d8kxh   Ready    worker   109m   v1.21.0-rc.0+6143dea

Comment 16 errata-xmlrpc 2021-07-27 22:34:24 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438

Comment 17 Red Hat Bugzilla 2023-09-15 00:51:23 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days

Note You need to log in before you can comment on or make changes to this bug.