Bug 1802675 - [IPI][Baremetal] sometimes Mdns-publisher (infra pod) advertise node's name as 'localhost'
Summary: [IPI][Baremetal] sometimes Mdns-publisher (infra pod) advertise node's name a...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Machine Config Operator
Version: 4.3.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: 4.3.z
Assignee: Yossi Boaron
QA Contact: Nataf Sharabi
URL:
Whiteboard:
: 1803429 (view as bug list)
Depends On: 1790823
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-02-13 17:11 UTC by Antoni Segura Puimedon
Modified: 2020-04-20 11:12 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1790823
Environment:
Last Closed: 2020-03-24 23:43:53 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift baremetal-runtimecfg pull 45 0 None closed [release-4.3] Bug 1802675: Consider mdns hostname file existence in utils.ShortHostname 2021-01-18 03:51:26 UTC
Github openshift machine-config-operator pull 1479 0 None closed Bug 1802675: [release-4.3] baremetal: Verify that mdns start only after hostname was set 2021-01-18 03:51:26 UTC

Comment 1 Antoni Segura Puimedon 2020-02-17 11:26:38 UTC
*** Bug 1803429 has been marked as a duplicate of this bug. ***

Comment 4 Nataf Sharabi 2020-03-23 11:16:57 UTC
Hi,

This bug was verified on OCP4.4 with on BZ-1790823.

The scenario doesn't work on 4.3.

After talking to Yossi & tried to understand how to reproduce... 

We've seen the following changes in the scenarios:

When trying to make the dhcp to give name localhost to one of the masters,
kubelet fails to start:

[root@localhost ~]# systemctl status kubelet.service 
● kubelet.service - Kubernetes Kubelet
   Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset:>
  Drop-In: /etc/systemd/system/kubelet.service.d
           └─10-default-env.conf, 20-nodenet.conf
   Active: activating (auto-restart) (Result: exit-code) since Mon 2020-03-23 1>
  Process: 9547 ExecStart=/usr/bin/hyperkube kubelet --config=/etc/kubernetes/k>
  Process: 9545 ExecStartPre=/bin/rm -f /var/lib/kubelet/cpu_manager_state (cod>
  Process: 9543 ExecStartPre=/bin/mkdir --parents /etc/kubernetes/manifests (co>
 Main PID: 9547 (code=exited, status=255)
      CPU: 278ms

Mar 23 11:02:54 localhost.localdomain systemd[1]: kubelet.service: Consumed 278>

--------------------------------------------------------------------------------------
We can see that the verify-hostname script is exiting:

[root@localhost ~]# sudo crictl ps  -a | grep  verify  
53ad966386e63       fabb83d6707761415d7bc20744a8975a704c1fc61475890473483be31ad27b69                                                         About an hour ago   Exited              verify-hostname                         8                   3f0d2a756a482
d4e5bfc13c607       quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:c21b0b480483bb5f42aeec39a8c6a802d1c940bb15561ee84f9df3317fb24c37   2 days ago          Exited              verify-hostname                         0                   cad4d8986a2c2

--------------------------------------------------------------------------------------
[root@localhost ~]# crictl logs  53ad966386e63                                                                                                                                                
function get_hostname()                                                                                                                                                                       
{                                                                                                                                                                                             
  if [[ -s $RUNTIMECFG_HOSTNAME_PATH ]]; then
    cat $RUNTIMECFG_HOSTNAME_PATH
  else
    # if hostname wasn't updated by NM script, read hostname
    hostname
  fi
}
while [[ "$(get_hostname)" =~ ^localhost(.localdomain)?$ ]]; do
  echo "XXhostname is still set to a default value"
  sleep 1
done
++ get_hostname
++ [[ -s /etc/mdns/hostname ]]
++ cat /etc/mdns/hostname
+ [[ localhost.ocp-edge-cluster.qe.lab.redhat.com =~ ^localhost(.localdomain)?$ ]]
----------------------------------------------------------------------------------
should get from the above the following output:
"hostname is still set to a default value"
----------------------------------------------------------------------------------

In addition:

Mar 23 01:00:04 localhost root[1367]: NM mdns-hostname triggered by hostname.
Mar 23 01:00:04 localhost nm-dispatcher[1351]: <13>Mar 23 01:00:04 root: NM mdn>
Mar 23 01:00:04 localhost root[1371]: Hostname changed: localhost
Mar 23 01:00:04 localhost nm-dispatcher[1351]: <13>Mar 23 01:00:04 root: Hostna>
Mar 23 01:00:04 localhost dhclient[1365]: DHCPDISCOVER on enp4s0 to 255.255.255>
Mar 23 01:00:04 localhost dhclient[1368]: DHCPDISCOVER on enp5s0 to 255.255.255>
Mar 23 01:00:07 localhost dhclient[1368]: DHCPDISCOVER on enp5s0 to 255.255.255>
Mar 23 01:00:08 localhost dhclient[1365]: DHCPDISCOVER on enp4s0 to 255.255.255>
Mar 23 01:00:10 localhost dhclient[1368]: DHCPDISCOVER on enp5s0 to 255.255.255>
Mar 23 01:00:14 localhost systemd[1]: NetworkManager-dispatcher.service: Consum>
Mar 23 01:00:16 localhost dhclient[1368]: DHCPDISCOVER on enp5s0 to 255.255.255>
Mar 23 01:00:18 localhost dhclient[1365]: DHCPDISCOVER on enp4s0 to 255.255.255>
Mar 23 01:00:23 localhost dhclient[1368]: DHCPDISCOVER on enp5s0 to 255.255.255>
Mar 23 01:00:25 localhost dhclient[1365]: DHCPDISCOVER on enp4s0 to 255.255.255>
Mar 23 01:00:33 localhost dhclient[1365]: DHCPDISCOVER on enp4s0 to 255.255.255>
Mar 23 01:00:34 localhost systemd[1]: NetworkManager-wait-online.service: Main >
Mar 23 01:00:34 localhost systemd[1]: NetworkManager-wait-online.service: Faile>
Mar 23 01:00:34 localhost systemd[1]: Failed to start Network Manager Wait Onli

NetworkManager-wait-online.service: Main process exited, code=exited, status=1/FA


Therefore I cannot verify this bug, Since I'm not able to reproduce the scenario.


Note You need to log in before you can comment on or make changes to this bug.