Bug 1790823 - [IPI][Baremetal] sometimes Mdns-publisher (infra pod) advertise node's name as 'localhost'
Summary: [IPI][Baremetal] sometimes Mdns-publisher (infra pod) advertise node's name a...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Machine Config Operator
Version: 4.4
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: 4.4.0
Assignee: Yossi Boaron
QA Contact: Nataf Sharabi
URL:
Whiteboard:
Depends On:
Blocks: 1802675
TreeView+ depends on / blocked
 
Reported: 2020-01-14 10:31 UTC by Yossi Boaron
Modified: 2020-05-07 15:22 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1802675 (view as bug list)
Environment:
Last Closed: 2020-05-04 11:24:10 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Github openshift baremetal-runtimecfg pull 44 None closed Bug 1790823: Consider mdns hostname file existence in utils.ShortHostname 2020-07-17 04:32:51 UTC
Github openshift machine-config-operator pull 1388 None closed Bug 1790823: [IPI][BAREMETAL] Verify that mdns-publisher starts after hostname is set 2020-07-17 04:32:50 UTC
Github openshift machine-config-operator pull 1446 None closed Bug 1790823: [baremetal] Verify that MDNS doesn't advertise localhost 2020-07-17 04:32:50 UTC
Github openshift machine-config-operator pull 1455 None closed Bug 1790823: [baremetal] Verify that MDNS doesn't advertise localhost 2020-07-17 04:32:51 UTC
Red Hat Product Errata RHBA-2020:0581 None None None 2020-05-04 11:24:40 UTC

Description Yossi Boaron 2020-01-14 10:31:25 UTC
Description of problem:
The Mdns-publisher infra pod, reads host's hostname and advertise it using MDNS protocol. 

Sometimes the mdns-publisher advertise wrongly the hostname as 'localhost', the reason for that could be a race condition (mdns-publisher should start publishing only after node's name is != 'localhost' )

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Kirsten Garrison 2020-01-15 00:22:42 UTC
Assigning to baremetal team as this doesn't seem directly MCO related.

Comment 2 Kirsten Garrison 2020-01-15 00:27:23 UTC
Actually, I'm not even sure this should be filed under MCO at all..

Comment 3 Kirsten Garrison 2020-01-15 18:16:16 UTC
Talked to Brad and he suggested I move this to KNI-Deployment.

Comment 4 Yossi Boaron 2020-01-19 09:34:59 UTC
The reason mdns-publisher advertise node's name as 'localhost.ostest.test.metalkube.org' is:

In case the MCO re-configures the DHCP client, which means that 'hostname' is not set yet on the first boot while baremetal-runtimecfg runs on first boot and configures mdns-publisher with the wrong hostname.

Comment 10 Nataf Sharabi 2020-02-24 15:36:00 UTC
In order to test the fix on virtual env. I did the following:

1.Install fully functional environment
2.Configure root access to one of the master nodes:
  from baremetal -> ssh kni@provisionhost -> ssh core@master-0 -> sudo -s -> passwd 
3.via virtual machine manager/ virsh console master-0 (from baremetal) shutdown ens4
4.enter master consle and delete the following: /etc/mdns/hostname, /etc/mdns/config.hcl
5.edit the baremetal xml (virtual machine manager/cli) & configure the dhcp to give the master the name "localhost" [1]
6.Enable back ens4
7.restart the required master node
8.via console 'cat /etc/mdns/hostname' -> Should be 'localhost'
9.Run the following:
  sudo crictl ps -a | grep verify
  2f55e57daee90       891ab60d7c8933530ae270a93b54ea6bd7201638c99b528d73bcaedade9198e9   38 seconds ago      Running    verify-hostname   
  
  sudo crictl logs 2f55e57daee90                                                                                                                                            
  hostname is still localhost
  hostname is still localhost

  This show the fix is working & mdns won't work till it will be different then 'localhost'

10.Change back section 5 to original name [2]
11.In order to restart the configuration on baremetal run: 
   service libvirtd restart
12.In order to the dhcp name & make mdns work run on master node:
   dhclient -r 
   dhclient
13.Wait approx. 5 minutes
14.From the master node:
   cat /etc/mdns/hostname -> Should be master-X 
15.From the master node:
   cat /etc/mdns/config.hcl -> should be [3]



[1] 
 <dhcp>
      <range start="192.168.123.100" end="192.168.123.150"/>
      <host mac="52:54:00:30:5f:ea" name="localhost" ip="192.168.123.146"/>
      <host mac="52:54:00:eb:0f:4b" name="master-1" ip="192.168.123.121"/>
      <host mac="52:54:00:fe:02:78" name="master-2" ip="192.168.123.128"/>
      <host mac="52:54:00:9b:b3:1e" name="worker-0" ip="192.168.123.140"/>
      <host mac="52:54:00:08:05:59" name="worker-1" ip="192.168.123.118"/>
      <host mac="52:54:00:6b:55:89" name="provisionhost-0" ip="192.168.123.141"/>
    </dhcp>
  </ip>

[2] 
 <dhcp>
      <range start="192.168.123.100" end="192.168.123.150"/>
      <host mac="52:54:00:30:5f:ea" name="master-0" ip="192.168.123.146"/>
      <host mac="52:54:00:eb:0f:4b" name="master-1" ip="192.168.123.121"/>
      <host mac="52:54:00:fe:02:78" name="master-2" ip="192.168.123.128"/>
      <host mac="52:54:00:9b:b3:1e" name="worker-0" ip="192.168.123.140"/>
      <host mac="52:54:00:08:05:59" name="worker-1" ip="192.168.123.118"/>
      <host mac="52:54:00:6b:55:89" name="provisionhost-0" ip="192.168.123.141"/>
    </dhcp>
  </ip>

[3]cat config.hcl 

bind_address = "192.168.123.121"
collision_avoidance = "hostname"

service {
    name = "ocp-edge-cluster Etcd"
    host_name = "etcd-1.local."
    type = "_etcd-server-ssl._tcp"
    domain = "local."
    port = 2380
    ttl = 3200
}

service {
    name = "ocp-edge-cluster Workstation"
    host_name = "master-1.local."
    type = "_workstation._tcp"
    domain = "local."
    port = 42424
    ttl = 3200
}

service {
    name = "ocp-edge-cluster EtcdWorkstation"
    host_name = "etcd-1.local."
    type = "_workstation._tcp"
    domain = "local."
    port = 42424
    ttl = 300
}

Comment 12 errata-xmlrpc 2020-05-04 11:24:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0581


Note You need to log in before you can comment on or make changes to this bug.