Bug 1790823

Summary: [IPI][Baremetal] sometimes Mdns-publisher (infra pod) advertise node's name as 'localhost'
Product: OpenShift Container Platform Reporter: Yossi Boaron <yboaron>
Component: Machine Config OperatorAssignee: Yossi Boaron <yboaron>
Status: CLOSED ERRATA QA Contact: Nataf Sharabi <nsharabi>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 4.4CC: acomabon, asegurap, bschmaus, kgarriso, kni-bugs, obockows, rgregory, rhhi-next-mgmt-qe, rsandu, scuppett, steven.barre, vvoronko, wsun
Target Milestone: ---   
Target Release: 4.4.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1802675 (view as bug list) Environment:
Last Closed: 2020-05-04 11:24:10 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1802675    

Description Yossi Boaron 2020-01-14 10:31:25 UTC
Description of problem:
The Mdns-publisher infra pod, reads host's hostname and advertise it using MDNS protocol. 

Sometimes the mdns-publisher advertise wrongly the hostname as 'localhost', the reason for that could be a race condition (mdns-publisher should start publishing only after node's name is != 'localhost' )

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Kirsten Garrison 2020-01-15 00:22:42 UTC
Assigning to baremetal team as this doesn't seem directly MCO related.

Comment 2 Kirsten Garrison 2020-01-15 00:27:23 UTC
Actually, I'm not even sure this should be filed under MCO at all..

Comment 3 Kirsten Garrison 2020-01-15 18:16:16 UTC
Talked to Brad and he suggested I move this to KNI-Deployment.

Comment 4 Yossi Boaron 2020-01-19 09:34:59 UTC
The reason mdns-publisher advertise node's name as 'localhost.ostest.test.metalkube.org' is:

In case the MCO re-configures the DHCP client, which means that 'hostname' is not set yet on the first boot while baremetal-runtimecfg runs on first boot and configures mdns-publisher with the wrong hostname.

Comment 10 Nataf Sharabi 2020-02-24 15:36:00 UTC
In order to test the fix on virtual env. I did the following:

1.Install fully functional environment
2.Configure root access to one of the master nodes:
  from baremetal -> ssh kni@provisionhost -> ssh core@master-0 -> sudo -s -> passwd 
3.via virtual machine manager/ virsh console master-0 (from baremetal) shutdown ens4
4.enter master consle and delete the following: /etc/mdns/hostname, /etc/mdns/config.hcl
5.edit the baremetal xml (virtual machine manager/cli) & configure the dhcp to give the master the name "localhost" [1]
6.Enable back ens4
7.restart the required master node
8.via console 'cat /etc/mdns/hostname' -> Should be 'localhost'
9.Run the following:
  sudo crictl ps -a | grep verify
  2f55e57daee90       891ab60d7c8933530ae270a93b54ea6bd7201638c99b528d73bcaedade9198e9   38 seconds ago      Running    verify-hostname   
  
  sudo crictl logs 2f55e57daee90                                                                                                                                            
  hostname is still localhost
  hostname is still localhost

  This show the fix is working & mdns won't work till it will be different then 'localhost'

10.Change back section 5 to original name [2]
11.In order to restart the configuration on baremetal run: 
   service libvirtd restart
12.In order to the dhcp name & make mdns work run on master node:
   dhclient -r 
   dhclient
13.Wait approx. 5 minutes
14.From the master node:
   cat /etc/mdns/hostname -> Should be master-X 
15.From the master node:
   cat /etc/mdns/config.hcl -> should be [3]



[1] 
 <dhcp>
      <range start="192.168.123.100" end="192.168.123.150"/>
      <host mac="52:54:00:30:5f:ea" name="localhost" ip="192.168.123.146"/>
      <host mac="52:54:00:eb:0f:4b" name="master-1" ip="192.168.123.121"/>
      <host mac="52:54:00:fe:02:78" name="master-2" ip="192.168.123.128"/>
      <host mac="52:54:00:9b:b3:1e" name="worker-0" ip="192.168.123.140"/>
      <host mac="52:54:00:08:05:59" name="worker-1" ip="192.168.123.118"/>
      <host mac="52:54:00:6b:55:89" name="provisionhost-0" ip="192.168.123.141"/>
    </dhcp>
  </ip>

[2] 
 <dhcp>
      <range start="192.168.123.100" end="192.168.123.150"/>
      <host mac="52:54:00:30:5f:ea" name="master-0" ip="192.168.123.146"/>
      <host mac="52:54:00:eb:0f:4b" name="master-1" ip="192.168.123.121"/>
      <host mac="52:54:00:fe:02:78" name="master-2" ip="192.168.123.128"/>
      <host mac="52:54:00:9b:b3:1e" name="worker-0" ip="192.168.123.140"/>
      <host mac="52:54:00:08:05:59" name="worker-1" ip="192.168.123.118"/>
      <host mac="52:54:00:6b:55:89" name="provisionhost-0" ip="192.168.123.141"/>
    </dhcp>
  </ip>

[3]cat config.hcl 

bind_address = "192.168.123.121"
collision_avoidance = "hostname"

service {
    name = "ocp-edge-cluster Etcd"
    host_name = "etcd-1.local."
    type = "_etcd-server-ssl._tcp"
    domain = "local."
    port = 2380
    ttl = 3200
}

service {
    name = "ocp-edge-cluster Workstation"
    host_name = "master-1.local."
    type = "_workstation._tcp"
    domain = "local."
    port = 42424
    ttl = 3200
}

service {
    name = "ocp-edge-cluster EtcdWorkstation"
    host_name = "etcd-1.local."
    type = "_workstation._tcp"
    domain = "local."
    port = 42424
    ttl = 300
}

Comment 12 errata-xmlrpc 2020-05-04 11:24:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0581