Description of problem: The Mdns-publisher infra pod, reads host's hostname and advertise it using MDNS protocol. Sometimes the mdns-publisher advertise wrongly the hostname as 'localhost', the reason for that could be a race condition (mdns-publisher should start publishing only after node's name is != 'localhost' ) Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Assigning to baremetal team as this doesn't seem directly MCO related.
Actually, I'm not even sure this should be filed under MCO at all..
Talked to Brad and he suggested I move this to KNI-Deployment.
The reason mdns-publisher advertise node's name as 'localhost.ostest.test.metalkube.org' is: In case the MCO re-configures the DHCP client, which means that 'hostname' is not set yet on the first boot while baremetal-runtimecfg runs on first boot and configures mdns-publisher with the wrong hostname.
In order to test the fix on virtual env. I did the following: 1.Install fully functional environment 2.Configure root access to one of the master nodes: from baremetal -> ssh kni@provisionhost -> ssh core@master-0 -> sudo -s -> passwd 3.via virtual machine manager/ virsh console master-0 (from baremetal) shutdown ens4 4.enter master consle and delete the following: /etc/mdns/hostname, /etc/mdns/config.hcl 5.edit the baremetal xml (virtual machine manager/cli) & configure the dhcp to give the master the name "localhost" [1] 6.Enable back ens4 7.restart the required master node 8.via console 'cat /etc/mdns/hostname' -> Should be 'localhost' 9.Run the following: sudo crictl ps -a | grep verify 2f55e57daee90 891ab60d7c8933530ae270a93b54ea6bd7201638c99b528d73bcaedade9198e9 38 seconds ago Running verify-hostname sudo crictl logs 2f55e57daee90 hostname is still localhost hostname is still localhost This show the fix is working & mdns won't work till it will be different then 'localhost' 10.Change back section 5 to original name [2] 11.In order to restart the configuration on baremetal run: service libvirtd restart 12.In order to the dhcp name & make mdns work run on master node: dhclient -r dhclient 13.Wait approx. 5 minutes 14.From the master node: cat /etc/mdns/hostname -> Should be master-X 15.From the master node: cat /etc/mdns/config.hcl -> should be [3] [1] <dhcp> <range start="192.168.123.100" end="192.168.123.150"/> <host mac="52:54:00:30:5f:ea" name="localhost" ip="192.168.123.146"/> <host mac="52:54:00:eb:0f:4b" name="master-1" ip="192.168.123.121"/> <host mac="52:54:00:fe:02:78" name="master-2" ip="192.168.123.128"/> <host mac="52:54:00:9b:b3:1e" name="worker-0" ip="192.168.123.140"/> <host mac="52:54:00:08:05:59" name="worker-1" ip="192.168.123.118"/> <host mac="52:54:00:6b:55:89" name="provisionhost-0" ip="192.168.123.141"/> </dhcp> </ip> [2] <dhcp> <range start="192.168.123.100" end="192.168.123.150"/> <host mac="52:54:00:30:5f:ea" name="master-0" ip="192.168.123.146"/> <host mac="52:54:00:eb:0f:4b" name="master-1" ip="192.168.123.121"/> <host mac="52:54:00:fe:02:78" name="master-2" ip="192.168.123.128"/> <host mac="52:54:00:9b:b3:1e" name="worker-0" ip="192.168.123.140"/> <host mac="52:54:00:08:05:59" name="worker-1" ip="192.168.123.118"/> <host mac="52:54:00:6b:55:89" name="provisionhost-0" ip="192.168.123.141"/> </dhcp> </ip> [3]cat config.hcl bind_address = "192.168.123.121" collision_avoidance = "hostname" service { name = "ocp-edge-cluster Etcd" host_name = "etcd-1.local." type = "_etcd-server-ssl._tcp" domain = "local." port = 2380 ttl = 3200 } service { name = "ocp-edge-cluster Workstation" host_name = "master-1.local." type = "_workstation._tcp" domain = "local." port = 42424 ttl = 3200 } service { name = "ocp-edge-cluster EtcdWorkstation" host_name = "etcd-1.local." type = "_workstation._tcp" domain = "local." port = 42424 ttl = 300 }
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0581