Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 2080736

Summary: OpenShift Assisted Installer gets the hostname as localhost instead of actual hostname when using 802.3ad LACP NIC bonding
Product: OpenShift Container Platform Reporter: Venkat B <venkatasubramanian.b>
Component: assisted-installerAssignee: Mat Kowalski <mko>
assisted-installer sub component: discovery-agent QA Contact: Yuri Obshansky <yobshans>
Status: CLOSED DUPLICATE Docs Contact:
Severity: medium    
Priority: unspecified CC: aos-bugs, lalon, oourfali, yliu1
Version: 4.8   
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-05-06 16:41:07 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
NetworkManager journalctl logs
none
systemd-hostnamed.service status output none

Description Venkat B 2022-05-01 12:23:59 UTC
Created attachment 1876321 [details]
NetworkManager journalctl logs

Description of problem:
Configured the Infraenv with static IPs and set the bonding mode as (4) 802.3ad. We have the LACP bundle configured at our Hardware and the external switches are configured for that too.

Following is an extract of network_yaml for one of the host.

    "static_network_config": [
    {
      "network_yaml": "dns-resolver:\r\n  config:\r\n    search:\r\n    - peparepe.net\r\n    server:\r\n    - 10.152.181.50\r\ninterfaces:\r\n- ipv4:\r\n    address:\r\n    - ip: 10.152.183.67\r\n      prefix-length: 26\r\n    dhcp: false\r\n    enabled: true\r\n  ipv6:\r\n    enabled: false\r\n  link-aggregation:\r\n    mode: 802.3ad\r\n    options:\r\n      lacp_rate: fast\r\n      xmit_hash_policy: layer2+3\r\n      miimon: \"100\"\r\n    slaves:\r\n    - ens3f0\r\n    - ens3f1\r\n  name: bond0\r\n  state: up\r\n  type: bond\r\n  mtu: 1500\r\nroutes:\r\n  config:\r\n  - destination: 0.0.0.0/0\r\n    next-hop-address: 10.152.183.1\r\n    next-hop-interface: bond0\r\n    table-id: 254",
      "mac_interface_map": [
        {
          "mac_address": "00:21:5A:C3:D3:2E",
          "logical_nic_name": "ens3f0"
        }
      ]
    }


When the bond mode is set as 4 (mode=802.3ad), booting the OCP node with the Discovery ISO end up with the node's hostname as 'localhost'. We have an external DNS that is able to resolve the static IP of the node (reverse lookup) to its hostname.

We have seen from the journalctl logs of NetworkManager and systemd-hostnamed.service that due to parallel initialization of the bringing up of the ETH interfaces and parallely setting the hostname, the hostname lookup does not even happen (i.e. we did not see any DNS query at our DNS during the server boot of the discovery image). End result is the hostname of the node is set as 'localhost'.

root@localhost ~]# hostnamectl
   Static hostname: n/a
Transient hostname: localhost
         Icon name: computer-server
           Chassis: server
        Machine ID: cbe6f6732bdf4b138701d14236e0bdeb
           Boot ID: 1cb224b2cef74f059e4e141e1733376e
  Operating System: Red Hat Enterprise Linux CoreOS 48.84.202109241901-0 (Ootpa)
       CPE OS Name: cpe:/o:redhat:enterprise_linux:8::coreos
            Kernel: Linux 4.18.0-305.19.1.el8_4.x86_64
      Architecture: x86-64

With all nodes having 'localhost' as the hostname, unable to proceed with the Assisted Installer based installation.


Version-Release number of selected component (if applicable):
We are using the latest image tag of Assisted Services.
There are 3 Control Nodes and 5 Compute Nodes. It's a pure IPv4 installation. OCP version used is 4.8.29.

How reproducible:
It's reproducible always

Steps to Reproduce:
1. Setup Assisted Installer services
2. Generate Discovery Image ISO
3. Boot each of the server with that ISO
4. Each host goes into the prompt with hostname set as 'localhost'

Actual results:
Each host goes into the prompt with hostname set as 'localhost'

Expected results:
Each host must get its unique hostname correctly (as per what is configured in the DNS)

Additional info:
Investigating if there are similar issues around this. Following might be relevant:
https://access.redhat.com/solutions/5915331
https://bugzilla.redhat.com/show_bug.cgi?id=1938671#attach_1763320
https://bugzilla.redhat.com/show_bug.cgi?id=1944559

Workaround to it working correctly:

2 things were done once the node boots to prompt with 'localhost'. The nodes have the right static IPv4 that was assigned to them.
- systemctl restart NetworkManager
- systemctl restart agent.service

Once that is done, the Assisted Service Agent retries and supplied the right hostname to the AI Service and from then on all OK.

Is this a known issue? What is the right fix to this issue?

Comment 1 Venkat B 2022-05-01 12:26:14 UTC
Created attachment 1876322 [details]
systemd-hostnamed.service status output

Comment 2 Mat Kowalski 2022-05-03 15:49:15 UTC
Hi,

I understand the topology you are running, but it's not trivial for me to reproduce it locally. Is there any chance you could give me access to the affected system when it's booted from the Discovery ISO so that I can gather some stuff live when we are hitting the issue? If not, I'd need at least a full journald and probably dmesg to have something to start with

Thanks for providing the required info

Comment 4 Mat Kowalski 2022-05-05 11:24:23 UTC
Hi Venkat,

I'm cancelling the NEEDINFO because it seems that the issue you have encountered is being tracked as high-prio bug in 2064339. I can see an open PR https://github.com/openshift/machine-config-operator/pull/3041 against OpenShift 4.10 already. We will check internally and discuss the feasibility of backporting back to 4.8

Comment 5 Mat Kowalski 2022-05-06 16:41:07 UTC

*** This bug has been marked as a duplicate of bug 2064339 ***