# Description of the problem: While installing a 3-node cluster on Supermicro servers, my nodes are bouncing between "insufficient" and "ready". The nodes go to insufficient due to a packet loss. # Additional symptoms: 1. If 'agent.service' is stopped on the nodes, no packet loss occurs in manual test 2. If tests are done with 'arping', no packet loss occurs, but sometimes there are delays : [core@cnfdf12 ~]$ arping 10.8.34.31 -I eno1 ARPING 10.8.34.31 from 10.8.34.30 eno1 Unicast reply from 10.8.34.31 [3C:EC:EF:5F:E0:D6] 0.543ms Unicast reply from 10.8.34.31 [3C:EC:EF:5F:E0:D6] 641.860ms Unicast reply from 10.8.34.31 [3C:EC:EF:5F:E0:D6] 0.549ms The problem seems to be associated with ARP table garbage, for example: [core@cnfdf12 ~]$ arp -a cnfdf13.telco5gran.eng.rdu2.redhat.com (10.8.34.31) at 3c:ec:ef:5f:e0:d6 [ether] on eno1 hv6.telco5gran.eng.rdu2.redhat.com (10.8.34.25) at b8:ce:f6:44:19:5e [ether] on eno1 cnfdf14.telco5gran.eng.rdu2.redhat.com (10.8.34.32) at 3c:ec:ef:5f:5d:37 [ether] on eno2 (The third entry is clearly points to a wrong interface) # Workaround (on each host): 1. manually switch off all the interfaces besides the relevant one (eno1): [core@cnfdf13 ~]$ sudo ip link set eno2 down [core@cnfdf13 ~]$ sudo ip link set ens2f0 down [core@cnfdf13 ~]$ sudo ip link set eth3 down [core@cnfdf13 ~]$ sudo ip link set eth4 down [core@cnfdf13 ~]$ sudo ip link set ens2f1 down [core@cnfdf13 ~]$ sudo ip link set ens1f1 down [core@cnfdf13 ~]$ sudo ip link set ens1f0 down 2. Clean the ARP table [core@cnfdf13 ~]$ sudo ip -s -s neigh flush all Versions: Server Version: 4.9.21 Kubernetes Version: v1.22.3+fdba464 ACM version: 2.4.4 Hardware Info: Manufacturer: Supermicro Product Name: Super Server Steps to reproduce: 1. Try installing a three-node cluster with the above servers. The servers must have several NICs connected to the same network. Actual results: Validation fails due to packet loss Expected results: Installation proceeds Additional info:
Depends on https://bugzilla.redhat.com/show_bug.cgi?id=2095173