Created attachment 1819203 [details] scenario 1 screenshot Description of problem: Impossible to set cluster Networking in any way while Allocate virtual IPs via DHCP server is True or False Failed on both cases Version-Release number of selected component (if applicable): v1.0.25.2 How reproducible: Scenario 1: Allocate virtual IPs via DHCP server - True - Create cluster, download ISO, discover nodes, go to Networking - do nothing with "Allocate virtual IPs via DHCP server" Result- > AI found API Virtual IP and Ingress Virtual IP but after several seconds failed 1. The DHCP server failed to allocate the IP The API virtual IP is undefined; IP allocation from the DHCP server timed out. 2. The DHCP server failed to allocate the IP The Ingress virtual IP is undefined; IP allocation from the DHCP server timed out. 3. Cluster is not ready yet The following requirements must be met: The API virtual IP is undefined; IP allocation from the DHCP server timed out. The Ingress virtual IP is undefined; IP allocation from the DHCP server timed out. Scenario 2: Allocate virtual IPs via DHCP server - False - Create cluster, download ISO, discover nodes, go to Networking - Uncheck "Allocate virtual IPs via DHCP server" - Put correct API Virtual IP * - 192.168.123.5 Ingress Virtual IP * - 192.168.123.10 Result- > Failed to update the cluster Setting Machine network CIDR is forbidden when cluster is not in vip-dhcp-allocation mode Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
For scenario 2 I see the following payload in the UI ``` {"api_vip":"192.168.127.52","ingress_vip":"192.168.127.51","ssh_public_key":"ssh-rsa AAAAB[...]W+b6wp5c=","vip_dhcp_allocation":false,"network_type":"OVNKubernetes","user_managed_networking":false,"cluster_networks":[{"cidr":"10.128.0.0/14","host_prefix":23}],"service_networks":[{"cidr":"172.30.0.0/16"}],"machine_networks":[{"cidr":"192.168.127.0/24"}]} ``` Modifying it and sending manually via curl as PATCH with machine_networks removed, succeeds ``` '{"api_vip":"192.168.127.52","ingress_vip":"192.168.127.51","ssh_public_key":"ssh-rsa AAA[...]wp5c=","vip_dhcp_allocation":false,"network_type":"OVNKubernetes","user_managed_networking":false,"cluster_networks":[{"cidr":"10.128.0.0/14","host_prefix":23}],"service_networks":[{"cidr":"172.30.0.0/16"}]}' ```
The succeeding payload can as well contain *empty* list of "machine_networks", i.e. ``` '{"api_vip":"192.168.127.52","ingress_vip":"192.168.127.51","ssh_public_key":"ssh-rsa AAAA[...]W+b6wp5c=","vip_dhcp_allocation":false,"network_type":"OVNKubernetes","user_managed_networking":false,"cluster_networks":[{"cidr":"10.128.0.0/14","host_prefix":23}],"service_networks":[{"cidr":"172.30.0.0/16"}],"machine_networks":[]}' ```
For scenario 1 the error message "IP allocation from the DHCP server timed out" is coming from the following validator function ``` func isDhcpLeaseAllocationTimedOut(c *clusterPreprocessContext) bool { return c.cluster.MachineNetworkCidrUpdatedAt.String() != "" && time.Since(c.cluster.MachineNetworkCidrUpdatedAt) > DhcpLeaseTimeoutMinutes*time.Minute } ``` Those values are updated via ``` func UpdateMachineCidr(db *gorm.DB, cluster *common.Cluster, machineCidr string) error { [...] return db.Model(&common.Cluster{}).Where("id = ?", cluster.ID.String()).Updates(map[string]interface{}{ "machine_network_cidr": machineCidr, "machine_network_cidr_updated_at": time.Now(), }).Error ```
++++++++++++++++++ +++ SCENARIO 1 +++ ++++++++++++++++++ Those are payloads used to configure respective options +++ Cluster-Managed Networking, disabling "Allocate virtual IPs via DHCP server", manually providing VIPs `--data '{"api_vip":"192.168.127.201","ingress_vip":"192.168.127.202","vip_dhcp_allocation":false,"network_type":"OVNKubernetes","user_managed_networking":false,"cluster_networks":[{"cidr":"10.128.0.0/14","host_prefix":23}],"service_networks":[{"cidr":"172.30.0.0/16"}]}'` +++ Cluster-Managed Networking, enabling "Allocate virtual IPs via DHCP server", machine network #1 `--data '{"vip_dhcp_allocation":true,"network_type":"OVNKubernetes","user_managed_networking":false,"cluster_networks":[{"cidr":"10.128.0.0/14","host_prefix":23}],"service_networks":[{"cidr":"172.30.0.0/16"}],"machine_networks":[{"cidr":"192.168.127.0/24"}]}'` +++ Cluster-Managed Networking, enabling "Allocate virtual IPs via DHCP server", machine network #2 `--data '{"vip_dhcp_allocation":true,"network_type":"OVNKubernetes","user_managed_networking":false,"cluster_networks":[{"cidr":"10.128.0.0/14","host_prefix":23}],"service_networks":[{"cidr":"172.30.0.0/16"}],"machine_networks":[{"cidr":"192.168.145.0/24"}]}'` ------------------ Now the observations - disabling auto-allocation just works, no more comments - enabling auto-allocations causes `GET /api/.../clusters/UUID | jq '.api_vip'` to return NULL for about 15 seconds after sending the request; after this time, the IP is returned consistently - sending any PATCH request containing "machine_networks" (even if the value does not change) causes GET to once again return NULL for about 15 seconds From this I believe what happens is the following - whenever params.ClusterUpdateParams.MachineNetworks is not empty, we are triggering a reallocation of the VIP - it does not matter whether params.ClusterUpdateParams.MachineNetworks is equal to cluster.MachineNetworks or not What should happen - reallocation should be triggered only when params.ClusterUpdateParams.MachineNetworks != cluster.MachineNetworks
Additional observation, not impacting the fundamental flaw of the reallocation logic described above, but something that causes the issue to be visible - when I created a cluster and then using the UI I go to the Networking tab, the browser keeps sending PATCH requests recurrently even though I'm not touching the UI at all. I would have expected PATCH to be sent only when I explicitly click something in the UI that changes the underlying value, but I observe them being sent even when I don't touch the browser.
The current state is believed to be - fix for scenario (1) in https://github.com/openshift/assisted-service/pull/2527 - fix for scenario (2) in https://github.com/openshift-assisted/assisted-ui-lib/pull/767 https://github.com/openshift/assisted-service/pull/2512, as not directly affecting the issue, should not be considered as a scope of this BZ.
Verified both scenarios on Staging UI 1.5.35 BE v1.0.25.3
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3759