Bug 1999297
| Summary: | [Assisted-4.8 ][SaaS] vip-dhcp-allocation mode broken cannot set networking for cluster | ||||||
|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Yuri Obshansky <yobshans> | ||||
| Component: | assisted-installer | Assignee: | Mat Kowalski <mko> | ||||
| assisted-installer sub component: | assisted-service | QA Contact: | Yuri Obshansky <yobshans> | ||||
| Status: | CLOSED ERRATA | Docs Contact: | |||||
| Severity: | urgent | ||||||
| Priority: | urgent | CC: | aos-bugs, jkilzi, lgamliel, mko, mlammon, ohochman, sasha | ||||
| Version: | 4.8 | Keywords: | TestBlocker | ||||
| Target Milestone: | --- | ||||||
| Target Release: | 4.9.0 | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | OCP-Metal-V1.0.25.3 | Doc Type: | If docs needed, set a value | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2021-10-18 17:49:59 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
For scenario 2 I see the following payload in the UI
```
{"api_vip":"192.168.127.52","ingress_vip":"192.168.127.51","ssh_public_key":"ssh-rsa AAAAB[...]W+b6wp5c=","vip_dhcp_allocation":false,"network_type":"OVNKubernetes","user_managed_networking":false,"cluster_networks":[{"cidr":"10.128.0.0/14","host_prefix":23}],"service_networks":[{"cidr":"172.30.0.0/16"}],"machine_networks":[{"cidr":"192.168.127.0/24"}]}
```
Modifying it and sending manually via curl as PATCH with machine_networks removed, succeeds
```
'{"api_vip":"192.168.127.52","ingress_vip":"192.168.127.51","ssh_public_key":"ssh-rsa AAA[...]wp5c=","vip_dhcp_allocation":false,"network_type":"OVNKubernetes","user_managed_networking":false,"cluster_networks":[{"cidr":"10.128.0.0/14","host_prefix":23}],"service_networks":[{"cidr":"172.30.0.0/16"}]}'
```
The succeeding payload can as well contain *empty* list of "machine_networks", i.e.
```
'{"api_vip":"192.168.127.52","ingress_vip":"192.168.127.51","ssh_public_key":"ssh-rsa AAAA[...]W+b6wp5c=","vip_dhcp_allocation":false,"network_type":"OVNKubernetes","user_managed_networking":false,"cluster_networks":[{"cidr":"10.128.0.0/14","host_prefix":23}],"service_networks":[{"cidr":"172.30.0.0/16"}],"machine_networks":[]}'
```
For scenario 1 the error message "IP allocation from the DHCP server timed out" is coming from the following validator function
```
func isDhcpLeaseAllocationTimedOut(c *clusterPreprocessContext) bool {
return c.cluster.MachineNetworkCidrUpdatedAt.String() != "" && time.Since(c.cluster.MachineNetworkCidrUpdatedAt) > DhcpLeaseTimeoutMinutes*time.Minute
}
```
Those values are updated via
```
func UpdateMachineCidr(db *gorm.DB, cluster *common.Cluster, machineCidr string) error {
[...]
return db.Model(&common.Cluster{}).Where("id = ?", cluster.ID.String()).Updates(map[string]interface{}{
"machine_network_cidr": machineCidr,
"machine_network_cidr_updated_at": time.Now(),
}).Error
```
++++++++++++++++++
+++ SCENARIO 1 +++
++++++++++++++++++
Those are payloads used to configure respective options
+++ Cluster-Managed Networking, disabling "Allocate virtual IPs via DHCP server", manually providing VIPs
`--data '{"api_vip":"192.168.127.201","ingress_vip":"192.168.127.202","vip_dhcp_allocation":false,"network_type":"OVNKubernetes","user_managed_networking":false,"cluster_networks":[{"cidr":"10.128.0.0/14","host_prefix":23}],"service_networks":[{"cidr":"172.30.0.0/16"}]}'`
+++ Cluster-Managed Networking, enabling "Allocate virtual IPs via DHCP server", machine network #1
`--data '{"vip_dhcp_allocation":true,"network_type":"OVNKubernetes","user_managed_networking":false,"cluster_networks":[{"cidr":"10.128.0.0/14","host_prefix":23}],"service_networks":[{"cidr":"172.30.0.0/16"}],"machine_networks":[{"cidr":"192.168.127.0/24"}]}'`
+++ Cluster-Managed Networking, enabling "Allocate virtual IPs via DHCP server", machine network #2
`--data '{"vip_dhcp_allocation":true,"network_type":"OVNKubernetes","user_managed_networking":false,"cluster_networks":[{"cidr":"10.128.0.0/14","host_prefix":23}],"service_networks":[{"cidr":"172.30.0.0/16"}],"machine_networks":[{"cidr":"192.168.145.0/24"}]}'`
------------------
Now the observations
- disabling auto-allocation just works, no more comments
- enabling auto-allocations causes `GET /api/.../clusters/UUID | jq '.api_vip'` to return NULL for about 15 seconds after sending the request; after this time, the IP is returned consistently
- sending any PATCH request containing "machine_networks" (even if the value does not change) causes GET to once again return NULL for about 15 seconds
From this I believe what happens is the following
- whenever params.ClusterUpdateParams.MachineNetworks is not empty, we are triggering a reallocation of the VIP
- it does not matter whether params.ClusterUpdateParams.MachineNetworks is equal to cluster.MachineNetworks or not
What should happen
- reallocation should be triggered only when params.ClusterUpdateParams.MachineNetworks != cluster.MachineNetworks
Additional observation, not impacting the fundamental flaw of the reallocation logic described above, but something that causes the issue to be visible - when I created a cluster and then using the UI I go to the Networking tab, the browser keeps sending PATCH requests recurrently even though I'm not touching the UI at all. I would have expected PATCH to be sent only when I explicitly click something in the UI that changes the underlying value, but I observe them being sent even when I don't touch the browser. The current state is believed to be - fix for scenario (1) in https://github.com/openshift/assisted-service/pull/2527 - fix for scenario (2) in https://github.com/openshift-assisted/assisted-ui-lib/pull/767 https://github.com/openshift/assisted-service/pull/2512, as not directly affecting the issue, should not be considered as a scope of this BZ. Verified both scenarios on Staging UI 1.5.35 BE v1.0.25.3 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3759 |
Created attachment 1819203 [details] scenario 1 screenshot Description of problem: Impossible to set cluster Networking in any way while Allocate virtual IPs via DHCP server is True or False Failed on both cases Version-Release number of selected component (if applicable): v1.0.25.2 How reproducible: Scenario 1: Allocate virtual IPs via DHCP server - True - Create cluster, download ISO, discover nodes, go to Networking - do nothing with "Allocate virtual IPs via DHCP server" Result- > AI found API Virtual IP and Ingress Virtual IP but after several seconds failed 1. The DHCP server failed to allocate the IP The API virtual IP is undefined; IP allocation from the DHCP server timed out. 2. The DHCP server failed to allocate the IP The Ingress virtual IP is undefined; IP allocation from the DHCP server timed out. 3. Cluster is not ready yet The following requirements must be met: The API virtual IP is undefined; IP allocation from the DHCP server timed out. The Ingress virtual IP is undefined; IP allocation from the DHCP server timed out. Scenario 2: Allocate virtual IPs via DHCP server - False - Create cluster, download ISO, discover nodes, go to Networking - Uncheck "Allocate virtual IPs via DHCP server" - Put correct API Virtual IP * - 192.168.123.5 Ingress Virtual IP * - 192.168.123.10 Result- > Failed to update the cluster Setting Machine network CIDR is forbidden when cluster is not in vip-dhcp-allocation mode Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: