1974085 – [Assisted-4.8] [Staging][Network Latency] Worker host IP appear in master validation message

Bug 1974085 - [Assisted-4.8] [Staging][Network Latency] Worker host IP appear in master validation message

Summary: [Assisted-4.8] [Staging][Network Latency] Worker host IP appear in master val...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	assisted-installer
Sub Component:
Version:	4.8
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	urgent
Target Milestone:	---
Target Release:	4.9.0
Assignee:	Jordi Gil
QA Contact:	Yuri Obshansky
Docs Contact:
URL:
Whiteboard:	AI-Team-Projects
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-06-20 13:21 UTC by Lital Alon
Modified:	2021-10-18 17:36 UTC (History)
CC List:	4 users (show)
Fixed In Version:	OCP-Metal-v1.0.23.1
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-10-18 17:35:50 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
example (187.72 KB, image/png) 2021-06-20 13:21 UTC, Lital Alon	no flags	Details
example 2 (166.94 KB, image/png) 2021-06-20 13:22 UTC, Lital Alon	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift assisted-service pull 2098	0	None	open	[WIP] OCPBUGSM-31163: fixes network&latency host validations	2021-06-28 08:40:04 UTC
Red Hat Product Errata	RHSA-2021:3759	0	None	None	None	2021-10-18 17:36:09 UTC

Description Lital Alon 2021-06-20 13:21:02 UTC

Created attachment 1792517 [details]
example

Description of problem:
I booted 3 master 3 workers and simulated
no latency in master-0 (192.168.127.10)
network latency in master-1 (192.168.127.11)
packet loss in master-2 (192.168.127.12)

workers with no latency (192.168.127.13-192.168.127.15)

master-1 validation is:
sufficient-network-latency-requirement-for-role: Error while attempting to validate network latency: host with IP 192.168.127.12 not found in inventory.
sufficient-packet-loss-requirement-for-role: Error while attempting to validate packet loss validation: host with IP 192.168.127.14 not found in inventory.

The issue is that 192.168.127.14 is a worker node ip
In addition, worker-0 fails network validations although no latency in worker nodes - validation contain master ip::
sufficient-packet-loss-requirement-for-role: Error while attempting to validate packet loss validation: host with IP 192.168.127.11 not found in inventory.

network latency is calculated per role, therefore i expect no validation failures between workers and masters


Version-Release number of selected component (if applicable):
Staging v1.0.22.1

How reproducible:
100%

Steps to Reproduce:
1. boot 3 masters 3 worker, set roles and api & ingress vip
2. set network latency in master-1, and packet loss in master-2 
sudo tc qdisc add dev ens3 root netem delay 150ms
sudo tc qdisc add dev ens3 root netem loss 10%

3. wait for network latency validation failures 

Actual results:
masters with latency got wrong worker IP in validation message
workers with no latency got validation failures containing masters ip

Expected results:
masters with latency got correct worker IP in validation message
workers with no latency got no validation failures

Additional info:

Comment 1 Lital Alon 2021-06-20 13:22:35 UTC

Created attachment 1792518 [details]
example 2

Comment 2 Jordi Gil 2021-06-21 14:43:15 UTC

The issue comes from this line of code (for packet validation):
https://github.com/openshift/assisted-service/blob/master/internal/host/validator.go#L768
It attempts to retrieve the hostname and role from the inventory based on the IP. The error that is shown in the UI is because the DB does not have the inventory for the host yet. 

To fix this, I will change the logic to record this as a warning in the logs and ignore this IP. Once the host inventory is reported in the DB, the validation will be able to report it.

Comment 3 Jordi Gil 2021-06-21 21:36:49 UTC

Fixed in https://github.com/openshift/assisted-service/pull/2053

Comment 4 Lital Alon 2021-06-23 19:38:32 UTC

Couldn't get validation messages in Integration environment, moving back to NEW for further investigation 
looks like hosts are missing from inventory, we noticed this repeated message in logs:
level=warning msg="unable to determine host's role and hostname for IP: host with IP 192.168.127.10 not found in inventory" func="github.com/openshift/assisted-service/internal/host.(*validator).validateNetworkLatencyForRole" file="/go/src/github.com/openshift/origin/internal/host/validator.go:701" pkg=host-state
time="2021-06-23T17:07:42Z"

Comment 6 Lital Alon 2021-07-18 18:31:13 UTC

Verified on Integration
Covered by test: test_latency_master_and_workers

Comment 7 Lital Alon 2021-07-21 11:14:50 UTC

Verified on Staging OCP-Metal-v1.0.23.1

Comment 10 errata-xmlrpc 2021-10-18 17:35:50 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3759

Note You need to log in before you can comment on or make changes to this bug.