Bug 1974085 - [Assisted-4.8] [Staging][Network Latency] Worker host IP appear in master validation message
Summary: [Assisted-4.8] [Staging][Network Latency] Worker host IP appear in master val...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: assisted-installer
Version: 4.8
Hardware: Unspecified
OS: Unspecified
high
urgent
Target Milestone: ---
: 4.9.0
Assignee: Jordi Gil
QA Contact: Yuri Obshansky
URL:
Whiteboard: AI-Team-Projects
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-06-20 13:21 UTC by Lital Alon
Modified: 2021-10-18 17:36 UTC (History)
4 users (show)

Fixed In Version: OCP-Metal-v1.0.23.1
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-10-18 17:35:50 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
example (187.72 KB, image/png)
2021-06-20 13:21 UTC, Lital Alon
no flags Details
example 2 (166.94 KB, image/png)
2021-06-20 13:22 UTC, Lital Alon
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github openshift assisted-service pull 2098 0 None open [WIP] OCPBUGSM-31163: fixes network&latency host validations 2021-06-28 08:40:04 UTC
Red Hat Product Errata RHSA-2021:3759 0 None None None 2021-10-18 17:36:09 UTC

Description Lital Alon 2021-06-20 13:21:02 UTC
Created attachment 1792517 [details]
example

Description of problem:
I booted 3 master 3 workers and simulated
no latency in master-0 (192.168.127.10)
network latency in master-1 (192.168.127.11)
packet loss in master-2 (192.168.127.12)

workers with no latency (192.168.127.13-192.168.127.15)

master-1 validation is:
sufficient-network-latency-requirement-for-role: Error while attempting to validate network latency: host with IP 192.168.127.12 not found in inventory.
sufficient-packet-loss-requirement-for-role: Error while attempting to validate packet loss validation: host with IP 192.168.127.14 not found in inventory.

The issue is that 192.168.127.14 is a worker node ip
In addition, worker-0 fails network validations although no latency in worker nodes - validation contain master ip::
sufficient-packet-loss-requirement-for-role: Error while attempting to validate packet loss validation: host with IP 192.168.127.11 not found in inventory.

network latency is calculated per role, therefore i expect no validation failures between workers and masters


Version-Release number of selected component (if applicable):
Staging v1.0.22.1

How reproducible:
100%

Steps to Reproduce:
1. boot 3 masters 3 worker, set roles and api & ingress vip
2. set network latency in master-1, and packet loss in master-2 
sudo tc qdisc add dev ens3 root netem delay 150ms
sudo tc qdisc add dev ens3 root netem loss 10%

3. wait for network latency validation failures 

Actual results:
masters with latency got wrong worker IP in validation message
workers with no latency got validation failures containing masters ip

Expected results:
masters with latency got correct worker IP in validation message
workers with no latency got no validation failures

Additional info:

Comment 1 Lital Alon 2021-06-20 13:22:35 UTC
Created attachment 1792518 [details]
example 2

Comment 2 Jordi Gil 2021-06-21 14:43:15 UTC
The issue comes from this line of code (for packet validation):
https://github.com/openshift/assisted-service/blob/master/internal/host/validator.go#L768
It attempts to retrieve the hostname and role from the inventory based on the IP. The error that is shown in the UI is because the DB does not have the inventory for the host yet. 

To fix this, I will change the logic to record this as a warning in the logs and ignore this IP. Once the host inventory is reported in the DB, the validation will be able to report it.

Comment 3 Jordi Gil 2021-06-21 21:36:49 UTC
Fixed in https://github.com/openshift/assisted-service/pull/2053

Comment 4 Lital Alon 2021-06-23 19:38:32 UTC
Couldn't get validation messages in Integration environment, moving back to NEW for further investigation 
looks like hosts are missing from inventory, we noticed this repeated message in logs:
level=warning msg="unable to determine host's role and hostname for IP: host with IP 192.168.127.10 not found in inventory" func="github.com/openshift/assisted-service/internal/host.(*validator).validateNetworkLatencyForRole" file="/go/src/github.com/openshift/origin/internal/host/validator.go:701" pkg=host-state
time="2021-06-23T17:07:42Z"

Comment 6 Lital Alon 2021-07-18 18:31:13 UTC
Verified on Integration
Covered by test: test_latency_master_and_workers

Comment 7 Lital Alon 2021-07-21 11:14:50 UTC
Verified on Staging OCP-Metal-v1.0.23.1

Comment 10 errata-xmlrpc 2021-10-18 17:35:50 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3759


Note You need to log in before you can comment on or make changes to this bug.