Bug 2056948

Summary: post 1.23 rebase: regression in service-load balancer reliability
Product: OpenShift Container Platform Reporter: Victor Pickard <vpickard>
Component: NetworkingAssignee: Riccardo Ravaioli <rravaiol>
Networking sub component: ovn-kubernetes QA Contact: Anurag saxena <anusaxen>
Status: CLOSED ERRATA Docs Contact:
Severity: urgent    
Priority: urgent CC: akashem, anusaxen, aojeagar, aos-bugs, calfonso, cdc, deads, dgoodwin, jerzhang, jluhrsen, mfojtik, mifiedle, pdiak, rravaiol, surya, vpickard, weliang, wking, wlewis, xxia
Version: 4.10   
Target Milestone: ---   
Target Release: 4.10.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: EmergencyConfirmed
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 2040715
: 2083554 2104901 (view as bug list) Environment:
Last Closed: 2022-03-10 16:44:19 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2040715    
Bug Blocks: 2083554    

Comment 6 zhaozhanqi 2022-02-28 07:31:05 UTC
saw 4.10.0-rc.6 has been marked as 'Rejected' which including the fixed PR https://github.com/openshift/kubernetes/pull/1195 

https://amd64.ocp.releases.ci.openshift.org/releasestream/4-stable/release/4.10.0-rc.6

Comment 7 W. Trevor King 2022-03-01 06:41:58 UTC
(In reply to zhaozhanqi from comment #6)
> saw 4.10.0-rc.6 has been marked as 'Rejected' which including the fixed PR
> https://github.com/openshift/kubernetes/pull/1195 
> 
> https://amd64.ocp.releases.ci.openshift.org/releasestream/4-stable/release/4.10.0-rc.6

Rejection is temporary ;).  It flaked out on the 4.9.23 -> rc.6 update job, but a later replacement run of that job passed, and now the release is accepted and it's in candidate-4.10 [1].

[1]: https://github.com/openshift/cincinnati-graph-data/pull/1557

Comment 8 Mike Fiedler 2022-03-01 14:13:39 UTC
Verification testing is complete.   @weliang will sync with engineering to confirm results

Comment 9 Weibin Liang 2022-03-01 15:01:27 UTC
Tested and verified in 4.10.0-rc.6-x86_64


During cluster upgrading, SVC LB traffic disruptions happened three times when three worker nodes got rebooting. Without PR upgrading testing, each disruption time is about 7.1 seconds, with PR upgrading testing, two disruption time is about 1.1 seconds, one disruption time is about 3.0 seconds, I can see some improvements with fix PR in my testing cluster

Comment 11 errata-xmlrpc 2022-03-10 16:44:19 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056