Bug 1954309

Summary: Handler locks not effective
Product: OpenShift Container Platform Reporter: Ben Nemec <bnemec>
Component: NetworkingAssignee: Ben Nemec <bnemec>
Networking sub component: kubernetes-nmstate-operator QA Contact: Oleg Sher <osher>
Status: CLOSED ERRATA Docs Contact: Olivia Payne <opayne>
Severity: medium    
Priority: medium CC: opayne, tsedovic
Version: 4.8Keywords: Triaged
Target Milestone: ---   
Target Release: 4.9.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Previously, a bug in the lock implementation for the nmstate-handler pod caused multiple nodes to gain control. This update fixes the lock implementation so that only one node is in control of the lock.
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-11-10 21:01:31 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2013034, 2018557    
Bug Blocks:    

Description Ben Nemec 2021-04-27 20:43:43 UTC
Description of problem: As described in https://github.com/nmstate/kubernetes-nmstate/issues/729, multiple handlers may think they hold the lock.


Version-Release number of selected component (if applicable): 0.44.0


How reproducible: Always


Steps to Reproduce:
1. Deploy operator using CNAO
2. Create nmstate instance
3. Check handler logs

Actual results: Multiple handlers claim to have the lock.


Expected results: Only one handler takes the lock.


Additional info: This fix was released in 0.46.0 upstream. We just need to rebase to that.

Comment 7 Tomas Sedovic 2021-07-20 16:28:32 UTC
The linked PR is merged, moving to ON_QA.

Comment 15 Ben Nemec 2021-08-31 17:50:03 UTC
It turns out I misunderstood the problem here when I opened the bug. The issue is actually that when you have two copies of the operator installed (say from OLM and CNAO or OLM and source) they don't properly coordinate locking. This means verification will require two copies of the operator installed. I've verified that this can be done with an install from OLM and from source, using the script in https://github.com/openshift/kubernetes-nmstate/pull/198 to deploy from source.

Note that this scenario would not normally happen as you would only have one copy of the operator, but it is important to allow migration of existing CNV installations to the standalone operator.

Comment 28 errata-xmlrpc 2021-11-10 21:01:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.9.6 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:4119