Bug 1954309 - Handler locks not effective
Summary: Handler locks not effective
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.8
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.9.0
Assignee: Ben Nemec
QA Contact: Oleg Sher
Olivia Payne
URL:
Whiteboard:
Depends On: 2013034 2018557
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-04-27 20:43 UTC by Ben Nemec
Modified: 2021-11-12 14:30 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Previously, a bug in the lock implementation for the nmstate-handler pod caused multiple nodes to gain control. This update fixes the lock implementation so that only one node is in control of the lock.
Clone Of:
Environment:
Last Closed: 2021-11-10 21:01:31 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift kubernetes-nmstate pull 191 0 None closed Bug 1954309: Rebase to upstream 0.47.0 2021-07-20 16:27:49 UTC
Red Hat Product Errata RHBA-2021:4119 0 None None None 2021-11-10 21:01:46 UTC

Description Ben Nemec 2021-04-27 20:43:43 UTC
Description of problem: As described in https://github.com/nmstate/kubernetes-nmstate/issues/729, multiple handlers may think they hold the lock.


Version-Release number of selected component (if applicable): 0.44.0


How reproducible: Always


Steps to Reproduce:
1. Deploy operator using CNAO
2. Create nmstate instance
3. Check handler logs

Actual results: Multiple handlers claim to have the lock.


Expected results: Only one handler takes the lock.


Additional info: This fix was released in 0.46.0 upstream. We just need to rebase to that.

Comment 7 Tomas Sedovic 2021-07-20 16:28:32 UTC
The linked PR is merged, moving to ON_QA.

Comment 15 Ben Nemec 2021-08-31 17:50:03 UTC
It turns out I misunderstood the problem here when I opened the bug. The issue is actually that when you have two copies of the operator installed (say from OLM and CNAO or OLM and source) they don't properly coordinate locking. This means verification will require two copies of the operator installed. I've verified that this can be done with an install from OLM and from source, using the script in https://github.com/openshift/kubernetes-nmstate/pull/198 to deploy from source.

Note that this scenario would not normally happen as you would only have one copy of the operator, but it is important to allow migration of existing CNV installations to the standalone operator.

Comment 28 errata-xmlrpc 2021-11-10 21:01:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.9.6 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:4119


Note You need to log in before you can comment on or make changes to this bug.