1874647 – log errors if CNO leader election fails

Bug 1874647 - log errors if CNO leader election fails

Summary: log errors if CNO leader election fails

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	4.6
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	4.6.0
Assignee:	Dan Winship
QA Contact:	zhaozhanqi
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-09-01 20:34 UTC by Dan Winship
Modified:	2020-10-27 16:37 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-10-27 16:36:55 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift cluster-network-operator pull 771	0	None	closed	Bug 1874647: Initialize controller-runtime logging	2020-12-07 02:59:39 UTC
Red Hat Product Errata	RHBA-2020:4196	0	None	None	None	2020-10-27 16:37:10 UTC

Description Dan Winship 2020-09-01 20:34:13 UTC

CNO currently does not log anything if leader election fails / goes unexpectedly. In particular, the IPv6 IPI job is currently failing in weird ways and there is no info in the logs explaining why

Comment 4 zhaozhanqi 2020-09-07 10:05:56 UTC

@Dan
Could you give any advice how to make CNO leader election fails to verify this bug?

Comment 5 Dan Winship 2020-09-08 11:13:35 UTC

The CNO pod that becomes leader should now log "Became the leader", while other CNO pods will log "Not the leader. Waiting".

(The particular IPv6 jobs I added this in order to debug are now failing with a different error so this isn't helping there.)

Comment 6 zhaozhanqi 2020-09-09 07:10:02 UTC

Thanks Dan Winship

There should be only one pod for CNO. 

$ oc get pod -n openshift-network-operator
NAME                                READY   STATUS    RESTARTS   AGE
network-operator-7d49f7f9d5-v54zr   1/1     Running   0          89m


>> while other CNO pods will log "Not the leader. Waiting".

what's the mean


I can see the "became the leader" in above pod

 #oc logs network-operator-7d49f7f9d5-v54zr -n openshift-network-operator | grep -i "Became the leader"
2020/09/09 05:38:24 Became the leader.

Comment 7 Dan Winship 2020-09-09 10:55:47 UTC

> There should be only one pod for CNO. 

ah, well, that's something else that was wrong with the other cluster then

> I can see the "became the leader" in above pod
> 
>  #oc logs network-operator-7d49f7f9d5-v54zr -n openshift-network-operator |
> grep -i "Became the leader"
> 2020/09/09 05:38:24 Became the leader.

Then the PR worked

Comment 8 zhaozhanqi 2020-09-10 04:01:28 UTC

Thanks the information @Dan

move this bug to verified.

Comment 10 errata-xmlrpc 2020-10-27 16:36:55 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196

Note You need to log in before you can comment on or make changes to this bug.