Bug 1874647 - log errors if CNO leader election fails
Summary: log errors if CNO leader election fails
Keywords:
Status: VERIFIED
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.6
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 4.6.0
Assignee: Dan Winship
QA Contact: zhaozhanqi
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-09-01 20:34 UTC by Dan Winship
Modified: 2020-09-10 04:01 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Github openshift cluster-network-operator pull 771 None closed Bug 1874647: Initialize controller-runtime logging 2020-09-07 01:14:28 UTC

Description Dan Winship 2020-09-01 20:34:13 UTC
CNO currently does not log anything if leader election fails / goes unexpectedly. In particular, the IPv6 IPI job is currently failing in weird ways and there is no info in the logs explaining why

Comment 4 zhaozhanqi 2020-09-07 10:05:56 UTC
@Dan
Could you give any advice how to make CNO leader election fails to verify this bug?

Comment 5 Dan Winship 2020-09-08 11:13:35 UTC
The CNO pod that becomes leader should now log "Became the leader", while other CNO pods will log "Not the leader. Waiting".

(The particular IPv6 jobs I added this in order to debug are now failing with a different error so this isn't helping there.)

Comment 6 zhaozhanqi 2020-09-09 07:10:02 UTC
Thanks Dan Winship

There should be only one pod for CNO. 

$ oc get pod -n openshift-network-operator
NAME                                READY   STATUS    RESTARTS   AGE
network-operator-7d49f7f9d5-v54zr   1/1     Running   0          89m


>> while other CNO pods will log "Not the leader. Waiting".

what's the mean


I can see the "became the leader" in above pod

 #oc logs network-operator-7d49f7f9d5-v54zr -n openshift-network-operator | grep -i "Became the leader"
2020/09/09 05:38:24 Became the leader.

Comment 7 Dan Winship 2020-09-09 10:55:47 UTC
> There should be only one pod for CNO. 

ah, well, that's something else that was wrong with the other cluster then

> I can see the "became the leader" in above pod
> 
>  #oc logs network-operator-7d49f7f9d5-v54zr -n openshift-network-operator |
> grep -i "Became the leader"
> 2020/09/09 05:38:24 Became the leader.

Then the PR worked

Comment 8 zhaozhanqi 2020-09-10 04:01:28 UTC
Thanks the information @Dan

move this bug to verified.


Note You need to log in before you can comment on or make changes to this bug.