Bug 1817657

Summary: ovnkube-master pods are in a continuous CrashLoopBackoff
Product: OpenShift Container Platform Reporter: Jean-Francois Saucier <jsaucier>
Component: NetworkingAssignee: Tim Rozet <trozet>
Networking sub component: ovn-kubernetes QA Contact: Weibin Liang <weliang>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: anbhat, bbennett, dcbw, gwest, hongkliu, rkhan, trozet, zzhao
Version: 4.3.z   
Target Milestone: ---   
Target Release: 4.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1818860 1818862 (view as bug list) Environment:
Last Closed: 2020-07-13 17:23:46 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1718372    
Bug Blocks: 1818860, 1818862, 1937118    
Attachments:
Description Flags
CrashLoopBackOff logs none

Description Jean-Francois Saucier 2020-03-26 17:59:36 UTC
Description of problem:

The ovnkube-master-* pods does not come up and is continuously in a CrashLoopBackoff.

 
Version-Release number of selected component (if applicable):

OCP 4.3.5

Comment 4 Aniket Bhat 2020-03-27 15:01:44 UTC
Also, oc logs ovnkube-master-4hw75 -c ovnkube-master | less would be helpful.

Comment 6 Tim Rozet 2020-03-27 17:04:58 UTC
After looking closer the issue is the same as:
https://github.com/ovn-org/ovn-kubernetes/issues/1124

Creating SCTP service causes crash in ovn-k8s. I have a fix for this with the SCTP support patch:

https://github.com/ovn-org/ovn-kubernetes/pull/1137

Comment 7 Dan Williams 2020-03-27 18:31:50 UTC
Tim, is it worth addressing the root cause of the crash immediately (even if SCTP services don't work quite yet) while we wait for the OVN LB fixes?

Comment 8 Tim Rozet 2020-03-27 18:34:57 UTC
I don't think so as it will only happen if you create an a non TCP/UDP protocol (aka right now only SCTP) service.

Comment 9 Ben Bennett 2020-03-30 14:23:13 UTC
Setting the target to 4.5.  Will clone for the earlier releases.

Comment 10 Ben Bennett 2020-03-30 14:39:06 UTC
*** Bug 1818182 has been marked as a duplicate of this bug. ***

Comment 11 Weibin Liang 2020-03-30 18:25:02 UTC
Reproduced this bug in v4.3.5.

[weliang@weliang FILE]$ oc create -f https://raw.githubusercontent.com/weliang1/Openshift_Networking/master/Features/SCTP/sctp-svc-pod.yaml
service/sctp-service created
pod/sctp-server-pod9d5wh created
pod/sctp-client-pod4kk86 created
[weliang@weliang FILE]$ oc get pod  -n openshift-ovn-kubernetes
NAME                   READY   STATUS             RESTARTS   AGE
ovnkube-master-2jl75   3/4     CrashLoopBackOff   2          4h1m
ovnkube-master-5gjkt   4/4     Running            0          4h
ovnkube-master-mpf8l   4/4     Running            0          4h
ovnkube-node-2wxlx     3/3     Running            3          4h2m
ovnkube-node-4zg9b     3/3     Running            0          3h59m
ovnkube-node-8jz9k     3/3     Running            0          4h
ovnkube-node-gk2fc     3/3     Running            3          4h2m
ovnkube-node-q6rtc     3/3     Running            3          4h2m
ovnkube-node-s86k5     3/3     Running            0          4h1m

The whole CrashLoopBackOff log is attached.

Comment 12 Weibin Liang 2020-03-30 18:26:10 UTC
Created attachment 1674781 [details]
CrashLoopBackOff logs

Comment 13 Tim Rozet 2020-04-06 14:50:27 UTC
Resolved by https://github.com/ovn-org/ovn-kubernetes/pull/1137

Comment 14 Tim Rozet 2020-04-14 13:57:10 UTC
Included in https://github.com/openshift/ovn-kubernetes/pull/134

Comment 17 zhaozhanqi 2020-04-15 06:50:02 UTC
verified this bug on 4.5.0-0.nightly-2020-04-14-221451

the ovn-master did not become crash when created one sctp service.

Comment 18 Glenn West 2020-04-21 21:13:56 UTC
Verified this on 4.3.13, and ovn still crashes.

Comment 19 zhaozhanqi 2020-04-22 03:17:54 UTC
Hi, Glenn
this bug is only trace the 4.5 version, you can see the 'Target Realse' is 4.5.

For 4.3 version, there is another https://bugzilla.redhat.com/show_bug.cgi?id=1818862#c1 , but closed with no need in 4.3 from the comment 1. 
could you discuss this with Tim Rozet or add comment in 4.3 bug. if you think this still need to be supported in 4.3. please reopen it.

thanks

Comment 21 errata-xmlrpc 2020-07-13 17:23:46 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409