Bug 1871732 - sdn-controller run into CrashLoopBackOff when do some egressip testing
Summary: sdn-controller run into CrashLoopBackOff when do some egressip testing
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.6
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 4.6.0
Assignee: Daniel Mellado
QA Contact: huirwang
URL:
Whiteboard:
Depends On:
Blocks: 1859451
TreeView+ depends on / blocked
 
Reported: 2020-08-24 07:09 UTC by huirwang
Modified: 2020-10-27 16:31 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-10-27 16:31:19 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift sdn pull 175 0 None closed Bug 1871732: Fix nodeInformer call in EgressIPManager. 2020-09-17 11:30:40 UTC
Red Hat Product Errata RHBA-2020:4196 0 None None None 2020-10-27 16:31:22 UTC

Description huirwang 2020-08-24 07:09:33 UTC
Version-Release number of selected component (if applicable):
4.6.0-0.nightly-2020-08-23-185640 

How reproducible:
Always

Steps to Reproduce:
1. Patch one node with egressCIDR
oc patch hostsubnet compute-0  --type=merge -p  '{"egressCIDRs": ["136.144.52.192/26"]}'
2. Create a ns and a pod in it, patch egressIP
oc patch netnamespace test1 --type=merge -p  '{"egressIPs": ["136.144.52.212"]}'
netnamespace.network.openshift.io/test1 patched
3. Shutdown the egressIP node compute-0

Actual Result:
oc get pods -n openshift-sdn -o wide
NAME                   READY   STATUS             RESTARTS   AGE    IP               NODE              NOMINATED NODE   READINESS GATES
ovs-2k6zz              1/1     Running            0          149m   136.144.52.203   compute-1         <none>           <none>
ovs-4q565              1/1     Running            0          158m   136.144.52.196   control-plane-2   <none>           <none>
ovs-7w2nz              1/1     Running            0          158m   136.144.52.198   control-plane-0   <none>           <none>
ovs-g9wtr              1/1     Running            0          149m   136.144.52.204   compute-0         <none>           <none>
ovs-ps9dt              1/1     Running            0          158m   136.144.52.199   control-plane-1   <none>           <none>
sdn-9c4cz              1/1     Running            0          158m   136.144.52.198   control-plane-0   <none>           <none>
sdn-controller-24znf   0/1     CrashLoopBackOff   8          158m   136.144.52.198   control-plane-0   <none>           <none>
sdn-controller-fdnpj   0/1     CrashLoopBackOff   5          158m   136.144.52.196   control-plane-2   <none>           <none>
sdn-controller-w8w9k   1/1     Running            6          158m   136.144.52.199   control-plane-1   <none>           <none>
sdn-g6nd5              1/1     Running            0          158m   136.144.52.199   control-plane-1   <none>           <none>
sdn-k5skc              1/1     Running            0          149m   136.144.52.204   compute-0         <none>           <none>
sdn-metrics-cjxwk      1/1     Running            0          147m   136.144.52.203   compute-1         <none>           <none>
sdn-metrics-xbrvp      1/1     Running            0          147m   136.144.52.204   compute-0         <none>           <none>
sdn-s2tp5              1/1     Running            0          149m   136.144.52.203   compute-1         <none>           <none>
sdn-sxvjv              1/1     Running            0          158m   136.144.52.196   control-plane-2   <none>           <none>


oc logs sdn-controller-w8w9k  -n openshift-sdn
I0824 06:09:47.994332       1 leaderelection.go:242] attempting to acquire leader lease  openshift-sdn/openshift-network-controller...
E0824 06:09:48.018610       1 event.go:316] Could not construct reference to: '&v1.ConfigMap{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"openshift-network-controller", GenerateName:"", Namespace:"openshift-sdn", SelfLink:"/api/v1/namespaces/openshift-sdn/configmaps/openshift-network-controller", UID:"e964ce50-2c57-41c6-8a61-a19ce7363673", ResourceVersion:"130002", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:63733837208, loc:(*time.Location)(0x27fb000)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string{"control-plane.alpha.kubernetes.io/leader":"{\"holderIdentity\":\"control-plane-1\",\"leaseDurationSeconds\":60,\"acquireTime\":\"2020-08-24T06:08:11Z\",\"renewTime\":\"2020-08-24T06:09:47Z\",\"leaderTransitions\":1}"}, OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ClusterName:"", ManagedFields:[]v1.ManagedFieldsEntry{v1.ManagedFieldsEntry{Manager:"openshift-sdn-controller", Operation:"Update", APIVersion:"v1", Time:(*v1.Time)(0xc000408660), FieldsType:"FieldsV1", FieldsV1:(*v1.FieldsV1)(0xc0004086a0)}}}, Immutable:(*bool)(nil), Data:map[string]string(nil), BinaryData:map[string][]uint8(nil)}' due to: 'no kind is registered for the type v1.ConfigMap in scheme "pkg/runtime/scheme.go:101"'. Will not report event: 'Normal' 'LeaderElection' 'control-plane-1 became leader'
I0824 06:09:48.018721       1 leaderelection.go:252] successfully acquired lease openshift-sdn/openshift-network-controller
I0824 06:09:48.022210       1 master.go:52] Initializing SDN master
I0824 06:09:48.034824       1 network_controller.go:61] Started OpenShift Network Controller
I0824 06:09:48.034909       1 reflector.go:175] Starting reflector *v1.NetNamespace (10m0s) from k8s.io/client-go.2/tools/cache/reflector.go:125
I0824 06:09:48.034937       1 reflector.go:175] Starting reflector *v1.Namespace (10m0s) from k8s.io/client-go.2/tools/cache/reflector.go:125
I0824 06:09:48.034956       1 reflector.go:175] Starting reflector *v1.HostSubnet (10m0s) from k8s.io/client-go.2/tools/cache/reflector.go:125
I0824 06:09:48.035100       1 reflector.go:175] Starting reflector *v1.Node (10m0s) from k8s.io/client-go.2/tools/cache/reflector.go:125
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x14f217e]

goroutine 127 [running]:
github.com/openshift/sdn/pkg/network/master.(*egressIPManager).check(0xc000898360, 0x48331c00, 0x27fb000, 0xc0000b87e0, 0x0)
    github.com/openshift/sdn/pkg/network/master/egressip.go:180 +0x10e
github.com/openshift/sdn/pkg/network/master.(*egressIPManager).poll(0xc000898360, 0xc0000b8f60)
    github.com/openshift/sdn/pkg/network/master/egressip.go:155 +0x78
created by github.com/openshift/sdn/pkg/network/master.(*egressIPManager).maybeDoUpdateEgressCIDRs
    github.com/openshift/sdn/pkg/network/master/egressip.go:127 +0x3b8

     
It seems caused by PR https://github.com/openshift/sdn/pull/171/files, from above logs, line 180,155 is the new codes in above PR.


Expected Result:
SDN pods are in running status and egressIP works well

Comment 7 errata-xmlrpc 2020-10-27 16:31:19 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196


Note You need to log in before you can comment on or make changes to this bug.