Bug 1871732

Summary: sdn-controller run into CrashLoopBackOff when do some egressip testing
Product: OpenShift Container Platform Reporter: huirwang
Component: NetworkingAssignee: Daniel Mellado <dmellado>
Networking sub component: openshift-sdn QA Contact: huirwang
Status: CLOSED ERRATA Docs Contact:
Severity: urgent    
Priority: urgent CC: ricarril
Version: 4.6Keywords: TestBlocker
Target Milestone: ---   
Target Release: 4.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-10-27 16:31:19 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1859451    

Description huirwang 2020-08-24 07:09:33 UTC
Version-Release number of selected component (if applicable):
4.6.0-0.nightly-2020-08-23-185640 

How reproducible:
Always

Steps to Reproduce:
1. Patch one node with egressCIDR
oc patch hostsubnet compute-0  --type=merge -p  '{"egressCIDRs": ["136.144.52.192/26"]}'
2. Create a ns and a pod in it, patch egressIP
oc patch netnamespace test1 --type=merge -p  '{"egressIPs": ["136.144.52.212"]}'
netnamespace.network.openshift.io/test1 patched
3. Shutdown the egressIP node compute-0

Actual Result:
oc get pods -n openshift-sdn -o wide
NAME                   READY   STATUS             RESTARTS   AGE    IP               NODE              NOMINATED NODE   READINESS GATES
ovs-2k6zz              1/1     Running            0          149m   136.144.52.203   compute-1         <none>           <none>
ovs-4q565              1/1     Running            0          158m   136.144.52.196   control-plane-2   <none>           <none>
ovs-7w2nz              1/1     Running            0          158m   136.144.52.198   control-plane-0   <none>           <none>
ovs-g9wtr              1/1     Running            0          149m   136.144.52.204   compute-0         <none>           <none>
ovs-ps9dt              1/1     Running            0          158m   136.144.52.199   control-plane-1   <none>           <none>
sdn-9c4cz              1/1     Running            0          158m   136.144.52.198   control-plane-0   <none>           <none>
sdn-controller-24znf   0/1     CrashLoopBackOff   8          158m   136.144.52.198   control-plane-0   <none>           <none>
sdn-controller-fdnpj   0/1     CrashLoopBackOff   5          158m   136.144.52.196   control-plane-2   <none>           <none>
sdn-controller-w8w9k   1/1     Running            6          158m   136.144.52.199   control-plane-1   <none>           <none>
sdn-g6nd5              1/1     Running            0          158m   136.144.52.199   control-plane-1   <none>           <none>
sdn-k5skc              1/1     Running            0          149m   136.144.52.204   compute-0         <none>           <none>
sdn-metrics-cjxwk      1/1     Running            0          147m   136.144.52.203   compute-1         <none>           <none>
sdn-metrics-xbrvp      1/1     Running            0          147m   136.144.52.204   compute-0         <none>           <none>
sdn-s2tp5              1/1     Running            0          149m   136.144.52.203   compute-1         <none>           <none>
sdn-sxvjv              1/1     Running            0          158m   136.144.52.196   control-plane-2   <none>           <none>


oc logs sdn-controller-w8w9k  -n openshift-sdn
I0824 06:09:47.994332       1 leaderelection.go:242] attempting to acquire leader lease  openshift-sdn/openshift-network-controller...
E0824 06:09:48.018610       1 event.go:316] Could not construct reference to: '&v1.ConfigMap{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"openshift-network-controller", GenerateName:"", Namespace:"openshift-sdn", SelfLink:"/api/v1/namespaces/openshift-sdn/configmaps/openshift-network-controller", UID:"e964ce50-2c57-41c6-8a61-a19ce7363673", ResourceVersion:"130002", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:63733837208, loc:(*time.Location)(0x27fb000)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string{"control-plane.alpha.kubernetes.io/leader":"{\"holderIdentity\":\"control-plane-1\",\"leaseDurationSeconds\":60,\"acquireTime\":\"2020-08-24T06:08:11Z\",\"renewTime\":\"2020-08-24T06:09:47Z\",\"leaderTransitions\":1}"}, OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ClusterName:"", ManagedFields:[]v1.ManagedFieldsEntry{v1.ManagedFieldsEntry{Manager:"openshift-sdn-controller", Operation:"Update", APIVersion:"v1", Time:(*v1.Time)(0xc000408660), FieldsType:"FieldsV1", FieldsV1:(*v1.FieldsV1)(0xc0004086a0)}}}, Immutable:(*bool)(nil), Data:map[string]string(nil), BinaryData:map[string][]uint8(nil)}' due to: 'no kind is registered for the type v1.ConfigMap in scheme "pkg/runtime/scheme.go:101"'. Will not report event: 'Normal' 'LeaderElection' 'control-plane-1 became leader'
I0824 06:09:48.018721       1 leaderelection.go:252] successfully acquired lease openshift-sdn/openshift-network-controller
I0824 06:09:48.022210       1 master.go:52] Initializing SDN master
I0824 06:09:48.034824       1 network_controller.go:61] Started OpenShift Network Controller
I0824 06:09:48.034909       1 reflector.go:175] Starting reflector *v1.NetNamespace (10m0s) from k8s.io/client-go.2/tools/cache/reflector.go:125
I0824 06:09:48.034937       1 reflector.go:175] Starting reflector *v1.Namespace (10m0s) from k8s.io/client-go.2/tools/cache/reflector.go:125
I0824 06:09:48.034956       1 reflector.go:175] Starting reflector *v1.HostSubnet (10m0s) from k8s.io/client-go.2/tools/cache/reflector.go:125
I0824 06:09:48.035100       1 reflector.go:175] Starting reflector *v1.Node (10m0s) from k8s.io/client-go.2/tools/cache/reflector.go:125
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x14f217e]

goroutine 127 [running]:
github.com/openshift/sdn/pkg/network/master.(*egressIPManager).check(0xc000898360, 0x48331c00, 0x27fb000, 0xc0000b87e0, 0x0)
    github.com/openshift/sdn/pkg/network/master/egressip.go:180 +0x10e
github.com/openshift/sdn/pkg/network/master.(*egressIPManager).poll(0xc000898360, 0xc0000b8f60)
    github.com/openshift/sdn/pkg/network/master/egressip.go:155 +0x78
created by github.com/openshift/sdn/pkg/network/master.(*egressIPManager).maybeDoUpdateEgressCIDRs
    github.com/openshift/sdn/pkg/network/master/egressip.go:127 +0x3b8

     
It seems caused by PR https://github.com/openshift/sdn/pull/171/files, from above logs, line 180,155 is the new codes in above PR.


Expected Result:
SDN pods are in running status and egressIP works well

Comment 7 errata-xmlrpc 2020-10-27 16:31:19 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196