Bug 1871732 - sdn-controller run into CrashLoopBackOff when do some egressip testing
Summary: sdn-controller run into CrashLoopBackOff when do some egressip testing
Keywords:
Status: VERIFIED
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.6
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 4.6.0
Assignee: Daniel Mellado
QA Contact: huirwang
URL:
Whiteboard:
Depends On:
Blocks: 1859451
TreeView+ depends on / blocked
 
Reported: 2020-08-24 07:09 UTC by huirwang
Modified: 2020-09-01 02:40 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Github openshift sdn pull 175 None closed Bug 1871732: Fix nodeInformer call in EgressIPManager. 2020-09-17 11:30:40 UTC

Description huirwang 2020-08-24 07:09:33 UTC
Version-Release number of selected component (if applicable):
4.6.0-0.nightly-2020-08-23-185640 

How reproducible:
Always

Steps to Reproduce:
1. Patch one node with egressCIDR
oc patch hostsubnet compute-0  --type=merge -p  '{"egressCIDRs": ["136.144.52.192/26"]}'
2. Create a ns and a pod in it, patch egressIP
oc patch netnamespace test1 --type=merge -p  '{"egressIPs": ["136.144.52.212"]}'
netnamespace.network.openshift.io/test1 patched
3. Shutdown the egressIP node compute-0

Actual Result:
oc get pods -n openshift-sdn -o wide
NAME                   READY   STATUS             RESTARTS   AGE    IP               NODE              NOMINATED NODE   READINESS GATES
ovs-2k6zz              1/1     Running            0          149m   136.144.52.203   compute-1         <none>           <none>
ovs-4q565              1/1     Running            0          158m   136.144.52.196   control-plane-2   <none>           <none>
ovs-7w2nz              1/1     Running            0          158m   136.144.52.198   control-plane-0   <none>           <none>
ovs-g9wtr              1/1     Running            0          149m   136.144.52.204   compute-0         <none>           <none>
ovs-ps9dt              1/1     Running            0          158m   136.144.52.199   control-plane-1   <none>           <none>
sdn-9c4cz              1/1     Running            0          158m   136.144.52.198   control-plane-0   <none>           <none>
sdn-controller-24znf   0/1     CrashLoopBackOff   8          158m   136.144.52.198   control-plane-0   <none>           <none>
sdn-controller-fdnpj   0/1     CrashLoopBackOff   5          158m   136.144.52.196   control-plane-2   <none>           <none>
sdn-controller-w8w9k   1/1     Running            6          158m   136.144.52.199   control-plane-1   <none>           <none>
sdn-g6nd5              1/1     Running            0          158m   136.144.52.199   control-plane-1   <none>           <none>
sdn-k5skc              1/1     Running            0          149m   136.144.52.204   compute-0         <none>           <none>
sdn-metrics-cjxwk      1/1     Running            0          147m   136.144.52.203   compute-1         <none>           <none>
sdn-metrics-xbrvp      1/1     Running            0          147m   136.144.52.204   compute-0         <none>           <none>
sdn-s2tp5              1/1     Running            0          149m   136.144.52.203   compute-1         <none>           <none>
sdn-sxvjv              1/1     Running            0          158m   136.144.52.196   control-plane-2   <none>           <none>


oc logs sdn-controller-w8w9k  -n openshift-sdn
I0824 06:09:47.994332       1 leaderelection.go:242] attempting to acquire leader lease  openshift-sdn/openshift-network-controller...
E0824 06:09:48.018610       1 event.go:316] Could not construct reference to: '&v1.ConfigMap{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"openshift-network-controller", GenerateName:"", Namespace:"openshift-sdn", SelfLink:"/api/v1/namespaces/openshift-sdn/configmaps/openshift-network-controller", UID:"e964ce50-2c57-41c6-8a61-a19ce7363673", ResourceVersion:"130002", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:63733837208, loc:(*time.Location)(0x27fb000)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string{"control-plane.alpha.kubernetes.io/leader":"{\"holderIdentity\":\"control-plane-1\",\"leaseDurationSeconds\":60,\"acquireTime\":\"2020-08-24T06:08:11Z\",\"renewTime\":\"2020-08-24T06:09:47Z\",\"leaderTransitions\":1}"}, OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ClusterName:"", ManagedFields:[]v1.ManagedFieldsEntry{v1.ManagedFieldsEntry{Manager:"openshift-sdn-controller", Operation:"Update", APIVersion:"v1", Time:(*v1.Time)(0xc000408660), FieldsType:"FieldsV1", FieldsV1:(*v1.FieldsV1)(0xc0004086a0)}}}, Immutable:(*bool)(nil), Data:map[string]string(nil), BinaryData:map[string][]uint8(nil)}' due to: 'no kind is registered for the type v1.ConfigMap in scheme "pkg/runtime/scheme.go:101"'. Will not report event: 'Normal' 'LeaderElection' 'control-plane-1 became leader'
I0824 06:09:48.018721       1 leaderelection.go:252] successfully acquired lease openshift-sdn/openshift-network-controller
I0824 06:09:48.022210       1 master.go:52] Initializing SDN master
I0824 06:09:48.034824       1 network_controller.go:61] Started OpenShift Network Controller
I0824 06:09:48.034909       1 reflector.go:175] Starting reflector *v1.NetNamespace (10m0s) from k8s.io/client-go@v0.18.2/tools/cache/reflector.go:125
I0824 06:09:48.034937       1 reflector.go:175] Starting reflector *v1.Namespace (10m0s) from k8s.io/client-go@v0.18.2/tools/cache/reflector.go:125
I0824 06:09:48.034956       1 reflector.go:175] Starting reflector *v1.HostSubnet (10m0s) from k8s.io/client-go@v0.18.2/tools/cache/reflector.go:125
I0824 06:09:48.035100       1 reflector.go:175] Starting reflector *v1.Node (10m0s) from k8s.io/client-go@v0.18.2/tools/cache/reflector.go:125
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x14f217e]

goroutine 127 [running]:
github.com/openshift/sdn/pkg/network/master.(*egressIPManager).check(0xc000898360, 0x48331c00, 0x27fb000, 0xc0000b87e0, 0x0)
    github.com/openshift/sdn/pkg/network/master/egressip.go:180 +0x10e
github.com/openshift/sdn/pkg/network/master.(*egressIPManager).poll(0xc000898360, 0xc0000b8f60)
    github.com/openshift/sdn/pkg/network/master/egressip.go:155 +0x78
created by github.com/openshift/sdn/pkg/network/master.(*egressIPManager).maybeDoUpdateEgressCIDRs
    github.com/openshift/sdn/pkg/network/master/egressip.go:127 +0x3b8

     
It seems caused by PR https://github.com/openshift/sdn/pull/171/files, from above logs, line 180,155 is the new codes in above PR.


Expected Result:
SDN pods are in running status and egressIP works well


Note You need to log in before you can comment on or make changes to this bug.