Bug 1702194

Summary: [sdn-254]Application pod should not be killed after ovs restart
Product: OpenShift Container Platform Reporter: zhaozhanqi <zzhao>
Component: NetworkingAssignee: Dan Winship <danw>
Status: CLOSED ERRATA QA Contact: Meng Bo <bmeng>
Severity: high Docs Contact:
Priority: high    
Version: 4.1.0CC: aos-bugs, bbennett
Target Milestone: ---   
Target Release: 4.1.0   
Hardware: All   
OS: All   
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-06-04 10:47:50 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Description Flags
crio_kubelet_loc_logs_previous_sdn logs
oc logs for ovs and sdn pod none

Description zhaozhanqi 2019-04-23 08:05:45 UTC
Description of problem:
When restart the ovs pod, the application pod also be reassigned ip and ovs ports also be recreated.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. New project and create the test pod
  oc create -f https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/networking/list_for_pods.json
2. check the node of test pod located
3. Delete the ovs pod of node where test pod located
4. Check the test pod again after the ovs pod is recreated
5. Check the logs of sdn pod

Actual results:
 the test pod ip is reassigned after the ovs pod is recreated
 step 5 logs:

  $ oc logs sdn-7xt8g -n openshift-sdn | grep node.go
I0423 06:27:00.453236   96643 node.go:148] Initializing SDN node of type "redhat/openshift-ovs-networkpolicy" with configured hostname "ip-172-31-143-62.eu-central-1.compute.internal" (IP ""), iptables sync period "30s"
I0423 06:27:00.464969   96643 node.go:267] Starting openshift-sdn network plugin
I0423 06:27:01.124169   96643 node.go:326] Starting openshift-sdn pod manager
I0423 06:27:01.128289   96643 node.go:334] OVS bridge has been recreated. Will reattach 12 existing pods...
I0423 06:27:01.137441   96643 node.go:389] Interface vethba581459 for pod 'openshift-ingress/router-default-bf9fcfb87-9zdfp' no longer exists
I0423 06:27:01.145792   96643 node.go:389] Interface veth5df0805e for pod 'openshift-monitoring/prometheus-k8s-0' no longer exists
I0423 06:27:01.152091   96643 node.go:389] Interface vethe86f2bf7 for pod 'openshift-monitoring/prometheus-adapter-54fd64868b-59xx2' no longer exists
I0423 06:27:01.160647   96643 node.go:389] Interface vetha5a4b482 for pod 'openshift-marketplace/redhat-operators-5cb954dc98-q2z27' no longer exists
I0423 06:27:01.165979   96643 node.go:389] Interface veth6150ab23 for pod 'openshift-marketplace/community-operators-544d88b5c4-qcr62' no longer exists
I0423 06:27:01.170942   96643 node.go:389] Interface veth3ca85a6d for pod 'openshift-monitoring/alertmanager-main-2' no longer exists
I0423 06:27:01.175796   96643 node.go:389] Interface veth037815a3 for pod 'openshift-dns/dns-default-78rsw' no longer exists
I0423 06:27:01.180541   96643 node.go:389] Interface veth490e9e59 for pod 'openshift-ssh-bastion/ssh-bastion-f4d5bbcbd-799lp' no longer exists
I0423 06:27:01.185259   96643 node.go:389] Interface vethfbf11a61 for pod 'openshift-image-registry/node-ca-k6qhv' no longer exists
I0423 06:27:01.190231   96643 node.go:389] Interface veth13354f47 for pod 'z1/test-rc-qfwjv' no longer exists
I0423 06:27:01.195061   96643 node.go:389] Interface veth69098196 for pod 'openshift-image-registry/image-registry-57b5d46fb-tjs66' no longer exists
I0423 06:27:01.200207   96643 node.go:389] Interface vethf811a976 for pod 'openshift-monitoring/alertmanager-main-0' no longer exists
I0423 06:27:01.200237   96643 node.go:350] openshift-sdn network plugin registering startup
I0423 06:27:01.200389   96643 node.go:368] openshift-sdn network plugin ready

Expected results:

test pod should be killed after the ovs pod is restarted.

Additional info:

Comment 1 Casey Callendrello 2019-04-23 08:57:29 UTC
Dan (Winship), can you take a look?

Comment 2 zhaozhanqi 2019-04-23 09:10:30 UTC
sorry for typo for the Expected results: 

test pod should NOT be killed after the ovs pod is restarted.

Comment 3 Dan Winship 2019-04-23 12:08:21 UTC
Can you attach the kubelet and crio logs from that node?

Also, the "oc logs" and "oc logs --previous" for openvswitch on that node.

Actually, also the "oc logs --previous" for sdn on that node

Comment 4 zhaozhanqi 2019-04-24 02:47:42 UTC
Created attachment 1557938 [details]
crio_kubelet_loc_logs_previous_sdn logs

Comment 5 zhaozhanqi 2019-04-24 03:01:34 UTC
Created attachment 1557950 [details]
oc logs for ovs and sdn pod

Comment 6 zhaozhanqi 2019-04-24 03:04:58 UTC
hi, Dan Winship

I added the following logs in the attachment:
oc logs --previous sdn-xxx
oc logs sdn-xxx
oc logs ovs-xxx
journalctl -u kubelet
journalctl -u crio

please let me know if you need more.

Comment 8 zhaozhanqi 2019-04-25 06:05:17 UTC
Verified this bug on 4.1.0-0.nightly-2019-04-25-002910

Comment 10 errata-xmlrpc 2019-06-04 10:47:50 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.