Bug 1957600

Summary: SriovNetworkNodeState : Delay in getting the correct status of syncStatus field
Product: OpenShift Container Platform Reporter: Geetika Kapoor <gkapoor>
Component: NetworkingAssignee: zenghui.shi <zshi>
Networking sub component: SR-IOV QA Contact: zhaozhanqi <zzhao>
Status: CLOSED NOTABUG Docs Contact:
Severity: unspecified    
Priority: unspecified    
Version: 4.5   
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-06-01 03:35:54 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Geetika Kapoor 2021-05-06 06:32:26 UTC
Description of problem:

We observe some 2-3 minutes of delay in getting exact status of syncStatus field.

$ oc get pods -A -o wide  | grep sriov| grep daemon
openshift-sriov-network-operator                   sriov-network-config-daemon-6j2h2                                  1/1     Running       0          29h     ip_host1    host1          <none>           <none>
openshift-sriov-network-operator                   sriov-network-config-daemon-bjrfh                                  1/1     Running       0          29h     ip_host2    host2          <none>           <none>


$ oc get SriovNetworkNodeState host2 -n openshift-sriov-network-operator -o yaml | grep -i syncStatus
        f:syncStatus: {}
  syncStatus: InProgress

  
$ oc logs sriov-network-config-daemon-6j2h2 -n openshift-sriov-network-operator --tail=20
I0428 06:38:04.410174 3917451 daemon.go:312] Run(): period refresh
I0428 06:38:04.414306 3917451 daemon.go:208] Could not retrieve Nic Id ConfigMap: configmaps "unsupported-nic-ids" not found
I0428 06:38:04.414325 3917451 daemon.go:826] tryCreateUdevRule()
I0428 06:38:05.049029 3917451 daemon.go:339] get item: 43
I0428 06:38:05.049059 3917451 daemon.go:431] nodeStateSyncHandler(): new generation is 43
I0428 06:38:05.053746 3917451 daemon.go:441] nodeStateSyncHandler(): Interface not changed
I0428 06:38:05.053762 3917451 daemon.go:374] Successfully synced
I0428 06:38:05.053768 3917451 daemon.go:337] worker queue size: 0
I0428 06:38:11.983495 3917451 daemon.go:402] nodeUpdateHandler(): node host2 is draining
I0428 06:38:11.983580 3917451 daemon.go:402] nodeUpdateHandler(): node host2 is draining
I0428 06:38:11.983624 3917451 daemon.go:402] nodeUpdateHandler(): node host2 is draining
I0428 06:38:11.983668 3917451 daemon.go:402] nodeUpdateHandler(): node host2 is draining
I0428 06:38:11.983716 3917451 daemon.go:402] nodeUpdateHandler(): node host2 is draining
I0428 06:38:17.133638 3917451 daemon.go:402] nodeUpdateHandler(): node host2 is draining
I0428 06:38:17.279940 3917451 daemon.go:402] nodeUpdateHandler(): node host2 is draining
I0428 06:38:20.049285 3917451 daemon.go:339] get item: 43
I0428 06:38:20.049317 3917451 daemon.go:431] nodeStateSyncHandler(): new generation is 43
I0428 06:38:20.055015 3917451 daemon.go:441] nodeStateSyncHandler(): Interface not changed
I0428 06:38:20.055033 3917451 daemon.go:374] Successfully synced
I0428 06:38:20.055040 3917451 daemon.go:337] worker queue size: 0

$ oc get SriovNetworkNodeState host2 -n openshift-sriov-network-operator -o yaml | grep -i syncStatus
        f:syncStatus: {}
  syncStatus: InProgress


Version-Release number of selected component (if applicable):

$ oc version 
Client Version: 4.8.0-202103080232.p0-f749845
Server Version: 4.8.0-0.nightly-2021-03-15-144314
Kubernetes Version: v1.20.0+e1bc274

How reproducible:

always 

Steps to Reproduce:
1. Configure a SriovNetworkNodePolicy. We achieve all this with our automation framework
2.
3.

Actual results:
 Delay is too much . 2-3 minutes of delay is quite high

Expected results:
it needs to be faster

Additional info:

Comment 1 zenghui.shi 2021-05-18 14:01:07 UTC
> We observe some 2-3 minutes of delay in getting exact status of syncStatus
> field.

Looking at the code, "daemon.go:374] Successfully synced" is printed after
syncStatus becomes Succeeded. the delay should not happen.


> 
> $ oc get pods -A -o wide  | grep sriov| grep daemon
> openshift-sriov-network-operator                  
> sriov-network-config-daemon-6j2h2                                  1/1    
> Running       0          29h     ip_host1    host1          <none>          
> <none>
> openshift-sriov-network-operator                  
> sriov-network-config-daemon-bjrfh                                  1/1    
> Running       0          29h     ip_host2    host2          <none>          
> <none>
> 
> 
> $ oc get SriovNetworkNodeState host2 -n openshift-sriov-network-operator -o
> yaml | grep -i syncStatus
>         f:syncStatus: {}
>   syncStatus: InProgress
> 

This cmd is checking host2 sriovnetworknodestate.

>   
> $ oc logs sriov-network-config-daemon-6j2h2 -n
> openshift-sriov-network-operator --tail=20
> I0428 06:38:04.410174 3917451 daemon.go:312] Run(): period refresh
> I0428 06:38:04.414306 3917451 daemon.go:208] Could not retrieve Nic Id
> ConfigMap: configmaps "unsupported-nic-ids" not found
> I0428 06:38:04.414325 3917451 daemon.go:826] tryCreateUdevRule()
> I0428 06:38:05.049029 3917451 daemon.go:339] get item: 43
> I0428 06:38:05.049059 3917451 daemon.go:431] nodeStateSyncHandler(): new
> generation is 43
> I0428 06:38:05.053746 3917451 daemon.go:441] nodeStateSyncHandler():
> Interface not changed
> I0428 06:38:05.053762 3917451 daemon.go:374] Successfully synced
> I0428 06:38:05.053768 3917451 daemon.go:337] worker queue size: 0
> I0428 06:38:11.983495 3917451 daemon.go:402] nodeUpdateHandler(): node host2
> is draining
> I0428 06:38:11.983580 3917451 daemon.go:402] nodeUpdateHandler(): node host2
> is draining
> I0428 06:38:11.983624 3917451 daemon.go:402] nodeUpdateHandler(): node host2
> is draining
> I0428 06:38:11.983668 3917451 daemon.go:402] nodeUpdateHandler(): node host2
> is draining
> I0428 06:38:11.983716 3917451 daemon.go:402] nodeUpdateHandler(): node host2
> is draining
> I0428 06:38:17.133638 3917451 daemon.go:402] nodeUpdateHandler(): node host2
> is draining
> I0428 06:38:17.279940 3917451 daemon.go:402] nodeUpdateHandler(): node host2
> is draining
> I0428 06:38:20.049285 3917451 daemon.go:339] get item: 43
> I0428 06:38:20.049317 3917451 daemon.go:431] nodeStateSyncHandler(): new
> generation is 43
> I0428 06:38:20.055015 3917451 daemon.go:441] nodeStateSyncHandler():
> Interface not changed
> I0428 06:38:20.055033 3917451 daemon.go:374] Successfully synced
> I0428 06:38:20.055040 3917451 daemon.go:337] worker queue size: 0

This log is from sriov-config-daemon on host1. could you check the log from host2?

> 
> $ oc get SriovNetworkNodeState host2 -n openshift-sriov-network-operator -o
> yaml | grep -i syncStatus
>         f:syncStatus: {}
>   syncStatus: InProgress
> 

This cmd is checking host2 again.

Comment 2 zenghui.shi 2021-06-01 03:35:54 UTC
According to comment #1, the delay was caused by checking on different nodes rather than the same node, closing as notabug.