1742207 – enableUnidling flag under CNO doesn't work as expected on running cluster

Bug 1742207 - enableUnidling flag under CNO doesn't work as expected on running cluster

Summary: enableUnidling flag under CNO doesn't work as expected on running cluster

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	4.2.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	4.2.0
Assignee:	Casey Callendrello
QA Contact:	Anurag saxena
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-08-16 16:29 UTC by Anurag saxena
Modified:	2019-10-16 06:36 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-10-16 06:36:19 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
sdn pod logs (95.20 KB, text/plain) 2019-08-20 17:53 UTC, Anurag saxena	no flags	Details
Please refer to this one. Ignore earlier one. (97.98 KB, text/plain) 2019-08-20 18:06 UTC, Anurag saxena	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2019:2922	0	None	None	None	2019-10-16 06:36:30 UTC

Description Anurag saxena 2019-08-16 16:29:12 UTC

Description of problem: enableUnidling takes a boolean value true or false. When false, it should not allow services to get un-idled. Found out that services are getting un-idled even post setting up this flag on running cluster. Check additional info for CNO config on running cluster


Version-Release number of selected component (if applicable): 4.2.0-0.nightly-2019-08-15-142751

How reproducible: Always


Steps to Reproduce:

$ oc project test
 
$ oc get pods
NAME            READY   STATUS    RESTARTS   AGE
hello-pod       1/1     Running   0          21m
test-rc-bqx6c   1/1     Running   0          66s
test-rc-ns5hr   1/1     Running   0          66s
 
$ oc get svc
NAME           TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)     AGE
test-service   ClusterIP   172.30.124.10   <none>        27017/TCP   23h
 
$ oc idle test-service
WARNING: idling when network policies are in place may cause connections to bypass network policy entirely
The service "test/test-service" has been marked as idled
The service will unidle ReplicationController "test/test-rc" to 2 replicas once it receives traffic
ReplicationController "test/test-rc" has been idled
 
$ oc get pods
NAME            READY   STATUS        RESTARTS   AGE
hello-pod       1/1     Running       0          32m
test-rc-4hvbc   0/1     Terminating   0          93s  <<<<<<<<<< Idling svc terminated these pods
test-rc-hshf5   0/1     Terminating   0          93s  <<<<<<<<<< Idling svc terminated these pods
 
 
$ oc rsh hello-pod
/ # curl 172.30.124.10:27017
Hello OpenShift!
/ # exit

$ oc get pods
NAME            READY   STATUS        RESTARTS   AGE
hello-pod       1/1     Running       0          32m
test-rc-4hvb6   1/1     Running       0          13s 
test-rc-hshf7   1/1     Running       0          13s  


Actual results:enableUnidling: false allows svc to unidle

Expected results: enableUnidling: false should not allow svc to unidle

Additional info:

CNO config
 
$ oc get networks.operator.openshift.io -oyaml
apiVersion: v1
items:
- apiVersion: operator.openshift.io/v1
  kind: Network
  metadata:
    creationTimestamp: "2019-08-14T20:17:48Z"
    generation: 8
    name: cluster
    resourceVersion: "380539"
    selfLink: /apis/operator.openshift.io/v1/networks/cluster
    uid: 95795691-bed0-11e9-97e5-02a7493b454c
  spec:
    clusterNetwork:
    - cidr: 10.128.0.0/14
      hostPrefix: 23
    defaultNetwork:
      type: OpenShiftSDN
    openshiftSDNConfig:
      enableUnidling: false  <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< false
    serviceNetwork:
    - 172.30.0.0/16
  status: {}
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

Comment 1 Casey Callendrello 2019-08-19 09:12:14 UTC

Hi there,
Now that https://github.com/openshift/cluster-network-operator/pull/292 has merged, can you try again?

Comment 2 Anurag saxena 2019-08-19 20:51:46 UTC

(In reply to Casey Callendrello from comment #1)
> Hi there,
> Now that https://github.com/openshift/cluster-network-operator/pull/292 has
> merged, can you try again?

Hi Casey, i still see the issue on latest nightly build 4.2.0-0.nightly-2019-08-19-113631

Comment 3 Anurag saxena 2019-08-19 22:33:55 UTC

Also seems like services are inaccessible when the networkype is set to OVNKubernetes. I will explore more into this and file an appropriate bug

Comment 4 Casey Callendrello 2019-08-20 08:00:26 UTC

Anurag,
Can you post some logs from the sdn pod? I really can't reproduce this.

Comment 5 Anurag saxena 2019-08-20 17:52:42 UTC

(In reply to Casey Callendrello from comment #4)
> Anurag,
> Can you post some logs from the sdn pod? I really can't reproduce this.

Sure ,Casey. Please find sdn_pod_logs.txt attached. Also i can see "unidlingProxy.syncProxyRules complete" in those logs

Thanks

Comment 6 Anurag saxena 2019-08-20 17:53:06 UTC

Created attachment 1606221 [details]
sdn pod logs

Comment 7 Anurag saxena 2019-08-20 18:06:06 UTC

Created attachment 1606222 [details]
Please refer to this one. Ignore earlier one.

Comment 8 Casey Callendrello 2019-08-26 17:52:13 UTC

Anurag,
Can you post the output of the network operator config?

Comment 9 Anurag saxena 2019-08-26 18:50:31 UTC

(In reply to Casey Callendrello from comment #8)
> Anurag,
> Can you post the output of the network operator config?

Casey, Its present in the "addtional info" in bug's description. Let me know if you are looking for something else. Thanks

Comment 10 Casey Callendrello 2019-08-28 13:12:14 UTC

Determined that this was a minor misconfiguration. No code changes needed (Though I did file a small PR to fix a little wart).

Back to MODIFIED.

Comment 11 Anurag saxena 2019-08-28 13:26:10 UTC

(In reply to Casey Callendrello from comment #10)
> Determined that this was a minor misconfiguration. No code changes needed
> (Though I did file a small PR to fix a little wart).
> 
> Back to MODIFIED.

Thanks Casey, Can you post the PR link for info?

Comment 13 Casey Callendrello 2019-08-28 13:59:05 UTC

https://github.com/openshift/cluster-network-operator/pull/302 is the fix that removes "mode" from required.

Comment 14 Anurag saxena 2019-08-29 16:42:16 UTC

This works fine now on 4.2.0-0.nightly-2019-08-29-062233 and "mode:" is not required now to be present in config

CNO config excerpt -

spec:
  clusterNetwork:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  defaultNetwork:
    openshiftSDNConfig:
      enableUnidling: false
    type: OpenShiftSDN
  serviceNetwork:
  - x.x.0.0/16

verifying it based on above tests. Thanks

Comment 15 errata-xmlrpc 2019-10-16 06:36:19 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922

Note You need to log in before you can comment on or make changes to this bug.