Bug 1793099 - Modifying ExternalIP policy in CNO take about 220 seconds to take effect
Summary: Modifying ExternalIP policy in CNO take about 220 seconds to take effect
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.4
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: ---
Assignee: aos-network-edge-staff
QA Contact: Hongan Li
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-01-20 16:36 UTC by Weibin Liang
Modified: 2024-10-01 16:27 UTC (History)
15 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-08-03 16:27:15 UTC
Target Upstream Version:
Embargoed:
weliang: needinfo-
weliang: needinfo-


Attachments (Terms of Use)

Description Weibin Liang 2020-01-20 16:36:14 UTC
Description of problem:
Modifying ExternalIP policy in CNO from rejectedCIDRs to allowedCIDRs will take about 250 seconds to take effect.

Version-Release number of selected component (if applicable):
4.3.0-0.nightly-2020-01-14-043441

How reproducible:
Always

Steps to Reproduce:
1. First set ExternalIP policy rejectedCIDRs/22.2.2.0/25 in CNO 
2. Then change ExternalIP policy to allowedCIDRs/22.2.2.0/25 in CNO
3. Depoly a svc with externaIP 22.2.2.10


[root@dhcp-41-193 verification-tests]# oc login -u kubeadmin -p uw3Vq-x69vi-2K6vo-r3Quz

[root@dhcp-41-193 verification-tests]# oc get networks.config.openshift.io/cluster -o yaml
apiVersion: config.openshift.io/v1
kind: Network
metadata:
  creationTimestamp: "2020-01-14T13:38:01Z"
  generation: 27
  name: cluster
  resourceVersion: "114200"
  selfLink: /apis/config.openshift.io/v1/networks/cluster
  uid: dfae356f-e3e2-4ffb-a83f-bf09515e0adc
spec:
  clusterNetwork:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  externalIP:
    policy:
      allowedCIDRs:
      - 22.2.2.0/24
      rejectedCIDRs:
      - 22.2.2.0/25
  networkType: OpenShiftSDN
  serviceNetwork:
  - 172.30.0.0/16
status:
  clusterNetwork:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  clusterNetworkMTU: 8951
  networkType: OpenShiftSDN
  serviceNetwork:
  - 172.30.0.0/16
[root@dhcp-41-193 verification-tests]# 
[root@dhcp-41-193 verification-tests]# oc login -u testuser-0 -p VooLn7KehL7I
Login successful.

You have one project on this server: "test"

Using project "test".

[root@dhcp-41-193 verification-tests]# curl -s https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/networking/externalip_service1.json | sed s/10.5.0.1/22.2.2.10/g | oc create -f - 
Error from server (Forbidden): error when creating "STDIN": services "service-unsecure" is forbidden:    #Clean-up required to erase above net-attach-def after testing done spec.externalIPs[0]: Forbidden: externalIP is not allowed
[root@dhcp-41-193 verification-tests]# oc get svc
No resources found.

#### Delete 
[root@dhcp-41-193 verification-tests]# oc login -u kubeadmin -p uw3Vq-x69vi-2K6vo-r3Quz
Login successful.

You have access to 55 projects, the list has been suppressed. You can list all projects with 'oc projects'

Using project "test".
[root@dhcp-41-193 verification-tests]# oc edit networks.config.openshift.io/cluster
network.config.openshift.io/cluster edited
[root@dhcp-41-193 verification-tests]# oc get networks.config.openshift.io/cluster -o yaml
apiVersion: config.openshift.io/v1
kind: Network
metadata:
  creationTimestamp: "2020-01-14T13:38:01Z"
  generation: 28
  name: cluster
  resourceVersion: "119069"
  selfLink: /apis/config.openshift.io/v1/networks/cluster
  uid: dfae356f-e3e2-4ffb-a83f-bf09515e0adc
spec:
  clusterNetwork:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  externalIP:
    policy:
      allowedCIDRs:
      - 22.2.2.0/24
  networkType: OpenShiftSDN
  serviceNetwork:
  - 172.30.0.0/16
status:
  clusterNetwork:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  clusterNetworkMTU: 8951
  networkType: OpenShiftSDN
  serviceNetwork:
  - 172.30.0.0/16
[root@dhcp-41-193 verification-tests]# oc login -u testuser-0 -p VooLn7KehL7I
Login successful.

You have one project on this server: "test"

Using project "test".
[root@dhcp-41-193 verification-tests]# for  i in {1..1000}; do date; curl -s https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/networking/externalip_service1.json | sed s/10.5.0.1/22.2.2.10/g | oc create -f - ;sleep 30; done
Tue Jan 14 14:44:07 EST 2020
Error from server (Forbidden): error when creating "STDIN": services "service-unsecure" is forbidden: spec.externalIPs[0]: Forbidden: externalIP is not allowed
Tue Jan 14 14:44:38 EST 2020
Error from server (Forbidden): error when creating "STDIN": services "service-unsecure" is forbidden: spec.externalIPs[0]: Forbidden: externalIP is not allowed
Tue Jan 14 14:45:09 EST 2020
Error from server (Forbidden): error when creating "STDIN": services "service-unsecure" is forbidden: spec.externalIPs[0]: Forbidden: externalIP is not allowed
Tue Jan 14 14:45:39 EST 2020
Error from server (Forbidden): error when creating "STDIN": services "service-unsecure" is forbidden: spec.externalIPs[0]: Forbidden: externalIP is not allowed
Tue Jan 14 14:46:10 EST 2020
Error from server (Forbidden): error when creating "STDIN": services "service-unsecure" is forbidden: spec.externalIPs[0]: Forbidden: externalIP is not allowed
Tue Jan 14 14:46:41 EST 2020
Error from server (Forbidden): error when creating "STDIN": services "service-unsecure" is forbidden: spec.externalIPs[0]: Forbidden: externalIP is not allowed
Tue Jan 14 14:47:11 EST 2020
Error from server (Forbidden): error when creating "STDIN": services "service-unsecure" is forbidden: spec.externalIPs[0]: Forbidden: externalIP is not allowed
Tue Jan 14 14:47:42 EST 2020
service/service-unsecure created

Actual results:
Take about 220 seconds to let service/service-unsecure created

Expected results:
service/service-unsecure can be created very quick

Additional info:

Comment 1 Ricardo Carrillo Cruz 2020-02-03 13:50:34 UTC
Is this on 4.4 or 4.3?

The bug has been filed against 4.4 but I see from the version field 4.3 nightly?

Anyway, can you please give me an environment to reproduce this?

Thanks

Comment 3 Ricardo Carrillo Cruz 2020-07-14 09:10:00 UTC
Unable to reproduce this on 4.6:

[ricky@localhost openshift-installer]$ for  i in {1..1000}; do date; curl -s https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/networking/externalip_service1.json | sed s/10.5.0.1/22.2.2.10/g | oc create -f - ;sleep 30; done
Tue 14 Jul 2020 11:05:10 AM CEST
service/service-unsecure created

The service gets created almost immediately.

Can you verify this on 4.6?

If it doesn't happen will check with Ben if this is worth pursuing for 4.3, since 220 seconds may not be terrible and it's 3 versions back.

Comment 4 Weibin Liang 2020-07-14 14:53:36 UTC
The issue still happened in 4.6.0-0.nightly-2020-07-14-092216 when QE run the script.

From below error you will find hte new services "service-unsecure" is forbidden for 148 times which last around 220 seconds.

And I wait up to 300 seconds for the steps to pass:                                             # features/step_definitions/meta_steps.rb:33
      [14:29:36] INFO> {"kind":"Service","apiVersion":"v1","metadata":{"name":"service-unsecure","labels":{"name":"service-unsecure"}},"spec":{"ports":[{"name":"http","protocol":"TCP","port":27017,"targetPort":8080}],"externalIPs":["10.0.129.76"],"selector":{"name":"caddy-docker"}}}
      [14:29:36] INFO> Shell Commands: oc create -f - --kubeconfig=/home/weliang/workdir/weliang-weliang/ocp4_testuser-0.kubeconfig
      
      STDERR:
      Error from server (Forbidden): error when creating "STDIN": services "service-unsecure" is forbidden: spec.externalIPs: Forbidden: externalIPs have been disabled
      
      [14:29:36] INFO> Exit Status: 1
      [14:33:03] INFO> last 4 messages repeated 148 times
      [14:33:04] INFO> {"kind":"Service","apiVersion":"v1","metadata":{"name":"service-unsecure","labels":{"name":"service-unsecure"}},"spec":{"ports":[{"name":"http","protocol":"TCP","port":27017,"targetPort":8080}],"externalIPs":["10.0.129.76"],"selector":{"name":"caddy-docker"}}}
      [14:33:04] INFO> Shell Commands: oc create -f - --kubeconfig=/home/weliang/workdir/weliang-weliang/ocp4_testuser-0.kubeconfig
      service/service-unsecure created

In order to reproduce this issue, you need:
1. First set ExternalIP policy rejectedCIDRs/22.2.2.0/25 in CNO
2. Deploy a svc with externaIP 22.2.2.10
3. curl svc get rejected
4. remove above svc
5. Then change ExternalIP policy to allowedCIDRs/22.2.2.0/25 in CNO
6. Deploy a svc with externaIP 22.2.2.10
7. svc can not be deployed until around 220 seconds passed

Comment 5 Juan Luis de Sousa-Valadas 2020-07-14 16:10:08 UTC
The error is creating the service, not making the network plumbing to make the plumbing work. This is forbidden because the kube-apiserver doesn't allow it.

I think 4 minutes is quite reasonable, IMO this is a reasonable limitation rather than a bug... Anyway this is done by the kube-apiserver operator[1] so I'll reassign them to fix it if they consider it's a bug:

1- https://github.com/openshift/cluster-kube-apiserver-operator/blob/ac2f94c3216ee2eede35a7357782e3a1f2617fbd/pkg/operator/configobservation/network/observe_network.go#L122

Comment 6 Juan Luis de Sousa-Valadas 2020-07-14 16:11:10 UTC
> The error is creating the service, not making the network plumbing to make
> the plumbing work.
s/not making the network plumbing to make the plumbing work./not making the network plumbing to make the networking work./

Comment 7 Stefan Schimanski 2020-08-03 09:58:48 UTC
Deploying new kube-apiserver configs takes O(250s) because the operator has to roll out through static pods and pass through 70s+ graceful shutdown procedure per instance. This works at intended.

Comment 10 Shubhag Saxena 2020-10-30 11:34:26 UTC
I am reopening this bug as I had performed few more tests based on this issue and seems like sometimes for project admin user "policy: null" works and sometimes not (see below) after waiting for 300 secs. While this is not the case with cluser:admin user and everything works fine for it.

>> Working :
~~~~~~~~~~~~~~~~~~~~~~~~~~~
$ oc login -u system:admin
Logged into "https://api.sharedocp4upi44.lab.upshift.rdu2.redhat.com:6443" as "system:admin" using existing credentials.
[...]

$ oc patch network cluster --type merge -p '{ "spec": { "externalIP": { "policy": null }}}'
network.config.openshift.io/cluster patched

$ oc edit network.config
apiVersion: config.openshift.io/v1
kind: Network
metadata:
  creationTimestamp: "2020-10-25T06:40:46Z"
  generation: 14
  name: cluster
  resourceVersion: "3300609"
  selfLink: /apis/config.openshift.io/v1/networks/cluster
  uid: d1d36699-5fb8-4f2f-94ac-f301d057246e
spec:
  clusterNetwork:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  externalIP: {}
  networkType: OpenShiftSDN
  serviceNetwork:
  - 172.30.0.0/16
status:
  clusterNetwork:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  clusterNetworkMTU: 1450
  networkType: OpenShiftSDN
  serviceNetwork:
  - 172.30.0.0/16

$ oc login -u newuser -p password
Login successful.
[...]

$ oc get po
NAME                     READY   STATUS      RESTARTS   AGE
httpd-example-1-build    0/1     Completed   0          12m
httpd-example-1-deploy   0/1     Completed   0          11m
httpd-example-1-kdlbz    1/1     Running     0          11m

$ oc get svc
NAME            TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
httpd-example   ClusterIP   172.30.60.171   <none>        8080/TCP   12m

$ oc edit svc  httpd-example    <-------- changed TYPE to LoadBalancer
service/httpd-example edited

$ oc get svc
NAME            TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE   
httpd-example   LoadBalancer   172.30.60.171   <pending>     8080:30472/TCP   13m

$ oc whoami
newuser

$ oc patch svc httpd-example -p '{"spec":{"externalIPs":["192.174.120.11"]}}'     // after waiting for 5 mins
service/httpd-example patched

$ oc get svc
NAME            TYPE           CLUSTER-IP      EXTERNAL-IP      PORT(S)          AGE
httpd-example   LoadBalancer   172.30.60.171   192.174.120.11   8080:30472/TCP   16m

$ oc login -u system:admin
Logged into "https://api.sharedocp4upi44.lab.upshift.rdu2.redhat.com:6443" as "system:admin" using existing credentials.

You have access to 70 projects, the list has been suppressed. You can list all projects with 'oc projects'

Using project "test".
$ oc get networks.config cluster -o go-template='{{.spec.externalIP}}{{"\n"}}'
map[]
~~~~~~~~~~~~~~~~~~~~~~~~~

>> Not Working : 

Here I directly assigned IP to svc and setting "policy: null" and tried a lot with project admin user but it didn't worked even I after waiting for soo long. 
~~~~~~~~~~~~~~~~~~~~~
$ oc new-project newwww

$ oc new-app httpd-example
--> Deploying template "openshift/httpd-example" to project newwww
[...]

$ oc get svc
NAME            TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE
httpd-example   LoadBalancer   172.30.237.62   <pending>     8080:30328/TCP   4m53s

$ $ oc patch svc httpd-example -p '{"spec":{"externalIPs":["192.174.120.15"]}}'
Error from server (Forbidden): services "httpd-example" is forbidden: spec.externalIPs: Forbidden: externalIPs have been disabled
$ oc patch svc httpd-example -p '{"spec":{"externalIPs":["192.174.120.15"]}}'
Error from server (Forbidden): services "httpd-example" is forbidden: spec.externalIPs: Forbidden: externalIPs have been disabled
~~~~~~~~~~~~~~~~~~~~~

So the only concern is the abnormal behaviour while setting extIP for svc manually having "policy: null" for project admin user as sometimes svc assigned ip and sometimes not or takes too long.

Comment 19 Weibin Liang 2020-11-11 18:26:36 UTC
Hi Shubhag,

For my testing, setting policy=null from default policy={} not take effect at all.
My testing not passed even I changed the TYPE to LoadBalancer for existing svc

A new bug was submitted for the failure in my testing:
Bug 1896880 - [ExternalIP] Setting policy=null from defualt policy={} not take effect

Comment 21 Shubhag Saxena 2020-11-11 18:58:40 UTC
Hi Weibin,

Thanks for the information, I will keep the track of this bugs status and will update the cu too.

Comment 22 Stefan Schimanski 2020-11-16 14:38:54 UTC
The load balancer service resource is owned by the cloud-provider LB implementation. Move to routing team. The kube-apiserver is not filling the external IP field.

Comment 29 Shubhag Saxena 2021-03-17 08:27:23 UTC
Hi team, when we are going to fix this bug ? May I know the current progress status of this bug ?

Comment 30 Stephen Greene 2021-04-22 19:34:26 UTC
Sounds like we are waiting on https://bugzilla.redhat.com/show_bug.cgi?id=1896880 to be verified to see if 1896880 also resolves this bug.

Comment 31 Miciah Dashiel Butler Masters 2021-06-11 23:01:02 UTC
We'll raise this for discussion on the next network architecture call.

Comment 33 Weibin Liang 2021-06-21 15:51:09 UTC
Let me clarify three externalIP bugs in more details:

This Bug 1793099 - Modifying ExternalIP policy in CNO take about 220 seconds to take effect which is closed one time and reopened due to customers encounter the same issue.



During the discussion for bug-1793099, QE opened two more externalIP bugs:

1. Bug 1907505 - [ExternalIP] Only a user with cluster-admin privileges can create a policy object  which is doc bug and is CLOSED CURRENTRELEASE

2. Bug 1896880 - [ExternalIP] Setting policy=null from default policy={} not take effect, DEV think the feature is worked as design, then we need update the doc to correct the statement.

Comment 35 Casey Callendrello 2021-07-07 11:44:55 UTC
This bug is about a 220 rollout time. Unfortunately, there's nothing we can do about this, since changing ExternalIPPolicy means restarting all of the apiervers. I know this is kind of annoying, but these fields aren't changed very often by end-users.

Either we should close this, or add a documentation line that it may take up to 5 minutes for the changes to take effect.

Comment 36 Miciah Dashiel Butler Masters 2021-08-03 16:27:15 UTC
Closing based on comment 35.


Note You need to log in before you can comment on or make changes to this bug.