Bug 1596300 - Increase lbaas_activation_timeout to avoid kuryr-controller pod crash
Summary: Increase lbaas_activation_timeout to avoid kuryr-controller pod crash
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: unspecified
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 3.10.z
Assignee: Luis Tomas Bolivar
QA Contact: Jon Uriarte
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-06-28 14:42 UTC by Luis Tomas Bolivar
Modified: 2019-01-30 15:13 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-01-30 15:13:18 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift openshift-ansible pull 9015 0 None closed Increase lbaas_activation_timeout for kuryr-controller 2020-10-08 22:32:05 UTC
Github openshift openshift-ansible pull 9133 0 None closed [release-3.10] Increase lbaas_activation_timeout for kuryr-controller 2020-10-08 22:32:14 UTC
OpenStack gerrit 579559 0 None MERGED Fix fields translation on filtering 2020-10-08 22:32:05 UTC
OpenStack gerrit 579846 0 None MERGED Fix fields translation on filtering 2020-10-08 22:32:05 UTC
Red Hat Product Errata RHBA-2019:0206 0 None None None 2019-01-30 15:13:25 UTC

Description Luis Tomas Bolivar 2018-06-28 14:42:46 UTC
When using Octavia as LBaaS on slow (or busy) environments it may take longer than the default 300 seconds that the kuryr-controller waits for the Amphora VM to be provisioned.

If this timeout is reached, kuryr will retry the action:

2018-06-28 11:34:44.529 1 ERROR kuryr_kubernetes.handlers.logging [-] Failed to handle event {u'object': {u'kind': u'Endpoints', u'subsets': [{u'addresses': [{u'ip': u'192.168.99.12', u'targetRef': {u'kind': u'Pod', u'resourceVersion': u'6149', u'namespace': u'default', u'name': u'router-1-fgh8f', u'uid': u'82a3d212-7ac6-11e8-b89e-fa163ec618b0'}, u'nodeName': u'infra-node-0.open
shift.example.com'}], u'ports': [{u'protocol': u'TCP', u'name': u'1936-tcp', u'port': 1936}, {u'protocol': u'TCP', u'name': u'80-tcp', u'port': 80}, {u'protocol': u'TCP', u'name': u'443-tcp', u'port': 443}]}], u'apiVersion': u'v1', u'metadata': {u'name': u'router', u'labels': {u'router': u'router'}, u'namespace': u'default', u'resourceVersion': u'6150', u'creationTimestamp': u'2
018-06-28T10:45:54Z', u'annotations': {u'openstack.org/kuryr-lbaas-spec': u'{"versioned_object.data": {"ip": "172.30.211.178", "lb_ip": null, "ports": [{"versioned_object.data": {"name": "80-tcp", "port": 80, "protocol": "TCP"}, "versioned_object.name": "LBaaSPortSpec", "versioned_object.namespace": "kuryr_kubernetes", "versioned_object.version": "1.0"}, {"versioned_object.data"
: {"name": "443-tcp", "port": 443, "protocol": "TCP"}, "versioned_object.name": "LBaaSPortSpec", "versioned_object.namespace": "kuryr_kubernetes", "versioned_object.version": "1.0"}, {"versioned_object.data": {"name": "1936-tcp", "port": 1936, "protocol": "TCP"}, "versioned_object.name": "LBaaSPortSpec", "versioned_object.namespace": "kuryr_kubernetes", "versioned_object.version
": "1.0"}], "project_id": "d85bdba083204fe2845349a86cb87d82", "security_groups_ids": ["1cd5ff23-545f-4af7-a79a-555b5b772b47"], "subnet_id": "e6d320d4-50ff-4c05-a4d8-ad0d2b7cc2ca", "type": "ClusterIP"}, "versioned_object.name": "LBaaSServiceSpec", "versioned_object.namespace": "kuryr_kubernetes", "versioned_object.version": "1.0"}'}, u'selfLink': u'/api/v1/namespaces/default/endp
oints/router', u'uid': u'6e97e23e-7ac0-11e8-b89e-fa163ec618b0'}}, u'type': u'MODIFIED'}: ResourceNotReady: Resource not ready: LBaaSLoadBalancer(id=7f3fc0e8-37bc-4b59-87eb-520a4bd625db,ip=172.30.211.178,name='default/router',port_id=c85e5e0b-2b98-4e77-9739-bdb05152fda4,project_id='d85bdba083204fe2845349a86cb87d82',provider='octavia',security_groups=[1cd5ff23-545f-4af7-a79a-555b5
b772b47],subnet_id=e6d320d4-50ff-4c05-a4d8-ad0d2b7cc2ca)
2018-06-28 11:34:44.529 1 ERROR kuryr_kubernetes.handlers.logging Traceback (most recent call last):
2018-06-28 11:34:44.529 1 ERROR kuryr_kubernetes.handlers.logging   File "/usr/lib/python2.7/site-packages/kuryr_kubernetes/handlers/logging.py", line 37, in __call__
2018-06-28 11:34:44.529 1 ERROR kuryr_kubernetes.handlers.logging     self._handler(event)
2018-06-28 11:34:44.529 1 ERROR kuryr_kubernetes.handlers.logging   File "/usr/lib/python2.7/site-packages/kuryr_kubernetes/handlers/retry.py", line 63, in __call__
2018-06-28 11:34:44.529 1 ERROR kuryr_kubernetes.handlers.logging     self._handler.set_health_status(healthy=False)
2018-06-28 11:34:44.529 1 ERROR kuryr_kubernetes.handlers.logging   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2018-06-28 11:34:44.529 1 ERROR kuryr_kubernetes.handlers.logging     self.force_reraise()
2018-06-28 11:34:44.529 1 ERROR kuryr_kubernetes.handlers.logging   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2018-06-28 11:34:44.529 1 ERROR kuryr_kubernetes.handlers.logging     six.reraise(self.type_, self.value, self.tb)
2018-06-28 11:34:44.529 1 ERROR kuryr_kubernetes.handlers.logging   File "/usr/lib/python2.7/site-packages/kuryr_kubernetes/handlers/retry.py", line 55, in __call__
2018-06-28 11:34:44.529 1 ERROR kuryr_kubernetes.handlers.logging     self._handler(event)
2018-06-28 11:34:44.529 1 ERROR kuryr_kubernetes.handlers.logging   File "/usr/lib/python2.7/site-packages/kuryr_kubernetes/handlers/k8s_base.py", line 72, in __call__
2018-06-28 11:34:44.529 1 ERROR kuryr_kubernetes.handlers.logging     self.on_present(obj)
2018-06-28 11:34:44.529 1 ERROR kuryr_kubernetes.handlers.logging   File "/usr/lib/python2.7/site-packages/kuryr_kubernetes/controller/handlers/lbaas.py", line 243, in on_present
2018-06-28 11:34:44.529 1 ERROR kuryr_kubernetes.handlers.logging     if self._sync_lbaas_members(endpoints, lbaas_state, lbaas_spec):
2018-06-28 11:34:44.529 1 ERROR kuryr_kubernetes.handlers.logging   File "/usr/lib/python2.7/site-packages/kuryr_kubernetes/controller/handlers/lbaas.py", line 318, in _sync_lbaas_members
2018-06-28 11:34:44.529 1 ERROR kuryr_kubernetes.handlers.logging     if self._sync_lbaas_pools(endpoints, lbaas_state, lbaas_spec):
2018-06-28 11:34:44.529 1 ERROR kuryr_kubernetes.handlers.logging   File "/usr/lib/python2.7/site-packages/kuryr_kubernetes/controller/handlers/lbaas.py", line 428, in _sync_lbaas_pools
2018-06-28 11:34:44.529 1 ERROR kuryr_kubernetes.handlers.logging     if self._sync_lbaas_listeners(endpoints, lbaas_state, lbaas_spec):
2018-06-28 11:34:44.529 1 ERROR kuryr_kubernetes.handlers.logging   File "/usr/lib/python2.7/site-packages/kuryr_kubernetes/controller/handlers/lbaas.py", line 486, in _sync_lbaas_listeners
2018-06-28 11:34:44.529 1 ERROR kuryr_kubernetes.handlers.logging     if self._add_new_listeners(endpoints, lbaas_spec, lbaas_state):
2018-06-28 11:34:44.529 1 ERROR kuryr_kubernetes.handlers.logging   File "/usr/lib/python2.7/site-packages/kuryr_kubernetes/controller/handlers/lbaas.py", line 504, in _add_new_listeners
2018-06-28 11:34:44.529 1 ERROR kuryr_kubernetes.handlers.logging     port=port)
2018-06-28 11:34:44.529 1 ERROR kuryr_kubernetes.handlers.logging   File "/usr/lib/python2.7/site-packages/kuryr_kubernetes/controller/drivers/lbaasv2.py", line 164, in ensure_listener
2018-06-28 11:34:44.529 1 ERROR kuryr_kubernetes.handlers.logging     self._find_listener)
2018-06-28 11:34:44.529 1 ERROR kuryr_kubernetes.handlers.logging   File "/usr/lib/python2.7/site-packages/kuryr_kubernetes/controller/drivers/lbaasv2.py", line 411, in _ensure_provisioned
2018-06-28 11:34:44.529 1 ERROR kuryr_kubernetes.handlers.logging     self._wait_for_provisioning(loadbalancer, remaining)
2018-06-28 11:34:44.529 1 ERROR kuryr_kubernetes.handlers.logging   File "/usr/lib/python2.7/site-packages/kuryr_kubernetes/controller/drivers/lbaasv2.py", line 450, in _wait_for_provisioning
2018-06-28 11:34:44.529 1 ERROR kuryr_kubernetes.handlers.logging     raise k_exc.ResourceNotReady(loadbalancer)
2018-06-28 11:34:44.529 1 ERROR kuryr_kubernetes.handlers.logging ResourceNotReady: Resource not ready: LBaaSLoadBalancer(id=7f3fc0e8-37bc-4b59-87eb-520a4bd625db,ip=172.30.211.178,name='default/router',port_id=c85e5e0b-2b98-4e77-9739-bdb05152fda4,project_id='d85bdba083204fe2845349a86cb87d82',provider='octavia',security_groups=[1cd5ff23-545f-4af7-a79a-555b5b772b47],subnet_id=e6d3
20d4-50ff-4c05-a4d8-ad0d2b7cc2ca)
2018-06-28 11:34:44.529 1 ERROR kuryr_kubernetes.handlers.logging 


However, as the LBaaS creation was already triggered and due to a bug in Octavia (https://storyboard.openstack.org/#!/story/2001944) when trying to list existing load balancers with filtering, the kuryr controller will fail to perform the remaining operations as it cannot find the existing loadbalancer, throwing an exception: 

2018-06-28 11:34:46.005 1 ERROR kuryr_kubernetes.handlers.retry [-] Report handler unhealthy LoadBalancerHandler: InternalServerError: 500-{u'debuginfo': u'Traceback (most recent call last):\n\n  File "/usr/lib/python2.7/site-packages/wsmeext/pecan.py", line 85, in callfunction\n    result = f(self, *args, **kwargs)\n\n  File "/opt/stack/octavia/octavia/api/v2/controllers/load_b
alancer.py", line 83, in get_all\n    **query_filter)\n\n  File "/opt/stack/octavia/octavia/db/repositories.py", line 145, in get_all\n    query, self.model_class)\n\n  File "/opt/stack/octavia/octavia/api/common/pagination.py", line 232, in apply\n    query = model.apply_filter(query, model, self.filters)\n\n  File "/opt/stack/octavia/octavia/db/base_models.py", line 123, in ap
ply_filter\n    query = query.filter_by(**translated_filters)\n\n  File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/query.py", line 1632, in filter_by\n    for key, value in kwargs.items()]\n\n  File "/usr/lib64/python2.7/site-packages/sqlalchemy/sql/operators.py", line 344, in __eq__\n    return self.operate(eq, other)\n\n  File "/usr/lib64/python2.7/site-packages/sqlalc
hemy/orm/attributes.py", line 180, in operate\n    return op(self.comparator, *other, **kwargs)\n\n  File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/relationships.py", line 1039, in __eq__\n    other, adapt_source=self.adapter))\n\n  File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/relationships.py", line 1372, in _optimized_compare\n    state = attributes.instanc
e_state(state)\n\nAttributeError: \'dict\' object has no attribute \'_sa_instance_state\'\n', u'faultcode': u'Server', u'faultstring': u"'dict' object has no attribute '_sa_instance_state'"}
Neutron server returns request_ids: ['req-8b7c13c4-b52b-4025-8a65-f82120881f71']
2018-06-28 11:34:46.005 1 ERROR kuryr_kubernetes.handlers.retry Traceback (most recent call last):
2018-06-28 11:34:46.005 1 ERROR kuryr_kubernetes.handlers.retry   File "/usr/lib/python2.7/site-packages/kuryr_kubernetes/handlers/retry.py", line 55, in __call__
2018-06-28 11:34:46.005 1 ERROR kuryr_kubernetes.handlers.retry     self._handler(event)
2018-06-28 11:34:46.005 1 ERROR kuryr_kubernetes.handlers.retry   File "/usr/lib/python2.7/site-packages/kuryr_kubernetes/handlers/k8s_base.py", line 75, in __call__
2018-06-28 11:34:46.005 1 ERROR kuryr_kubernetes.handlers.retry     self.on_present(obj)
2018-06-28 11:34:46.005 1 ERROR kuryr_kubernetes.handlers.retry   File "/usr/lib/python2.7/site-packages/kuryr_kubernetes/controller/handlers/lbaas.py", line 243, in on_present
2018-06-28 11:34:46.005 1 ERROR kuryr_kubernetes.handlers.retry     if self._sync_lbaas_members(endpoints, lbaas_state, lbaas_spec):
2018-06-28 11:34:46.005 1 ERROR kuryr_kubernetes.handlers.retry   File "/usr/lib/python2.7/site-packages/kuryr_kubernetes/controller/handlers/lbaas.py", line 318, in _sync_lbaas_members
2018-06-28 11:34:46.005 1 ERROR kuryr_kubernetes.handlers.retry     if self._sync_lbaas_pools(endpoints, lbaas_state, lbaas_spec):
2018-06-28 11:34:46.005 1 ERROR kuryr_kubernetes.handlers.retry   File "/usr/lib/python2.7/site-packages/kuryr_kubernetes/controller/handlers/lbaas.py", line 428, in _sync_lbaas_pools
2018-06-28 11:34:46.005 1 ERROR kuryr_kubernetes.handlers.retry     if self._sync_lbaas_listeners(endpoints, lbaas_state, lbaas_spec):
2018-06-28 11:34:46.005 1 ERROR kuryr_kubernetes.handlers.retry   File "/usr/lib/python2.7/site-packages/kuryr_kubernetes/controller/handlers/lbaas.py", line 483, in _sync_lbaas_listeners
2018-06-28 11:34:46.005 1 ERROR kuryr_kubernetes.handlers.retry     if self._sync_lbaas_loadbalancer(endpoints, lbaas_state, lbaas_spec):
2018-06-28 11:34:46.005 1 ERROR kuryr_kubernetes.handlers.retry   File "/usr/lib/python2.7/site-packages/kuryr_kubernetes/controller/handlers/lbaas.py", line 573, in _sync_lbaas_loadbalancer
2018-06-28 11:34:46.005 1 ERROR kuryr_kubernetes.handlers.retry     service_type=lbaas_spec.type)
2018-06-28 11:34:46.005 1 ERROR kuryr_kubernetes.handlers.retry   File "/usr/lib/python2.7/site-packages/kuryr_kubernetes/controller/drivers/lbaasv2.py", line 60, in ensure_loadbalancer
2018-06-28 11:34:46.005 1 ERROR kuryr_kubernetes.handlers.retry     self._find_loadbalancer)
2018-06-28 11:34:46.005 1 ERROR kuryr_kubernetes.handlers.retry   File "/usr/lib/python2.7/site-packages/kuryr_kubernetes/controller/drivers/lbaasv2.py", line 404, in _ensure
2018-06-28 11:34:46.005 1 ERROR kuryr_kubernetes.handlers.retry     result = find(obj)
2018-06-28 11:34:46.005 1 ERROR kuryr_kubernetes.handlers.retry   File "/usr/lib/python2.7/site-packages/kuryr_kubernetes/controller/drivers/lbaasv2.py", line 271, in _find_loadbalancer
2018-06-28 11:34:46.005 1 ERROR kuryr_kubernetes.handlers.retry     vip_subnet_id=loadbalancer.subnet_id)
2018-06-28 11:34:46.005 1 ERROR kuryr_kubernetes.handlers.retry   File "/usr/lib/python2.7/site-packages/neutronclient/v2_0/client.py", line 1124, in list_loadbalancers
2018-06-28 11:34:46.005 1 ERROR kuryr_kubernetes.handlers.retry     retrieve_all, **_params)
2018-06-28 11:34:46.005 1 ERROR kuryr_kubernetes.handlers.retry   File "/usr/lib/python2.7/site-packages/neutronclient/v2_0/client.py", line 369, in list
2018-06-28 11:34:46.005 1 ERROR kuryr_kubernetes.handlers.retry     for r in self._pagination(collection, path, **params):
2018-06-28 11:34:46.005 1 ERROR kuryr_kubernetes.handlers.retry   File "/usr/lib/python2.7/site-packages/neutronclient/v2_0/client.py", line 384, in _pagination
2018-06-28 11:34:46.005 1 ERROR kuryr_kubernetes.handlers.retry     res = self.get(path, params=params)
2018-06-28 11:34:46.005 1 ERROR kuryr_kubernetes.handlers.retry   File "/usr/lib/python2.7/site-packages/neutronclient/v2_0/client.py", line 354, in get
2018-06-28 11:34:46.005 1 ERROR kuryr_kubernetes.handlers.retry     headers=headers, params=params)
2018-06-28 11:34:46.005 1 ERROR kuryr_kubernetes.handlers.retry   File "/usr/lib/python2.7/site-packages/neutronclient/v2_0/client.py", line 331, in retry_request
2018-06-28 11:34:46.005 1 ERROR kuryr_kubernetes.handlers.retry     headers=headers, params=params)
2018-06-28 11:34:46.005 1 ERROR kuryr_kubernetes.handlers.retry   File "/usr/lib/python2.7/site-packages/neutronclient/v2_0/client.py", line 294, in do_request
2018-06-28 11:34:46.005 1 ERROR kuryr_kubernetes.handlers.retry     self._handle_fault_response(status_code, replybody, resp)
2018-06-28 11:34:46.005 1 ERROR kuryr_kubernetes.handlers.retry   File "/usr/lib/python2.7/site-packages/neutronclient/v2_0/client.py", line 269, in _handle_fault_response
2018-06-28 11:34:46.005 1 ERROR kuryr_kubernetes.handlers.retry     exception_handler_v20(status_code, error_body)
2018-06-28 11:34:46.005 1 ERROR kuryr_kubernetes.handlers.retry   File "/usr/lib/python2.7/site-packages/neutronclient/v2_0/client.py", line 93, in exception_handler_v20
2018-06-28 11:34:46.005 1 ERROR kuryr_kubernetes.handlers.retry     request_ids=request_ids)
2018-06-28 11:34:46.005 1 ERROR kuryr_kubernetes.handlers.retry InternalServerError: 500-{u'debuginfo': u'Traceback (most recent call last):\n\n  File "/usr/lib/python2.7/site-packages/wsmeext/pecan.py", line 85, in callfunction\n    result = f(self, *args, **kwargs)\n\n  File "/opt/stack/octavia/octavia/api/v2/controllers/load_balancer.py", line 83, in get_all\n    **query_filt
er)\n\n  File "/opt/stack/octavia/octavia/db/repositories.py", line 145, in get_all\n    query, self.model_class)\n\n  File "/opt/stack/octavia/octavia/api/common/pagination.py", line 232, in apply\n    query = model.apply_filter(query, model, self.filters)\n\n  File "/opt/stack/octavia/octavia/db/base_models.py", line 123, in apply_filter\n    query = query.filter_by(**translat
ed_filters)\n\n  File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/query.py", line 1632, in filter_by\n    for key, value in kwargs.items()]\n\n  File "/usr/lib64/python2.7/site-packages/sqlalchemy/sql/operators.py", line 344, in __eq__\n    return self.operate(eq, other)\n\n  File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/attributes.py", line 180, in operate\n   
 return op(self.comparator, *other, **kwargs)\n\n  File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/relationships.py", line 1039, in __eq__\n    other, adapt_source=self.adapter))\n\n  File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/relationships.py", line 1372, in _optimized_compare\n    state = attributes.instance_state(state)\n\nAttributeError: \'dict\' object 
has no attribute \'_sa_instance_state\'\n', u'faultcode': u'Server', u'faultstring': u"'dict' object has no attribute '_sa_instance_state'"}

To avoid this, the time that Kuryr waits for the load balancer provision tneeds to be increased.

Comment 1 Luis Tomas Bolivar 2018-06-28 14:46:30 UTC
Seems this patch on Octavia https://review.openstack.org/#/c/559842/ may fix the https://storyboard.openstack.org/#!/story/2001944 bug regarding getting the existing loadbalancer with filters.

Comment 2 Luis Tomas Bolivar 2018-06-29 13:06:50 UTC
Until the Octavia issue is fixed, we need this on kuryr-kubernetes (controller) side: https://review.openstack.org/#/c/579144/

Comment 3 Luis Tomas Bolivar 2018-07-06 12:24:09 UTC
https://review.openstack.org/#/c/579846/ got merged, so  https://review.openstack.org/#/c/579144/ is not needed anymore

Comment 4 Scott Dodson 2018-08-14 21:40:07 UTC
Should be in openshift-ansible-3.10.28-1

Comment 5 Jon Uriarte 2018-09-25 12:58:44 UTC
Verified in openshift-ansible-3.10.50-1.git.0.96a93c5.el7.noarch.

Verification steps:

1. Deploy OCP 3.10 on OSP 3.10, with kuryr enabled
2. Check lbaas_activation_timeout value in kuryr config:

$ oc -n openshift-infra get configmap -o yaml | grep lbaas
      lbaas_activation_timeout = 1200

3. Create a new project, deploy and scale an app:

$ oc new-project test
$ oc run --image kuryr/demo demo
$ oc scale dc/demo --replicas=2

$ oc get pods --all-namespaces -o wide
NAMESPACE         NAME                                                READY     STATUS    RESTARTS   AGE       IP              NODE
default           docker-registry-1-j9q8p                             1/1       Running   0          21h       10.11.0.11      infra-node-0.openshift.example.com
default           registry-console-1-hqrx4                            1/1       Running   0          21h       10.11.0.3       master-0.openshift.example.com
default           router-1-rpjg7                                      1/1       Running   0          21h       192.168.99.5    infra-node-0.openshift.example.com
kube-system       master-api-master-0.openshift.example.com           1/1       Running   0          21h       192.168.99.15   master-0.openshift.example.com
kube-system       master-controllers-master-0.openshift.example.com   1/1       Running   1          21h       192.168.99.15   master-0.openshift.example.com
kube-system       master-etcd-master-0.openshift.example.com          1/1       Running   1          21h       192.168.99.15   master-0.openshift.example.com
openshift-infra   kuryr-cni-ds-9xs42                                  2/2       Running   0          21h       192.168.99.5    infra-node-0.openshift.example.com
openshift-infra   kuryr-cni-ds-k9b6c                                  2/2       Running   0          21h       192.168.99.10   app-node-0.openshift.example.com
openshift-infra   kuryr-cni-ds-nw82s                                  2/2       Running   0          21h       192.168.99.15   master-0.openshift.example.com
openshift-infra   kuryr-cni-ds-znwrt                                  2/2       Running   0          21h       192.168.99.4    app-node-1.openshift.example.com
openshift-infra   kuryr-controller-59fc7f478b-zwfvb                   1/1       Running   0          33m       192.168.99.15   master-0.openshift.example.com
openshift-node    sync-fpmst                                          1/1       Running   0          21h       192.168.99.15   master-0.openshift.example.com
openshift-node    sync-qzzvp                                          1/1       Running   0          21h       192.168.99.5    infra-node-0.openshift.example.com
openshift-node    sync-s7xzt                                          1/1       Running   0          21h       192.168.99.4    app-node-1.openshift.example.com
openshift-node    sync-zmqbh                                          1/1       Running   0          21h       192.168.99.10   app-node-0.openshift.example.com
test              demo-1-7v9xf                                        1/1       Running   0          1m        10.11.0.17      app-node-0.openshift.example.com
test              demo-1-njzfx                                        1/1       Running   0          2m        10.11.0.28      app-node-1.openshift.example.com

$ curl 10.11.0.17:8080                                                                                                                                                                       
demo-1-7v9xf: HELLO! I AM ALIVE!!!

$ curl 10.11.0.28:8080                                                                                                                                                                       
demo-1-njzfx: HELLO! I AM ALIVE!!!

4. Create a service and kill the kuryr controller pod while the load balancer is creating:
$ oc expose dc/demo --port 80 --target-port 8080

+--------------------------------------+------------------------------------------------+----------------------------------+---------------+---------------------+----------+
| id                                   | name                                           | project_id                       | vip_address   | provisioning_status | provider |
+--------------------------------------+------------------------------------------------+----------------------------------+---------------+---------------------+----------+
| 49bf0999-621c-4c6b-9540-5c2536c76102 | test/demo                                      | 4379cba1109242639a99af8ad04c7208 | 172.30.101.63 | PENDING_CREATE      | octavia  |
+--------------------------------------+------------------------------------------------+----------------------------------+---------------+---------------------+----------+

$ oc -n openshift-infra delete pod kuryr-controller-59fc7f478b-zwfvb

$ oc -n openshift-infra get pods -o wide
NAME                                READY     STATUS    RESTARTS   AGE       IP              NODE
kuryr-cni-ds-9xs42                  2/2       Running   0          21h       192.168.99.5    infra-node-0.openshift.example.com
kuryr-cni-ds-k9b6c                  2/2       Running   0          21h       192.168.99.10   app-node-0.openshift.example.com
kuryr-cni-ds-nw82s                  2/2       Running   0          21h       192.168.99.15   master-0.openshift.example.com
kuryr-cni-ds-znwrt                  2/2       Running   0          21h       192.168.99.4    app-node-1.openshift.example.com
kuryr-controller-59fc7f478b-skrpf   1/1       Running   0          38s       192.168.99.4    app-node-1.openshift.example.com

+--------------------------------------+------------------------------------------------+----------------------------------+---------------+---------------------+----------+
| id                                   | name                                           | project_id                       | vip_address   | provisioning_status | provider |
+--------------------------------------+------------------------------------------------+----------------------------------+---------------+---------------------+----------+
| 49bf0999-621c-4c6b-9540-5c2536c76102 | test/demo                                      | 4379cba1109242639a99af8ad04c7208 | 172.30.101.63 | ACTIVE              | octavia  |
+--------------------------------------+------------------------------------------------+----------------------------------+---------------+---------------------+----------+

$ oc get svc
NAME      TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)   AGE
demo      ClusterIP   172.30.101.63   <none>        80/TCP    2m

$ curl 172.30.101.63
demo-1-7v9xf: HELLO! I AM ALIVE!!!

$ curl 172.30.101.63
demo-1-njzfx: HELLO! I AM ALIVE!!!

5. Check there are no restarts or errors in kuryr controller.

6. Delete the project:
$ oc delete project test

Another test:

1. Deploy OCP 3.10 on OSP 3.10, with kuryr enabled

2. Create a new project, deploy and expose an app:

$ oc new-project test-timeout
$ oc run --image kuryr/demo demo                                                                                                                                                             
$ oc get pods -o wide
NAME           READY     STATUS    RESTARTS   AGE       IP          NODE
demo-1-6nczk   1/1       Running   0          21s       10.11.0.7   app-node-0.openshift.example.com

$ curl 10.11.0.7:8080
demo-1-6nczk: HELLO! I AM ALIVE!!!

$ oc expose dc/demo --port 80 --target-port 8080 

$ oc get svc
NAME      TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)   AGE
demo      ClusterIP   172.30.176.122   <none>        80/TCP    4s

$ curl 172.30.176.122
demo-1-6nczk: HELLO! I AM ALIVE!!!

$ oc -n openshift-infra get pods                                                                                                                                                             
NAME                                READY     STATUS    RESTARTS   AGE
kuryr-cni-ds-9xs42                  2/2       Running   0          22h
kuryr-cni-ds-k9b6c                  2/2       Running   0          22h
kuryr-cni-ds-nw82s                  2/2       Running   0          22h
kuryr-cni-ds-znwrt                  2/2       Running   0          22h
kuryr-controller-59fc7f478b-tlfll   1/1       Running   0          10m

3. Delete the kuryr controller pod:

$ oc -n openshift-infra delete pod kuryr-controller-59fc7f478b-tlfll
pod "kuryr-controller-59fc7f478b-tlfll" deleted

$ oc -n openshift-infra get pods
NAME                                READY     STATUS    RESTARTS   AGE
kuryr-cni-ds-9xs42                  2/2       Running   0          22h
kuryr-cni-ds-k9b6c                  2/2       Running   0          22h
kuryr-cni-ds-nw82s                  2/2       Running   0          22h
kuryr-cni-ds-znwrt                  2/2       Running   0          22h
kuryr-controller-59fc7f478b-rh2f2   1/1       Running   0          15s

$ oc get pods -o wide
NAME           READY     STATUS    RESTARTS   AGE       IP          NODE
demo-1-6nczk   1/1       Running   0          3m        10.11.0.7   app-node-0.openshift.example.com

$ curl 172.30.176.122                                                                                                                                                                        
demo-1-6nczk: HELLO! I AM ALIVE!!!

4. Scale the app:

$ oc scale dc/demo --replicas=2

$ oc get pods -o wide
NAME           READY     STATUS    RESTARTS   AGE       IP           NODE
demo-1-6nczk   1/1       Running   0          4m        10.11.0.7    app-node-0.openshift.example.com
demo-1-ft659   1/1       Running   0          58s       10.11.0.25   app-node-1.openshift.example.com


5. Check new member was added correctly to the load balancer:

$ curl 172.30.176.122                                                                                                                                                                        
demo-1-ft659: HELLO! I AM ALIVE!!!

$ curl 172.30.176.122
demo-1-6nczk: HELLO! I AM ALIVE!!!

6. Delete the project:
$ oc delete project test-timeout

Comment 7 errata-xmlrpc 2019-01-30 15:13:18 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0206


Note You need to log in before you can comment on or make changes to this bug.