Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1875806

Summary: When creating a service of type "LoadBalancer" (Kuryr,OVN) communication through this loadbalancer failes after 2-5 minutes.
Product: OpenShift Container Platform Reporter: Robert Heinzmann <rheinzma>
Component: NetworkingAssignee: Luis Tomas Bolivar <ltomasbo>
Networking sub component: kuryr QA Contact: GenadiC <gcheresh>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: medium CC: ffernand, jaeichle, rlobillo, svmichel
Version: 4.5Keywords: TestOnly
Target Milestone: ---   
Target Release: 4.7.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of:
: 1879389 (view as bug list) Environment:
Last Closed: 2021-02-24 15:17:26 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1879389    
Bug Blocks:    

Comment 9 Luis Tomas Bolivar 2020-09-24 12:52:57 UTC
This is the ovn fix for this problem: https://review.opendev.org/#/c/753833/

Comment 13 rlobillo 2020-11-20 13:02:31 UTC
Tested on OCP4.7.0-0.nightly-2020-11-18-203317 over OSP16.1 with OVN octavia (compose: RHOS-16.1-RHEL-8-20201110.n.1)

Confirmed that RHOS-16.1-RHEL-8-20201110.n.1 includes the package with the fix:

http://download.eng.bos.redhat.com/rcm-guest/puddles/OpenStack/16.1-RHEL-8/RHOS-16.1-RHEL-8-20201110.n.1/compose/OpenStack/source/tree/Packages/python-networking-ovn-7.3.1-1.20200902233413.el8ost.src.rpm  

# Setting the environment using default ingress port:

oc new-project test
oc run --image kuryr/demo demo
oc expose service/demo

$ openstack port list | grep ingress
| 74d380f9-32c6-4161-a97a-212d1f5238d3 | ostest-9fn97-ingress-port                            | fa:16:3e:f0:7c:1a | ip_address='10.196.0.7', subnet_id='3ef39638-9aa4-4b26-8376-e7fdbca140ea' 
    | DOWN   |
$ openstack floating ip set --port 74d380f9-32c6-4161-a97a-212d1f5238d3 10.46.22.104

(shiftstack) [stack@undercloud-0 ~]$ tail -1 /etc/hosts
10.46.22.104 demo-test.apps.ostest.shiftstack.com

(shiftstack) [stack@undercloud-0 ~]$ curl demo-test.apps.ostest.shiftstack.com
demo: HELLO! I AM ALIVE!!!


Now, applying the 'Scaling for ingress traffic by using {rh-openstack} Octavia' procedure. Ref: https://github.com/openshift/openshift-docs/pull/23467/files?short_path=6f43ec6#diff-6f43ec66e11c7071e703211dc4f7773b4c4c861924d39c24e6a7437cba323c2a

$ cat external_router.yaml
apiVersion: v1
kind: Service
metadata:
  labels:
    ingresscontroller.operator.openshift.io/owning-ingresscontroller: default
  name: router-external-default
  namespace: openshift-ingress
spec:
  ports:
  - name: http
    port: 80
    protocol: TCP
    targetPort: http
  - name: https
    port: 443
    protocol: TCP
    targetPort: https
  - name: metrics
    port: 1936
    protocol: TCP
    targetPort: 1936
  selector:
    ingresscontroller.operator.openshift.io/deployment-ingresscontroller: default
  sessionAffinity: None
  type: LoadBalancer


$ oc -n openshift-ingress get svc
NAME                      TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)                                     AGE
router-external-default   LoadBalancer   172.30.20.242   10.46.22.91   80:30620/TCP,443:31102/TCP,1936:30261/TCP   80m
router-internal-default   ClusterIP      172.30.16.228   <none>        80/TCP,443/TCP,1936/TCP                     24h

# Changing the local resolution to use router-external-default svc:

$ tail -1 /etc/hosts
10.46.22.91 demo-test.apps.ostest.shiftstack.com

$ curl demo-test.apps.ostest.shiftstack.com -vvv
* Rebuilt URL to: demo-test.apps.ostest.shiftstack.com/
* Uses proxy env variable no_proxy == ',overcloud.ctlplane.redhat.local,overcloud.redhat.local'
*   Trying 10.46.22.91...
* TCP_NODELAY set
* Connected to demo-test.apps.ostest.shiftstack.com (10.46.22.91) port 80 (#0)
> GET / HTTP/1.1
> Host: demo-test.apps.ostest.shiftstack.com
> User-Agent: curl/7.61.1
> Accept: */*
> 
< HTTP/1.1 200 OK
< date: Fri, 20 Nov 2020 12:46:28 GMT
< content-length: 27
< content-type: text/plain; charset=utf-8
< set-cookie: 368e81a3926bc2f60deaf1f00c88a746=aa04daf3e7f13f57765db9508e838ccb; path=/; HttpOnly
< cache-control: private
< 
demo: HELLO! I AM ALIVE!!!
* Connection #0 to host demo-test.apps.ostest.shiftstack.com left intact


# Keeping trying for 10 minutes:

$ while(true); do date && curl demo-test.apps.ostest.shiftstack.com && sleep 300; done                                                                     
Fri Nov 20 12:47:43 UTC 2020
demo: HELLO! I AM ALIVE!!!
Fri Nov 20 12:52:43 UTC 2020
demo: HELLO! I AM ALIVE!!!
Fri Nov 20 12:57:43 UTC 2020
demo: HELLO! I AM ALIVE!!!

Comment 19 errata-xmlrpc 2021-02-24 15:17:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633