Bug 1730778 - Route with 2 named ports sends traffic to wrong one with the correct one is down
Summary: Route with 2 named ports sends traffic to wrong one with the correct one is down
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 3.11.0
Hardware: All
OS: Linux
unspecified
urgent
Target Milestone: ---
: 3.11.z
Assignee: Dan Mace
QA Contact: Hongan Li
URL:
Whiteboard:
Depends On: 1756166
Blocks: 1752521
TreeView+ depends on / blocked
 
Reported: 2019-07-17 15:17 UTC by Matthew Robson
Modified: 2022-08-04 22:24 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1752521 (view as bug list)
Environment:
Last Closed: 2019-08-13 14:09:19 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift origin pull 23441 0 'None' 'closed' 'Bug 1730778: Respect route targetPort in dynamic config' 2019-11-13 23:11:05 UTC
Red Hat Product Errata RHBA-2019:2352 0 None None None 2019-08-13 14:09:30 UTC

Description Matthew Robson 2019-07-17 15:17:17 UTC
Description of problem:

Project with 2 routes points to a single service with 2 named ports.

One is for a health check and the other is for application traffic.

When start / deploying and the health check is up, but the application is still starting, we see traffic on the application route being sent to the health check containers resulting in 404s.

This pod has 2 containers, the application and a sidecar for add-on services which include the health check.

Requests to project-lb.public.com target the app port which points to the application container. When the application container is still starting, they are seeing a number of 404 errors for traffic.

From what we can see, when the application comes up, it stops sending traffic incorrectly to the health check container.

Turning on debug logging for the health check container, we can see traffic for project-lb.public.com is incorrectly coming into the health check container and returning 404s.

Health Check Route:

apiVersion: v1
items:
- apiVersion: route.openshift.io/v1
  kind: Route
  metadata:
    creationTimestamp: 2019-03-29T19:31:23Z
    name: healthcheck
    namespace: project
    resourceVersion: "65987865"
    selfLink: /apis/route.openshift.io/v1/namespaces/project/routes/healthcheck
    uid: 3c9c3612-5259-11e9-a732-005056b97b10
  spec:
    host: healthcheck.paas.com
    port:
      targetPort: hm
    to:
      kind: Service
      name: svc-act
      weight: 100
    wildcardPolicy: None
  status:
    ingress:
    - conditions:
      - lastTransitionTime: 2019-03-29T19:31:23Z
        status: "True"
        type: Admitted
      routerName: router
      wildcardPolicy: None

Application Route:


- apiVersion: route.openshift.io/v1
  kind: Route
  metadata:
    creationTimestamp: 2019-03-29T19:31:05Z
    name: application
    namespace: project
    resourceVersion: "65987845"
    selfLink: /apis/route.openshift.io/v1/namespaces/project/routes/application
    uid: 31a7d081-5259-11e9-a732-005056b97b10
  spec:
    host: project-lb.public.com
    port:
      targetPort: app
    tls:
      termination: edge
    to:
      kind: Service
      name: svc-act
      weight: 100
    wildcardPolicy: None
  status:
    ingress:
    - conditions:
      - lastTransitionTime: 2019-03-29T19:31:05Z
        status: "True"
        type: Admitted
      routerName: router
      wildcardPolicy: None


Service:

apiVersion: v1
items:
- apiVersion: v1
  kind: Service
  metadata:
    annotations:
      kubectl.kubernetes.io/last-applied-configuration: |
        {"apiVersion":"v1","kind":"Service","metadata":{"annotations":{},"name":"svc-act","namespace":"project"},"spec":{"ports":[{"name":"app","port":8080,"targetPort":8080},{"name":"hm","port":8081,"targetPort":9011}],"selector":{"name":"project-b"}}}
    creationTimestamp: 2019-03-29T19:31:09Z
    name: svc-act
    namespace: project
    resourceVersion: "80400351"
    selfLink: /api/v1/namespaces/project/services/svc-act
    uid: 3483c483-5259-11e9-a366-005056b97ad5
  spec:
    clusterIP: 172.29.82.203
    ports:
    - name: app
      port: 8080
      protocol: TCP
      targetPort: 8080
    - name: hm
      port: 8081
      protocol: TCP
      targetPort: 9011
    selector:
      name: project-b
    sessionAffinity: None
    type: ClusterIP
  status:
    loadBalancer: {}


Health Check Log showing traffic for project-lb.public.com coming into port 9011 which is the health check port 'hm'

2019-07-16 14:50:27.450 DEBUG 8 --- [http-nio-9011-Acceptor-0] o.apache.tomcat.util.threads.LimitLatch  : Counting up[http-nio-9011-Acceptor-0] latch=1
2019-07-16 14:50:27.450 DEBUG 8 --- [http-nio-9011-exec-5] o.a.tomcat.util.net.SocketWrapperBase    : Socket: [org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper@4054f4a1:org.apache.tomcat.util.net.NioChannel@699d524f:java.nio.channels.SocketChannel[connected local=/10.1.1.1:9011 remote=/10.59.28.1:37190]], Read from buffer: [0]
2019-07-16 14:50:27.451 DEBUG 8 --- [http-nio-9011-exec-5] o.a.coyote.http11.Http11InputBuffer      : Received [GET /url/info HTTP/1.1
User-Agent: curl/7.29.0
Accept: */*
X-Forwarded-For: 10.1.1.1
Host: project-lb.public.com
X-Forwarded-Host: project-lb.public.com
X-Forwarded-Port: 443
X-Forwarded-Proto: https
Forwarded: for=10.1.1.1;host=fproject-lb.public.com;proto=https

]
2019-07-16 14:50:27.451 DEBUG 8 --- [http-nio-9011-exec-5] o.a.c.authenticator.AuthenticatorBase    : Security checking request GET /url/info
2019-07-16 14:50:27.451 DEBUG 8 --- [http-nio-9011-exec-5] org.apache.catalina.realm.RealmBase      :   No applicable constraints defined
2019-07-16 14:50:27.451 DEBUG 8 --- [http-nio-9011-exec-5] o.a.c.authenticator.AuthenticatorBase    :  Not subject to any constraint
2019-07-16 14:50:27.451 DEBUG 8 --- [http-nio-9011-exec-5] o.s.b.w.f.OrderedRequestContextFilter    : Bound request context to thread: org.apache.catalina.connector.RequestFacade@12f8d3c1
2019-07-16 14:50:27.451 DEBUG 8 --- [http-nio-9011-exec-5] o.s.web.servlet.DispatcherServlet        : DispatcherServlet with name 'dispatcherServlet' processing GET request for [/url/info]
2019-07-16 14:50:27.451 DEBUG 8 --- [http-nio-9011-exec-5] s.w.s.m.m.a.RequestMappingHandlerMapping : Looking up handler method for path /url/info
2019-07-16 14:50:27.451 DEBUG 8 --- [http-nio-9011-exec-5] s.w.s.m.m.a.RequestMappingHandlerMapping : Did not find handler method for [/url/info]
2019-07-16 14:50:27.451 DEBUG 8 --- [http-nio-9011-exec-5] o.s.w.s.handler.SimpleUrlHandlerMapping  : Matching patterns for request [/url/info] are [/**]
2019-07-16 14:50:27.452 DEBUG 8 --- [http-nio-9011-exec-5] o.s.w.s.handler.SimpleUrlHandlerMapping  : URI Template variables for request [/url/info] are {}
2019-07-16 14:50:27.452 DEBUG 8 --- [http-nio-9011-exec-5] o.s.w.s.handler.SimpleUrlHandlerMapping  : Mapping [/url/info] to HandlerExecutionChain with handler [ResourceHttpRequestHandler [locations=[ServletContext resource [/], class path resource [META-INF/resources/], class path resource [resources/], class path resource [static/], class path resource [public/]], resolvers=[org.springframework.web.servlet.resource.PathResourceResolver@4b5a5ed1]]] and 1 interceptor
2019-07-16 14:50:27.452 DEBUG 8 --- [http-nio-9011-exec-5] o.s.web.servlet.DispatcherServlet        : Last-Modified value for [/url/info] is: -1
2019-07-16 14:50:27.452 DEBUG 8 --- [http-nio-9011-exec-5] o.s.web.servlet.DispatcherServlet        : Null ModelAndView returned to DispatcherServlet with name 'dispatcherServlet': assuming HandlerAdapter completed request handling
2019-07-16 14:50:27.452 DEBUG 8 --- [http-nio-9011-exec-5] o.s.web.servlet.DispatcherServlet        : Successfully completed request
2019-07-16 14:50:27.452 DEBUG 8 --- [http-nio-9011-exec-5] o.s.b.w.f.OrderedRequestContextFilter    : Cleared thread-bound request context: org.apache.catalina.connector.RequestFacade@12f8d3c1
2019-07-16 14:50:27.452 DEBUG 8 --- [http-nio-9011-exec-5] o.a.c.c.C.[Tomcat].[localhost]           : Processing ErrorPage[errorCode=0, location=/error]
2019-07-16 14:50:27.452 DEBUG 8 --- [http-nio-9011-exec-5] o.s.web.servlet.DispatcherServlet        : DispatcherServlet with name 'dispatcherServlet' processing GET request for [/error]
2019-07-16 14:50:27.453 DEBUG 8 --- [http-nio-9011-exec-5] s.w.s.m.m.a.RequestMappingHandlerMapping : Looking up handler method for path /error
2019-07-16 14:50:27.453 DEBUG 8 --- [http-nio-9011-exec-5] s.w.s.m.m.a.RequestMappingHandlerMapping : Returning handler method [public org.springframework.http.ResponseEntity<java.util.Map<java.lang.String, java.lang.Object>> org.springframework.boot.autoconfigure.web.BasicErrorController.error(javax.servlet.http.HttpServletRequest)]
2019-07-16 14:50:27.453 DEBUG 8 --- [http-nio-9011-exec-5] o.s.b.f.s.DefaultListableBeanFactory     : Returning cached instance of singleton bean 'basicErrorController'
2019-07-16 14:50:27.453 DEBUG 8 --- [http-nio-9011-exec-5] o.s.web.servlet.DispatcherServlet        : Last-Modified value for [/error] is: -1
2019-07-16 14:50:27.454 DEBUG 8 --- [http-nio-9011-exec-5] o.s.w.s.m.m.a.HttpEntityMethodProcessor  : Written [{timestamp=Tue Jul 16 14:50:27 EDT 2019, status=404, error=Not Found, message=No message available, path=/url/info}] as ""application/json"" using [org.springframework.http.converter.json.MappingJackson2HttpMessageConverter@37e547da]
2019-07-16 14:50:27.454 DEBUG 8 --- [http-nio-9011-exec-5] o.s.web.servlet.DispatcherServlet        : Null ModelAndView returned to DispatcherServlet with name 'dispatcherServlet': assuming HandlerAdapter completed request handling
2019-07-16 14:50:27.454 DEBUG 8 --- [http-nio-9011-exec-5] o.s.web.servlet.DispatcherServlet        : Successfully completed request
2019-07-16 14:50:27.454 DEBUG 8 --- [http-nio-9011-exec-5] o.a.c.c.C.[.[.[/].[dispatcherServlet]    :  Disabling the response for further output
...
...


Version-Release number of selected component (if applicable):
3.11

How reproducible:
Always for this application

Steps to Reproduce:
1. Customer environment
2.
3.

Actual results:
Traffic to wrong container


Expected results:
Traffic should not route to the health check container.

Additional info:

Comment 1 Dan Mace 2019-07-18 19:42:03 UTC
Still working on reproducing this. In the meantime, if you are able to reproduce, could you also please provide the following after observing the problematic behavior?

1. Related resources:

    oc get -n project routes -o yaml
    oc get -n project services -o yaml
    oc get -n project endpoints -o yaml
    oc get -n project pods -o yaml

2. The haproxy.config file from a router pod's filesystem (substituting the router namespace and pod name):

   oc exec -n router_namespace router_pod_name cat haproxy.config

Thanks!

Comment 3 Dan Mace 2019-07-18 21:14:17 UTC
In addition to the haproxy config, can you please also get the yaml for the router deployment itself?

Comment 15 Hongan Li 2019-08-06 08:43:04 UTC
verified with atomic-openshift-3.11.135-1.git.0.b309f70 and issue has been fixed.

test step:
1. create a service with two ports
# oc get svc -n hongli
NAME               TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)               AGE
mysvc              ClusterIP   172.30.28.216    <none>        27017/TCP,27443/TCP   38m

2. create a route with the svc and one of the ports
# oc get route -n hongli
NAME      HOST/PORT                                   PATH      SERVICES   PORT      TERMINATION   WILDCARD
mysvc     mysvc-hongli.apps.0806-xqi.qe.rhcloud.com             mysvc      8080                    None

3. enable dynamic configuration 
# oc set env dc/router ROUTER_HAPROXY_CONFIG_MANAGER=true

4. check server status or curling the route.
sh-4.2$ echo "show servers state be_http:hongli:mysvc" | socat - UNIX-CONNECT:/var/lib/haproxy/run/haproxy.sock
1
# be_id be_name srv_id srv_name srv_addr srv_op_state srv_admin_state srv_uweight srv_iweight srv_time_since_last_change srv_check_status srv_check_result srv_check_health srv_check_state srv_agent_state bk_f_forced_id srv_f_forced_id srv_fqdn srv_port srvrecord
41 be_http:hongli:mysvc 1 pod:caddy-rc-d6jxz:mysvc:10.129.0.11:8080 10.129.0.11 2 0 256 256 342 6 3 4 6 0 0 0 - 8080 -
41 be_http:hongli:mysvc 2 _dynamic-pod-1 172.4.0.4 0 5 1 1 342 1 0 0 14 0 0 0 - 8765 -
41 be_http:hongli:mysvc 3 _dynamic-pod-2 172.4.0.4 0 5 1 1 342 1 0 0 14 0 0 0 - 8765 -
41 be_http:hongli:mysvc 4 _dynamic-pod-3 172.4.0.4 0 5 1 1 342 1 0 0 14 0 0 0 - 8765 -
41 be_http:hongli:mysvc 5 _dynamic-pod-4 172.4.0.4 0 5 1 1 342 1 0 0 14 0 0 0 - 8765 -
41 be_http:hongli:mysvc 6 _dynamic-pod-5 10.129.0.13 2 4 256 1 342 6 3 4 6 0 0 0 - 8080 -


Tested with old version, the server status may shows another ports like: 
41 be_http:hongli:mysvc 5 _dynamic-pod-4 10.129.0.13 2 4 256 1 342 6 3 4 6 0 0 0 - 8443 -
41 be_http:hongli:mysvc 6 _dynamic-pod-5 10.129.0.13 2 4 256 1 342 6 3 4 6 0 0 0 - 8080 -
And curling randomly return error like:
# curl http://mysvc-hongli.apps.0806-xqi.qe.rhcloud.com
Hello-OpenShift-1 http-8080
# curl http://mysvc-hongli.apps.0806-xqi.qe.rhcloud.com
<html><body><h1>502 Bad Gateway</h1>
The server returned an invalid or incomplete response.
</body></html>

Comment 20 errata-xmlrpc 2019-08-13 14:09:19 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2352


Note You need to log in before you can comment on or make changes to this bug.