Bug 1903206

Summary:	Ingress controller incorrectly routes traffic to non-ready pods/backends.
Product:	OpenShift Container Platform	Reporter:	emahoney
Component:	Networking	Assignee:	Andrew McDermott <amcdermo>
Networking sub component:	router	QA Contact:	Arvind iyengar <aiyengar>
Status:	CLOSED ERRATA	Docs Contact:
Severity:	urgent
Priority:	urgent	CC:	aiyengar, amcdermo, aos-bugs, aos-network-edge-staff, arthur.barr, chris.wilkinson, cmarches, hongli, mmasters, openshift-bugs-escalate, rcarrier
Version:	4.6
Target Milestone:	---
Target Release:	4.7.0
Hardware:	All
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2021-02-24 15:37:08 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1904010

Description emahoney 2020-12-01 16:10:19 UTC

Description of problem:
We are using a StatefulSet or Deployment with two replicas, with a ClusterIP service, and a Route. One of the Pods is alive and ready. The other Pod is alive but not ready. Before OCP 4.6, network traffic was correctly routed to only the ready Pod. In OCP 4.6, it appears that either Pod can receive traffic through the Route. Traffic from within the cluster which uses the ClusterIP service seems to be handled correctly, leading us to think this is an issue with the Router (i.e. HAProxy).

We have seen the incorrect behaviour on OCP 4.6.1, 4.6.3 and 4.6.4. We have seen the correct behaviour on OCP 4.5.17.

Version-Release number of selected component (if applicable):
OCP 4.6.1, 4.6.3 and 4.6.4

How reproducible:

Steps to Reproduce:
## Re-create steps

Create the following deployment, service, route (on 4.6 to recreate problem, on 4.5 to show it works as expected):

### Deployment

Note that the readiness probe here will not pass.
```
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginxdemoshello-deployment
labels:
app: nginxdemoshello
spec:
replicas: 2
selector:
matchLabels:
app: nginxdemoshello
template:
metadata:
labels:
app: nginxdemoshello
spec:
containers:
- name: liveness
image: nginxdemos/nginx-hello:plain-text
readinessProbe:
exec:
command:
- cat
- /tmp/ready
initialDelaySeconds: 5
periodSeconds: 5
```

### Service
```
apiVersion: v1
kind: Service
metadata:
name: nginxdemoshello-service
spec:
selector:
app: nginxdemoshello
ports:
- protocol: TCP
port: 8080
targetPort: 8080
```

### Route
NOTE: Update the host with the cluster specific domain:
```
kind: Route
apiVersion: route.openshift.io/v1
metadata:
name: nginxdemoshello-route
spec:
host: nginxdemoshello-drb-recreate.apps.<cluster specific domain>
to:
kind: Service
name: nginxdemoshello-service
weight: 100
port:
targetPort: 8080
wildcardPolicy: None
```
You will now have 2 pods running but not in ready state, 1 service serving them, and a route pointing at the service.

### Steps
Create a basic pod and exec in:
```
oc run -it --rm --restart=Never ubi --image=ubi8/ubi sh
```
Within this pod run:

```
curl <pod1ip>:8080
curl <pod2ip>:8080
curl <service cluster ip>:8080
```

Note as expected the 2 pods return successfully (note that they include their own host ip in the response) and the service correctly fails, as no pods are ready.

From outside OCP (i.e. your laptop terminal) run:

```
curl <route host>
```

In some runs, we have seen the issue recreated at this point - despite the fact that the 2 pods are not ready, and the service is correctly not directing to either pod, the route may be directing to a pod, and you get back the response. If you have recreated the issue, you can not that the haproxy.conf file in the haproxy pods in openshift-ingress namespace contains both pods in the app level backends section, when it should be empty.

If the route correctly failed, you will have got an 'Application is not available' page response which is correct, as there are no ready pods.

Confirm the 2 pods are still not ready.

```
oc get pods -o wide
```
Now run the following (updating the pod-name to match the first of your pods) to make one of the pods ready:

```
oc exec nginxdemoshello-deployment-<pod1specific> -- touch /tmp/ready
```
Confirm that shortly after the changed pod has become ready:

```
oc get pods -o wide
```
Repeat the curls against the 2 pods and the service(from ubi pod), and the route(from laptop)

On 4.6 you may now find that the wrong pod is being serviced by the route - it is the not ready pod that responds. If so, you have recreated the issue. Optionally, you can view the haconfig.conf file as above to see it contains the not ready pod.

If the correct ready pod responded, run the following commands(update the pod name as specified, noting they run against different pods) to switch which pod is ready:

```
oc exec nginxdemoshello-deployment-<pod1specific> -- rm /tmp/ready
oc exec nginxdemoshello-deployment-<pod2specific> -- touch /tmp/ready
```
Repeat the curls. Hopefully now you have recreated the issue on 4.6.
On 4.5 the route will behave as expected in all cases.

Alternatively, if this still did not recreate the issue (both pods are in the haconfig.conf so its feasible that the correct pod always replies, and I have seen neither pod respond after step 1 though its possible i didnt give it long enough), you can confirm there is a problem by viewing the /var/lib/haproxy/conf/haproxy.conf file on the pods in openshift-ingress. You will note that despite the fact that 1 pod in not ready, it list both pods under the app level backends. On 4.5 you can note that only 1 will be shown.

Actual results:
Traffic is directed to both ready and non-ready pods

Expected results:
Traffic directed only to ready pods.

Additional info:
OCP 4.6 on AWS and private cloud.

Comment 1 Roberto Carrieri 2020-12-01 17:13:22 UTC

Hello Team,

Could you please prioritize this bugzilla, because of the high impact Customer showed us and also this case has high visibility to the management on both sides?

Thanks in advance for your efforts and support.

Kind regards,
Roberto Carrieri
Escalation Manager
Customer Experience & Engagement
Mobile: +420.702.269.469

Comment 2 Andrew McDermott 2020-12-01 17:20:02 UTC

Will look into this immediately.

Comment 3 Arthur Barr 2020-12-02 11:41:22 UTC

Is there any update on this issue, please?  Have you managed to reproduce it?  Very happy to perform additional diagnostics, but hopefully you can re-create based on the above.

I'm a little concerned that the "Target Release" has been set to 4.7.0, as we really need to see a fix on OCP 4.6.x, as this appears to be regressed/changed behaviour.

Comment 4 Andrew McDermott 2020-12-02 11:45:20 UTC

I can reproduce this. This was broken by the switch to endpointslices https://github.com/openshift/router/pull/154 which happened in 4.6.

Investigating a fix and will then back port to 4.6.

Comment 5 Andrew McDermott 2020-12-02 11:47:50 UTC

(In reply to Arthur Barr from comment #3)
> Is there any update on this issue, please?  Have you managed to reproduce
> it?  Very happy to perform additional diagnostics, but hopefully you can
> re-create based on the above.
> 
> I'm a little concerned that the "Target Release" has been set to 4.7.0, as
> we really need to see a fix on OCP 4.6.x, as this appears to be
> regressed/changed behaviour.

The procedure would mean that we first make the fix in 4.7 and then backport to 4.6.
I plan to have a fix up for review for 4.7 today.

Comment 6 Arthur Barr 2020-12-02 12:10:11 UTC

Thanks very much for the update.  Assuming this fix is accepted for 4.7, can you give any indication of a timeline for a fix on 4.6?  Any information would be appreciated.

Comment 7 Andrew McDermott 2020-12-02 12:29:34 UTC

(In reply to Arthur Barr from comment #6)
> Thanks very much for the update.  Assuming this fix is accepted for 4.7, can
> you give any indication of a timeline for a fix on 4.6?  Any information
> would be appreciated.

I just POSTed the PR: https://github.com/openshift/router/pull/229

If this gets reviewed and merged into 4.7 today then I can start the cherry-pick for 4.6.
Once picked for 4.6 that needs approval for a 4.6.z stream which may happen tomorrow. Failing
that it would be end of next week. Once it is merged into 4.7 I can give a better estimate.

Comment 9 Andrew McDermott 2020-12-03 13:45:11 UTC

Moving this back to POST as https://github.com/openshift/router/pull/231 needs to be part of the overall change. I was adding tests to origin/e2e to verify the change but that is overkill. PR #231 adds unit tests to the router.

Comment 11 Arvind iyengar 2020-12-04 08:59:59 UTC

The PR merge made into "4.7.0-0.nightly-2020-12-03-141554" release version. With this payload, it is noted that the fix effectively resolves the problem where the PODs when in a "Not ready" state, the haproxy configuration has an empty backend pool and curl to the external route fails as expected. When one or all the pods are available and in a ready state, The haproxy backend pool gets populated with the entries of the ready pod, and the external route traffic is sent to the specifically ready pods only. 

* With no pods in the "ready" state: 
-----
$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.7.0-0.nightly-2020-12-03-141554   True        False         64m     Cluster version is 4.7.0-0.nightly-2020-12-03-141554

$ oc create -f nginx-demoshell-deployment.yaml 
deployment.apps/nginxdemoshello-deployment created

$ oc create -f nginx-demoshell-service.yaml 
service/nginxdemoshello-service created

$ oc create -f nginx-demoshell-route.yaml 
route.route.openshift.io/nginxdemoshello-route created

$ oc get pods -o wide
NAME                                          READY   STATUS    RESTARTS   AGE   IP            NODE                                         NOMINATED NODE   READINESS GATES
nginxdemoshello-deployment-5b46f96478-gwlc7   0/1     Running   0          41s   10.131.0.49   ip-10-0-164-195.us-east-2.compute.internal   <none>           <none>
nginxdemoshello-deployment-5b46f96478-n6wlm   0/1     Running   0          41s   10.129.2.32   ip-10-0-199-23.us-east-2.compute.internal    <none>           <none>

The haproxy configuration does not contain the backend entries. The non-ready pods are now not added in the pool and curl to route now fails with no backend pods in the "ready" state:

backend be_http:test1:nginxdemoshello-route
  mode http
  option redispatch
  option forwardfor
  balance leastconn

  timeout check 5000ms
  http-request add-header X-Forwarded-Host %[req.hdr(host)]
  http-request add-header X-Forwarded-Port %[dst_port]
  http-request add-header X-Forwarded-Proto http if !{ ssl_fc }
  http-request add-header X-Forwarded-Proto https if { ssl_fc }
  http-request add-header X-Forwarded-Proto-Version h2 if { ssl_fc_alpn -i h2 }
  http-request add-header Forwarded for=%[src];host=%[req.hdr(host)];proto=%[req.hdr(X-Forwarded-Proto)]
  cookie 1384d216b7b1811db4625b94ff95ea56 insert indirect nocache httponly

Route is unreachble at this time:
$ curl nginxdemoshello-drb-test1.apps.aiyengar-oc47-1903206.qe.devcluster.openshift.com -I
HTTP/1.0 503 Service Unavailable
Pragma: no-cache
Cache-Control: private, max-age=0, no-cache, no-store
Connection: close
Content-Type: text/html
-----

* With one pod set to ready state:
------
$ oc exec nginxdemoshello-deployment-5b46f96478-gwlc7 -- touch /tmp/ready
$ oc get pods -o wide                                                    
NAME                                          READY   STATUS    RESTARTS   AGE   IP            NODE                                         NOMINATED NODE   READINESS GATES
nginxdemoshello-deployment-5b46f96478-gwlc7   1/1     Running   0          16m   10.131.0.49   ip-10-0-164-195.us-east-2.compute.internal   <none>           <none>
nginxdemoshello-deployment-5b46f96478-n6wlm   0/1     Running   0          16m   10.129.2.32   ip-10-0-199-23.us-east-2.compute.internal    <none>           <none>


Entry gets added in the haproxy backend config:
$ oc -n openshift-ingress exec router-default-6458cc5549-hfh6z -- grep -i "nginxdemoshello-route" haproxy.config -A15
backend be_http:test1:nginxdemoshello-route
  mode http
  option redispatch
  option forwardfor
  balance leastconn

  timeout check 5000ms
  http-request add-header X-Forwarded-Host %[req.hdr(host)]
  http-request add-header X-Forwarded-Port %[dst_port]
  http-request add-header X-Forwarded-Proto http if !{ ssl_fc }
  http-request add-header X-Forwarded-Proto https if { ssl_fc }
  http-request add-header X-Forwarded-Proto-Version h2 if { ssl_fc_alpn -i h2 }
  http-request add-header Forwarded for=%[src];host=%[req.hdr(host)];proto=%[req.hdr(X-Forwarded-Proto)]
  cookie 1384d216b7b1811db4625b94ff95ea56 insert indirect nocache httponly
  server pod:nginxdemoshello-deployment-5b46f96478-gwlc7:nginxdemoshello-service::10.131.0.49:8080 10.131.0.49:8080 cookie 1ecb3172ae59bdbf44242ac3d7873732 weight 256

The curl traffic now hits the "Ready" state pod only: 
$ curl nginxdemoshello-drb-test1.apps.aiyengar-oc47-1903206.qe.devcluster.openshift.com -I                           
HTTP/1.1 200 OK
Server: nginx/1.16.1
Date: Fri, 04 Dec 2020 08:47:00 GMT
Content-Type: text/plain
Content-Length: 175
Expires: Fri, 04 Dec 2020 08:46:59 GMT
Cache-Control: no-cache
Set-Cookie: 1384d216b7b1811db4625b94ff95ea56=1ecb3172ae59bdbf44242ac3d7873732; path=/; HttpOnly

$ curl nginxdemoshello-drb-test1.apps.aiyengar-oc47-1903206.qe.devcluster.openshift.com   
Server address: 10.131.0.49:8080
Server name: nginxdemoshello-deployment-5b46f96478-gwlc7
Date: 04/Dec/2020:08:47:02 +0000
URI: /
Request ID: f05bee9b4cf1bab32facfb9937e9e602

$ curl nginxdemoshello-drb-test1.apps.aiyengar-oc47-1903206.qe.devcluster.openshift.com 
Server address: 10.131.0.49:8080
Server name: nginxdemoshello-deployment-5b46f96478-gwlc7
Date: 04/Dec/2020:08:47:04 +0000
URI: /
Request ID: 5c6816d9f46ae794e8ea5017a122843e
------

Comment 12 chris.wilkinson@uk.ibm.com 2020-12-04 11:34:40 UTC

Now that this is merged to 4.7, are you able to indicate which 4.6.x update this fix will be targeted at?

Comment 13 Andrew McDermott 2020-12-04 18:12:39 UTC

(In reply to chris.wilkinson.com from comment #12)
> Now that this is merged to 4.7, are you able to indicate which 4.6.x update
> this fix will be targeted at?

4.6.8.

Currently waiting for the following PRs to merge in the 4.6 release branch:

- https://github.com/openshift/router/pull/230
- https://github.com/openshift/router/pull/232

Comment 18 errata-xmlrpc 2021-02-24 15:37:08 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633