Bug 1419086

Summary: Docker pull fails when accessing exposed registry through Load Balancer
Product: OpenShift Container Platform Reporter: Vladislav Walek <vwalek>
Component: NetworkingAssignee: Ben Bennett <bbennett>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Meng Bo <bmeng>
Severity: medium Docs Contact:
Priority: high    
Version: 3.3.0CC: aos-bugs, bbennett, rromerom, vwalek
Target Milestone: ---   
Target Release: 3.3.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-07-26 13:20:16 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Vladislav Walek 2017-02-03 15:21:46 UTC
Description of problem:

Customer recently upgraded the environment form 3.2 to 3.3. Customer is trying to pull image from exposed registry. The registry is running on openshift, service is securely exposed using re-encrypt. The have between the client and openshift router a load balancer provided by cluster provider (can't reconfigure the load balancer) where some encryption is done.
When they try to pull image on some external client with docker it fails using token from service account:

docker pull registry.com:443/namespace/image:latest

Will give truncated response and then docker fails. If they try that with curl, the response is complete:

curl -i -s -k  -X $'GET' -H $'User-Agent: docker/1.10.3 go/go1.6.3 git-commit/3999ccb-unsupported kernel/3.10.0-514.2.2.el7.x86_64 os/linux arch/amd64' -H $'Authorization: Bearer <long_token_here>' $'https://registry.com/v2/namepspace/image/manifests/latest'

When they tried to bypass the load balancer and use one of the router, the pull works normally. Even if the service is re-encrypt. 
Something is done on Load Balancer which brakes the docker pull. (docker push also doesn't work).

Version-Release number of selected component (if applicable):

Openshift 3.3

How reproducible:

Comment 7 Ben Bennett 2017-02-06 15:34:18 UTC
And, if worst comes to worst, we may need to get some wireshark traces from various points to see if one end is tearing down the connection abruptly.

Comment 10 Ruben Romero Montes 2017-02-06 15:49:47 UTC
I have requested more details about the load balancer and some tcpdumps between client -> lbalancer and lbalancer -> node

Comment 11 Vladislav Walek 2017-02-08 12:28:04 UTC
Hello, I have reply from customer.

The test : encrypted traffic from outside to LB and un-encrypted from LB to router works fine too.

Also, here is the response from Noris about load balancer:
> What kind of balancer is used?

A10 hardware appliance.

> How it encrypts the traffic, what steps are done?

TLS from the client-side is terminated on the load balancer.
Towards the servers a new TLS connection is opened.

> Which SNI is used on load balancer?

I don't understand this question.
What exactly do they want to know?
SNI is supported on the Load balancer.
However there's only one certificate in use for the virtual-server.
So SNI is not needed.

Unfortunately, customer can't provide the tcpdumps, due the load balancer is held by provider and they can't decrypt the tcpdumps.

Comment 20 Ben Bennett 2017-07-26 13:20:16 UTC
Closing due to insufficient data.  Everything points to the external loadbalancer as the problem because if they hit the OpenShift router directly, it works.  If more information arises, please re-open.