Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1623632

Summary: Incorrect http request method to Router stats port results in connection reset.
Product: OpenShift Container Platform Reporter: Ryan Howe <rhowe>
Component: NetworkingAssignee: Ram Ranganathan <ramr>
Networking sub component: router QA Contact: zhaozhanqi <zzhao>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: unspecified CC: aos-bugs, bbennett, dmace, hongli, piqin, xtian
Version: 3.10.0   
Target Milestone: ---   
Target Release: 3.11.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-10-11 07:25:55 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Ryan Howe 2018-08-29 18:57:24 UTC
Description of problem:
Incorrect http request method to Router stats port results in connection reset. 

Version-Release number of selected component (if applicable):
3.7+ 

How reproducible:
100% 

Steps to Reproduce:
#  telnet 10.10.94.150  1936
Trying 10.10.94.150...
Connected to 10.10.94.150.
Escape character is '^]'.
Get /healthz HTTP/1.1
host:apps.ocp.example.com
connection:close

Connection closed by foreign host.

Actual results:

  Connection reset. 

    1   0.000000 10.10.94.149 → 10.10.94.150 TCP 74 54824 → 1936 [SYN] Seq=0 Win=29200 Len=0 MSS=1460 SACK_PERM=1 TSval=819120536 TSecr=0 WS=128
    2   0.000085 10.10.94.150 → 10.10.94.149 TCP 74 1936 → 54824 [SYN, ACK] Seq=0 Ack=1 Win=28960 Len=0 MSS=1460 SACK_PERM=1 TSval=819121880 TSecr=819120536 WS=128
    3   0.000259 10.10.94.149 → 10.10.94.150 TCP 66 54824 → 1936 [ACK] Seq=1 Ack=1 Win=29312 Len=0 TSval=819120537 TSecr=819121880
    4   0.448752 10.10.94.149 → 10.10.94.150 TCP 89 54824 → 1936 [PSH, ACK] Seq=1 Ack=1 Win=29312 Len=23 TSval=819120985 TSecr=819121880
    5   0.448811 10.10.94.150 → 10.10.94.149 TCP 66 1936 → 54824 [ACK] Seq=1 Ack=24 Win=29056 Len=0 TSval=819122329 TSecr=819120985
    6   0.448925 10.10.94.150 → 10.10.94.149 TCP 66 1936 → 54824 [RST, ACK] Seq=1 Ack=24 Win=29056 Len=0 TSval=0 TSecr=819120985
    7   0.449097 10.10.94.149 → 10.10.94.150 TCP 119 54824 → 1936 [PSH, ACK] Seq=24 Ack=1 Win=29312 Len=53 TSval=819120985 TSecr=819122329
    8   0.449122 10.10.94.150 → 10.10.94.149 TCP 54 1936 → 54824 [RST] Seq=1 Win=0 Len=0


Expected results:

Either respond with a 501 or accept the Get even though the method is case sensitive.

https://www.w3.org/Protocols/rfc2616/rfc2616-sec5.html#sec5.1.1
   "... An origin server SHOULD return the status code 405 (Method Not Allowed) if the method is known by the origin server but not allowed for the requested resource, and 501 (Not Implemented) if the method is unrecognized or not implemented by the origin server. . . ."


Additional info:

   https://github.com/openshift/origin/blob/release-3.7/pkg/router/metrics/metrics.go

A "Get" worked in 3.6 but due to this commit we now get reset when this is passed. 

   https://github.com/openshift/origin/commit/feefce602f5c2abb54c9b493fc4831d6af6867f8

Comment 1 Dan Mace 2018-08-30 13:07:58 UTC
Ram, want to fix up the handler to return a 501 for unsupported methods?

Comment 2 Ram Ranganathan 2018-09-05 00:57:31 UTC
@Dan will do, looks like this is happening because the method matching inside the cockroachdb cmux code is looking for exact matches on the method name when called from: 
https://github.com/openshift/origin/blob/master/pkg/router/metrics/metrics.go#L135

And because of that it doesn't match HTTP1Fast and as a result defaults to assuming it is a TLS request at:  
https://github.com/openshift/origin/blob/master/pkg/router/metrics/metrics.go#L148

and since it not a TLS request, the router logs an error and looks like the code just closes the connection. 
I will see if we can the slow http route and if that works. Or alternatively return an error - don't know what hooks we have available via the cockroach db code, so the actual http status we can return will be dependent on that.

Comment 3 Ram Ranganathan 2018-09-06 00:16:56 UTC
Ok, so it looks like if we just use the HTTP1 protocol matcher (instead of HTTP1Fast) in cockroach db, it does an ignore case check. 

Associated PR: https://github.com/openshift/origin/pull/20873

Some rudimentary tests at: https://gist.github.com/ramr/ced20f285c07b26942f90ae5a961b249

Comment 5 zhaozhanqi 2018-09-14 09:48:54 UTC
I checked this https://github.com/openshift/origin/pull/20873 only was merged in master branch, but not in 3.11 branch. So update the status to 'MODIFIED"

Comment 7 zhaozhanqi 2018-09-19 06:04:52 UTC
Just check again. I saw this https://github.com/openshift/origin/pull/20873 merged to branch 'enterprise-3.11-backup-eparis' but not 'enterprise-3.11', and the latest release 3.11.8-1 is using branch 'enterprise-3.11'. So this PR need to rebase to 'enterprise-3.11'. please correct me if I'm wrong.

Comment 12 zhaozhanqi 2018-09-21 05:41:21 UTC
Verified this bug on v3.11.12

the connection did not be reset even if using lowercase 'Get', see:
 #telnet 127.0.0.1 1936
Trying 127.0.0.1...
Connected to 127.0.0.1.
Escape character is '^]'.
Get /healthz HTTP/1.1
host:apps.ocp.example.com

Comment 14 errata-xmlrpc 2018-10-11 07:25:55 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2652