Bug 1663268

Summary: Enabling router metrics browser page cause "Readiness probe failed: HTTP probe failed with statuscode: 401"
Product: OpenShift Container Platform Reporter: Alan Chan <alchan>
Component: NetworkingAssignee: Dan Mace <dmace>
Networking sub component: router QA Contact: Hongan Li <hongli>
Status: CLOSED ERRATA Docs Contact:
Severity: unspecified    
Priority: unspecified CC: aos-bugs, tmanor
Version: 3.11.0   
Target Milestone: ---   
Target Release: 3.11.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-03-14 02:17:59 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Alan Chan 2019-01-03 15:34:15 UTC
Description of problem:

Using the doc: https://docs.openshift.com/container-platform/3.11/install_config/router/default_haproxy_router.html#exposing-the-router-metrics, to enable router metrics page for browser.

Would cause the pod to fail to start up with "Readiness probe failed: HTTP probe failed with statuscode: 401".


Version-Release number of selected component (if applicable):

3.11.51


How reproducible:

- Repeatedly.


Steps to Reproduce:

1. A working cluster with router pods

2. oc set env dc router ROUTER_METRICS_TYPE- ROUTER_LISTEN_ADDR-

3. oc -n default get pod -w
  - shows pod stuck at deploy and eventually fails deploy

4. oc -n default get events
  - shows readiness probe failure


Actual results:

- router pod fails to deploy

Expected results:

- router pod deploy successful and metrics page enabled for browser at port 1936


Additional info:

Comment 1 Alan Chan 2019-01-07 20:31:05 UTC
Workaround based on what Luke found in the case 02284589 from 3.9:

- oc patch dc router -p '"spec": {"template": {"spec": {"containers": [{"name": "router","readinessProbe": {"httpGet": {"path": "healthz"}}}]}}}'

- oc set env dc router-dmz ROUTER_METRICS_TYPE-

Comment 2 Alan Chan 2019-01-07 20:33:52 UTC
(In reply to Alan C from comment #1)
> Workaround based on what Luke found in the case 02284589 from 3.9:
> 
> - oc patch dc router -p '"spec": {"template": {"spec": {"containers":
> [{"name": "router","readinessProbe": {"httpGet": {"path": "healthz"}}}]}}}'
> 
> - oc set env dc router-dmz ROUTER_METRICS_TYPE-

Sorry, typo, should be:

- oc set env dc router ROUTER_METRICS_TYPE-

...to match the patch command.

Comment 3 Ram Ranganathan 2019-02-13 00:09:06 UTC
Yeah, the issue is we changed the probe paths to be different to distinguish between liveness and readiness checks. 
But when you disable the listener on the stats port on the openshift-router and enable it on haproxy, there is only one 
"unauthenticated" endpoint available (as haproxy only allows one monitor-uri) to use.

So we do have to match up both the probes to the same endpoint "/healthz" when we remove ROUTER_LISTEN_ADDR - will create a PR for 
the docs team to also mention that we need to update the readiness probe in that section.  Thanks.

Comment 4 Ram Ranganathan 2019-02-13 01:44:34 UTC
Docs PR: https://github.com/openshift/openshift-docs/pull/13608

Comment 5 Dan Mace 2019-02-18 21:15:25 UTC
Moving to MODIFIED as the docs update merged.

Comment 7 Ram Ranganathan 2019-02-25 06:23:05 UTC
Docs PR against master: https://github.com/openshift/openshift-docs/pull/13622 

The 3.11 PR (https://github.com/openshift/openshift-docs/pull/13608) got closed by Vikram so am not certain if this is 
ready for QA unless I missed another docs PR.

Comment 8 Hongan Li 2019-02-26 08:25:13 UTC
the Doc PR looks good, thanks

Comment 10 errata-xmlrpc 2019-03-14 02:17:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0407