Bug 1569311 - master api static pod is killed and started again and again due to its Readiness and Liveness probe port is hardcode "8443" when openshift_master_api_port is set to 443.
Summary: master api static pod is killed and started again and again due to its Readin...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 3.10.0
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: ---
: 3.10.0
Assignee: Michael Gugino
QA Contact: Weihua Meng
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-04-19 03:32 UTC by Johnny Liu
Modified: 2018-07-30 19:13 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-07-30 19:13:21 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:1816 0 None None None 2018-07-30 19:13:51 UTC

Description Johnny Liu 2018-04-19 03:32:19 UTC
Description of problem:
When user is setting openshift_master_api_port=443 in inventory file, api is listening on 443, while static pod's Readiness and Liveness probe port is still "8443", which is defined as hardcode in roles/openshift_control_plane/files/apiserver.yaml.

Version-Release number of the following components:
openshift-ansible-3.10.0-0.22.0

How reproducible:
Always

Steps to Reproduce:
1. Setting openshift_master_api_port=443 in inventory file
2.
3.

Actual results:
master api static pod is killed and started again and again due to its Readiness and Liveness probe failed.

node logs:
Apr 18 23:02:56 ip-172-18-7-137.ec2.internal atomic-openshift-node[19594]: I0418 23:02:56.394155   19594 prober.go:111] Liveness probe for "master-api-ip-172-18-7-137.ec2.internal_kube-system(c841c6034a69c9ebc7a2f4b67b059785):api" failed (failure): Get https://172.18.7.137:8443/healthz: dial tcp 172.18.7.137:8443: getsockopt: connection refused
Apr 18 23:02:56 ip-172-18-7-137.ec2.internal atomic-openshift-node[19594]: I0418 23:02:56.394300   19594 server.go:428] Event(v1.ObjectReference{Kind:"Pod", Namespace:"kube-system", Name:"master-api-ip-172-18-7-137.ec2.internal", UID:"c841c6034a69c9ebc7a2f4b67b059785", APIVersion:"v1", ResourceVersion:"", FieldPath:"spec.containers{api}"}): type: 'Warning' reason: 'Unhealthy' Liveness probe failed: Get https://172.18.7.137:8443/healthz: dial tcp 172.18.7.137:8443: getsockopt: connection refused
Apr 18 23:03:00 ip-172-18-7-137.ec2.internal atomic-openshift-node[19594]: I0418 23:03:00.952844   19594 prober.go:111] Readiness probe for "master-api-ip-172-18-7-137.ec2.internal_kube-system(c841c6034a69c9ebc7a2f4b67b059785):api" failed (failure): Get https://172.18.7.137:8443/healthz/ready: dial tcp 172.18.7.137:8443: getsockopt: connection refused
Apr 18 23:03:00 ip-172-18-7-137.ec2.internal atomic-openshift-node[19594]: I0418 23:03:00.953316   19594 server.go:428] Event(v1.ObjectReference{Kind:"Pod", Namespace:"kube-system", Name:"master-api-ip-172-18-7-137.ec2.internal", UID:"c841c6034a69c9ebc7a2f4b67b059785", APIVersion:"v1", ResourceVersion:"", FieldPath:"spec.containers{api}"}): type: 'Warning' reason: 'Unhealthy' Readiness probe failed: Get https://172.18.7.137:8443/healthz/ready: dial tcp 172.18.7.137:8443: getsockopt: connection refused



Expected results:
Readiness and Liveness probe port should not be a hardcode in roles/openshift_control_plane/files/apiserver.yaml

Additional info:
Please attach logs from ansible-playbook with the -vvv flag

Comment 1 Michael Gugino 2018-04-19 19:52:09 UTC
PR Created: https://github.com/openshift/openshift-ansible/pull/8052

Comment 2 Scott Dodson 2018-04-23 20:16:47 UTC
In openshift-ansible-3.10.0-0.27.0

Comment 3 Weihua Meng 2018-04-24 09:40:30 UTC
Fixed.
openshift-ansible-3.10.0-0.27.0

openshift_master_api_port=443
installation successful

# oc describe pod/master-api-ip-172-18-0-210.ec2.internal

    Ready:          True
    Restart Count:  0
    Liveness:       http-get https://:443/healthz delay=45s timeout=1s period=10s #success=1 #failure=3
    Readiness:      http-get https://:443/healthz/ready delay=10s timeout=1s period=10s #success=1 #failure=3


  Operating System: Red Hat Enterprise Linux Server 7.5 (Maipo)
       CPE OS Name: cpe:/o:redhat:enterprise_linux:7.5:GA:server
            Kernel: Linux 3.10.0-862.el7.x86_64

Comment 5 errata-xmlrpc 2018-07-30 19:13:21 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:1816


Note You need to log in before you can comment on or make changes to this bug.