Bug 1300471

Summary: Openshift node Installer fails connecting to master ; but master is running
Product: OKD Reporter: Jay Vyas <jvyas>
Component: InstallerAssignee: Jason DeTiberus <jdetiber>
Status: CLOSED WONTFIX QA Contact: Ma xiaoqiang <xiama>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.xCC: abhgupta, aos-bugs, bleanhar, ccoleman, erich, ffranz, libra-bugs, mmccomas, xtian
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1000581 Environment:
Last Closed: 2016-02-23 17:00:43 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jay Vyas 2016-01-20 23:45:58 UTC
Description: 

- After running 

sh <(curl -s https://install.openshift.com/ose/)  

on a 2 node, 1 master cluster in EC2, we can see that the ansible task for node startup fails.

- Looking deeper, we can extract the error:

[root@openshift ec2-user]# systemctl status openshift-node.service -o verbose

- And the result is: 

MESSAGE=E0120 18:30:21.606933    5868 reflector.go:209] pkg/kubelet/kubelet.go:182: Failed to watch *api.Service: Get https://ip-172-18-14-218.ec2.internal:8443/api/v1/watch/services?resourceVersion=1912: dial tcp 172.18.14.218:8443: connection refused

To do some sanity checks i confirmed: that this IP is reachable FROM the node... 

[root@openshift ec2-user]#  wget https://ip-172-18-14-218.ec2.internal:8443/api/v1/watch/services?resourceVersion=1912 --no-check-certificate
--2016-01-20 18:44:25--  https://ip-172-18-14-218.ec2.internal:8443/api/v1/watch/services?resourceVersion=1912
Resolving ip-172-18-14-218.ec2.internal (ip-172-18-14-218.ec2.internal)... 172.18.14.218
Connecting to ip-172-18-14-218.ec2.internal (ip-172-18-14-218.ec2.internal)|172.18.14.218|:8443... connected.
WARNING: cannot verify ip-172-18-14-218.ec2.internal's certificate, issued by ‘/CN=openshift-signer@1453327451’:
  Self-signed certificate encountered.
HTTP request sent, awaiting response... 403 Forbidden
2016-01-20 18:44:25 ERROR 403: Forbidden.


So I assume that its related to certificates not being percolated properly to the node that is trying to connect to the apiserver on the master.

DESIRED BEHAVIOUR

Maybe there could be some checks before starting the Nodes in the installer, that the nodes can use certs to properly connect to the master.

Comment 2 Brenton Leanhardt 2016-02-23 17:00:43 UTC
If this is still a problem please let us know.  If this was an origin v3 install you shouldn't be using install.openshift.com/ose now and instead follow https://docs.openshift.org/latest/install_config/install/index.html