Bug 1733656 - 4.2 AWS IPI deployment fails waiting for Kubernetes API: context deadline exceeded
Summary: 4.2 AWS IPI deployment fails waiting for Kubernetes API: context deadline exc...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 4.2.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: Abhinav Dahiya
QA Contact: Johnny Liu
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-07-27 01:12 UTC by Coady LaCroix
Modified: 2019-07-29 17:13 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-07-27 01:57:56 UTC
Target Upstream Version:


Attachments (Terms of Use)
log-bundle (989.42 KB, application/gzip)
2019-07-27 01:12 UTC, Coady LaCroix
no flags Details
bootkube-service-log (6.99 KB, text/plain)
2019-07-27 01:14 UTC, Coady LaCroix
no flags Details
openshitf-installer log (134.42 KB, text/plain)
2019-07-27 01:15 UTC, Coady LaCroix
no flags Details

Description Coady LaCroix 2019-07-27 01:12:34 UTC
Created attachment 1593815 [details]
log-bundle

Description of problem: After using the openshift-installer to generate an install-config I attempt to deploy a cluster to AWS. The deployment eventually fails waiting for Kubernetes API: context deadline exceeded. I was able to ssh to the bootstrap node in AWS and retrieve data from the bootkube.service log (see attachment).


openshift-installer log snippet:

time="2019-07-26T16:09:41-07:00" level=debug msg="Still waiting for the Kubernetes API: Get https://api.ocs-ci-clacroix.qe.rh-ocs.com:6443/version?timeout=32s: dial tcp 18.218.245.145:6443: connect: connection refused"
time="2019-07-26T16:09:54-07:00" level=debug msg="Fetching \"Install Config\"..."
time="2019-07-26T16:09:54-07:00" level=debug msg="Loading \"Install Config\"..."
time="2019-07-26T16:09:54-07:00" level=debug msg="  Loading \"SSH Key\"..."
time="2019-07-26T16:09:54-07:00" level=debug msg="  Loading \"Base Domain\"..."
time="2019-07-26T16:09:54-07:00" level=debug msg="    Loading \"Platform\"..."
time="2019-07-26T16:09:54-07:00" level=debug msg="  Loading \"Cluster Name\"..."
time="2019-07-26T16:09:54-07:00" level=debug msg="    Loading \"Base Domain\"..."
time="2019-07-26T16:09:54-07:00" level=debug msg="  Loading \"Pull Secret\"..."
time="2019-07-26T16:09:54-07:00" level=debug msg="  Loading \"Platform\"..."
time="2019-07-26T16:09:54-07:00" level=debug msg="Using \"Install Config\" loaded from state file"
time="2019-07-26T16:09:54-07:00" level=debug msg="Reusing previously-fetched \"Install Config\""
time="2019-07-26T16:09:54-07:00" level=info msg="Pulling debug logs from the bootstrap machine"
time="2019-07-26T16:09:55-07:00" level=debug msg="Gathering bootstrap journals ..."
time="2019-07-26T16:09:56-07:00" level=debug msg="Gathering bootstrap containers ..."
time="2019-07-26T16:09:59-07:00" level=debug msg="time=\"2019-07-26T23:09:59Z\" level=fatal msg=\"failed to connect: failed to connect: context deadline exceeded\""
time="2019-07-26T16:10:00-07:00" level=debug msg="Gathering rendered assets..."
time="2019-07-26T16:10:00-07:00" level=debug msg="Gathering cluster resources ..."
time="2019-07-26T16:10:01-07:00" level=debug msg="Waiting for logs ..."
time="2019-07-26T16:10:02-07:00" level=debug msg="The connection to the server api.ocs-ci-clacroix.qe.rh-ocs.com:6443 was refused - did you specify the right host or port?"

...

time="2019-07-26T16:10:04-07:00" level=debug msg="The connection to the server api.ocs-ci-clacroix.qe.rh-ocs.com:6443 was refused - did you specify the right host or port?"
time="2019-07-26T16:10:04-07:00" level=debug msg="Gather remote logs"
time="2019-07-26T16:10:04-07:00" level=debug msg="Collecting info from 10.0.132.64"
time="2019-07-26T16:10:04-07:00" level=debug msg="lost connection"
time="2019-07-26T16:10:04-07:00" level=debug msg="ssh: connect to host 10.0.132.64 port 22: Connection refused\r"
time="2019-07-26T16:10:04-07:00" level=debug msg="Collecting info from 10.0.147.0"
time="2019-07-26T16:10:04-07:00" level=debug msg="lost connection"
time="2019-07-26T16:10:04-07:00" level=debug msg="ssh: connect to host 10.0.147.0 port 22: Connection refused\r"
time="2019-07-26T16:10:04-07:00" level=debug msg="Collecting info from 10.0.170.19"
time="2019-07-26T16:10:04-07:00" level=debug msg="lost connection"
time="2019-07-26T16:10:04-07:00" level=debug msg="ssh: connect to host 10.0.170.19 port 22: Connection refused\r"
time="2019-07-26T16:10:04-07:00" level=debug msg="Log bundle written to ~/log-bundle.tar.gz"
time="2019-07-26T16:10:05-07:00" level=info msg="Bootstrap gather logs captured here \"/Users/clacroix/clusters/4.2.0-deploy-16/log-bundle-20190726161004.tar.gz\""
time="2019-07-26T16:10:05-07:00" level=fatal msg="waiting for Kubernetes API: context deadline exceeded"



Version-Release number of selected component (if applicable):
4.2.0-0.nightly-2019-07-26-152831

This is the latest version I have attempted with. I have tried various other verions (both ci and nightly) that were accepted in the past few days with the same exact results.


How reproducible: 100% of attempts to deploy using recent versions of 4.2.


Steps to Reproduce:
1. Generate install-config using openshift-installer
2. Attempt deployment of cluster to AWS
3. 

Actual results: Deployment failure - waiting for Kubernetes API: context deadline exceeded. 


Expected results: Successful deployment


Additional info:

Comment 1 Coady LaCroix 2019-07-27 01:14:21 UTC
Created attachment 1593816 [details]
bootkube-service-log

Comment 2 Coady LaCroix 2019-07-27 01:15:42 UTC
Created attachment 1593817 [details]
openshitf-installer log

Comment 3 Abhinav Dahiya 2019-07-27 01:57:56 UTC
Jul 26 22:39:47 ip-10-0-7-58 bootkube.sh[1539]: time="2019-07-26T22:39:47Z" level=error msg="Error pulling image ref //registry.svc.ci.openshift.org/ocp/release@sha256:6ccb990f8616a6efca05b411af56c79f5c4502f05fcd1f5cfce143858a8d0986: Error initializing source docker://registry.svc.ci.openshift.org/ocp/release@sha256:6ccb990f8616a6efca05b411af56c79f5c4502f05fcd1f5cfce143858a8d0986: Error reading manifest sha256:6ccb990f8616a6efca05b411af56c79f5c4502f05fcd1f5cfce143858a8d0986 in registry.svc.ci.openshift.org/ocp/release: unauthorized: authentication required"

Please use the correct pull secret when using installer+release-image build by CI.

Comment 4 Coady LaCroix 2019-07-29 15:46:48 UTC
(In reply to Abhinav Dahiya from comment #3)

> Please use the correct pull secret when using installer+release-image build
> by CI.

Can you point me to where I can generate the correct pull secret for these builds or how to construct it myself? I've been using the pull secret I downloaded from openshift.com's install steps.

Comment 5 Raz Tamir 2019-07-29 16:31:38 UTC
Hi,

Could you please help with comment #4?


Note You need to log in before you can comment on or make changes to this bug.