Bug 1708307 - Gather bootstrap doesn't have perms to gather
Summary: Gather bootstrap doesn't have perms to gather
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 4.1.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: 4.1.0
Assignee: Jeremiah Stuever
QA Contact: sheng.lao
URL:
Whiteboard:
: 1706750 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-05-09 14:55 UTC by Chris Callegari
Modified: 2019-06-04 10:48 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-06-04 10:48:39 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
log-bundle.tar.gz (20.07 KB, application/gzip)
2019-05-09 14:56 UTC, Chris Callegari
no flags Details
openshift_install_dir.tar.gz (1.24 MB, application/gzip)
2019-05-09 14:59 UTC, Chris Callegari
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:0758 0 None None None 2019-06-04 10:48:47 UTC

Description Chris Callegari 2019-05-09 14:55:13 UTC
Description of problem:
Unable to deploy using nightly build using IPI method

Bootstrap never flips into healthy state in all three AWS NLBs and causes deployment failure.

Version-Release number of the following components:
$ ~/bin/openshift-install version
/home/ccallega/bin/openshift-install v4.1.0-201905081711-dirty
built from commit 218340f12450cae2961abfaab5985c0677dd63b5
release image registry.svc.ci.openshift.org/ocp/release@sha256:1cb302f7f7508582c5150ee908279e4a52614e801b9785a11b74b1ae7834f501

Bootstrap EC2 AMI: mycluster-x9lxw-master (ami-0dd265e79a060be94)

How reproducible:
Always

Steps to Reproduce:
1. ~/bin/openshift-install --dir=/tmp/openshift/mycluster create install-config
2. ~/bin/openshift-install --dir=/tmp/openshift/mycluster create cluster

Actual results:
blah blah blah...
DEBUG Apply complete! Resources: 117 added, 0 changed, 0 destroyed.
DEBUG
DEBUG The state of your infrastructure has been saved to the path
DEBUG below. This state is required to modify and destroy your
DEBUG infrastructure, so keep it safe. To inspect the complete state
DEBUG use the `terraform show` command.
DEBUG
DEBUG State path: /tmp/openshift-install-269239872/terraform.tfstate
DEBUG OpenShift Installer v4.1.0-201905081711-dirty
DEBUG Built from commit 218340f12450cae2961abfaab5985c0677dd63b5
INFO Waiting up to 30m0s for the Kubernetes API at https://api.mycluster.ccallegar-aws.sysdeseng.com:6443...
DEBUG Still waiting for the Kubernetes API: Get https://api.mycluster.ccallegar-aws.sysdeseng.com:6443/version?timeout=32s: dial tcp 3.14.206.133:6443: connect: connection refused
DEBUG Still waiting for the Kubernetes API: Get https://api.mycluster.ccallegar-aws.sysdeseng.com:6443/version?timeout=32s: dial tcp 3.14.206.133:6443: connect: connection refused
DEBUG Still waiting for the Kubernetes API: Get https://api.mycluster.ccallegar-aws.sysdeseng.com:6443/version?timeout=32s: dial tcp 3.14.206.133:6443: connect: connection refused
DEBUG Still waiting for the Kubernetes API: Get https://api.mycluster.ccallegar-aws.sysdeseng.com:6443/version?timeout=32s: dial tcp 3.14.206.133:6443: connect: connection refused
DEBUG Still waiting for the Kubernetes API: Get https://api.mycluster.ccallegar-aws.sysdeseng.com:6443/version?timeout=32s: dial tcp 3.14.206.133:6443: connect: connection refused
....and so on until FATAL

Expected results:
A complete OpenShift installation

Additional info:
$ ssh -A core.107.194 '/usr/local/bin/installer-gather.sh 10.0.136.54 10.0.156.35 10.0.166.221'
The authenticity of host '18.223.107.194 (18.223.107.194)' can't be established.
ECDSA key fingerprint is SHA256:MMEg3BaYWgMQgGO618QPLM+ZKbvCFWfuHLYBW1whSsw.
ECDSA key fingerprint is MD5:45:f3:19:55:15:4f:56:93:d6:a5:51:d3:d4:94:54:47.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '18.223.107.194' (ECDSA) to the list of known hosts.
Gathering bootstrap journals ...
Gathering bootstrap containers ...
Gathering rendered assets...
cp: cannot open '/var/opt/openshift/auth/kubeconfig' for reading: Permission denied
cp: cannot open '/var/opt/openshift/auth/kubeconfig-kubelet' for reading: Permission denied
cp: cannot open '/var/opt/openshift/tls/admin-kubeconfig-ca-bundle.crt' for reading: Permission denied
cp: cannot open '/var/opt/openshift/tls/aggregator-ca.key' for reading: Permission denied
cp: cannot open '/var/opt/openshift/tls/aggregator-ca.crt' for reading: Permission denied
cp: cannot open '/var/opt/openshift/tls/aggregator-ca-bundle.crt' for reading: Permission denied
cp: cannot open '/var/opt/openshift/tls/aggregator-client.key' for reading: Permission denied
cp: cannot open '/var/opt/openshift/tls/aggregator-client.crt' for reading: Permission denied
cp: cannot open '/var/opt/openshift/tls/aggregator-signer.key' for reading: Permission denied
cp: cannot open '/var/opt/openshift/tls/aggregator-signer.crt' for reading: Permission denied
cp: cannot open '/var/opt/openshift/tls/apiserver-proxy.key' for reading: Permission denied
cp: cannot open '/var/opt/openshift/tls/apiserver-proxy.crt' for reading: Permission denied
cp: cannot open '/var/opt/openshift/tls/etcd-ca-bundle.crt' for reading: Permission denied
cp: cannot open '/var/opt/openshift/tls/etcd-metric-ca-bundle.crt' for reading: Permission denied
cp: cannot open '/var/opt/openshift/tls/etcd-metric-signer.key' for reading: Permission denied
cp: cannot open '/var/opt/openshift/tls/etcd-metric-signer.crt' for reading: Permission denied
cp: cannot open '/var/opt/openshift/tls/etcd-metric-signer-client.key' for reading: Permission denied
cp: cannot open '/var/opt/openshift/tls/etcd-metric-signer-client.crt' for reading: Permission denied
cp: cannot open '/var/opt/openshift/tls/etcd-signer.key' for reading: Permission denied
cp: cannot open '/var/opt/openshift/tls/etcd-signer.crt' for reading: Permission denied
cp: cannot open '/var/opt/openshift/tls/etcd-client.key' for reading: Permission denied
cp: cannot open '/var/opt/openshift/tls/etcd-client.crt' for reading: Permission denied
cp: cannot open '/var/opt/openshift/tls/kube-apiserver-lb-ca-bundle.crt' for reading: Permission denied
cp: cannot open '/var/opt/openshift/tls/kube-apiserver-lb-server.key' for reading: Permission denied
cp: cannot open '/var/opt/openshift/tls/kube-apiserver-lb-server.crt' for reading: Permission denied
cp: cannot open '/var/opt/openshift/tls/kube-apiserver-internal-lb-server.key' for reading: Permission denied
cp: cannot open '/var/opt/openshift/tls/kube-apiserver-internal-lb-server.crt' for reading: Permission denied
cp: cannot open '/var/opt/openshift/tls/kube-apiserver-lb-signer.key' for reading: Permission denied
cp: cannot open '/var/opt/openshift/tls/kube-apiserver-lb-signer.crt' for reading: Permission denied
cp: cannot open '/var/opt/openshift/tls/kube-apiserver-localhost-ca-bundle.crt' for reading: Permission denied
cp: cannot open '/var/opt/openshift/tls/kube-apiserver-localhost-server.key' for reading: Permission denied
cp: cannot open '/var/opt/openshift/tls/kube-apiserver-localhost-server.crt' for reading: Permission denied
cp: cannot open '/var/opt/openshift/tls/kube-apiserver-localhost-signer.key' for reading: Permission denied
cp: cannot open '/var/opt/openshift/tls/kube-apiserver-localhost-signer.crt' for reading: Permission denied
cp: cannot open '/var/opt/openshift/tls/kube-apiserver-service-network-ca-bundle.crt' for reading: Permission denied
cp: cannot open '/var/opt/openshift/tls/kube-apiserver-service-network-server.key' for reading: Permission denied
cp: cannot open '/var/opt/openshift/tls/kube-apiserver-service-network-server.crt' for reading: Permission denied
cp: cannot open '/var/opt/openshift/tls/kube-apiserver-service-network-signer.key' for reading: Permission denied
cp: cannot open '/var/opt/openshift/tls/kube-apiserver-service-network-signer.crt' for reading: Permission denied
cp: cannot open '/var/opt/openshift/tls/kube-apiserver-complete-server-ca-bundle.crt' for reading: Permission denied
cp: cannot open '/var/opt/openshift/tls/kube-apiserver-complete-client-ca-bundle.crt' for reading: Permission denied
cp: cannot open '/var/opt/openshift/tls/kube-apiserver-to-kubelet-ca-bundle.crt' for reading: Permission denied
cp: cannot open '/var/opt/openshift/tls/kube-apiserver-to-kubelet-client.key' for reading: Permission denied
cp: cannot open '/var/opt/openshift/tls/kube-apiserver-to-kubelet-client.crt' for reading: Permission denied
cp: cannot open '/var/opt/openshift/tls/kube-apiserver-to-kubelet-signer.key' for reading: Permission denied
cp: cannot open '/var/opt/openshift/tls/kube-apiserver-to-kubelet-signer.crt' for reading: Permission denied
cp: cannot open '/var/opt/openshift/tls/kube-control-plane-ca-bundle.crt' for reading: Permission denied
cp: cannot open '/var/opt/openshift/tls/kube-control-plane-kube-controller-manager-client.key' for reading: Permission denied
cp: cannot open '/var/opt/openshift/tls/kube-control-plane-kube-controller-manager-client.crt' for reading: Permission denied
cp: cannot open '/var/opt/openshift/tls/kube-control-plane-kube-scheduler-client.key' for reading: Permission denied
cp: cannot open '/var/opt/openshift/tls/kube-control-plane-kube-scheduler-client.crt' for reading: Permission denied
cp: cannot open '/var/opt/openshift/tls/kube-control-plane-signer.key' for reading: Permission denied
Gathering cluster resources ...
cp: cannot open '/var/opt/openshift/tls/kube-control-plane-signer.crt' for reading: Permission denied
cp: cannot open '/var/opt/openshift/tls/kubelet-bootstrap-kubeconfig-ca-bundle.crt' for reading: Permission denied
cp: cannot open '/var/opt/openshift/tls/kubelet-client-ca-bundle.crt' for reading: Permission denied
cp: cannot open '/var/opt/openshift/tls/kubelet-client.key' for reading: Permission denied
cp: cannot open '/var/opt/openshift/tls/kubelet-client.crt' for reading: Permission denied
cp: cannot open '/var/opt/openshift/tls/kubelet-signer.key' for reading: Permission denied
cp: cannot open '/var/opt/openshift/tls/kubelet-signer.crt' for reading: Permission denied
cp: cannot open '/var/opt/openshift/tls/kubelet-serving-ca-bundle.crt' for reading: Permission denied
cp: cannot open '/var/opt/openshift/tls/machine-config-server.key' for reading: Permission denied
cp: cannot open '/var/opt/openshift/tls/machine-config-server.crt' for reading: Permission denied
cp: cannot open '/var/opt/openshift/tls/service-account.key' for reading: Permission denied
cp: cannot open '/var/opt/openshift/tls/service-account.pub' for reading: Permission denied
cp: cannot open '/var/opt/openshift/tls/journal-gatewayd.key' for reading: Permission denied
cp: cannot open '/var/opt/openshift/tls/journal-gatewayd.crt' for reading: Permission denied
rm: missing operand
Try 'rm --help' for more information.
rm: missing operand
Try 'rm --help' for more information.
Waiting for logs ...
The connection to the server api.mycluster.ccallegar-aws.sysdeseng.com:6443 was refused - did you specify the right host or port?
The connection to the server api.mycluster.ccallegar-aws.sysdeseng.com:6443 was refused - did you specify the right host or port?
The connection to the server api.mycluster.ccallegar-aws.sysdeseng.com:6443 was refused - did you specify the right host or port?
The connection to the server api.mycluster.ccallegar-aws.sysdeseng.com:6443 was refused - did you specify the right host or port?
The connection to the server api.mycluster.ccallegar-aws.sysdeseng.com:6443 was refused - did you specify the right host or port?
The connection to the server api.mycluster.ccallegar-aws.sysdeseng.com:6443 was refused - did you specify the right host or port?
The connection to the server api.mycluster.ccallegar-aws.sysdeseng.com:6443 was refused - did you specify the right host or port?
The connection to the server api.mycluster.ccallegar-aws.sysdeseng.com:6443 was refused - did you specify the right host or port?
The connection to the server api.mycluster.ccallegar-aws.sysdeseng.com:6443 was refused - did you specify the right host or port?
The connection to the server api.mycluster.ccallegar-aws.sysdeseng.com:6443 was refused - did you specify the right host or port?
The connection to the server api.mycluster.ccallegar-aws.sysdeseng.com:6443 was refused - did you specify the right host or port?
The connection to the server api.mycluster.ccallegar-aws.sysdeseng.com:6443 was refused - did you specify the right host or port?
The connection to the server api.mycluster.ccallegar-aws.sysdeseng.com:6443 was refused - did you specify the right host or port?
The connection to the server api.mycluster.ccallegar-aws.sysdeseng.com:6443 was refused - did you specify the right host or port?
The connection to the server api.mycluster.ccallegar-aws.sysdeseng.com:6443 was refused - did you specify the right host or port?
The connection to the server api.mycluster.ccallegar-aws.sysdeseng.com:6443 was refused - did you specify the right host or port?
The connection to the server api.mycluster.ccallegar-aws.sysdeseng.com:6443 was refused - did you specify the right host or port?
The connection to the server api.mycluster.ccallegar-aws.sysdeseng.com:6443 was refused - did you specify the right host or port?
The connection to the server api.mycluster.ccallegar-aws.sysdeseng.com:6443 was refused - did you specify the right host or port?
The connection to the server api.mycluster.ccallegar-aws.sysdeseng.com:6443 was refused - did you specify the right host or port?
The connection to the server api.mycluster.ccallegar-aws.sysdeseng.com:6443 was refused - did you specify the right host or port?
The connection to the server api.mycluster.ccallegar-aws.sysdeseng.com:6443 was refused - did you specify the right host or port?
The connection to the server api.mycluster.ccallegar-aws.sysdeseng.com:6443 was refused - did you specify the right host or port?
Gather remote logs
Log bundle written to ~/log-bundle.tar.gz

log-bundle.tar.gz is attached

Comment 1 Chris Callegari 2019-05-09 14:56:06 UTC
Created attachment 1566187 [details]
log-bundle.tar.gz

Comment 2 Chris Callegari 2019-05-09 14:59:40 UTC
Created attachment 1566188 [details]
openshift_install_dir.tar.gz

Comment 3 Chris Callegari 2019-05-09 15:00:44 UTC
I can leave mycluster infrastructure up and running for today (05/09/2019) but I will have to destroy it at end of business day.

Comment 4 Chris Callegari 2019-05-09 15:02:25 UTC
I see this same failure behavior when trying to deploy via the UPI method as well

Comment 5 Abhinav Dahiya 2019-05-09 17:05:42 UTC
looking at the gathered logs

less bootstrap/journals/bootkube.log
```
May 09 14:45:10 ip-10-0-13-157 systemd[1]: Started Bootstrap a Kubernetes cluster.
May 09 14:45:13 ip-10-0-13-157 bootkube.sh[1381]: Pulling release image...
May 09 14:45:14 ip-10-0-13-157 bootkube.sh[1381]: error pulling image "registry.svc.ci.openshift.org/ocp/release@sha256:1cb302f7f7508582c5150ee908279e4a52614e801b9785a11b74b1ae7834f501": unable to pull registry.svc.ci.openshift.org/ocp/release@sha256:1cb302f7f7508582c5150ee908279e4a52614e801b9785a11b74b1ae7834f501: unable to pull image: Error determining manifest MIME type for docker://registry.svc.ci.openshift.org/ocp/release@sha256:1cb302f7f7508582c5150ee908279e4a52614e801b9785a11b74b1ae7834f501: Error reading manifest sha256:1cb302f7f7508582c5150ee908279e4a52614e801b9785a11b74b1ae7834f501 in registry.svc.ci.openshift.org/ocp/release: unauthorized: authentication required
```

Looks like your pull secret is not valid. Please use the correct pull secret.

Comment 6 Chris Callegari 2019-05-09 17:10:05 UTC
ARG!

Comment 7 Chris Callegari 2019-05-09 17:19:28 UTC
My pull secret has gone invalid more than once in the past 6 months.  Is there a way to test it?

Comment 8 Eric Paris 2019-05-09 17:22:31 UTC
I'm reopening. The failure to install is irrelevant. I want this to track must gather complaining about not having access for all of those files.

Comment 9 Scott Dodson 2019-05-09 17:38:05 UTC
Jeremiah,

Can you look at the installer-gather.sh output above? I think you had mentioned that there were changes that went in after it was originally introduced that created problems. We need to fix those.

Comment 12 W. Trevor King 2019-05-09 17:58:15 UTC
> My pull secret has gone invalid more than once in the past 6 months.  Is there a way to test it?

The keys from [1] are good forever, although new authorities may be added to them.  So a key from there will always successfully install a given release image, but you may need to re-fetch keys to install newer release images.

Nightlies and other CI images (e.g. from [2]) require an additional authority to fetch from the CI registry.  That key expires each month, and requires you to be in the OpenShift GitHub org, etc.  This going stale is your problem above.

There is an open ticket for preflighting these creds: bug 1662106

[1]: https://cloud.redhat.com/openshift/install
[2]: https://openshift-release.svc.ci.openshift.org/

Comment 13 Jeremiah Stuever 2019-05-09 19:34:59 UTC
https://github.com/openshift/installer/pull/1735

Comment 15 W. Trevor King 2019-05-09 22:54:58 UTC
I dunno how the errata tool decides to push things into ON_QA, but this isn't in the most-recent nightly yet:

$ oc adm release info --commits registry.svc.ci.openshift.org/ocp/release:4.1.0-0.nightly-2019-05-09-223828 | grep ' installer '
  installer                                     https://github.com/openshift/installer                                     403a93d1f683384800597ac38e9c2fc0180b3a5d
$ git log --first-parent --format='%ad %h %d %s' --date=iso 403a93d1f68..origin/master
2019-05-09 23:22:59 +0200 59e927d2b  (HEAD -> master, origin/release-4.2, origin/release-4.1, origin/master, origin/HEAD) Merge pull request #1735 from jstuever/bz1708307

Comment 18 sheng.lao 2019-05-13 06:01:21 UTC
*** Bug 1706750 has been marked as a duplicate of this bug. ***

Comment 21 errata-xmlrpc 2019-06-04 10:48:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758


Note You need to log in before you can comment on or make changes to this bug.