This bug was initially created as a copy of Bug #1766066 I am copying this bug because: This is meant to track the re-introduction of the reverted commit and the proper fix for report-progress.sh This bug also reproduced in 4.2.0-0.nightly-2019-10-26-055649 when fixing BZ#1758663 +++ This bug was initially created as a clone of Bug #1762618 +++ Description of problem: Version-Release number of the following components: 4.3.0-0.nightly-2019-10-16-010826 How reproducible: Always Steps to Reproduce: 1. Drop internet gateway for private subnets in VPC to create a disconnected env 2. Set up a proxy in public subnets, the proxy could be connected both external and internal network. 3. In proxy, use whitelist to control which traffic could get through, NOT adding api url into the list. such as: acl whitelist dstdomain ec2.us-east-2.amazonaws.com iam.amazonaws.com .s3.us-east-2.amazonaws.com .apps.jialiu-42dis8.qe.devcluster.openshift.com ec2-18-191-189-164.us-east-2.compute.amazonaws.com .github.com .rubygems.org http_access allow whitelist 4. Enable proxy setting in install-config.yaml 5. Trigger a UPI install on aws Actual results: $ ./openshift-install wait-for bootstrap-complete --dir '/home/installer2/workspace/Launch Environment Flexy/workdir/install-dir' level=info msg="Waiting up to 30m0s for the Kubernetes API at https://api.jialiu-42dis8.qe.devcluster.openshift.com:6443..." level=info msg="API v1.16.0-beta.2+453eff1 up" level=info msg="Waiting up to 30m0s for bootstrapping to complete..." level=info msg="Use the following commands to gather logs from the cluster" level=info msg="openshift-install gather bootstrap --help" level=fatal msg="failed to wait for bootstrapping to complete: timed out waiting for the condition" Expected results: Installation get passed Additional info: Log into bootstrap node, bootkube service is completed successfully. $ journalctl -b -f -u bootkube.service -- Logs begin at Wed 2019-10-16 09:26:30 UTC. -- Oct 16 09:38:35 ip-10-0-61-231 bootkube.sh[1610]: Skipped "secret-control-plane-client-signer.yaml" secrets.v1./kube-control-plane-signer -n openshift-kube-apiserver-operator as it already exists Oct 16 09:38:35 ip-10-0-61-231 bootkube.sh[1610]: Skipped "secret-csr-signer-signer.yaml" secrets.v1./csr-signer-signer -n openshift-kube-controller-manager-operator as it already exists Oct 16 09:38:36 ip-10-0-61-231 bootkube.sh[1610]: Skipped "secret-initial-kube-controller-manager-service-account-private-key.yaml" secrets.v1./initial-service-account-private-key -n openshift-config as it already exists Oct 16 09:38:36 ip-10-0-61-231 bootkube.sh[1610]: Skipped "secret-kube-apiserver-to-kubelet-signer.yaml" secrets.v1./kube-apiserver-to-kubelet-signer -n openshift-kube-apiserver-operator as it already exists Oct 16 09:38:37 ip-10-0-61-231 bootkube.sh[1610]: Skipped "secret-loadbalancer-serving-signer.yaml" secrets.v1./loadbalancer-serving-signer -n openshift-kube-apiserver-operator as it already exists Oct 16 09:38:37 ip-10-0-61-231 bootkube.sh[1610]: Skipped "secret-localhost-serving-signer.yaml" secrets.v1./localhost-serving-signer -n openshift-kube-apiserver-operator as it already exists Oct 16 09:38:37 ip-10-0-61-231 bootkube.sh[1610]: Skipped "secret-service-network-serving-signer.yaml" secrets.v1./service-network-serving-signer -n openshift-kube-apiserver-operator as it already exists Oct 16 09:38:38 ip-10-0-61-231 bootkube.sh[1610]: Skipped "user-ca-bundle-config.yaml" configmaps.v1./user-ca-bundle -n openshift-config as it already exists Oct 16 09:38:38 ip-10-0-61-231 bootkube.sh[1610]: Tearing down temporary bootstrap control plane... Oct 16 09:38:38 ip-10-0-61-231 bootkube.sh[1610]: bootkube.service complete But report-progress.sh is reporting some error. Oct 16 11:37:12 ip-10-0-61-231 report-progress.sh[1611]: error: unable to recognize "STDIN": Get https://api.jialiu-42dis8.qe.devcluster.openshift.com:6443/api?timeout=32s: Forbidden Oct 16 11:37:18 ip-10-0-61-231 report-progress.sh[1611]: error: unable to recognize "STDIN": Get https://api.jialiu-42dis8.qe.devcluster.openshift.com:6443/api?timeout=32s: Forbidden Oct 16 11:37:23 ip-10-0-61-231 report-progress.sh[1611]: error: unable to recognize "STDIN": Get https://api.jialiu-42dis8.qe.devcluster.openshift.com:6443/api?timeout=32s: Forbidden Oct 16 11:37:28 ip-10-0-61-231 report-progress.sh[1611]: error: unable to recognize "STDIN": Get https://api.jialiu-42dis8.qe.devcluster.openshift.com:6443/api?timeout=32s: Forbidden # env|grep -i proxy HTTP_PROXY=http://ec2-18-191-189-164.us-east-2.compute.amazonaws.com:3128 NO_PROXY=.cluster.local,.svc,.us-east-2.compute.internal,10.0.0.0/16,10.128.0.0/14,127.0.0.1,169.254.169.254,172.30.0.0/16,api-int.jialiu-42dis8.qe.devcluster.openshift.com,etcd-0.jialiu-42dis8.qe.devcluster.openshift.com,etcd-1.jialiu-42dis8.qe.devcluster.openshift.com,etcd-2.jialiu-42dis8.qe.devcluster.openshift.com,localhost,test.no-proxy.com HTTPS_PROXY=http://ec2-18-191-189-164.us-east-2.compute.amazonaws.com:3128 Check report-progress.sh code: # cat /usr/local/bin/report-progress.sh #!/usr/bin/env bash KUBECONFIG="${1}" wait_for_existance() { while [ ! -e "${1}" ] do sleep 5 done } echo "Waiting for bootstrap to complete..." wait_for_existance /opt/openshift/.bootkube.done echo "Reporting install progress..." while ! oc --config="$KUBECONFIG" create -f - <<-EOF apiVersion: v1 kind: ConfigMap metadata: name: bootstrap namespace: kube-system data: status: complete EOF do sleep 5 done The script is calling oc command against external api server to create some resource. But the api server is not in NO_PROXY list. This issue is a regression issue, which is caused by https://github.com/openshift/installer/pull/2425 --- Additional comment from Scott Dodson on 2019-10-21 18:56:29 UTC --- That PR was merged in order to resolve https://bugzilla.redhat.com/show_bug.cgi?id=1762618 My opinion is that customer proxy configuration should include external api in its whitelist. --- Additional comment from Daneyon Hansen on 2019-10-21 19:10:46 UTC --- What is the reason for report-progress.sh to use the api-server's external name instead of internal name? --- Additional comment from Johnny Liu on 2019-10-22 01:07:08 UTC --- > My opinion is that customer proxy configuration should include external api in its whitelist. I think including external api in its whitelist is some kind of workaround. The potential reasonable fix is run oc command against internal api in cluster itself instead of external api. > What is the reason for report-progress.sh to use the api-server's external > name instead of internal name? The default kubeconfig for oc command is using the api-server's external name.
Verified this bug with 4.2.0-0.nightly-2019-11-24-111327, and PASS. On bootstrap: # env|grep -i proxy |grep api NO_PROXY=.cluster.local,.svc,.us-east-2.compute.internal,10.0.0.0/16,10.128.0.0/14,127.0.0.1,169.254.169.254,172.30.0.0/16,api-int.jialiu421.qe.devcluster.openshift.com,etcd-0.jialiu421.qe.devcluster.openshift.com,etcd-1.jialiu421.qe.devcluster.openshift.com,etcd-2.jialiu421.qe.devcluster.openshift.com,localhost,test.no-proxy.com # cat /opt/openshift/auth/kubeconfig|grep api server: https://api-int.jialiu421.qe.devcluster.openshift.com:6443
*** Bug 1776767 has been marked as a duplicate of this bug. ***
During testing, found another issue - bz#1776767. In that bug, I found proxies.config.openshift.io still have api listed in NO_PROXY. # oc get proxy cluster -o jsonpath='{.status.noProxy}' .cluster.local,.svc,.us-east-2.compute.internal,10.0.0.0/16,10.128.0.0/14,127.0.0.1,169.254.169.254,172.30.0.0/16,api-int.jialiu425.qe.devcluster.openshift.com,api.jialiu425.qe.devcluster.openshift.com,etcd-0.jialiu425.qe.devcluster.openshift.com,etcd-1.jialiu425.qe.devcluster.openshift.com,etcd-2.jialiu425.qe.devcluster.openshift.com,localhost,test.no-proxy.comYou have new mai That make me remember https://bugzilla.redhat.com/show_bug.cgi?id=1758663#c8. That means to fix this bug, also need merge PR for https://bugzilla.redhat.com/show_bug.cgi?id=1758663 and https://bugzilla.redhat.com/show_bug.cgi?id=1758656 together, only when the 3 PR is merged, this bug may be fixed.
Verified this bug with 4.2.0-0.nightly-2019-11-27-004055, and PASS. On bootstrap: # env|grep api NO_PROXY=.cluster.local,.svc,.us-east-2.compute.internal,10.0.0.0/16,10.128.0.0/14,127.0.0.1,169.254.169.254,172.30.0.0/16,api-int.jialiu42.qe.devcluster.openshift.com,etcd-0.jialiu42.qe.devcluster.openshift.com,etcd-1.jialiu42.qe.devcluster.openshift.com,etcd-2.jialiu42.qe.devcluster.openshift.com,localhost,test.no-proxy.com In cluster: # oc get proxy cluster -o jsonpath='{.status.noProxy}'|grep api .cluster.local,.svc,.us-east-2.compute.internal,10.0.0.0/16,10.128.0.0/14,127.0.0.1,169.254.169.254,172.30.0.0/16,api-int.jialiu42.qe.devcluster.openshift.com,etcd-0.jialiu42.qe.devcluster.openshift.com,etcd-1.jialiu42.qe.devcluster.openshift.com,etcd-2.jialiu42.qe.devcluster.openshift.com,localhost,test.no-proxy.com But 4.2.0-0.nightly-2019-11-27-00405 is marked as `Rejected` in https://openshift-release.svc.ci.openshift.org/, once another acceptable build is shown, will re-test.
Re-test this bug with 4.2.0-0.nightly-2019-11-27-102509, and PASS with the same result as comment 5.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:3953