Version: $ openshift-install version openshift-install 4.6.48 built from commit 1cfb1b32f5aaf0dfe0fb2ea9da41c710da9b2c76 release image quay.io/openshift-release-dev/ocp-release@sha256:6f03d6ced979d6f6fd10b6a54529c186e3f83c0ecf3e2b910d01505d2f59037a (from https://mirror.openshift.com/pub/openshift-v4/clients/ocp/stable-4.6/openshift-install-linux.tar.gz) Platform: GCP Please specify: * IPI install-config.yaml: ``` apiVersion: v1 baseDomain: ${BASE_DOMAIN} compute: - architecture: amd64 hyperthreading: Enabled name: worker platform: gcp: type: ${WORKER_NODE_TYPE} replicas: ${WORKER_NODE_COUNT} controlPlane: architecture: amd64 hyperthreading: Enabled name: master platform: gcp: type: ${MASTER_NODE_TYPE} replicas: ${MASTER_NODE_COUNT} metadata: creationTimestamp: null name: ${CLUSTER_NAME} networking: clusterNetwork: - cidr: 10.128.0.0/14 hostPrefix: 23 machineNetwork: - cidr: 10.0.0.0/16 networkType: OVNKubernetes serviceNetwork: - 172.30.0.0/16 platform: gcp: projectID: ${PROJECT} region: ${REGION} publish: External pullSecret: | ${PULL_SECRET} sshKey: | ${SSH_KEY} ``` What happened? Bootstrap and master nodes come up but no worker nodes. Ingress component fails to reach ready. The installer binary has not changed. Neither has the way we are invoking it. Suspect possible change to GCP API. # Always at least include the `.openshift_install.log` What did you expect to happen? Cluster provisioning to complete on GCP. This work working but recently started failing. How to reproduce it (as minimally and precisely as possible)? This is part of ACS testing... https://github.com/stackrox/automation-flavors/blob/master/openshift-4/entrypoint.sh https://github.com/stackrox/automation-flavors/blob/master/openshift-4/install-config.yaml Those are probably private but this is essentially what we do: ``` create() { if [ -n "${USER_PULL_SECRET-}" ]; then echo "The pull secret was overriden with a user supplied value." PULL_SECRET="${USER_PULL_SECRET}" export PULL_SECRET fi MASTER_NODE_COUNT="${MASTER_NODE_COUNT:-3}" MASTER_NODE_TYPE="${MASTER_NODE_TYPE:-3}" WORKER_NODE_COUNT="${WORKER_NODE_COUNT:-3}" WORKER_NODE_TYPE="${WORKER_NODE_TYPE:-3}" REGION="${REGION:-us-east1}" echo ">>> Generating an SSH key pair." yes | ssh-keygen -t rsa -f /data/id_rsa -C '' -N '' chmod 0600 /data/id_rsa /data/id_rsa.pub read -r SSH_KEY < /data/id_rsa.pub export SSH_KEY echo ">>> Creating cluster install config." envsubst < /cluster-create/install-config.yaml > /data/install-config.yaml echo ">>> Creating the cluster." cd /data if ! openshift-install create cluster --log-level=debug > /dev/null 2>&1; then destroy echo ">>> ERROR: The create failed." echo "/data/.openshift_install.log:" sed '/Login to the console with user/s/password.*/password from file/' < /data/.openshift_install.log exit 1 fi echo ">>> Cluster created." export KUBECONFIG=/data/auth/kubeconfig local OPENSHIFT_CONSOLE_URL OPENSHIFT_CONSOLE_LOGIN_STR OPENSHIFT_CONSOLE_USERNAME OPENSHIFT_CONSOLE_PASSWORD OPENSHIFT_CONSOLE_URL="https://console-openshift-console.apps.${CLUSTER_NAME}.${BASE_DOMAIN}" OPENSHIFT_CONSOLE_LOGIN_STR=$(grep -Eo 'Login to the console with user.*"' .openshift_install.log \ | tail -1 | sed -e 's/\\"/"/g' | sed -e 's/"$//') OPENSHIFT_CONSOLE_USERNAME=$(perl -lne '/user: "(\w+)"/ and print $1' <<<"$OPENSHIFT_CONSOLE_LOGIN_STR") OPENSHIFT_CONSOLE_PASSWORD=$(perl -lne '/password: "([\w-]+)"/ and print $1' <<<"$OPENSHIFT_CONSOLE_LOGIN_STR") echo "$OPENSHIFT_CONSOLE_URL" > /data/url cat > /data/dotenv <<EOF CLUSTER_NAME="$CLUSTER_NAME" REGION="$REGION" OPENSHIFT_VERSION="$OPENSHIFT_VERSION" OPENSHIFT_CONSOLE_URL="$OPENSHIFT_CONSOLE_URL" OPENSHIFT_CONSOLE_USERNAME="$OPENSHIFT_CONSOLE_USERNAME" OPENSHIFT_CONSOLE_PASSWORD="$OPENSHIFT_CONSOLE_PASSWORD" EOF echo ">>> Test cluster & kubeconfig" oc get nodes -o wide echo ">>> Deploy a bastion pod for SSH access" curl https://raw.githubusercontent.com/eparis/ssh-bastion/master/deploy/deploy.sh | bash echo ">>> Give the user some SSH help" cluster_name_prefix=$(cut -b1-21 <<<"$CLUSTER_NAME") gcp_instances_filter="name~${cluster_name_prefix}-.*" instances_table=$(gcloud compute instances list --project="$PROJECT" --filter="$gcp_instances_filter" | sort) ssh_commands=$(gcloud compute instances list --project="$PROJECT" --filter="$gcp_instances_filter" \ --format json | jq -r '.[].name' | awk '{ printf "./data/ssh.sh %s\n", $1 }' | sort -k2) export instances_table ssh_commands PROJECT gcp_instances_filter envsubst < /cluster-create/SSH_ACCESS.md > /data/SSH_ACCESS.md cp /usr/bin/ssh-via-bastion.sh /data/ssh.sh } ``` Anything else we need to know? Tracking initial investigation through resolution on the ACS side here: https://issues.redhat.com/browse/ROX-9228 This is an important platform for our existing ACS customers. (i.e. openshift ocp/stable-4.6 running in GCP)
Created attachment 1861364 [details] .openshift_install.log
Created attachment 1861368 [details] Full artifacts dump from oc adm must-gather
*** This bug has been marked as a duplicate of bug 2054914 ***