Bug 2054916 - OpenShift 4.6 provision failures on GCP with ocp/stable-4.6
Summary: OpenShift 4.6 provision failures on GCP with ocp/stable-4.6
Keywords:
Status: CLOSED DUPLICATE of bug 2054914
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 4.6
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: ---
Assignee: aos-install
QA Contact: Gaoyun Pei
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-02-16 01:48 UTC by Shane Bostick
Modified: 2022-02-16 03:30 UTC (History)
0 users

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-02-16 03:30:35 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
.openshift_install.log (175.66 KB, text/plain)
2022-02-16 01:49 UTC, Shane Bostick
no flags Details
Full artifacts dump from oc adm must-gather (15.67 MB, application/gzip)
2022-02-16 02:08 UTC, Shane Bostick
no flags Details

Description Shane Bostick 2022-02-16 01:48:32 UTC
Version:

$ openshift-install version

openshift-install 4.6.48
built from commit 1cfb1b32f5aaf0dfe0fb2ea9da41c710da9b2c76
release image quay.io/openshift-release-dev/ocp-release@sha256:6f03d6ced979d6f6fd10b6a54529c186e3f83c0ecf3e2b910d01505d2f59037a

(from https://mirror.openshift.com/pub/openshift-v4/clients/ocp/stable-4.6/openshift-install-linux.tar.gz)

Platform:

GCP

Please specify:
* IPI

install-config.yaml:
```
apiVersion: v1
baseDomain: ${BASE_DOMAIN}
compute:
- architecture: amd64
  hyperthreading: Enabled
  name: worker
  platform:
    gcp:
      type: ${WORKER_NODE_TYPE}
  replicas: ${WORKER_NODE_COUNT}
controlPlane:
  architecture: amd64
  hyperthreading: Enabled
  name: master
  platform:
    gcp:
      type: ${MASTER_NODE_TYPE}
  replicas: ${MASTER_NODE_COUNT}
metadata:
  creationTimestamp: null
  name: ${CLUSTER_NAME}
networking:
  clusterNetwork:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  machineNetwork:
  - cidr: 10.0.0.0/16
  networkType: OVNKubernetes
  serviceNetwork:
  - 172.30.0.0/16
platform:
  gcp:
    projectID: ${PROJECT}
    region: ${REGION}
publish: External
pullSecret: |
  ${PULL_SECRET}
sshKey: |
  ${SSH_KEY}
```

What happened?

Bootstrap and master nodes come up but no worker nodes.
Ingress component fails to reach ready.
The installer binary has not changed.
Neither has the way we are invoking it.
Suspect possible change to GCP API.

# Always at least include the `.openshift_install.log`

What did you expect to happen?

Cluster provisioning to complete on GCP.
This work working but recently started failing.

How to reproduce it (as minimally and precisely as possible)?

This is part of ACS testing...
https://github.com/stackrox/automation-flavors/blob/master/openshift-4/entrypoint.sh
https://github.com/stackrox/automation-flavors/blob/master/openshift-4/install-config.yaml
Those are probably private but this is essentially what we do:
```
create() {
    if [ -n "${USER_PULL_SECRET-}" ]; then
        echo "The pull secret was overriden with a user supplied value."
        PULL_SECRET="${USER_PULL_SECRET}"
        export PULL_SECRET
    fi
    MASTER_NODE_COUNT="${MASTER_NODE_COUNT:-3}"
    MASTER_NODE_TYPE="${MASTER_NODE_TYPE:-3}"
    WORKER_NODE_COUNT="${WORKER_NODE_COUNT:-3}"
    WORKER_NODE_TYPE="${WORKER_NODE_TYPE:-3}"
    REGION="${REGION:-us-east1}"

    echo ">>> Generating an SSH key pair."
    yes | ssh-keygen -t rsa -f /data/id_rsa -C '' -N ''
    chmod 0600 /data/id_rsa /data/id_rsa.pub
    read -r SSH_KEY < /data/id_rsa.pub
    export SSH_KEY

    echo ">>> Creating cluster install config."
    envsubst < /cluster-create/install-config.yaml > /data/install-config.yaml

    echo ">>> Creating the cluster."
    cd /data
    if ! openshift-install create cluster --log-level=debug > /dev/null 2>&1; then
        destroy
        echo ">>> ERROR: The create failed."
        echo "/data/.openshift_install.log:"
        sed '/Login to the console with user/s/password.*/password from file/' < /data/.openshift_install.log
        exit 1
    fi
    echo ">>> Cluster created."
    export KUBECONFIG=/data/auth/kubeconfig

    local OPENSHIFT_CONSOLE_URL OPENSHIFT_CONSOLE_LOGIN_STR OPENSHIFT_CONSOLE_USERNAME OPENSHIFT_CONSOLE_PASSWORD
    OPENSHIFT_CONSOLE_URL="https://console-openshift-console.apps.${CLUSTER_NAME}.${BASE_DOMAIN}"
    OPENSHIFT_CONSOLE_LOGIN_STR=$(grep -Eo 'Login to the console with user.*"' .openshift_install.log \
                | tail -1 | sed -e 's/\\"/"/g' | sed -e 's/"$//')
    OPENSHIFT_CONSOLE_USERNAME=$(perl -lne '/user: "(\w+)"/ and print $1' <<<"$OPENSHIFT_CONSOLE_LOGIN_STR")
    OPENSHIFT_CONSOLE_PASSWORD=$(perl -lne '/password: "([\w-]+)"/ and print $1' <<<"$OPENSHIFT_CONSOLE_LOGIN_STR")

    echo "$OPENSHIFT_CONSOLE_URL" > /data/url
    cat > /data/dotenv <<EOF
CLUSTER_NAME="$CLUSTER_NAME"
REGION="$REGION"
OPENSHIFT_VERSION="$OPENSHIFT_VERSION"
OPENSHIFT_CONSOLE_URL="$OPENSHIFT_CONSOLE_URL"
OPENSHIFT_CONSOLE_USERNAME="$OPENSHIFT_CONSOLE_USERNAME"
OPENSHIFT_CONSOLE_PASSWORD="$OPENSHIFT_CONSOLE_PASSWORD"
EOF

    echo ">>> Test cluster & kubeconfig"
    oc get nodes -o wide

    echo ">>> Deploy a bastion pod for SSH access"
    curl https://raw.githubusercontent.com/eparis/ssh-bastion/master/deploy/deploy.sh | bash

    echo ">>> Give the user some SSH help"
    cluster_name_prefix=$(cut -b1-21 <<<"$CLUSTER_NAME")
    gcp_instances_filter="name~${cluster_name_prefix}-.*"
    instances_table=$(gcloud compute instances list --project="$PROJECT" --filter="$gcp_instances_filter" | sort)
    ssh_commands=$(gcloud compute instances list --project="$PROJECT" --filter="$gcp_instances_filter" \
        --format json | jq -r '.[].name' | awk '{ printf "./data/ssh.sh %s\n", $1 }' | sort -k2)
    export instances_table ssh_commands PROJECT gcp_instances_filter
    envsubst < /cluster-create/SSH_ACCESS.md > /data/SSH_ACCESS.md
    cp /usr/bin/ssh-via-bastion.sh /data/ssh.sh
}
```

Anything else we need to know?

Tracking initial investigation through resolution on the ACS side here:
https://issues.redhat.com/browse/ROX-9228

This is an important platform for our existing ACS customers.
(i.e. openshift ocp/stable-4.6 running in GCP)

Comment 1 Shane Bostick 2022-02-16 01:49:50 UTC
Created attachment 1861364 [details]
.openshift_install.log

Comment 2 Shane Bostick 2022-02-16 02:08:05 UTC
Created attachment 1861368 [details]
Full artifacts dump from oc adm must-gather

Comment 3 Scott Dodson 2022-02-16 03:30:35 UTC

*** This bug has been marked as a duplicate of bug 2054914 ***


Note You need to log in before you can comment on or make changes to this bug.