Bug 1752380 - [ci][gcp] Creating Image is timeout
Summary: [ci][gcp] Creating Image is timeout
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 4.2.0
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 4.3.0
Assignee: Abhinav Dahiya
QA Contact: Johnny Liu
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-09-16 08:40 UTC by sheng.lao
Modified: 2019-09-30 23:18 UTC (History)
0 users

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-09-30 23:18:40 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description sheng.lao 2019-09-16 08:40:47 UTC
Description of problem:
https://prow.k8s.io/view/gcs/origin-ci-test/logs/canary-openshift-ocp-installer-e2e-gcp-4.2/279

level=error level=error msg="Error: Error waiting to create Image: Error waiting for Creating Image: timeout while waiting for state to become 'DONE' (last state: 'RUNNING', timeout: 4m0s)" level=error level=error msg=" on ../tmp/openshift-install-893060457/main.tf line 83, in resource \"google_compute_image\" \"cluster\":" level=error msg=" 83: resource \"google_compute_image\" \"cluster\" {" level=error level=error level=fatal msg="failed to fetch Cluster: failed to generate asset \"Cluster\": failed to create cluster: failed to apply using Terraform"


Version-Release number of the following components:
rpm -q openshift-ansible
rpm -q ansible
ansible --version

How reproducible:

Steps to Reproduce:
1.
2.
3.

Actual results:
Please include the entire output from the last TASK line through the end of output if an error is generated

Expected results:

Additional info:
Please attach logs from ansible-playbook with the -vvv flag

Comment 1 Abhinav Dahiya 2019-09-16 16:12:16 UTC
Did some rudimentary check in terms of how long the IMAGE creation takes for GCP.

for i in {250..278}; do curl -s "https://storage.googleapis.com/origin-ci-test/logs/canary-openshift-ocp-installer-e2e-gcp-4.2/$i/artifacts/e2e-gcp/installer/.openshift_install.log" | rg "google_compute_image.cluster: Creation complete after"; done
time="2019-09-12T15:54:30Z" level=debug msg="google_compute_image.cluster: Creation complete after 1m18s [id=ci-op--khdsx-rhcos-image]"
time="2019-09-12T16:08:05Z" level=debug msg="google_compute_image.cluster: Creation complete after 1m18s [id=ci-op--89ctl-rhcos-image]"
time="2019-09-12T16:35:09Z" level=debug msg="google_compute_image.cluster: Creation complete after 1m8s [id=ci-op--l7wmk-rhcos-image]"
time="2019-09-12T19:37:12Z" level=debug msg="google_compute_image.cluster: Creation complete after 1m7s [id=ci-op--76swx-rhcos-image]"
time="2019-09-12T23:33:50Z" level=debug msg="google_compute_image.cluster: Creation complete after 1m18s [id=ci-op--9sgtg-rhcos-image]"
time="2019-09-13T01:13:05Z" level=debug msg="google_compute_image.cluster: Creation complete after 1m17s [id=ci-op--zskrz-rhcos-image]"
time="2019-09-13T03:32:42Z" level=debug msg="google_compute_image.cluster: Creation complete after 1m7s [id=ci-op--tgftn-rhcos-image]"
time="2019-09-13T04:13:52Z" level=debug msg="google_compute_image.cluster: Creation complete after 1m18s [id=ci-op--txc29-rhcos-image]"
time="2019-09-13T07:51:34Z" level=debug msg="google_compute_image.cluster: Creation complete after 1m13s [id=ci-op--5hmsk-rhcos-image]"
time="2019-09-13T11:51:48Z" level=debug msg="google_compute_image.cluster: Creation complete after 1m7s [id=ci-op--p8tkc-rhcos-image]"
time="2019-09-13T15:27:46Z" level=debug msg="google_compute_image.cluster: Creation complete after 1m17s [id=ci-op--wd9jl-rhcos-image]"
time="2019-09-13T19:26:07Z" level=debug msg="google_compute_image.cluster: Creation complete after 1m18s [id=ci-op--jkpnq-rhcos-image]"
time="2019-09-13T20:11:01Z" level=debug msg="google_compute_image.cluster: Creation complete after 1m8s [id=ci-op--26wgh-rhcos-image]"
time="2019-09-13T22:34:48Z" level=debug msg="google_compute_image.cluster: Creation complete after 1m18s [id=ci-op--cs5rl-rhcos-image]"
time="2019-09-14T04:29:12Z" level=debug msg="google_compute_image.cluster: Creation complete after 1m18s [id=ci-op--f4j7b-rhcos-image]"
time="2019-09-14T06:42:08Z" level=debug msg="google_compute_image.cluster: Creation complete after 1m18s [id=ci-op--b8fv8-rhcos-image]"
time="2019-09-14T10:00:46Z" level=debug msg="google_compute_image.cluster: Creation complete after 1m7s [id=ci-op--rtkww-rhcos-image]"
time="2019-09-14T13:59:45Z" level=debug msg="google_compute_image.cluster: Creation complete after 1m17s [id=ci-op--xdsx5-rhcos-image]"
time="2019-09-14T17:18:15Z" level=debug msg="google_compute_image.cluster: Creation complete after 1m18s [id=ci-op--5d9gj-rhcos-image]"
time="2019-09-14T21:17:59Z" level=debug msg="google_compute_image.cluster: Creation complete after 1m8s [id=ci-op--7kn4k-rhcos-image]"
time="2019-09-15T01:18:56Z" level=debug msg="google_compute_image.cluster: Creation complete after 1m18s [id=ci-op--wbghn-rhcos-image]"
time="2019-09-15T05:19:53Z" level=debug msg="google_compute_image.cluster: Creation complete after 1m7s [id=ci-op--c964g-rhcos-image]"
time="2019-09-15T05:33:29Z" level=debug msg="google_compute_image.cluster: Creation complete after 1m8s [id=ci-op--74njr-rhcos-image]"
time="2019-09-15T09:28:45Z" level=debug msg="google_compute_image.cluster: Creation complete after 1m8s [id=ci-op--5ctld-rhcos-image]"
time="2019-09-15T13:29:53Z" level=debug msg="google_compute_image.cluster: Creation complete after 1m8s [id=ci-op--r4l8r-rhcos-image]"
time="2019-09-15T17:29:58Z" level=debug msg="google_compute_image.cluster: Creation complete after 1m7s [id=ci-op--64s7c-rhcos-image]"
time="2019-09-15T20:32:36Z" level=debug msg="google_compute_image.cluster: Creation complete after 1m18s [id=ci-op--ndbbw-rhcos-image]"
time="2019-09-15T22:20:43Z" level=debug msg="google_compute_image.cluster: Creation complete after 1m8s [id=ci-op--42zp2-rhcos-image]"


Seems like some images are created in ~1min... so waiting for 4 mins upper bound should be enough.

We have knobs to increase the timeouts to more than 4 mins... but I would like to see a more consistent failure rate for this before bumping the timeouts.


Note You need to log in before you can comment on or make changes to this bug.