Bug 1421040

Summary: OCP HA fails at 90%: Timeout when creating Hello World project
Product: Red Hat Quickstart Cloud Installer Reporter: Antonin Pagac <apagac>
Component: Installation - OpenShiftAssignee: John Matthews <jmatthew>
Status: NEW --- QA Contact: Sudhir Mallamprabhakara <smallamp>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 1.1CC: bthurber, qci-bugzillas
Target Milestone: ---Keywords: Triaged
Target Release: 1.2   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Antonin Pagac 2017-02-10 08:31:09 UTC
Description of problem:
This is resumable error, I'm opening a bug to have it logged.

OCP HA fails at 90%, from ansible.log:

'2017-02-09 19:53:03,716 p=28674 u=foreman |  TASK [Create Hello World project] **********************************************
2017-02-09 19:53:34,091 p=28674 u=foreman |  fatal: [ocpha-ocp-master1.example.com]: FAILED! => {"changed": true, "cmd": ["oc", "new-project", "helloworld"], "delta": "0:00:30.125950", "end": "2017-02-09 19:53:33.407272", "failed": true, "rc": 1, "start": "2017-02-09 19:53:03.281322", "stderr": "Error from server: Timeout: request did not complete within allowed duration", "stdout": "", "stdout_lines": [], "warnings": []}'

I resumed the task and deployment of OCP HA finished successfully.

Version-Release number of selected component (if applicable):
QCI-1.1-RHEL-7-20170209.t.0

How reproducible:
Happened to me once

Steps to Reproduce:
1. Install QCI from an ISO
2. Kick off OCP HA as a first deployment
3. Error appears at 90%

Actual results:
Error appears at 90%

Expected results:
No error appears

Additional info:
Baremetal environment; maybe it would help just to increase the timeout period a bit

Comment 1 John Matthews 2017-02-10 16:32:11 UTC
We believe this problem is related to the environment.  Prior issues have been seen from this test setup with s2i taking longer than expected for what looked like networking issues.

The timeout is coming from the OCP server process, we aren't aware of a means to increase the timeout for that call, but we might be able to add some logic for retrying this operation.