Bug 1420819 - OCP HA fails at 90%: "Flag --credentials has been deprecated" and "Timeout: request did not complete within allowed duration"
Summary: OCP HA fails at 90%: "Flag --credentials has been deprecated" and "Timeout: r...
Alias: None
Product: Red Hat Quickstart Cloud Installer
Classification: Red Hat
Component: Installation - OpenShift
Version: 1.1
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
: ---
Assignee: Dylan Murray
QA Contact: Sudhir Mallamprabhakara
Depends On:
TreeView+ depends on / blocked
Reported: 2017-02-09 15:03 UTC by Antonin Pagac
Modified: 2020-01-08 16:38 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Last Closed:
Target Upstream Version:

Attachments (Terms of Use)
ansible.log (8.23 MB, text/plain)
2017-02-09 15:03 UTC, Antonin Pagac
no flags Details

Description Antonin Pagac 2017-02-09 15:03:23 UTC
Created attachment 1248833 [details]

Description of problem:
OCP HA failed at 90% with:

'2017-02-09 09:38:07,256 p=29949 u=foreman |  TASK [Create router for CFME to access data] ***********************************
2017-02-09 09:38:39,357 p=29949 u=foreman |  fatal: [ocpha-ocp-master1.example.com]: FAILED! => {"changed": true, "cmd": ["oadm", "router", "management-metrics", "-n", "default", "--credentials=/etc/origin/master/openshift-router.kubeconfig", "--service-account=router", "--ports=443:5000", "--selector=kubernetes.io/hostname=ocpha-ocp-master1.example.com", "--stats-port=1937", "--host-network=false", "--images=sat62fusor.example.com:5000/default_organization-fusor-ose-haproxy-router"], "delta": "0:00:31.849752", "end": "2017-02-09 09:38:38.698170", "failed": true, "rc": 1, "start": "2017-02-09 09:38:06.848418", "stderr": "Flag --credentials has been deprecated, use --service-account to specify the service account the router will use to make API calls\ninfo: password for stats user admin has been set to zqjOndtqsQ\n    error: Timeout: request did not complete within allowed duration", "stdout": "--> Creating router management-metrics ...\n    deploymentconfig \"management-metrics\" created\n--> Failed", "stdout_lines": ["--> Creating router management-metrics ...", "    deploymentconfig \"management-metrics\" created", "--> Failed"], "warnings": []}'

Clean install of Satellite, first deployment. Also notice the password in plaintext.
When resumed, there is another error:

'2017-02-09 09:59:34,454 p=31585 u=foreman |  TASK [Create registry serviceaccount] ******************************************
2017-02-09 09:59:35,872 p=31585 u=foreman |  fatal: [ocpha-ocp-master1.example.com]: FAILED! => {"changed": true, "cmd": ["oc", "create", "serviceaccount", "registry"], "delta": "0:00:01.122821", "end": "2017-02-09 09:59:35.217439", "failed": true, "rc": 1, "start": "2017-02-09 09:59:34.094618", "stderr": "Error from server: serviceaccounts \"registry\" already exists", "stdout": "", "stdout_lines": [], "warnings": []}'

Attaching ansible.log.

Version-Release number of selected component (if applicable):

How reproducible:
Happened to me once

Steps to Reproduce:
1. Deploy OCP HA on baremetal machines

Actual results:
OCP HA failed at 90%

Expected results:
OCP HA successful

Additional info:

Comment 1 Dylan Murray 2017-02-09 16:56:26 UTC

Added some logic so that you are able to resume the task in this scenario again and now you won't have to go to the CDN for those images which should prevent network timeouts.

Comment 2 Dylan Murray 2017-02-09 18:24:28 UTC
This PR made it into QCI-1.1-RHEL-7-20170209.1.0

Comment 3 Antonin Pagac 2017-02-10 13:08:35 UTC
Verified in QCI-1.1-RHEL-7-20170209.t.0.

Note You need to log in before you can comment on or make changes to this bug.