Created attachment 1248833 [details] ansible.log Description of problem: OCP HA failed at 90% with: '2017-02-09 09:38:07,256 p=29949 u=foreman | TASK [Create router for CFME to access data] *********************************** 2017-02-09 09:38:39,357 p=29949 u=foreman | fatal: [ocpha-ocp-master1.example.com]: FAILED! => {"changed": true, "cmd": ["oadm", "router", "management-metrics", "-n", "default", "--credentials=/etc/origin/master/openshift-router.kubeconfig", "--service-account=router", "--ports=443:5000", "--selector=kubernetes.io/hostname=ocpha-ocp-master1.example.com", "--stats-port=1937", "--host-network=false", "--images=sat62fusor.example.com:5000/default_organization-fusor-ose-haproxy-router"], "delta": "0:00:31.849752", "end": "2017-02-09 09:38:38.698170", "failed": true, "rc": 1, "start": "2017-02-09 09:38:06.848418", "stderr": "Flag --credentials has been deprecated, use --service-account to specify the service account the router will use to make API calls\ninfo: password for stats user admin has been set to zqjOndtqsQ\n error: Timeout: request did not complete within allowed duration", "stdout": "--> Creating router management-metrics ...\n deploymentconfig \"management-metrics\" created\n--> Failed", "stdout_lines": ["--> Creating router management-metrics ...", " deploymentconfig \"management-metrics\" created", "--> Failed"], "warnings": []}' Clean install of Satellite, first deployment. Also notice the password in plaintext. When resumed, there is another error: '2017-02-09 09:59:34,454 p=31585 u=foreman | TASK [Create registry serviceaccount] ****************************************** 2017-02-09 09:59:35,872 p=31585 u=foreman | fatal: [ocpha-ocp-master1.example.com]: FAILED! => {"changed": true, "cmd": ["oc", "create", "serviceaccount", "registry"], "delta": "0:00:01.122821", "end": "2017-02-09 09:59:35.217439", "failed": true, "rc": 1, "start": "2017-02-09 09:59:34.094618", "stderr": "Error from server: serviceaccounts \"registry\" already exists", "stdout": "", "stdout_lines": [], "warnings": []}' Attaching ansible.log. Version-Release number of selected component (if applicable): QCI-1.1-RHEL-7-20170208.t.0 How reproducible: Happened to me once Steps to Reproduce: 1. Deploy OCP HA on baremetal machines 2. 3. Actual results: OCP HA failed at 90% Expected results: OCP HA successful Additional info:
https://github.com/fusor/ansible-ocp/pull/20 Added some logic so that you are able to resume the task in this scenario again and now you won't have to go to the CDN for those images which should prevent network timeouts.
This PR made it into QCI-1.1-RHEL-7-20170209.1.0
Verified in QCI-1.1-RHEL-7-20170209.t.0.