Hide Forgot
Description of problem: RHV engine + 1 hypervisor, OSE master + 2 nodes; bare metal deployment with supermicro machines. In the past, this setup completed deployments of OSE with 1 node successfully. From ansible.log: 'fatal: [rhev-ose-ose-master1.example.com]: FAILED! => {"changed": false, "failed": true, "msg": "Unable to start service atomic-openshift-node: Job for atomic-openshift-node.service failed because the control process exited with error code. See \"systemctl status atomic-openshift-node.service\" and \"journalctl -xe\" for details.\n"}' RHV is online, I can ssh to OSE master node. I can't however see the reason for the failure. Attaching logs. Version-Release number of selected component (if applicable): QCI-1.0-RHEL-7-20160830.t.0 How reproducible: Happened to me once Steps to Reproduce: 1. Start deployment of OSE with two nodes on top of RHV with one hypervisor 2. OSE fails on 85% 3. Actual results: Deployment of OSE failed Expected results: Deployment of OSE successful Additional info:
Created attachment 1196395 [details] deployment.log
Created attachment 1196396 [details] ansible.log
After resuming the failing task in Satellite it fails again: 'failed: [rhev-ose-ose-master1.example.com] => {"changed": true, "cmd": "atomic-openshift-installer -u -c /tmp/atomic-openshift-installer.answers.cfg.yml install", "delta": "0:00:22.073518", "end": "2016-08-31 14:06:55.753432", "rc": 1, "start": "2016-08-31 14:06:33.679914", "warnings": []}' Service atomic-openshift-node.service is running on master.
https://github.com/fusor/fusor/pull/1200. This PR addresses the issue where attempting to resume the task failed. The failed installation is an issue with the atomic-openshift-installer itself and not a bug for us.
In the compose as of 8/31
Verified in QCI-1.0-RHEL-7-20160908.1. I'm now able to resume the task in Satellite.