| Summary: | OSE deployment failed at 85%, ansible: "Unable to start service atomic-openshift-node" | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Quickstart Cloud Installer | Reporter: | Antonin Pagac <apagac> | ||||||
| Component: | Installation - OpenShift | Assignee: | Dylan Murray <dymurray> | ||||||
| Status: | VERIFIED --- | QA Contact: | Sudhir Mallamprabhakara <smallamp> | ||||||
| Severity: | unspecified | Docs Contact: | Derek <dcadzow> | ||||||
| Priority: | unspecified | ||||||||
| Version: | 1.0 | CC: | bthurber, dymurray, tsanders | ||||||
| Target Milestone: | ga | Keywords: | Triaged | ||||||
| Target Release: | 1.0 | ||||||||
| Hardware: | Unspecified | ||||||||
| OS: | Unspecified | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | Type: | Bug | |||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Attachments: |
|
||||||||
Created attachment 1196395 [details]
deployment.log
Created attachment 1196396 [details]
ansible.log
After resuming the failing task in Satellite it fails again:
'failed: [rhev-ose-ose-master1.example.com] => {"changed": true, "cmd": "atomic-openshift-installer -u -c /tmp/atomic-openshift-installer.answers.cfg.yml install", "delta": "0:00:22.073518", "end": "2016-08-31 14:06:55.753432", "rc": 1, "start": "2016-08-31 14:06:33.679914", "warnings": []}'
Service atomic-openshift-node.service is running on master.
https://github.com/fusor/fusor/pull/1200. This PR addresses the issue where attempting to resume the task failed. The failed installation is an issue with the atomic-openshift-installer itself and not a bug for us. In the compose as of 8/31 Verified in QCI-1.0-RHEL-7-20160908.1. I'm now able to resume the task in Satellite. |
Description of problem: RHV engine + 1 hypervisor, OSE master + 2 nodes; bare metal deployment with supermicro machines. In the past, this setup completed deployments of OSE with 1 node successfully. From ansible.log: 'fatal: [rhev-ose-ose-master1.example.com]: FAILED! => {"changed": false, "failed": true, "msg": "Unable to start service atomic-openshift-node: Job for atomic-openshift-node.service failed because the control process exited with error code. See \"systemctl status atomic-openshift-node.service\" and \"journalctl -xe\" for details.\n"}' RHV is online, I can ssh to OSE master node. I can't however see the reason for the failure. Attaching logs. Version-Release number of selected component (if applicable): QCI-1.0-RHEL-7-20160830.t.0 How reproducible: Happened to me once Steps to Reproduce: 1. Start deployment of OSE with two nodes on top of RHV with one hypervisor 2. OSE fails on 85% 3. Actual results: Deployment of OSE failed Expected results: Deployment of OSE successful Additional info: