Bug 1371955 - OSE deployment failed at 85%, ansible: "Unable to start service atomic-openshift-node"
Summary: OSE deployment failed at 85%, ansible: "Unable to start service atomic-opensh...
Keywords:
Status: VERIFIED
Alias: None
Product: Red Hat Quickstart Cloud Installer
Classification: Red Hat
Component: Installation - OpenShift
Version: 1.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ga
: 1.0
Assignee: Dylan Murray
QA Contact: Sudhir Mallamprabhakara
Derek
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-08-31 14:02 UTC by Antonin Pagac
Modified: 2020-01-08 16:37 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Target Upstream Version:


Attachments (Terms of Use)
deployment.log (27.71 KB, text/plain)
2016-08-31 14:03 UTC, Antonin Pagac
no flags Details
ansible.log (335.23 KB, text/plain)
2016-08-31 14:03 UTC, Antonin Pagac
no flags Details

Description Antonin Pagac 2016-08-31 14:02:35 UTC
Description of problem:
RHV engine + 1 hypervisor, OSE master + 2 nodes; bare metal deployment with supermicro machines. In the past, this setup completed deployments of OSE with 1 node successfully.

From ansible.log:

'fatal: [rhev-ose-ose-master1.example.com]: FAILED! => {"changed": false, "failed": true, "msg": "Unable to start service atomic-openshift-node: Job for atomic-openshift-node.service failed because the control process exited with error code. See \"systemctl status atomic-openshift-node.service\" and \"journalctl -xe\" for details.\n"}'

RHV is online, I can ssh to OSE master node. I can't however see the reason for the failure. Attaching logs.

Version-Release number of selected component (if applicable):
QCI-1.0-RHEL-7-20160830.t.0

How reproducible:
Happened to me once

Steps to Reproduce:
1. Start deployment of OSE with two nodes on top of RHV with one hypervisor
2. OSE fails on 85%
3.

Actual results:
Deployment of OSE failed

Expected results:
Deployment of OSE successful

Additional info:

Comment 1 Antonin Pagac 2016-08-31 14:03:14 UTC
Created attachment 1196395 [details]
deployment.log

Comment 2 Antonin Pagac 2016-08-31 14:03:41 UTC
Created attachment 1196396 [details]
ansible.log

Comment 3 Antonin Pagac 2016-08-31 14:12:07 UTC
After resuming the failing task in Satellite it fails again:

'failed: [rhev-ose-ose-master1.example.com] => {"changed": true, "cmd": "atomic-openshift-installer -u -c /tmp/atomic-openshift-installer.answers.cfg.yml install", "delta": "0:00:22.073518", "end": "2016-08-31 14:06:55.753432", "rc": 1, "start": "2016-08-31 14:06:33.679914", "warnings": []}'

Service atomic-openshift-node.service is running on master.

Comment 4 Dylan Murray 2016-08-31 15:38:47 UTC
https://github.com/fusor/fusor/pull/1200.

This PR addresses the issue where attempting to resume the task failed. The failed installation is an issue with the atomic-openshift-installer itself and not a bug for us.

Comment 6 Dylan Murray 2016-09-01 15:48:15 UTC
In the compose as of 8/31

Comment 7 Antonin Pagac 2016-09-11 15:45:30 UTC
Verified in QCI-1.0-RHEL-7-20160908.1.

I'm now able to resume the task in Satellite.


Note You need to log in before you can comment on or make changes to this bug.