Bug 1414908
Summary: | OCP fails to install due to timeout running command | ||
---|---|---|---|
Product: | Red Hat Quickstart Cloud Installer | Reporter: | James Olin Oden <joden> |
Component: | Installation - OpenShift | Assignee: | jkim |
Status: | CLOSED NOTABUG | QA Contact: | Sudhir Mallamprabhakara <smallamp> |
Severity: | high | Docs Contact: | Derek <dcadzow> |
Priority: | unspecified | ||
Version: | 1.1 | CC: | arubin, bthurber, jkim, jmatthew, qci-bugzillas |
Target Milestone: | --- | Keywords: | Triaged |
Target Release: | 1.1 | ||
Hardware: | Unspecified | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2017-01-26 20:02:36 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
James Olin Oden
2017-01-19 17:19:24 UTC
I did this same deployment again, and it did succeed, but if you looked at the openstack VM's they were both running at 100% or near their. This was with compose: QCI-1.1-RHEL-7-20170118.t.1 I was not able to reproduce this bug. The following are the details of my deployment setup: - RHV & OCP (1 engine & 1 Hypervisor) - Engine : 4GB RAM, 30GB disk, 16 vCPU - Hypervisor : 16GB Ram, 70GB disk, 16vCPU - Satellite VM : 16GB Ram, 250GB disk, 16vCPU - OCP Master Node : 8GB RAM, 30GB disk, 2vCPU (30GB for docker) - OCP worker Node : 8GB RAM, 15GB disk, 1vCPU (30GB for docker) NOTE: The host machine only has 16 cores, so the actual vCPU during the time of deployment may be varied depending on the VM's work load, and the libvirt scheduling. Also, upon quick review of James Oden's setup, his VM's were set to a much higher RAM setting (50GB RAM each for Engine and Hypervisor). Seen this again, this time the failure was a different command: 2017-01-24 20:50:42,305 p=21157 u=foreman | fatal: [dirty-ocp-master1.b.b]: FAILED! => { "changed": true, "cmd": ["systemctl","restart", "atomic-openshift-master.service"], "delta": "0:01:31.783867", "end": "2017-01-24 20:50:39.756826", "failed": true, "rc": 1, "start": "2017-01-24 20:49:07.972959", "stderr": "Job for atomic-openshift-master.service failed because a timeout was exceeded. See \"systemctl status atomic-openshift-master.service\" and \"journalctl -xe\" for details.", "stdout": "", "stdout_lines": [], "warnings": [] } The OCP deploy subtask was now at 90%. However the CPU utilization of the nodes is going between 80%-100%. When I run top on one of the OCP nodes I'm seeing the openshift and etcd processes taking up most of the CPU. Open shift is going from like 25% to sometimes 70% utilization. I don't know what this looks like on a normal system though. We believe this issue is due to low available hardware resources. |