Bug 1505537 - Installer hangs at "Wait for master controller service to start on first master"
Summary: Installer hangs at "Wait for master controller service to start on first master"
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 3.7.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 3.7.0
Assignee: Russell Teague
QA Contact: Vikas Laad
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-10-23 19:54 UTC by Vikas Laad
Modified: 2017-11-28 22:18 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
A bug in Ansible was causing the pause module to hang when running playbooks in a background process. The pause was not necessary as the master controller service does not need to be stagger started. The tasks were refactored to remove the pause and start all master controller services at the same time.
Clone Of:
Environment:
Last Closed: 2017-11-28 22:18:47 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2017:3188 0 normal SHIPPED_LIVE Moderate: Red Hat OpenShift Container Platform 3.7 security, bug, and enhancement update 2017-11-29 02:34:54 UTC

Description Vikas Laad 2017-10-23 19:54:57 UTC
Description of problem:
Installer hangs at the following task
TASK [openshift_master : Wait for master controller service to start on first master] ****************************************
task path: /root/openshift-ansible/roles/openshift_master/tasks/main.yml:337
Pausing for 15 seconds
(ctrl+C then 'C' = continue early, ctrl+C then 'A' = abort)

When I ssh to master I can see contriller service is running.

Version-Release number of the following components:
rpm -q openshift-ansible
I am running playbook from openshift-ansible code
head is be7d536260d562e3fff6d6ebf762f6bc2e4e9879

rpm -q ansible
ansible-2.4.0.0-3.el7.noarch

ansible --version
ansible 2.4.0.0
  config file = /root/openshift-ansible/ansible.cfg
  configured module search path = [u'/root/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/lib/python2.7/site-packages/ansible
  executable location = /usr/bin/ansible
  python version = 2.7.5 (default, May  3 2017, 07:55:04) [GCC 4.8.5 20150623 (Red Hat 4.8.5-14)]


Steps to Reproduce:
1. Create aws instances
2. Run byo/config playbook from openshift-ansible

Actual results:
TASK [openshift_master : Wait for master controller service to start on first master] ****************************************
task path: /root/openshift-ansible/roles/openshift_master/tasks/main.yml:337
Pausing for 15 seconds
(ctrl+C then 'C' = continue early, ctrl+C then 'A' = abort)

Expected results:
Playbook should finish

Additional info:
Attaching logs and inventory file.

Comment 3 Vikas Laad 2017-10-23 19:59:21 UTC
root@ip-172-31-14-171: ~ # systemctl status atomic-openshift-master-controllers.service 
● atomic-openshift-master-controllers.service
   Loaded: loaded (/etc/systemd/system/atomic-openshift-master-controllers.service; enabled; vendor preset: disabled)
   Active: active (running) since Mon 2017-10-23 18:11:02 UTC; 1h 47min ago
 Main PID: 32327 (runc)
   Memory: 3.4M
   CGroup: /system.slice/atomic-openshift-master-controllers.service
           └─32327 /bin/runc --systemd-cgroup run atomic-openshift-master-controllers

Oct 23 19:56:31 ip-172-31-14-171.us-west-2.compute.internal atomic-openshift-master-controllers[32327]: W1023 19:56:31.237717   32340 reflector.go:343] github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/client/informers/informers_g...en compacted
Oct 23 19:56:37 ip-172-31-14-171.us-west-2.compute.internal atomic-openshift-master-controllers[32327]: W1023 19:56:37.247487   32340 reflector.go:343] github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/client/informers/informers_g...en compacted
Oct 23 19:56:37 ip-172-31-14-171.us-west-2.compute.internal atomic-openshift-master-controllers[32327]: W1023 19:56:37.247545   32340 reflector.go:343] github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/client/informers/informers_g...en compacted
Oct 23 19:56:53 ip-172-31-14-171.us-west-2.compute.internal atomic-openshift-master-controllers[32327]: W1023 19:56:53.174642   32340 reflector.go:343] github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/client/informers/informers_g...en compacted
Oct 23 19:57:06 ip-172-31-14-171.us-west-2.compute.internal atomic-openshift-master-controllers[32327]: W1023 19:57:06.195799   32340 reflector.go:343] github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/client/informers/informers_g...en compacted
Oct 23 19:57:33 ip-172-31-14-171.us-west-2.compute.internal atomic-openshift-master-controllers[32327]: W1023 19:57:33.240900   32340 reflector.go:343] github.com/openshift/origin/pkg/build/generated/informers/internalversion/factory.go:...en compacted
Oct 23 19:58:07 ip-172-31-14-171.us-west-2.compute.internal atomic-openshift-master-controllers[32327]: W1023 19:58:07.097670   32340 reflector.go:343] github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/client/informers/informers_g...en compacted
Oct 23 19:58:07 ip-172-31-14-171.us-west-2.compute.internal atomic-openshift-master-controllers[32327]: W1023 19:58:07.097682   32340 reflector.go:343] github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/client/informers/informers_g...en compacted
Oct 23 19:58:15 ip-172-31-14-171.us-west-2.compute.internal atomic-openshift-master-controllers[32327]: W1023 19:58:15.110434   32340 reflector.go:343] github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/client/informers/informers_g...en compacted
Oct 23 19:58:34 ip-172-31-14-171.us-west-2.compute.internal atomic-openshift-master-controllers[32327]: W1023 19:58:34.141808   32340 reflector.go:343] github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/client/informers/informers_g...en compacted
Hint: Some lines were ellipsized, use -l to show in full.

Comment 4 Scott Dodson 2017-10-23 20:25:05 UTC
The task you've referenced is an unconditional 15 second pause and I don't see from your logs that the run aborted in anyway. Did you terminate it before the 15 second pause period?

Comment 5 Vikas Laad 2017-10-23 20:48:45 UTC
Playbook is still stuck at that task after almost 2 hours

root      12662   1497  0 18:55 pts/1    00:00:38 /usr/bin/python2 /usr/bin/ansible-playbook -i inv openshift-ansible/playbooks/byo/config.yml
root      18567  12662  0 18:59 pts/1    00:00:00 /usr/bin/python2 /usr/bin/ansible-playbook -i inv openshift-ansible/playbooks/byo/config.yml

this is current time on that machine 

root@ip-172-31-0-206: ~/openshift-ansible # date
Mon Oct 23 20:47:31 UTC 2017

Comment 6 Russell Teague 2017-10-24 14:08:03 UTC
Proposed: https://github.com/openshift/openshift-ansible/pull/5861

Comment 7 Russell Teague 2017-10-24 14:13:18 UTC
The attached logs appear to be truncated.  This issue may be related to Ansible 2.4 usage of 'pause' in non-interactive shells.  We are removing the pauses since they may no longer be necessary.  The task noted above is the first instance of using 'pause' as recorded in the log so we may see this issue again on the next pause.  Continuing to investigate and attempting to reproduce.

Comment 8 Vikas Laad 2017-10-24 14:46:04 UTC
(In reply to Russell Teague from comment #7)
You are right, I was running the playbook in background

ansible-playbook -i inv openshift-ansible/playbooks/byo/config.yml -vvv > ansible.log &

Comment 9 Russell Teague 2017-10-25 15:12:14 UTC
Upstream Ansible issue: https://github.com/ansible/ansible/issues/32142

Comment 10 Russell Teague 2017-10-26 12:23:24 UTC
$ git tag --contains a07eba1e05cf3f52bf247afbec514d0af6629953
openshift-ansible-3.7.0-0.179.0

Comment 13 Vikas Laad 2017-10-30 14:22:09 UTC
verified in following version, playbook completed without any problem when run in background.

openshift v3.7.0-0.184.0

Comment 16 errata-xmlrpc 2017-11-28 22:18:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:3188


Note You need to log in before you can comment on or make changes to this bug.