Bug 1295911 - errors raised starting controllers during ha install
Summary: errors raised starting controllers during ha install
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 3.1.0
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: ---
Assignee: Andrew Butcher
QA Contact: Ma xiaoqiang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-01-05 18:43 UTC by Matthew Farrellee
Modified: 2016-07-04 00:47 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-01-27 19:43:59 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2016:0075 0 normal SHIPPED_LIVE Red Hat OpenShift Enterprise atomic-openshift-utils bug fix update 2016-01-28 00:42:22 UTC

Description Matthew Farrellee 2016-01-05 18:43:44 UTC
Description of problem:

non-fatal errors are raised during installation of ha masters.

the errors result in delays during install and the appearance of failure.


Version-Release number of selected component (if applicable):

3.0.20-1.git.0.3703f1b.el7aos.noarch


How reproducible:

100%


Steps to Reproduce:
1. ha master inventory, e.g.
[masters]
192.1.0.[3:5] openshift_hostname="{{ ansible_default_ipv4.address}}" openshift_public_hostname="{{ ansible_default_ipv4.address }}" openshift_public_ip="{{ ansible_default_ipv4.address }}"

2. ansible-playbook -i inventory /usr/share/ansible/openshift-ansible/playbooks/byo/config.yml

3. there is no 3


Actual results:

TASK: [openshift_master | Start and enable master controller] *****************
failed: [192.1.0.3] => {"failed": true}
msg: Job for atomic-openshift-master-controllers.service failed because the control process exited with error code. See "systemctl status atomic-openshift-master-controllers.service" and "journalctl -xe" for details.
...ignoring

note, the restart attempt on 192.1.0.3 succeeds and i can confirm the master is running via systemctl status.

TASK: [openshift_master | Start and enable master controller] *****************
failed: [192.1.0.4] => {"failed": true}
msg: Job for atomic-openshift-master-controllers.service failed because the control process exited with error code. See "systemctl status atomic-openshift-master-controllers.service" and "journalctl -xe" for details.
...ignoring

NOTIFIED: [openshift_master | restart master controllers] *********************
failed: [192.1.0.4] => {"failed": true}
msg: Job for atomic-openshift-master-controllers.service failed because the control process exited with error code. See "systemctl status atomic-openshift-master-controllers.service" and "journalctl -xe" for details.
...ignoring

TASK: [openshift_master | Start and enable master controller] *****************
failed: [192.1.0.5] => {"failed": true}
msg: Job for atomic-openshift-master-controllers.service failed because the control process exited with error code. See "systemctl status atomic-openshift-master-controllers.service" and "journalctl -xe" for details.
...ignoring

NOTIFIED: [openshift_master | restart master controllers] *********************
failed: [192.1.0.5] => {"failed": true}
msg: Job for atomic-openshift-master-controllers.service failed because the control process exited with error code. See "systemctl status atomic-openshift-master-controllers.service" and "journalctl -xe" for details.
...ignoring



Expected results:

no errors


Additional info:

the master controllers are in fact running, they're just waiting on a lease -

ssh 192.1.0.4 systemctl status atomic-openshift-master-controllers.service
● atomic-openshift-master-controllers.service - Atomic OpenShift Master Controllers
   Loaded: loaded (/usr/lib/systemd/system/atomic-openshift-master-controllers.service; enabled; vendor preset: disabled)
   Active: activating (start) since Sun 2016-01-03 22:40:31 EST; 13s ago
     Docs: https://github.com/openshift/origin
 Main PID: 21016 (openshift)
   CGroup: /system.slice/atomic-openshift-master-controllers.service
           └─21016 /usr/bin/openshift start master controllers --config=/etc/origin/master/master-config.yaml --loglevel=2 --listen=https://0.0.0.0:8444
Jan 03 22:40:31 192.1.0.4 systemd[1]: Starting Atomic OpenShift Master Controllers...
Jan 03 22:40:32 192.1.0.4 atomic-openshift-master-controllers[21016]: I0103 22:40:32.525344   21016 plugins.go:71] No cloud provider specified.
Jan 03 22:40:32 192.1.0.4 atomic-openshift-master-controllers[21016]: I0103 22:40:32.682340   21016 start_master.go:410] Starting controllers on 0.0.0.0:8444 (v3.1.0.4-16-g112fcc4)
Jan 03 22:40:32 192.1.0.4 atomic-openshift-master-controllers[21016]: I0103 22:40:32.682995   21016 start_master.go:414] Using images from "openshift3/ose-<component>:v3.1.0.4"
Jan 03 22:40:32 192.1.0.4 atomic-openshift-master-controllers[21016]: I0103 22:40:32.683346   21016 master.go:232] Started health checks at 0.0.0.0:8444
Jan 03 22:40:32 192.1.0.4 atomic-openshift-master-controllers[21016]: I0103 22:40:32.683770   21016 master_config.go:250] Attempting to acquire controller lease as master-t3uwlee7, renewing every 30 seconds

ssh 192.1.0.5 systemctl status atomic-openshift-master-controllers.service
● atomic-openshift-master-controllers.service - Atomic OpenShift Master Controllers
   Loaded: loaded (/usr/lib/systemd/system/atomic-openshift-master-controllers.service; enabled; vendor preset: disabled)
   Active: activating (start) since Sun 2016-01-03 22:44:22 EST; 36s ago
     Docs: https://github.com/openshift/origin
 Main PID: 21032 (openshift)
   CGroup: /system.slice/atomic-openshift-master-controllers.service
           └─21032 /usr/bin/openshift start master controllers --config=/etc/origin/master/master-config.yaml --loglevel=2 --listen=https://0.0.0.0:8444
Jan 03 22:44:22 192.1.0.5 systemd[1]: Starting Atomic OpenShift Master Controllers...
Jan 03 22:44:24 192.1.0.5 atomic-openshift-master-controllers[21032]: I0103 22:44:24.432969   21032 plugins.go:71] No cloud provider specified.
Jan 03 22:44:24 192.1.0.5 atomic-openshift-master-controllers[21032]: I0103 22:44:24.792731   21032 start_master.go:410] Starting controllers on 0.0.0.0:8444 (v3.1.0.4-16-g112fcc4)
Jan 03 22:44:24 192.1.0.5 atomic-openshift-master-controllers[21032]: I0103 22:44:24.792809   21032 start_master.go:414] Using images from "openshift3/ose-<component>:v3.1.0.4"
Jan 03 22:44:24 192.1.0.5 atomic-openshift-master-controllers[21032]: I0103 22:44:24.793514   21032 master.go:232] Started health checks at 0.0.0.0:8444
Jan 03 22:44:24 192.1.0.5 atomic-openshift-master-controllers[21032]: I0103 22:44:24.794837   21032 master_config.go:250] Attempting to acquire controller lease as master-256rr3ye, renewing every 30 seconds


this may be related to https://github.com/ansible/ansible-modules-core/issues/2265

Comment 1 Scott Dodson 2016-01-05 19:08:32 UTC
I believe this is actually already worked around via https://github.com/openshift/openshift-ansible/pull/1094

With a real fix in https://github.com/openshift/origin/pull/6275

Comment 3 Gaoyun Pei 2016-01-08 06:00:22 UTC
Test with openshift-ansible-3.0.24-1.git.0.42b0745.el7aos.noarch.

During installation, no error of atomic-openshift-master-controllers.service shown.

[root@openshift-v3 tmp]# grep -e "Start and enable master controller" -e "restart master controllers" -A 1 ansible.log 
2016-01-08 13:40:56,745 p=22797 u=root |  TASK: [openshift_master | Start and enable master controller] ***************** 
2016-01-08 13:41:00,618 p=22797 u=root |  changed: [10.x.x.1]
--
2016-01-08 13:41:07,164 p=22797 u=root |  NOTIFIED: [openshift_master | restart master controllers] ********************* 
2016-01-08 13:41:07,286 p=22797 u=root |  skipping: [10.x.x.1]
--
2016-01-08 13:43:03,062 p=22797 u=root |  TASK: [openshift_master | Start and enable master controller] ***************** 
2016-01-08 13:43:07,522 p=22797 u=root |  changed: [10.x.x.2]
--
2016-01-08 13:43:15,218 p=22797 u=root |  NOTIFIED: [openshift_master | restart master controllers] ********************* 
2016-01-08 13:43:15,359 p=22797 u=root |  skipping: [10.x.x.2]
--
2016-01-08 13:44:31,675 p=22797 u=root |  TASK: [openshift_master | Start and enable master controller] ***************** 
2016-01-08 13:44:34,108 p=22797 u=root |  changed: [10.x.x.3]
--
2016-01-08 13:44:39,177 p=22797 u=root |  NOTIFIED: [openshift_master | restart master controllers] ********************* 
2016-01-08 13:44:39,278 p=22797 u=root |  skipping: [10.x.x.3]

master controller service is running on all the 3 masters, 1 is starting, the left 2 is waiting for a lease.

Comment 5 errata-xmlrpc 2016-01-27 19:43:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:0075


Note You need to log in before you can comment on or make changes to this bug.