Bug 1500897

Summary: openshift-master/restart_services.yml fails with new 3.6 master installs
Product: OpenShift Container Platform Reporter: Ryan Howe <rhowe>
Component: InstallerAssignee: Michael Gugino <mgugino>
Status: CLOSED ERRATA QA Contact: Gaoyun Pei <gpei>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.6.0CC: aos-bugs, jokerman, mmccomas, rhowe
Target Milestone: ---   
Target Release: 3.9.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-03-28 14:07:14 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Ryan Howe 2017-10-11 17:02:20 UTC
Description of problem:

With a single master set install in 3.6 the master services are now split up to api and controllers services like an HA install. With this change the atomic-openshift-master.service is masked and can not be restarted causing the playbook to fail to restart the master services. 


Version-Release number of the following components:

└──> rpm -q openshift-ansible
openshift-ansible-3.6.173.0.21-2.git.0.44a4038.el7.noarch

└──> rpm -q ansible
ansible-2.3.2.0-2.el7.noarch

└──> ansible --version
ansible 2.3.2.0
  config file = /etc/ansible/ansible.cfg
  configured module search path = Default w/o overrides
  python version = 2.7.5 (default, May  3 2017, 07:55:04) [GCC 4.8.5 20150623 (Red Hat 4.8.5-14)]


https://github.com/openshift/openshift-ansible/blob/release-3.6/playbooks/common/openshift-master/restart.yml#L7
https://github.com/openshift/openshift-ansible/blob/release-3.6/playbooks/common/openshift-master/restart_services.yml


How reproducible:
100%

Steps to Reproduce:
1. Fresh 3.6 install
2. Redeploy certs playbook run after install 

/usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/redeploy-master-certificates

Actual results:

TASK [Restart master] ***********************************************************************************************************************************************************
task path: /usr/share/ansible/openshift-ansible/playbooks/common/openshift-master/restart_services.yml:2

atal: [master.openshift.com]: FAILED! => {
    "changed": false, 
    "failed": true, 
    "invocation": {
        "module_args": {
            "daemon_reload": false, 
            "enabled": null, 
            "masked": null, 
            "name": "atomic-openshift-master", 
            "no_block": false, 
            "state": "restarted", 
            "user": false
        }
    }
}

MSG:

Unable to start service atomic-openshift-master: Failed to start atomic-openshift-master.service: Unit is masked.


Expected results:

For the services to be restarted

Comment 3 Ryan Howe 2017-10-11 17:34:41 UTC
Correction a single master install is not the default in 3.6 but is defaulted in master branch. I would assume this will be in 3.7 

This is still a bug for clusters that are only running a single master but split the services to api and controller.

Comment 4 Jan Chaloupka 2017-11-02 10:27:53 UTC
> This is still a bug for clusters that are only running a single master but split the services to api and controller.

So it affects only clusters that have been deployed with openshift-ansible 3.6 (and lower) and where the master service split into master-api and master-controllers has been done manually (without running the openshift-ansible)? If not done manually, what playbook (or approach) has been used to split the services?

Comment 5 Ryan Howe 2017-11-13 22:01:11 UTC
Unsure how they ended up with the following services with a single master install. They did however get the following services after an install.

atomic-openshift-master-api.service                                                 
atomic-openshift-master-controllers.service                                         
atomic-openshift-node.service                                         


The issue still stands that if only one master is listed and the services are split the installer will never restart the servers correctly.

Comment 7 Michael Gugino 2018-01-25 18:30:38 UTC
PR Created: https://github.com/openshift/openshift-ansible/pull/6876

This appears to still be the case in master.  If openshift_master_ha != True, the  services are not restarted.

Since single masters now use the same service names as ha masters, this resulted in a condition where single masters could not have their services restarted by this play.

One could argue the necessity of a play to restart services on a single host, but since we provide the play it might as well be useful.

Comment 9 Michael Gugino 2018-01-25 18:38:43 UTC
3.7 Backport created: https://github.com/openshift/openshift-ansible/pull/6877

Comment 10 Scott Dodson 2018-02-01 15:31:35 UTC
Mike,

Need to clone this for 3.7 once QE verifies this. I know you've got a Pr already but we need one bug per release.

Comment 12 Johnny Liu 2018-02-06 07:19:35 UTC
QE do not know how to reproduce this bug, in 3.6 install with one single master, the master service is never split into api and controllers services like an HA install (that is 3.7 new change).

I also checked 3.7 openshift-ansible code, there is no any restart *master* task in playbooks/common/openshift-master/restart_services.yml
$ git describe
openshift-ansible-3.7.9-1

$ cat playbooks/common/openshift-master/restart_services.yml
---
- name: Restart master API
  service:
    name: "{{ openshift.common.service_type }}-master-api"
    state: restarted
  when: openshift_master_ha | bool
- name: Wait for master API to come back online
  wait_for:
    host: "{{ openshift.common.hostname }}"
    state: started
    delay: 10
    port: "{{ openshift.master.api_port }}"
    timeout: 600
  when: openshift_master_ha | bool
- name: Restart master controllers
  service:
    name: "{{ openshift.common.service_type }}-master-controllers"
    state: restarted
  # Ignore errrors since it is possible that type != simple for
  # pre-3.1.1 installations.
  ignore_errors: true
  when: openshift_master_ha | bool


Actually I think this is an invalid test case, should be closed as NOTABUG.


Based on the PR in comment 7, dev make some enhancement for restart master services part in 3.9, QE would verify that change takes effect in 3.9 openshift-ansible installer.

Comment 13 Gaoyun Pei 2018-02-07 07:34:45 UTC
Tried the same usage scenario on latest 3.9, openshift v3.9.0-0.39.0, openshift-ansible-3.9.0-0.39.0.git.0.fea6997.el7.noarch.

Run the master certs redeployment playbook after installation, it restart master services correctly.

ansible-playbook -i host /usr/share/ansible/openshift-ansible/playbooks/openshift-master/redeploy-certificates.yml -v

PLAY [Restart masters] ******************************************************************************************************************************************************

TASK [Gathering Facts] ******************************************************************************************************************************************************
ok: [ec2-54-236-111-207.compute-1.amazonaws.com]

TASK [include_tasks] ********************************************************************************************************************************************************
skipping: [ec2-54-236-111-207.compute-1.amazonaws.com] => {"changed": false, "skip_reason": "Conditional result was False"}

TASK [openshift_master : Restart master API] ********************************************************************************************************************************
changed: [ec2-54-236-111-207.compute-1.amazonaws.com] => {"changed": true, "name": "atomic-openshift-master-api", "state": "started", "status": {"ActiveEnterTimestamp": "Tue 2018-02-06 21:28:52 EST", ...
"WorkingDirectory": "/var/lib/origin"}}

TASK [openshift_master : Wait for master API to come back online] ***********************************************************************************************************
ok: [ec2-54-236-111-207.compute-1.amazonaws.com] => {"changed": false, "elapsed": 10, "path": null, "port": 8443, "search_regex": null, "state": "started"}

TASK [openshift_master : restart master controllers] ************************************************************************************************************************
changed: [ec2-54-236-111-207.compute-1.amazonaws.com] => {"attempts": 1, "changed": true, "cmd": ["systemctl", "restart", "atomic-openshift-master-controllers"], "delta": "0:00:01.795976", "end": "2018-02-07 02:26:09.370788", "rc": 0, "start": "2018-02-07 02:26:07.574812", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}



Move this bug to verified according to Comment 12

Comment 16 errata-xmlrpc 2018-03-28 14:07:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0489