Description of problem: With a single master set install in 3.6 the master services are now split up to api and controllers services like an HA install. With this change the atomic-openshift-master.service is masked and can not be restarted causing the playbook to fail to restart the master services. Version-Release number of the following components: └──> rpm -q openshift-ansible openshift-ansible-3.6.173.0.21-2.git.0.44a4038.el7.noarch └──> rpm -q ansible ansible-2.3.2.0-2.el7.noarch └──> ansible --version ansible 2.3.2.0 config file = /etc/ansible/ansible.cfg configured module search path = Default w/o overrides python version = 2.7.5 (default, May 3 2017, 07:55:04) [GCC 4.8.5 20150623 (Red Hat 4.8.5-14)] https://github.com/openshift/openshift-ansible/blob/release-3.6/playbooks/common/openshift-master/restart.yml#L7 https://github.com/openshift/openshift-ansible/blob/release-3.6/playbooks/common/openshift-master/restart_services.yml How reproducible: 100% Steps to Reproduce: 1. Fresh 3.6 install 2. Redeploy certs playbook run after install /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/redeploy-master-certificates Actual results: TASK [Restart master] *********************************************************************************************************************************************************** task path: /usr/share/ansible/openshift-ansible/playbooks/common/openshift-master/restart_services.yml:2 atal: [master.openshift.com]: FAILED! => { "changed": false, "failed": true, "invocation": { "module_args": { "daemon_reload": false, "enabled": null, "masked": null, "name": "atomic-openshift-master", "no_block": false, "state": "restarted", "user": false } } } MSG: Unable to start service atomic-openshift-master: Failed to start atomic-openshift-master.service: Unit is masked. Expected results: For the services to be restarted
Correction a single master install is not the default in 3.6 but is defaulted in master branch. I would assume this will be in 3.7 This is still a bug for clusters that are only running a single master but split the services to api and controller.
> This is still a bug for clusters that are only running a single master but split the services to api and controller. So it affects only clusters that have been deployed with openshift-ansible 3.6 (and lower) and where the master service split into master-api and master-controllers has been done manually (without running the openshift-ansible)? If not done manually, what playbook (or approach) has been used to split the services?
Unsure how they ended up with the following services with a single master install. They did however get the following services after an install. atomic-openshift-master-api.service atomic-openshift-master-controllers.service atomic-openshift-node.service The issue still stands that if only one master is listed and the services are split the installer will never restart the servers correctly.
PR Created: https://github.com/openshift/openshift-ansible/pull/6876 This appears to still be the case in master. If openshift_master_ha != True, the services are not restarted. Since single masters now use the same service names as ha masters, this resulted in a condition where single masters could not have their services restarted by this play. One could argue the necessity of a play to restart services on a single host, but since we provide the play it might as well be useful.
3.7 Backport created: https://github.com/openshift/openshift-ansible/pull/6877
Mike, Need to clone this for 3.7 once QE verifies this. I know you've got a Pr already but we need one bug per release.
QE do not know how to reproduce this bug, in 3.6 install with one single master, the master service is never split into api and controllers services like an HA install (that is 3.7 new change). I also checked 3.7 openshift-ansible code, there is no any restart *master* task in playbooks/common/openshift-master/restart_services.yml $ git describe openshift-ansible-3.7.9-1 $ cat playbooks/common/openshift-master/restart_services.yml --- - name: Restart master API service: name: "{{ openshift.common.service_type }}-master-api" state: restarted when: openshift_master_ha | bool - name: Wait for master API to come back online wait_for: host: "{{ openshift.common.hostname }}" state: started delay: 10 port: "{{ openshift.master.api_port }}" timeout: 600 when: openshift_master_ha | bool - name: Restart master controllers service: name: "{{ openshift.common.service_type }}-master-controllers" state: restarted # Ignore errrors since it is possible that type != simple for # pre-3.1.1 installations. ignore_errors: true when: openshift_master_ha | bool Actually I think this is an invalid test case, should be closed as NOTABUG. Based on the PR in comment 7, dev make some enhancement for restart master services part in 3.9, QE would verify that change takes effect in 3.9 openshift-ansible installer.
Tried the same usage scenario on latest 3.9, openshift v3.9.0-0.39.0, openshift-ansible-3.9.0-0.39.0.git.0.fea6997.el7.noarch. Run the master certs redeployment playbook after installation, it restart master services correctly. ansible-playbook -i host /usr/share/ansible/openshift-ansible/playbooks/openshift-master/redeploy-certificates.yml -v PLAY [Restart masters] ****************************************************************************************************************************************************** TASK [Gathering Facts] ****************************************************************************************************************************************************** ok: [ec2-54-236-111-207.compute-1.amazonaws.com] TASK [include_tasks] ******************************************************************************************************************************************************** skipping: [ec2-54-236-111-207.compute-1.amazonaws.com] => {"changed": false, "skip_reason": "Conditional result was False"} TASK [openshift_master : Restart master API] ******************************************************************************************************************************** changed: [ec2-54-236-111-207.compute-1.amazonaws.com] => {"changed": true, "name": "atomic-openshift-master-api", "state": "started", "status": {"ActiveEnterTimestamp": "Tue 2018-02-06 21:28:52 EST", ... "WorkingDirectory": "/var/lib/origin"}} TASK [openshift_master : Wait for master API to come back online] *********************************************************************************************************** ok: [ec2-54-236-111-207.compute-1.amazonaws.com] => {"changed": false, "elapsed": 10, "path": null, "port": 8443, "search_regex": null, "state": "started"} TASK [openshift_master : restart master controllers] ************************************************************************************************************************ changed: [ec2-54-236-111-207.compute-1.amazonaws.com] => {"attempts": 1, "changed": true, "cmd": ["systemctl", "restart", "atomic-openshift-master-controllers"], "delta": "0:00:01.795976", "end": "2018-02-07 02:26:09.370788", "rc": 0, "start": "2018-02-07 02:26:07.574812", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []} Move this bug to verified according to Comment 12
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:0489