Bug 1500897
| Summary: | openshift-master/restart_services.yml fails with new 3.6 master installs | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Ryan Howe <rhowe> |
| Component: | Installer | Assignee: | Michael Gugino <mgugino> |
| Status: | CLOSED ERRATA | QA Contact: | Gaoyun Pei <gpei> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 3.6.0 | CC: | aos-bugs, jokerman, mmccomas, rhowe |
| Target Milestone: | --- | ||
| Target Release: | 3.9.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2018-03-28 14:07:14 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Ryan Howe
2017-10-11 17:02:20 UTC
Correction a single master install is not the default in 3.6 but is defaulted in master branch. I would assume this will be in 3.7 This is still a bug for clusters that are only running a single master but split the services to api and controller. > This is still a bug for clusters that are only running a single master but split the services to api and controller.
So it affects only clusters that have been deployed with openshift-ansible 3.6 (and lower) and where the master service split into master-api and master-controllers has been done manually (without running the openshift-ansible)? If not done manually, what playbook (or approach) has been used to split the services?
Unsure how they ended up with the following services with a single master install. They did however get the following services after an install. atomic-openshift-master-api.service atomic-openshift-master-controllers.service atomic-openshift-node.service The issue still stands that if only one master is listed and the services are split the installer will never restart the servers correctly. PR Created: https://github.com/openshift/openshift-ansible/pull/6876 This appears to still be the case in master. If openshift_master_ha != True, the services are not restarted. Since single masters now use the same service names as ha masters, this resulted in a condition where single masters could not have their services restarted by this play. One could argue the necessity of a play to restart services on a single host, but since we provide the play it might as well be useful. 3.7 Backport created: https://github.com/openshift/openshift-ansible/pull/6877 Mike, Need to clone this for 3.7 once QE verifies this. I know you've got a Pr already but we need one bug per release. QE do not know how to reproduce this bug, in 3.6 install with one single master, the master service is never split into api and controllers services like an HA install (that is 3.7 new change).
I also checked 3.7 openshift-ansible code, there is no any restart *master* task in playbooks/common/openshift-master/restart_services.yml
$ git describe
openshift-ansible-3.7.9-1
$ cat playbooks/common/openshift-master/restart_services.yml
---
- name: Restart master API
service:
name: "{{ openshift.common.service_type }}-master-api"
state: restarted
when: openshift_master_ha | bool
- name: Wait for master API to come back online
wait_for:
host: "{{ openshift.common.hostname }}"
state: started
delay: 10
port: "{{ openshift.master.api_port }}"
timeout: 600
when: openshift_master_ha | bool
- name: Restart master controllers
service:
name: "{{ openshift.common.service_type }}-master-controllers"
state: restarted
# Ignore errrors since it is possible that type != simple for
# pre-3.1.1 installations.
ignore_errors: true
when: openshift_master_ha | bool
Actually I think this is an invalid test case, should be closed as NOTABUG.
Based on the PR in comment 7, dev make some enhancement for restart master services part in 3.9, QE would verify that change takes effect in 3.9 openshift-ansible installer.
Tried the same usage scenario on latest 3.9, openshift v3.9.0-0.39.0, openshift-ansible-3.9.0-0.39.0.git.0.fea6997.el7.noarch.
Run the master certs redeployment playbook after installation, it restart master services correctly.
ansible-playbook -i host /usr/share/ansible/openshift-ansible/playbooks/openshift-master/redeploy-certificates.yml -v
PLAY [Restart masters] ******************************************************************************************************************************************************
TASK [Gathering Facts] ******************************************************************************************************************************************************
ok: [ec2-54-236-111-207.compute-1.amazonaws.com]
TASK [include_tasks] ********************************************************************************************************************************************************
skipping: [ec2-54-236-111-207.compute-1.amazonaws.com] => {"changed": false, "skip_reason": "Conditional result was False"}
TASK [openshift_master : Restart master API] ********************************************************************************************************************************
changed: [ec2-54-236-111-207.compute-1.amazonaws.com] => {"changed": true, "name": "atomic-openshift-master-api", "state": "started", "status": {"ActiveEnterTimestamp": "Tue 2018-02-06 21:28:52 EST", ...
"WorkingDirectory": "/var/lib/origin"}}
TASK [openshift_master : Wait for master API to come back online] ***********************************************************************************************************
ok: [ec2-54-236-111-207.compute-1.amazonaws.com] => {"changed": false, "elapsed": 10, "path": null, "port": 8443, "search_regex": null, "state": "started"}
TASK [openshift_master : restart master controllers] ************************************************************************************************************************
changed: [ec2-54-236-111-207.compute-1.amazonaws.com] => {"attempts": 1, "changed": true, "cmd": ["systemctl", "restart", "atomic-openshift-master-controllers"], "delta": "0:00:01.795976", "end": "2018-02-07 02:26:09.370788", "rc": 0, "start": "2018-02-07 02:26:07.574812", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}
Move this bug to verified according to Comment 12
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:0489 |