Description of problem: Run upgrade against non-ha containerized env, upgrade failed due to installer try to find atomic-openshift-master-api service. TASK [Ensure HA Master is running] ********************************************* fatal: [qe-master]: FAILED! => { "changed": false, "failed": true } MSG: Could not find the requested service atomic-openshift-master-api: host # ls -la /etc/sysconfig/ |grep atomic -rw-r--r--. 1 root root 304 Sep 25 03:39 atomic-openshift-master -rw-r--r--. 1 root root 95 Sep 25 03:46 atomic-openshift-node -rw-r--r--. 1 root root 141 Sep 25 03:47 atomic-openshift-node-dep PLAY RECAP ********************************************************************* localhost : ok=11 changed=0 unreachable=0 failed=0 qe-etcd : ok=41 changed=4 unreachable=0 failed=0 qe-master : ok=90 changed=11 unreachable=0 failed=1 qe-node : ok=83 changed=10 unreachable=0 failed=0 Failure summary: 1. Hosts: qe-master Play: Verify master processes Task: Ensure HA Master is running Message: Could not find the requested service atomic-openshift-master-api: host Version-Release number of the following components: ansible-2.3.2.0-2.el7.noarch openshift-ansible-3.7.0-0.127.0.git.0.b9941e4.el7.noarch How reproducible: always Steps to Reproduce: 1. Install non-ha container env 2. Run upgrade 3. Actual results: Upgrade failed. Expected results: Upgrade succeed. Additional info: Please attach logs from ansible-playbook with the -vvv flag
I am not able to reproduce it, my inventory: ```ini [OSEv3:children] masters nodes etcd [OSEv3:vars] ansible_ssh_user = root deployment_type = openshift-enterprise openshift_deployment_type = openshift-enterprise osm_use_cockpit = false openshift_release = v3.7 openshift_docker_insecure_registries=brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888 openshift_docker_additional_registries="brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888,registry.ops.openshift.com" containerized=True openshift_pkg_version=-3.6.173.0.37 openshift_version=3.6.173.0.39 [masters] 10.8.174.18 ansible_ssh_host=10.8.174.18 [nodes] 10.8.174.183 ansible_ssh_host=10.8.174.183 [etcd] 10.8.174.18 ansible_ssh_host=10.8.174.18 ``` can you share your inventory file?
Both openshift_pkg_version and openshift_version are supposed to be commented out, they were used to deploy the v3.6 cluster.
Please, see https://github.com/openshift/openshift-ansible/pull/4832. Citing Clayton: "Native clustering is the default configuration mode, even when only one master is configured" [1] "We don't support upgrade from non-HA to HA" [2] All the changes are for OCP 3.7+ so the error message is expected. The only item to complete is to document this case. [1] https://github.com/openshift/openshift-ansible/pull/4832#issue-244862534 [2] https://github.com/openshift/openshift-ansible/pull/4832#discussion_r130642101
Reproduced always with v3.7.0-0.127.0. # rpm -qa|grep openshift openshift-ansible-filter-plugins-3.7.0-0.127.0.git.0.b9941e4.el7.noarch openshift-ansible-playbooks-3.7.0-0.127.0.git.0.b9941e4.el7.noarch atomic-openshift-clients-3.7.0-0.127.0.git.0.459b70b.el7.x86_64 openshift-ansible-docs-3.7.0-0.127.0.git.0.b9941e4.el7.noarch openshift-ansible-lookup-plugins-3.7.0-0.127.0.git.0.b9941e4.el7.noarch openshift-ansible-roles-3.7.0-0.127.0.git.0.b9941e4.el7.noarch atomic-openshift-utils-3.7.0-0.127.0.git.0.b9941e4.el7.noarch openshift-ansible-3.7.0-0.127.0.git.0.b9941e4.el7.noarch openshift-ansible-callback-plugins-3.7.0-0.127.0.git.0.b9941e4.el7.noarch Inventory and upgrade.log in attachment.
No any explicit claim saying that installer will not support non-ha containerized ocp upgrade from v3.6 to v3.7 till now. What QE received is just that single master service will be spitted to master-api.service and master-controller.service in 3.7, so for upgrade process, it may need not only a detect but also a transfer to complete this split just as point 2 in [1]. To document this case, it seems only a compromise for this issue but not the best solution, however, it indeed should be tracked as a bug before it come to the last conclusion. [1] https://github.com/openshift/openshift-ansible/issues/4979
Clayton, can you more elaborate on the issue and comment #7?
I would expect the upgrade to re-run openshift-master systemd_units.xml task on each master node, which would convert the monolithic master process into api and controller units.
Once the control plane check passes, the non-ha master is upgraded to ha without any problems. So only the "Ensure HA Master is running" tasks need to be modified so they check the non-ha service if available.
Upstream PR: https://github.com/openshift/openshift-ansible/pull/5845
Version: openshift-ansible-docs-3.7.0-0.178.0.git.0.27a1039.el7.noarch Steps: 1. Container install ocp v3.6(one master_etcd+one node) 2. Upgrade ocp to latest v3.7 Upgrade succeed with atomic-openshift-master-api and atomic-openshift-master-controllers services running. # docker ps|grep master 01116881e2bf openshift3/ose:v3.7.0 "/usr/bin/openshift s" 9 minutes ago Up 9 minutes atomic-openshift-master-controllers 897c7f49f878 openshift3/ose:v3.7.0 "/usr/bin/openshift s" 10 minutes ago Up 10 minutes atomic-openshift-master-api 77d439489612 openshift3/ose:v3.6.173.0.59 "/usr/bin/openshift s" 12 minutes ago Up 12 minutes atomic-openshift-master It is strange to keep original atomic-openshift-master service together with api and controllers service after upgrade. Will track it in a new bug if it will cause new problem. As for this bug, upgrade works well against non-ha containerzied ocp.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:3188