Bug 1685951
Summary: | [RFE] HC prerequisites are not carried out before cluster upgrade | ||
---|---|---|---|
Product: | [oVirt] ovirt-ansible-collection | Reporter: | SATHEESARAN <sasundar> |
Component: | cluster-upgrade | Assignee: | Ritesh Chikatwar <rchikatw> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | SATHEESARAN <sasundar> |
Severity: | medium | Docs Contact: | |
Priority: | high | ||
Version: | unspecified | CC: | bugs, godas, guillaume.pavese, lleistne, lsvaty, mperina, omachace, rchikatw, rcyriac, rhs-bugs, sabose, sasundar |
Target Milestone: | ovirt-4.4.1 | Keywords: | FutureFeature, Reopened |
Target Release: | --- | Flags: | sasundar:
ovirt-4.4?
|
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | rhv-4.4.0-29 | Doc Type: | No Doc Update |
Doc Text: | Story Points: | --- | |
Clone Of: | 1500728 | Environment: | |
Last Closed: | 2020-08-05 06:25:28 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | Gluster | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1689838 | ||
Bug Blocks: | 1500728 |
Description
SATHEESARAN
2019-03-06 11:45:30 UTC
HC pre-requisites includes: 1. stopping geo-rep session if anything is in progress 2. check for self-heal progress, if self-heal in progress, fail the upgrade. 3. check for brick quorum is met for the volume. 4. Stop glusterfs processes, glusterd service Is it easy to fix or needs more time? Hi sas, Looks like problem with HE fqdn In log i can see Error: Failed to read response: [(<pycurl.Curl object at 0x7fa4270569d8>, 6, 'Could not resolve host: hostedenginesm3.lab.eng.blr.********.com; Unknown error')] So because of HE fqdn not resolved the api call failed I think Here is full error: 2019-02-26 12:52:46,070 p=29986 u=root | TASK [ovirt.cluster-upgrade : Get hosts] ************************************************************************************************************************************************************************** 2019-02-26 12:52:46,070 p=29986 u=root | task path: /usr/share/ansible/roles/ovirt.cluster-upgrade/tasks/main.yml:24 2019-02-26 12:52:46,276 p=29986 u=root | Using module file /usr/lib/python2.7/site-packages/ansible/modules/cloud/ovirt/ovirt_host_facts.py 2019-02-26 12:52:46,517 p=29986 u=root | The full traceback is: Traceback (most recent call last): File "/tmp/ansible_ovirt_host_facts_payload_N4_GxY/__main__.py", line 88, in main all_content=module.params['all_content'], File "/usr/lib64/python2.7/site-packages/ovirtsdk4/services.py", line 11714, in list return self._internal_get(headers, query, wait) File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line 211, in _internal_get return future.wait() if wait else future File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line 54, in wait response = self._connection.wait(self._context) File "/usr/lib64/python2.7/site-packages/ovirtsdk4/__init__.py", line 496, in wait return self.__wait(context, failed_auth) File "/usr/lib64/python2.7/site-packages/ovirtsdk4/__init__.py", line 510, in __wait raise Error("Failed to read response: {}".format(err_list)) Error: Failed to read response: [(<pycurl.Curl object at 0x7fa4270569d8>, 6, 'Could not resolve host: hostedenginesm3.lab.eng.blr.********.com; Unknown error')] 2019-02-26 12:52:46,518 p=29986 u=root | fatal: [localhost]: FAILED! => { "changed": false, "invocation": { "module_args": { "all_content": false, "fetch_nested": false, "nested_attributes": [], "pattern": "cluster=Default name=* status=up" } }, "msg": "Failed to read response: [(<pycurl.Curl object at 0x7fa4270569d8>, 6, 'Could not resolve host: hostedenginesm3.lab.eng.blr.********.com; Unknown error')]" Not that easy, need more time. Should RHV infra team work on this or do you (gluster team) work on this? Your issue is that you are using password which is also contained in hostname. So it's obfuscated for more info see: https://github.com/ansible/ansible/issues/19278 (In reply to Ondra Machacek from comment #4) > Not that easy, need more time. Should RHV infra team work on this or do you > (gluster team) work on this? > > Your issue is that you are using password which is also contained in > hostname. So it's obfuscated for more info see: > https://github.com/ansible/ansible/issues/19278 This is already known issue within Ansible no_log implementation, I don't think we should do anything about it with cluster-upgrade role, this needs to be fixed in Ansible itself: https://github.com/ansible/ansible/issues/19278 My recommendation is to use safe passwords instead of well-known strings which can be part of FQDNS, domains, ... (In reply to Martin Perina from comment #5) > (In reply to Ondra Machacek from comment #4) > > Not that easy, need more time. Should RHV infra team work on this or do you > > (gluster team) work on this? > > > > Your issue is that you are using password which is also contained in > > hostname. So it's obfuscated for more info see: > > https://github.com/ansible/ansible/issues/19278 > > This is already known issue within Ansible no_log implementation, I don't > think we should do anything about it with cluster-upgrade role, this needs > to be fixed in Ansible itself: > > https://github.com/ansible/ansible/issues/19278 > > My recommendation is to use safe passwords instead of well-known strings > which can be part of FQDNS, domains, ... Thanks Martin & Ondra, Yes, initially the password was part of the hostname used. But that's not the problem here. RHHI-V needs set of pre-requisites to be done and that's been taken care while testing with this cluster-upgrade role Please check comment1 for the set of pre-requisites. As gluster team is aware of these set of pre-requisites, this upgrade-cluster role should be updated for HC environment While testing the upgrade, i see a exception error while the host goes for a reboot But i see once the host comes up, its updated to the latest image and all the services running. Error: ===== ****************************************** 2019-03-14 14:14:56,893 p=61390 u=root | ok: [localhost] 2019-03-14 14:14:56,955 p=61390 u=root | TASK [ovirt.cluster-upgrade : Upgrade host] *********************************************************************************************************************************************************************** 2019-03-14 14:28:24,163 p=61390 u=root | An exception occurred during task execution. To see the full traceback, use -vvv. The error was: Exception: Error while waiting on result state of the entity. 2019-03-14 14:28:24,163 p=61390 u=root | fatal: [localhost]: FAILED! => {"changed": false, "msg": "Error while waiting on result state of the entity."} 2019-03-14 14:28:24,225 p=61390 u=root | TASK [ovirt.cluster-upgrade : Log event about cluster upgrade failed] ********************************************************************************************************************************************* 2019-03-14 14:28:24,654 p=61390 u=root | changed: [localhost] 2019-03-14 14:28:24,716 p=61390 u=root | TASK [ovirt.cluster-upgrade : Set original cluster policy] ******************************************************************************************************************************************************** 2019-03-14 14:28:25,224 p=61390 u=root | changed: [localhost] 2019-03-14 14:28:25,287 p=61390 u=root | TASK [ovirt.cluster-upgrade : Start again stopped VMs] ************************************************************************************************************************************************************ 2019-03-14 14:28:25,363 p=61390 u=root | TASK [ovirt.cluster-upgrade : Start again pin to host VMs] ******************************************************************************************************************************************************** 2019-03-14 14:28:25,442 p=61390 u=root | TASK [ovirt.cluster-upgrade : Logout from oVirt] ****************************************************************************************************************************************************************** 2019-03-14 14:28:25,457 p=61390 u=root | skipping: [localhost] 2019-03-14 14:28:25,458 p=61390 u=root | PLAY RECAP ******************************************************************************************************************************************************************************************************** 2019-03-14 14:28:25,459 p=61390 u=root | localhost : ok=22 changed=5 unreachable=0 failed=1 Ondra, Could you please take a look?Attaching the required files Gobinda, assigning to you to take a look I am clearing the needinfo here as the issue was reported in bug 1689838 This bug is targeted to 4.4.3 and in modified state. can we re-target to 4.4.0 and move to QA? yes can be targeted... HC pre-requisites includes: 1. Engine fqdn should not contain in host password. 2. stopping geo-rep session if anything is in progress. 3. check for brick quorum is met for the volume. Clearing needinfo as it's already targeted to 4.4.0 Moving to 4.4.1 since 4.4.0 has been already released Tested with ovirt-ansible-cluster-upgrade-1.2.3 and RHV Manager 4.4.1. The feature works good. It updates the cluster and proceeds to upgrade all the hosts in the cluster. As there are no real upgrade image is available, all the testing is done with interim build RHVH images All the prerequisites are handled well This bugzilla is included in oVirt 4.4.1 release, published on July 8th 2020. Since the problem described in this bug report should be resolved in oVirt 4.4.1 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report. |