Bug 1642440

Summary: HE restore code must base upon vds_unique_id table instead of hw_uuid
Product: [oVirt] ovirt-hosted-engine-setup Reporter: Polina <pagranat>
Component: GeneralAssignee: Simone Tiraboschi <stirabos>
Status: CLOSED CURRENTRELEASE QA Contact: Polina <pagranat>
Severity: high Docs Contact:
Priority: urgent    
Version: 2.2.24CC: bugs, stirabos
Target Milestone: ovirt-4.2.7Flags: rule-engine: ovirt-4.2+
rule-engine: blocker+
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: ovirt-hosted-engine-setup-2.2.30-1.el7ev Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-11-02 14:28:37 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Integration RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1568841, 1641603    
Attachments:
Description Flags
ovirt-hosted-engine-setup none

Description Polina 2018-10-24 12:24:13 UTC
Created attachment 1497012 [details]
ovirt-hosted-engine-setup

Description of problem:
In some specific cases, it could be the environment with the same host uuid in hw_uuid table. in vds_unique_id the ids are unique because for the problematic host we use workaround before the installation: uuidgen > /etc/vdsm/vdsm.id. The example of such environment is compute-ge-he-4.scl.lab.tlv.redhat.com using three hosts - cougar03.scl.lab.tlv.redhat.com, cougar04.scl.lab.tlv.redhat.com, cougar05.scl.lab.tlv.redhat.com.
While the first install the hosts are installed in engine after running the 'uuidgen > /etc/vdsm/vdsm.id' workaround. But then it fails while the backup/restore procedure.

Version-Release number of selected component (if applicable):found in 4.2.7. though relevant to all versions.


How reproducible:100%


Steps to Reproduce:
1. on all 3 hosts:
yum install http://download.eng.bos.redhat.com/brewroot/packages/ovirt-hosted-engine-setup/2.2.29/1.el7ev/noarch/ovirt-hosted-engine-setup-2.2.29-1.el7ev.noarch.rpm

2. 
[root@compute-ge-he-4 ~]# engine-backup --mode=backup --file=backup_compute-he-4 --log=log_compute-he-4_backup4.2
Backing up:
Notifying engine
- Files
- Engine database 'engine'
- DWH database 'ovirt_engine_history'
Packing into file 'backup_compute-he-4'
Notifying engine
Done.

3. Copy aside
4. Insert environment into global maintenance. 

5. Cleanup HE Storage NFS Domain.
rm -Rf /Compute_NFS/pagranat/compute-ge-he-4 on yellow-vdsb.qa.lab.tlv.redhat.com

6. Reprovisioning HE host . Copy repos to /etc/yum.repos.d/, yum update and run <yum install http://download.eng.bos.redhat.com/brewroot/packages/ovirt-hosted-engine-setup/2.2.29/1.el7ev/noarch/ovirt-hosted-engine-setup-2.2.29-1.el7ev.noarch.rpm>

7. hosted-engine --deploy --restore-from-file=backup_compute-he-4

Actual results: output :

[ ERROR ] fatal: [localhost]: FAILED! => {"ansible_facts": {"ovirt_hosts": []}, "attempts": 120, "changed": false, "deprecations": [{"msg": "The 'ovirt_hosts_facts' module is being renamed 'ovirt_host_facts'", "version": 2.8}]}
[ INFO  ] TASK [Fetch logs from the engine VM]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Set destination directory path]
[ INFO  ] ok: [localhost]ovirt-hosted-engine-setup-2.2.29-1.el7ev.noarch.rpm

[ INFO  ] TASK [Create destination directory]
[ INFO  ] changed: [localhost]
[ INFO  ] TASK [include_tasks]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Find the local appliance image]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Set local_vm_disk_path]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Give the vm time to flush dirty buffers]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Copy engine logs]
[ INFO  ] TASK [include_tasks]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Remove local vm dir]
[ INFO  ] changed: [localhost]
[ INFO  ] TASK [Remove temporary entry in /etc/hosts for the local VM]
[ INFO  ] changed: [localhost]
[ INFO  ] TASK [Notify the user about a failure]
[ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "The system may not be provisioned according to the playbook results: please check the logs for the issue, fix accordingly or re-deploy from scratch.\n"}
[ ERROR ] Failed to execute stage 'Closing up': Failed executing ansible-playbook
[ INFO  ] Stage


Expected results: restore succeeds


Additional info: 
engine=# SELECT vds_name, hw_uuid, vds_id, vds_unique_id, host_name FROM vds;
the reason is that we have vds_unique_id that different, while the hw_uuid  is exactly the same.

Comment 1 Polina 2018-10-28 13:25:07 UTC
verified on http://download.eng.bos.redhat.com/brewroot/packages/ovirt-hosted-engine-setup/2.2.30/1.el7ev/noarch/ovirt-hosted-engine-setup-2.2.30-1.el7ev.noarch.rpm
The environment where we can see two hosts with the same uuid - http://compute-ge-he-4.scl.lab.tlv.redhat.com (hosts cougar03.scl.lab.tlv.redhat.com and cougar04.scl.lab.tlv.redhat.com)

Comment 2 Sandro Bonazzola 2018-11-02 14:28:37 UTC
This bugzilla is included in oVirt 4.2.7 release, published on November 2nd 2018.

Since the problem described in this bug report should be
resolved in oVirt 4.2.7 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.