Bug 1642440 - HE restore code must base upon vds_unique_id table instead of hw_uuid
Summary: HE restore code must base upon vds_unique_id table instead of hw_uuid
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-hosted-engine-setup
Classification: oVirt
Component: General
Version: 2.2.24
Hardware: x86_64
OS: Linux
urgent
high
Target Milestone: ovirt-4.2.7
: ---
Assignee: Simone Tiraboschi
QA Contact: Polina
URL:
Whiteboard:
Depends On:
Blocks: 1568841 1641603
TreeView+ depends on / blocked
 
Reported: 2018-10-24 12:24 UTC by Polina
Modified: 2018-11-02 14:28 UTC (History)
2 users (show)

Fixed In Version: ovirt-hosted-engine-setup-2.2.30-1.el7ev
Clone Of:
Environment:
Last Closed: 2018-11-02 14:28:37 UTC
oVirt Team: Integration
Embargoed:
rule-engine: ovirt-4.2+
rule-engine: blocker+


Attachments (Terms of Use)
ovirt-hosted-engine-setup (531.77 KB, application/x-gzip)
2018-10-24 12:24 UTC, Polina
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 95108 0 ovirt-hosted-engine-setup-2.2 MERGED restore: support custom vds_unique_id via /etc/vdsm/vdsm.id 2018-10-24 15:28:26 UTC
oVirt gerrit 95114 0 ovirt-hosted-engine-setup-2.2 MERGED restore: support custom vds_unique_id via /etc/vdsm/vdsm.id 2018-10-24 15:38:00 UTC

Description Polina 2018-10-24 12:24:13 UTC
Created attachment 1497012 [details]
ovirt-hosted-engine-setup

Description of problem:
In some specific cases, it could be the environment with the same host uuid in hw_uuid table. in vds_unique_id the ids are unique because for the problematic host we use workaround before the installation: uuidgen > /etc/vdsm/vdsm.id. The example of such environment is compute-ge-he-4.scl.lab.tlv.redhat.com using three hosts - cougar03.scl.lab.tlv.redhat.com, cougar04.scl.lab.tlv.redhat.com, cougar05.scl.lab.tlv.redhat.com.
While the first install the hosts are installed in engine after running the 'uuidgen > /etc/vdsm/vdsm.id' workaround. But then it fails while the backup/restore procedure.

Version-Release number of selected component (if applicable):found in 4.2.7. though relevant to all versions.


How reproducible:100%


Steps to Reproduce:
1. on all 3 hosts:
yum install http://download.eng.bos.redhat.com/brewroot/packages/ovirt-hosted-engine-setup/2.2.29/1.el7ev/noarch/ovirt-hosted-engine-setup-2.2.29-1.el7ev.noarch.rpm

2. 
[root@compute-ge-he-4 ~]# engine-backup --mode=backup --file=backup_compute-he-4 --log=log_compute-he-4_backup4.2
Backing up:
Notifying engine
- Files
- Engine database 'engine'
- DWH database 'ovirt_engine_history'
Packing into file 'backup_compute-he-4'
Notifying engine
Done.

3. Copy aside
4. Insert environment into global maintenance. 

5. Cleanup HE Storage NFS Domain.
rm -Rf /Compute_NFS/pagranat/compute-ge-he-4 on yellow-vdsb.qa.lab.tlv.redhat.com

6. Reprovisioning HE host . Copy repos to /etc/yum.repos.d/, yum update and run <yum install http://download.eng.bos.redhat.com/brewroot/packages/ovirt-hosted-engine-setup/2.2.29/1.el7ev/noarch/ovirt-hosted-engine-setup-2.2.29-1.el7ev.noarch.rpm>

7. hosted-engine --deploy --restore-from-file=backup_compute-he-4

Actual results: output :

[ ERROR ] fatal: [localhost]: FAILED! => {"ansible_facts": {"ovirt_hosts": []}, "attempts": 120, "changed": false, "deprecations": [{"msg": "The 'ovirt_hosts_facts' module is being renamed 'ovirt_host_facts'", "version": 2.8}]}
[ INFO  ] TASK [Fetch logs from the engine VM]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Set destination directory path]
[ INFO  ] ok: [localhost]ovirt-hosted-engine-setup-2.2.29-1.el7ev.noarch.rpm

[ INFO  ] TASK [Create destination directory]
[ INFO  ] changed: [localhost]
[ INFO  ] TASK [include_tasks]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Find the local appliance image]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Set local_vm_disk_path]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Give the vm time to flush dirty buffers]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Copy engine logs]
[ INFO  ] TASK [include_tasks]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Remove local vm dir]
[ INFO  ] changed: [localhost]
[ INFO  ] TASK [Remove temporary entry in /etc/hosts for the local VM]
[ INFO  ] changed: [localhost]
[ INFO  ] TASK [Notify the user about a failure]
[ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "The system may not be provisioned according to the playbook results: please check the logs for the issue, fix accordingly or re-deploy from scratch.\n"}
[ ERROR ] Failed to execute stage 'Closing up': Failed executing ansible-playbook
[ INFO  ] Stage


Expected results: restore succeeds


Additional info: 
engine=# SELECT vds_name, hw_uuid, vds_id, vds_unique_id, host_name FROM vds;
the reason is that we have vds_unique_id that different, while the hw_uuid  is exactly the same.

Comment 1 Polina 2018-10-28 13:25:07 UTC
verified on http://download.eng.bos.redhat.com/brewroot/packages/ovirt-hosted-engine-setup/2.2.30/1.el7ev/noarch/ovirt-hosted-engine-setup-2.2.30-1.el7ev.noarch.rpm
The environment where we can see two hosts with the same uuid - http://compute-ge-he-4.scl.lab.tlv.redhat.com (hosts cougar03.scl.lab.tlv.redhat.com and cougar04.scl.lab.tlv.redhat.com)

Comment 2 Sandro Bonazzola 2018-11-02 14:28:37 UTC
This bugzilla is included in oVirt 4.2.7 release, published on November 2nd 2018.

Since the problem described in this bug report should be
resolved in oVirt 4.2.7 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.