Bug 2087735 - Failure while 4.3 -> 4.5 restore
Summary: Failure while 4.3 -> 4.5 restore
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: ovirt-hosted-engine-setup
Classification: oVirt
Component: General
Version: 2.6.2
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: ---
Assignee: Asaf Rachmani
QA Contact: meital avital
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-05-18 11:34 UTC by Polina
Modified: 2022-05-19 16:51 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-05-19 13:19:42 UTC
oVirt Team: ---
Embargoed:
nsednev: needinfo-


Attachments (Terms of Use)
ovirt-hosted-engine-setup-20220517234200-cdzlvl.log.gz (48.47 KB, application/gzip)
2022-05-18 11:34 UTC, Polina
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHV-46091 0 None None None 2022-05-18 11:37:29 UTC

Description Polina 2022-05-18 11:34:51 UTC
Created attachment 1880802 [details]
ovirt-hosted-engine-setup-20220517234200-cdzlvl.log.gz

Description of problem: Restore from 4.3 to 4.5 fails


Version-Release number of selected component (if applicable):
ovirt-hosted-engine-ha-2.5.0-1.el8ev.noarch
ovirt-hosted-engine-setup-2.6.3-1.el8ev.noarch

How reproducible:Reproduced while the described below manual testing. Now repeating the test to see if this is consistent

Steps to Reproduce:

0. Install env 4.3 with 3 hosts (done in Jenkins)
The HE SD is nfs , though there are other SDs in the setup: 3 iscsi SDs,  3 gluster SDs.

1. migrate HE VM to host_mixed_1 (the first host in the setup)

2. set global maintenance .

3. engine-backup --mode=backup --file=backup_ge-8 --log=log_ge-8_backup and put it aside

4. reprovision host_mixed_1 to rhel 8.6 https://rhv-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/infra_reprovision_job/3257/

5. fix the initiator /etc/iscsi/initiatorname.iscsi after reprovisioning 

6. On the host put the latest 4.5 repos and 'yum update -y', then 'yum install ovirt-hosted-engine-setup' .

7. Copy to host the backup file and run 'hosted-engine --deploy --restore-from-file=backup_ge-8 

Actual results:
 while deploment I set DC, cluster the same names as in the backed up env.
 Result: it failed with the error 
 "2022-05-18 00:09:54,923+0300 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:115 TASK [ovirt.ovirt.hosted_engine_setup : Generate the error message from the engine events]
2022-05-18 00:09:55,627+0300 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:109 {'msg': "The task includes an option with an undefined variable. The error was: 'dict object' has no attribute 'id'\n\nThe error appears to be in '/usr/share/ansible/collections/ansible_collections/ovirt/ovirt/roles/hosted_engine_setup/tasks/bootstrap_local_vm/05_add_host.yml': line 233, column 9, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n      - name: Generate the error message from the engine events\n        ^ here\n", '_ansible_no_log': False}
2022-05-18 00:09:55,728+0300 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:109 ignored: [localhost]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: 'dict object' has no attribute 'id'\n\nThe error appears to be in '/usr/share/ansible/collections/ansible_collections/ovirt/ovirt/roles/hosted_engine_setup/tasks/bootstrap_local_vm/05_add_host.yml': line 233, column 9, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n      - name: Generate the error message from the engine events\n        ^ here\n"}
2022-05-18 00:09:56,430+0300 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:115 TASK [ovirt.ovirt.hosted_engine_setup : Fail with error description]
2022-05-18 00:09:57,133+0300 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:115 skipping: [localhost]
2022-05-18 00:09:58,037+0300 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:115 TASK [ovirt.ovirt.hosted_engine_setup : Fail with generic error]
2022-05-18 00:09:58,841+0300 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:109 {'msg': 'The host has been set in non_operational status, please check engine logs, more info can be found in the engine logs, fix accordingly and re-deploy.', '_ansible_no_log': False, 'changed': False}
2022-05-18 00:09:58,942+0300 ERROR otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:113 fatal: [localhost]: FAILED! => {"changed": false, "msg": "The host has been set in non_operational status, please check engine logs, more info can be found in the engine logs, fix accordingly and re-deploy."}
"         
          
I'm repeating the test again and will provide more logs


Expected results:
restored with no errors

Additional info:

Comment 1 Polina 2022-05-18 11:40:00 UTC
Hi Didi, 
I'm repeating the test again , this time 4.3 setup must be deployed with no iscsi, gluster at all.
Could you please tell which logs should be added besides the ovirt-hosted-engine-setup-20220517234200-cdzlvl.log.gz, vdsm.log,

Comment 2 Michal Skrivanek 2022-05-18 12:40:55 UTC
whole /var/log directory on the host. That should include HE's engine.log inside /var/log/ovirt-hosted-engine-setup/engine-logs-*/

Comment 15 Michal Skrivanek 2022-05-19 12:18:35 UTC
HE is being deployed with IPv6 but the hostname resolution is dual stack, java prefers IPv4 so it tries to connect over IPv4 but HE VM doesn't have a route for that, it only has IPv6

either 
- explicitly force IPv4 (as I suppose that's what you want to use anyway)
- fix DNS so it returns IPv6-only records
- change java DNS resolution preference to java.net.preferIPv6Addresses=true in /etc/ovirt-engine/engine.conf.d/

Comment 16 Michal Skrivanek 2022-05-19 13:19:42 UTC
closing for now. Please reopen if this re-appears after applying one of the three alternatives in comment #15

Comment 17 Nikolai Sednev 2022-05-19 16:50:51 UTC
Successfully upgraded from ovirt-engine-setup-4.3.11.4-0.1.el7.noarch to ovirt-engine-4.5.0.6-0.7.el8ev.noarch
, NFS to NFS using hosted-engine --deploy --4 --restore-from-file=/root/nsednev_from_serval14_SPM_rhevm_4_3.

[ INFO  ] Hosted Engine successfully deployed
[ INFO  ] Other hosted-engine hosts have to be reinstalled in order to update their storage configuration. From the engine, host by host, please set maintenance mode and then click on reinstall button ensuring you choose DEPLOY in hosted engine tab.
[ INFO  ] Please note that the engine VM ssh keys have changed. Please remove the engine VM entry in ssh known_hosts on your clients.


4.3 components on hosts:
ansible-2.9.13-1.el7ae.noarch
ovirt-ansible-repositories-1.1.6-1.el7ev.noarch
ovirt-ansible-hosted-engine-setup-1.0.38-1.el7ev.noarch
ovirt-ansible-engine-setup-1.1.9-1.el7ev.noarch
ovirt-hosted-engine-ha-2.3.6-1.el7ev.noarch
ovirt-hosted-engine-setup-2.3.13-2.el7ev.noarch

Engne 4.3:
ovirt-ansible-engine-setup-1.1.9-1.el7ev.noarch  
ovirt-ansible-hosted-engine-setup-1.0.38-1.el7ev.noarch
ovirt-engine-setup-4.3.11.4-0.1.el7.noarch

4.5 components on hosts:
ovirt-hosted-engine-setup-2.6.3-1.el8ev.noarch
ovirt-hosted-engine-ha-2.5.0-1.el8ev.noarch
ansible-collection-ansible-utils-2.3.0-2.2.el8ev.noarch
ansible-collection-ansible-posix-1.3.0-1.2.el8ev.noarch
ansible-core-2.12.2-3.1.el8.x86_64
ovirt-ansible-collection-2.0.3-1.el8ev.noarch
ansible-collection-ansible-netcommon-2.2.0-3.2.el8ev.noarch

Engne 4.5:
ovirt-engine-4.5.0.6-0.7.el8ev.noarch
ansible-collection-ansible-netcommon-2.2.0-3.2.el8ev.noarch
ansible-runner-2.1.3-1.el8ev.noarch
ansible-collection-ansible-utils-2.3.0-2.2.el8ev.noarch
python38-ansible-runner-2.1.3-1.el8ev.noarch
ansible-core-2.12.2-3.1.el8.x86_64
ovirt-ansible-collection-2.0.3-1.el8ev.noarch
ansible-collection-ansible-posix-1.3.0-1.2.el8ev.noarch



I've created this bug https://bugzilla.redhat.com/show_bug.cgi?id=2088466 for the warning to be added to deployment to avoid such issues in future.


Note You need to log in before you can comment on or make changes to this bug.