Bug 1316143 - 3.6 hosted-engine hosts can't be added properly to 3.6 host cluster that was started with 3.4.
Summary: 3.6 hosted-engine hosts can't be added properly to 3.6 host cluster that was ...
Keywords:
Status: CLOSED DUPLICATE of bug 1306825
Alias: None
Product: ovirt-hosted-engine-ha
Classification: oVirt
Component: Agent
Version: 1.3.4.3
Hardware: x86_64
OS: Linux
urgent
urgent
Target Milestone: ovirt-3.6.4
: 1.3.5
Assignee: Simone Tiraboschi
QA Contact: Nikolai Sednev
URL:
Whiteboard: integration
Depends On:
Blocks: 1317895
TreeView+ depends on / blocked
 
Reported: 2016-03-09 14:06 UTC by Nikolai Sednev
Modified: 2016-03-22 15:43 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-03-21 12:23:58 UTC
oVirt Team: Integration
Embargoed:
rule-engine: ovirt-3.6.z+
ylavi: planning_ack+
sbonazzo: devel_ack+
gklein: testing_ack+


Attachments (Terms of Use)
engine sosreport (16.96 MB, application/x-xz)
2016-03-09 14:10 UTC, Nikolai Sednev
no flags Details
additional host's sosreport (7.61 MB, application/x-xz)
2016-03-20 17:23 UTC, Nikolai Sednev
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1306825 0 urgent CLOSED hosted-engine upgrade fails after upgrade hosts from el6 to el7 2021-02-22 00:41:40 UTC
Red Hat Bugzilla 1319595 0 high CLOSED ha-agent not starting when upgrading the hosted engine from 6.x to 7.x 2021-02-22 00:41:40 UTC
oVirt gerrit 54557 0 master MERGED storage: avoid assuming lockspace volume always exists 2016-03-15 08:55:59 UTC
oVirt gerrit 54736 0 ovirt-hosted-engine-ha-1.3 MERGED storage: avoid assuming lockspace volume always exists 2016-03-15 08:56:07 UTC

Internal Links: 1306825 1319595

Description Nikolai Sednev 2016-03-09 14:06:47 UTC
Description of problem:
3.6 Hosts can't be added properly to 3.6 host cluster that was started with 3.4. 

Version-Release number of selected component (if applicable):
Host:
qemu-kvm-rhev-2.3.0-31.el7_2.8.x86_64
sanlock-3.2.4-2.el7_2.x86_64
vdsm-4.17.23-0.el7ev.noarch
libvirt-client-1.2.17-13.el7_2.4.x86_64
mom-0.5.2-1.el7ev.noarch
Linux version 3.10.0-327.13.1.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC) ) #1 SMP Mon Feb 29 13:22:02 EST 2016

Engine:
ovirt-vmconsole-proxy-1.0.0-1.el6ev.noarch
ovirt-engine-extension-aaa-ldap-1.1.2-1.el6ev.noarch
ovirt-host-deploy-1.4.1-1.el6ev.noarch
ovirt-host-deploy-java-1.4.1-1.el6ev.noarch
rhevm-3.6.3.4-0.1.el6.noarch
ovirt-setup-lib-1.0.1-1.el6ev.noarch
ovirt-vmconsole-1.0.0-1.el6ev.noarch
ovirt-engine-extension-aaa-jdbc-1.0.6-1.el6ev.noarch


How reproducible:
100%

Steps to Reproduce:
1.Add the 3.6 host to 3.6 engine that is running on host that was got from 3.4 host cluster.
2.
3.

Actual results:
Answer file from 3.4 can't be consumed by 3.6 properly and host not being added correctly, see the https://gerrit.ovirt.org/#/c/54307/1

Expected results:
Host should be added normally regardless answer file was inherited from 3.4 or 3.5 or 3.6.

Additional info:

Comment 1 Nikolai Sednev 2016-03-09 14:10:02 UTC
Created attachment 1134536 [details]
engine sosreport

Comment 2 Nikolai Sednev 2016-03-09 15:01:53 UTC
Sosreport from the host available from here.
https://drive.google.com/a/redhat.com/file/d/0B85BEaDBcF88NGhvNXBlMVgxakU/view?usp=sharing

Comment 4 Nikolai Sednev 2016-03-17 06:38:38 UTC
Hi Sandro,
QA did not received 3.6.z yet, may you return this bug to another state than ON_QA please?

Comment 5 Nikolai Sednev 2016-03-20 17:09:15 UTC
Host was successfully redeployed and added as additional HE-host, but it can't normally function with the HE-storage:
[ INFO  ] Stage: Setup validation
          The Host ID is already known. Is this a re-deployment on an additional host that was previously set up (Yes, No)[Yes]? 
          It seems like your existing HE infrastructure was deployed with version 3.5 (or before) and never upgraded to current release.
          Mixing hosts with HE from 3.5 (or before) and current release is not supported.
          Please upgrade the existing HE hosts to current release before adding this host.
          Please check the log file for more details.
          Replying "No" will abort Setup.
          Continue? (Yes, No) [No]: yes

Host:
libvirt-client-1.2.17-13.el7_2.4.x86_64
ovirt-hosted-engine-setup-1.3.4.0-1.el7ev.noarch
sanlock-3.2.4-2.el7_2.x86_64
ovirt-vmconsole-host-1.0.0-1.el7ev.noarch
ovirt-host-deploy-1.4.1-1.el7ev.noarch
qemu-kvm-rhev-2.3.0-31.el7_2.10.x86_64
mom-0.5.2-1.el7ev.noarch
ovirt-setup-lib-1.0.1-1.el7ev.noarch
vdsm-4.17.23.1-0.el7ev.noarch
ovirt-hosted-engine-ha-1.3.5-1.el7ev.noarch
ovirt-vmconsole-1.0.0-1.el7ev.noarch
Red Hat Enterprise Linux Server release 7.2 (Maipo)
Linux seal08.qa.lab.tlv.redhat.com 3.10.0-327.13.1.el7.x86_64 #1 SMP Mon Feb 29 13:22:02 EST 2016 x86_64 x86_64 x86_64 GNU/Linux

systemctl status ovirt-ha-agent -l
● ovirt-ha-agent.service - oVirt Hosted Engine High Availability Monitoring Agent
   Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-agent.service; enabled; vendor preset: disabled)
   Active: active (running) since Sun 2016-03-20 19:00:08 IST; 29s ago
 Main PID: 20805 (ovirt-ha-agent)
   CGroup: /system.slice/ovirt-ha-agent.service
           └─20805 /usr/bin/python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent --no-daemon

Mar 20 19:00:31 seal08.qa.lab.tlv.redhat.com ovirt-ha-agent[20805]: INFO:ovirt_hosted_engine_ha.lib.storage_server.StorageServer:Refreshing the storage domain
Mar 20 19:00:31 seal08.qa.lab.tlv.redhat.com ovirt-ha-agent[20805]: INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Preparing images
Mar 20 19:00:31 seal08.qa.lab.tlv.redhat.com ovirt-ha-agent[20805]: INFO:ovirt_hosted_engine_ha.lib.image.Image:Preparing images
Mar 20 19:00:32 seal08.qa.lab.tlv.redhat.com ovirt-ha-agent[20805]: INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Reloading vm.conf from the shared storage domain
Mar 20 19:00:32 seal08.qa.lab.tlv.redhat.com ovirt-ha-agent[20805]: INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config:Trying to get a fresher copy of vm configuration from the OVF_STORE
Mar 20 19:00:33 seal08.qa.lab.tlv.redhat.com ovirt-ha-agent[20805]: WARNING:ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore:Unable to find OVF_STORE
Mar 20 19:00:33 seal08.qa.lab.tlv.redhat.com ovirt-ha-agent[20805]: ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config ERROR Unable to get vm.conf from OVF_STORE, falling back to initial vm.conf
Mar 20 19:00:33 seal08.qa.lab.tlv.redhat.com ovirt-ha-agent[20805]: ERROR:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config:Unable to get vm.conf from OVF_STORE, falling back to initial vm.conf
Mar 20 19:00:33 seal08.qa.lab.tlv.redhat.com ovirt-ha-agent[20805]: ovirt-ha-agent ovirt_hosted_engine_ha.agent.agent.Agent ERROR Error: 'Path to volume None not found in /rhev/data-center/mnt' - trying to restart agent
Mar 20 19:00:33 seal08.qa.lab.tlv.redhat.com ovirt-ha-agent[20805]: ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Error: 'Path to volume None not found in /rhev/data-center/mnt' - trying to restart agent


--== Host 2 status ==--

Status up-to-date                  : False
Hostname                           : seal09.qa.lab.tlv.redhat.com
Host ID                            : 2
Engine status                      : unknown stale-data
Score                              : 3400
stopped                            : False
Local maintenance                  : False
crc32                              : 10787441
Host timestamp                     : 1745


--== Host 3 status ==--

Status up-to-date                  : False
Hostname                           : seal08.qa.lab.tlv.redhat.com
Host ID                            : 3
Engine status                      : unknown stale-data
Score                              : 0
stopped                            : True
Local maintenance                  : False
crc32                              : 038d7a8c
Host timestamp                     : 60602


Engine:
rhevm-3.6.4-0.1.el6.noarch

Comment 6 Nikolai Sednev 2016-03-20 17:23:30 UTC
Created attachment 1138331 [details]
additional host's sosreport

Comment 7 Nikolai Sednev 2016-03-21 11:51:49 UTC
Works for me in case that manually changing to "if status['status']['code'] != 0:" in "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/image.py" on line 102.

Now I see both hosts OK:
--== Host 2 status ==--

Status up-to-date                  : True
Hostname                           : seal09.qa.lab.tlv.redhat.com
Host ID                            : 2
Engine status                      : {"health": "good", "vm": "up", "detail": "up"}
Score                              : 3400
stopped                            : False
Local maintenance                  : False
crc32                              : 1f404603
Host timestamp                     : 68734


--== Host 3 status ==--

Status up-to-date                  : True
Hostname                           : seal08.qa.lab.tlv.redhat.com
Host ID                            : 3
Engine status                      : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"}
Score                              : 2400
stopped                            : False
Local maintenance                  : False
crc32                              : 97293c0b
Host timestamp                     : 68877

Forth to tiraboschi, it should be fixed here:  https://gerrit.ovirt.org/#/c/54982
Also the same problem was documented here:
https://bugzilla.redhat.com/show_bug.cgi?id=1306825#c10

Please consider closing this bug as duplicate of 1306825.

Comment 8 Simone Tiraboschi 2016-03-21 12:23:58 UTC

*** This bug has been marked as a duplicate of bug 1306825 ***


Note You need to log in before you can comment on or make changes to this bug.