Bug 1276650 - ovirt-ha-agent will hang during 3.5 -> 3.6 upgrade on NFS ('list index out of range' from getImagesList)
Summary: ovirt-ha-agent will hang during 3.5 -> 3.6 upgrade on NFS ('list index out of...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-hosted-engine-ha
Classification: oVirt
Component: Agent
Version: 1.3.0
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ovirt-3.6.0-rc3
: 1.3.2
Assignee: Simone Tiraboschi
QA Contact: Artyom
URL:
Whiteboard: integration
: 1277013 (view as bug list)
Depends On: 1278130
Blocks: 1234906 RHEV3.6Upgrade
TreeView+ depends on / blocked
 
Reported: 2015-10-30 11:20 UTC by Simone Tiraboschi
Modified: 2015-12-22 13:30 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
VDSM getImagesList raises an 'list index out of range' exception if called on a storage domain witch is not connected to any SP. Avoid directly using it.
Clone Of:
Environment:
Last Closed: 2015-12-22 13:30:41 UTC
oVirt Team: Integration
Embargoed:
rule-engine: ovirt-3.6.0+
rule-engine: blocker+
ylavi: planning_ack+
sbonazzo: devel_ack+
mavital: testing_ack+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1274622 0 high CLOSED getImagesList fails if called on a file based storageDomain which is not connected to any storage pool 2021-02-22 00:41:40 UTC
oVirt gerrit 47889 0 master MERGED upgrade: reimplementing getImagesList cause vdsm one is broken Never
oVirt gerrit 47952 0 ovirt-hosted-engine-ha-1.3 MERGED upgrade: reimplementing getImagesList cause vdsm one is broken Never

Internal Links: 1274622

Description Simone Tiraboschi 2015-10-30 11:20:13 UTC
Description of problem:
On NFS storage only, ovirt-ha-agent will hang during 3.5 -> 3.6 upgrade cause it's using vdscli.getImagesList which is broken and returns {'status': {'message': 'list index out of range', 'code': 100}}

MainThread::INFO::2015-10-30 11:14:38,919::upgrade::125::ovirt_hosted_engine_ha.lib.upgrade.StorageServer::(_is_conf_volume_there) Looking for conf volume
MainThread::DEBUG::2015-10-30 11:14:38,926::upgrade::131::ovirt_hosted_engine_ha.lib.upgrade.StorageServer::(_is_conf_volume_there) {'status': {'message': 'list index out of range', 'code': 100}}
MainThread::ERROR::2015-10-30 11:14:38,927::agent::205::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Error: 'list index out of range' - trying to restart agent

[root@c71het20151028 ~]# vdsClient -s 0 getImagesList acfcfc14-c2ff-404d-9dbd-89b1743ce10f
list index out of range
[root@c71het20151028 ~]# echo $?
1

See also:
https://bugzilla.redhat.com/1274622

Version-Release number of selected component (if applicable):
ovirt-hosted-engine-ha.noarch                                                                                1.3.1-1.el7.centos

How reproducible:
100% (NFS only)

Steps to Reproduce:
1. deploy hosted-engine from 3.5 on NFS
2. upgrade to 3.6
3.

Actual results:
It hangs with:
MainThread::ERROR::2015-10-30 11:14:38,927::agent::205::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Error: 'list index out of range' - trying to restart agent

Expected results:
It successfully upgrade

Additional info:
NFS only, on iSCSI it works.

Comment 1 Sandro Bonazzola 2015-10-30 13:27:58 UTC
Yaniv, maybe we should respin just ovirt-hosted-engine-ha including this fix in 3.6.0 GA. What do you think?

Comment 2 Red Hat Bugzilla Rules Engine 2015-10-30 13:28:04 UTC
This bug is marked for z-stream, yet the milestone is for a major version, therefore the milestone has been reset.
Please set the correct milestone or drop the z stream flag.

Comment 3 Yaniv Lavi 2015-11-01 11:24:58 UTC
Should work in upgrades from 3.5 to 3.6. If you lose HE, you lose the env.

Comment 4 Red Hat Bugzilla Rules Engine 2015-11-01 11:25:02 UTC
This bug is marked for z-stream, yet the milestone is for a major version, therefore the milestone has been reset.
Please set the correct milestone or drop the z stream flag.

Comment 5 Simone Tiraboschi 2015-11-02 08:31:35 UTC
*** Bug 1277013 has been marked as a duplicate of this bug. ***

Comment 6 Sandro Bonazzola 2015-11-02 10:41:17 UTC
Dropping dep on bug #1274622 since we can workaround it with External Bug ID: oVirt gerrit 47889. Moving it to See also.

Comment 7 Artyom 2015-12-01 16:23:44 UTC
Verified on ovirt-hosted-engine-ha-1.3.3-1.el7ev.noarch
1) Deploy hosted-engine 3.5 on two hosts and on NFS storage
2) Put first host to maintenance via webadmin
3) Upgrade packages and restart host(restart host W/A because bug https://bugzilla.redhat.com/show_bug.cgi?id=1282187)
4) Wait for correct status via hosted-engine --vm-status(can take around 5-7 minutes)
5) Activate host via webadmin
6) Put second host to maintenance(wait until all vms migrated and he vm migrate on first host)
7) Upgrade packages and restart second host
8) Wait for correct status via hosted-engine --vm-status(can take around 5-7 minutes)
9) Activate second host via webadmin
10) Put environment to global maintenance
11) Update rhevm-setup.noarch package on engine
12) Run engine-setup on vm and finish upgrade process
13) Disable global maintenance via webadmin

Comment 8 Sandro Bonazzola 2015-12-22 13:30:41 UTC
oVirt 3.6.0 has been released and the bz verified, moving to closed current release.


Note You need to log in before you can comment on or make changes to this bug.