Bug 1276650 - ovirt-ha-agent will hang during 3.5 -> 3.6 upgrade on NFS ('list index out of range' from getImagesList)
ovirt-ha-agent will hang during 3.5 -> 3.6 upgrade on NFS ('list index out of...
Status: CLOSED CURRENTRELEASE
Product: ovirt-hosted-engine-ha
Classification: oVirt
Component: Agent (Show other bugs)
1.3.0
Unspecified Unspecified
urgent Severity urgent (vote)
: ovirt-3.6.0-rc3
: 1.3.2
Assigned To: Simone Tiraboschi
Artyom
integration
: Triaged
: 1277013 (view as bug list)
Depends On: 1278130
Blocks: 1234906 RHEV3.6Upgrade
  Show dependency treegraph
 
Reported: 2015-10-30 07:20 EDT by Simone Tiraboschi
Modified: 2015-12-22 08:30 EST (History)
8 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
VDSM getImagesList raises an 'list index out of range' exception if called on a storage domain witch is not connected to any SP. Avoid directly using it.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2015-12-22 08:30:41 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Integration
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
rule-engine: ovirt‑3.6.0+
rule-engine: blocker+
ylavi: planning_ack+
sbonazzo: devel_ack+
mavital: testing_ack+


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
oVirt gerrit 47889 master MERGED upgrade: reimplementing getImagesList cause vdsm one is broken Never
oVirt gerrit 47952 ovirt-hosted-engine-ha-1.3 MERGED upgrade: reimplementing getImagesList cause vdsm one is broken Never

  None (edit)
Description Simone Tiraboschi 2015-10-30 07:20:13 EDT
Description of problem:
On NFS storage only, ovirt-ha-agent will hang during 3.5 -> 3.6 upgrade cause it's using vdscli.getImagesList which is broken and returns {'status': {'message': 'list index out of range', 'code': 100}}

MainThread::INFO::2015-10-30 11:14:38,919::upgrade::125::ovirt_hosted_engine_ha.lib.upgrade.StorageServer::(_is_conf_volume_there) Looking for conf volume
MainThread::DEBUG::2015-10-30 11:14:38,926::upgrade::131::ovirt_hosted_engine_ha.lib.upgrade.StorageServer::(_is_conf_volume_there) {'status': {'message': 'list index out of range', 'code': 100}}
MainThread::ERROR::2015-10-30 11:14:38,927::agent::205::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Error: 'list index out of range' - trying to restart agent

[root@c71het20151028 ~]# vdsClient -s 0 getImagesList acfcfc14-c2ff-404d-9dbd-89b1743ce10f
list index out of range
[root@c71het20151028 ~]# echo $?
1

See also:
https://bugzilla.redhat.com/1274622

Version-Release number of selected component (if applicable):
ovirt-hosted-engine-ha.noarch                                                                                1.3.1-1.el7.centos

How reproducible:
100% (NFS only)

Steps to Reproduce:
1. deploy hosted-engine from 3.5 on NFS
2. upgrade to 3.6
3.

Actual results:
It hangs with:
MainThread::ERROR::2015-10-30 11:14:38,927::agent::205::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Error: 'list index out of range' - trying to restart agent

Expected results:
It successfully upgrade

Additional info:
NFS only, on iSCSI it works.
Comment 1 Sandro Bonazzola 2015-10-30 09:27:58 EDT
Yaniv, maybe we should respin just ovirt-hosted-engine-ha including this fix in 3.6.0 GA. What do you think?
Comment 2 Red Hat Bugzilla Rules Engine 2015-10-30 09:28:04 EDT
This bug is marked for z-stream, yet the milestone is for a major version, therefore the milestone has been reset.
Please set the correct milestone or drop the z stream flag.
Comment 3 Yaniv Lavi 2015-11-01 06:24:58 EST
Should work in upgrades from 3.5 to 3.6. If you lose HE, you lose the env.
Comment 4 Red Hat Bugzilla Rules Engine 2015-11-01 06:25:02 EST
This bug is marked for z-stream, yet the milestone is for a major version, therefore the milestone has been reset.
Please set the correct milestone or drop the z stream flag.
Comment 5 Simone Tiraboschi 2015-11-02 03:31:35 EST
*** Bug 1277013 has been marked as a duplicate of this bug. ***
Comment 6 Sandro Bonazzola 2015-11-02 05:41:17 EST
Dropping dep on bug #1274622 since we can workaround it with External Bug ID: oVirt gerrit 47889. Moving it to See also.
Comment 7 Artyom 2015-12-01 11:23:44 EST
Verified on ovirt-hosted-engine-ha-1.3.3-1.el7ev.noarch
1) Deploy hosted-engine 3.5 on two hosts and on NFS storage
2) Put first host to maintenance via webadmin
3) Upgrade packages and restart host(restart host W/A because bug https://bugzilla.redhat.com/show_bug.cgi?id=1282187)
4) Wait for correct status via hosted-engine --vm-status(can take around 5-7 minutes)
5) Activate host via webadmin
6) Put second host to maintenance(wait until all vms migrated and he vm migrate on first host)
7) Upgrade packages and restart second host
8) Wait for correct status via hosted-engine --vm-status(can take around 5-7 minutes)
9) Activate second host via webadmin
10) Put environment to global maintenance
11) Update rhevm-setup.noarch package on engine
12) Run engine-setup on vm and finish upgrade process
13) Disable global maintenance via webadmin
Comment 8 Sandro Bonazzola 2015-12-22 08:30:41 EST
oVirt 3.6.0 has been released and the bz verified, moving to closed current release.

Note You need to log in before you can comment on or make changes to this bug.