Bug 1297457 - 3.5 -> 3.6 upgrade fails on additional (not first) iSCSI hosts if the host rebooted just after yum update
Summary: 3.5 -> 3.6 upgrade fails on additional (not first) iSCSI hosts if the host re...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-hosted-engine-ha
Classification: oVirt
Component: General
Version: 1.3.3.6
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ovirt-3.6.3
: 1.3.4.3
Assignee: Simone Tiraboschi
QA Contact: Artyom
URL:
Whiteboard:
Depends On:
Blocks: ovirt-hosted-engine-ha-1.3.4.3
TreeView+ depends on / blocked
 
Reported: 2016-01-11 15:02 UTC by Simone Tiraboschi
Modified: 2016-03-11 07:22 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
On 3.5 -> 3.6 upgrade we were relying on images prepared by 3.5 code. If the user upgrade the rpm and suddenly reboot the upgrade procedure was failing cause the image were not prepared. Explicitly preparing images also from the upgrade procedure to ensure that are really ready.
Clone Of:
Environment:
Last Closed: 2016-03-11 07:22:27 UTC
oVirt Team: Integration
Embargoed:
rule-engine: ovirt-3.6.z+
rule-engine: blocker+
ylavi: planning_ack+
sbonazzo: devel_ack+
mavital: testing_ack+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 51658 0 master MERGED upgrade: fixing for iSCSI additional hosts 2016-01-12 13:21:52 UTC
oVirt gerrit 51722 0 ovirt-hosted-engine-ha-1.3 MERGED upgrade: fixing for iSCSI additional hosts 2016-01-12 13:22:31 UTC
oVirt gerrit 51724 0 master MERGED upgrade: adding a docstring for _is_conf_volume_there 2016-01-21 12:53:53 UTC
oVirt gerrit 52502 0 ovirt-hosted-engine-ha-1.3 MERGED upgrade: adding a docstring for _is_conf_volume_there 2016-01-20 14:50:57 UTC
oVirt gerrit 53927 0 master MERGED storage: always pass a blank SP UUID in image.py 2016-02-23 16:01:56 UTC
oVirt gerrit 53928 0 ovirt-hosted-engine-ha-1.3 MERGED storage: always pass a blank SP UUID in image.py 2016-02-23 16:31:41 UTC
oVirt gerrit 53935 0 master MERGED storage: fixing a double call of get_images_list 2016-02-24 07:26:37 UTC
oVirt gerrit 53937 0 ovirt-hosted-engine-ha-1.3 MERGED storage: fixing a double call of get_images_list 2016-02-24 07:27:14 UTC

Description Simone Tiraboschi 2016-01-11 15:02:27 UTC
Description of problem:
In the upgrade procedure each host is checking if the shared configuration volume exists or not and to do that we scan all the volumes on the hosted-engine storage domain.

The issue is that after a reboot, ovirt-ha-agent will call prepareImage for metadata, lockspace and configuration image if present but not for the engine image.
So getVolumesList will report an issue on iSCSI storage domains:

[root@master-vds10 ~]# vdsClient -s 0 getVolumesList 09bb1168-4c09-4523-a5dd-35e5329b0736 00000000-0000-0000-0000-000000000000
ERROR: b7b859df-498f-434f-af48-18702dce341c : {'status': {'message': "Image path does not exist or cannot be accessed/created: ('/rhev/data-center/mnt/blockSD/09bb1168-4c09-4523-a5dd-35e5329b0736/images/2399b2ac-ec64-401e-b162-96106a46bab4',)", 'code': 254}}
1de06201-53c3-4b3b-a2a5-478d95dc5494 : hosted-engine.metadata. 
5fe71750-0111-4989-976b-8575ca63750a : hosted-engine.lockspace. 
51e6b26c-9707-4ac7-b8e2-c38fc36e4e39 : HostedEngineConfigurationImage. 

The upgrade procedure is quite picky on that and it stops the scan and so it doesn't found the configuration volume but it fails later on when it tries to create another one.

It doesn't happens if the user doesn't reboot before restarting ovirt-ha-agent and it doesn't happens on NFS cause getVolumesList doesn't fails here after the reboot.

Version-Release number of selected component (if applicable):
ovirt-hosted-engine-ha-1.3.3.6

How reproducible:
100%

Steps to Reproduce:
1. deploy hosted-engine 3.5 on at lest two hosts using iSCSI
2. upgrade the first host to 3.6
3. stop and disable ovirt-ha-agent on the second host
4. upgrade all the rpms of the second host to 3.6 and avoid restarting ovirt-ha-agent
5. reboot the second host
6. enable and restart ovirt-ha-agent to trigger the upgrade

Actual results:
it says 
MainThread::DEBUG::2016-01-11 11:31:51,456::image::86::ovirt_hosted_engine_ha.lib.image.Image::(prepare_images) Configuration image doesn't exist
also if the Configuration image is already on the storage domain

Expected results:
MainThread::INFO::2016-01-11 15:51:07,559::upgrade::960::ovirt_hosted_engine_ha.lib.upgrade.StorageServer::(upgrade) Successfully upgraded

Additional info:

Comment 1 Simone Tiraboschi 2016-01-11 15:04:52 UTC
Workaround: manually call vdsClient -s prepareImage for the engine image and restart ovirt-ha-agent

Comment 2 Sandro Bonazzola 2016-01-20 14:00:05 UTC
Simone this bug is on modified but still have a patch on NEW, can you check?

Comment 3 Simone Tiraboschi 2016-01-20 14:22:04 UTC
It's just a docstring but the code is really the same

Comment 4 Yaniv Lavi 2016-01-20 14:25:26 UTC
(In reply to Simone Tiraboschi from comment #3)
> It's just a docstring but the code is really the same

This is on 3.6.2 and has patches on new that are less important and patches that unblock flow which are critical. Please move to QE is released in latest 3.6.2 and create a new bug for the other fix targeted to a later 3.6.z.

Comment 5 Red Hat Bugzilla Rules Engine 2016-01-20 14:44:44 UTC
Bug tickets that are moved to testing must have target release set to make sure tester knows what to test. Please set the correct target release before moving to ON_QA.

Comment 6 Artyom 2016-02-18 12:42:36 UTC
wait for release 1.3.3.6

Comment 7 Yaniv Lavi 2016-02-21 13:37:42 UTC
What do you man by this comment?

Comment 8 Artyom 2016-02-21 15:23:09 UTC
I mean that version of last package that I have ovirt-hosted-engine-setup-1.3.3.3-1.el7ev.noarch, but target release is 1.3.3.6, so I need package ovirt-hosted-engine-setup-1.3.3.6*.el7ev.noarch to verify this bug

Comment 9 Artyom 2016-02-21 15:24:28 UTC
my mistake, I confused setup package with ha package

Comment 10 Artyom 2016-02-23 15:36:09 UTC
Checked on ovirt-hosted-engine-ha-1.3.4.1-1.el7ev.noarch

Comment 11 Red Hat Bugzilla Rules Engine 2016-02-23 15:36:15 UTC
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.

Comment 12 Simone Tiraboschi 2016-02-23 17:14:39 UTC
Now it fails due to https://bugzilla.redhat.com/show_bug.cgi?id=1274622#c13

Comment 13 Artyom 2016-02-26 12:29:18 UTC
Verified on ovirt-hosted-engine-ha-1.3.4.3-1.el7ev.noarch

1) Deploy 3.5 HE on two hosts
2) Upgrade engine to 3.6
3) Upgrade first host to 3.6
4) stop and mask ovirt-ha-agent on second host
5) update packages on second host
6) unmask ovirt-ha-agent on second host
7) reboot second host
8) check that upgrade pass on second host
PASS


Note You need to log in before you can comment on or make changes to this bug.