Bug 1297457 - 3.5 -> 3.6 upgrade fails on additional (not first) iSCSI hosts if the host rebooted just after yum update
3.5 -> 3.6 upgrade fails on additional (not first) iSCSI hosts if the host re...
Product: ovirt-hosted-engine-ha
Classification: oVirt
Component: General (Show other bugs)
Unspecified Unspecified
unspecified Severity medium (vote)
: ovirt-3.6.3
Assigned To: Simone Tiraboschi
: Triaged
Depends On:
Blocks: ovirt-hosted-engine-ha-
  Show dependency treegraph
Reported: 2016-01-11 10:02 EST by Simone Tiraboschi
Modified: 2016-03-11 02:22 EST (History)
6 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
On 3.5 -> 3.6 upgrade we were relying on images prepared by 3.5 code. If the user upgrade the rpm and suddenly reboot the upgrade procedure was failing cause the image were not prepared. Explicitly preparing images also from the upgrade procedure to ensure that are really ready.
Story Points: ---
Clone Of:
Last Closed: 2016-03-11 02:22:27 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: Integration
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
rule-engine: ovirt‑3.6.z+
rule-engine: blocker+
ylavi: planning_ack+
sbonazzo: devel_ack+
mavital: testing_ack+

Attachments (Terms of Use)

External Trackers
Tracker ID Priority Status Summary Last Updated
oVirt gerrit 51658 master MERGED upgrade: fixing for iSCSI additional hosts 2016-01-12 08:21 EST
oVirt gerrit 51722 ovirt-hosted-engine-ha-1.3 MERGED upgrade: fixing for iSCSI additional hosts 2016-01-12 08:22 EST
oVirt gerrit 51724 master MERGED upgrade: adding a docstring for _is_conf_volume_there 2016-01-21 07:53 EST
oVirt gerrit 52502 ovirt-hosted-engine-ha-1.3 MERGED upgrade: adding a docstring for _is_conf_volume_there 2016-01-20 09:50 EST
oVirt gerrit 53927 master MERGED storage: always pass a blank SP UUID in image.py 2016-02-23 11:01 EST
oVirt gerrit 53928 ovirt-hosted-engine-ha-1.3 MERGED storage: always pass a blank SP UUID in image.py 2016-02-23 11:31 EST
oVirt gerrit 53935 master MERGED storage: fixing a double call of get_images_list 2016-02-24 02:26 EST
oVirt gerrit 53937 ovirt-hosted-engine-ha-1.3 MERGED storage: fixing a double call of get_images_list 2016-02-24 02:27 EST

  None (edit)
Description Simone Tiraboschi 2016-01-11 10:02:27 EST
Description of problem:
In the upgrade procedure each host is checking if the shared configuration volume exists or not and to do that we scan all the volumes on the hosted-engine storage domain.

The issue is that after a reboot, ovirt-ha-agent will call prepareImage for metadata, lockspace and configuration image if present but not for the engine image.
So getVolumesList will report an issue on iSCSI storage domains:

[root@master-vds10 ~]# vdsClient -s 0 getVolumesList 09bb1168-4c09-4523-a5dd-35e5329b0736 00000000-0000-0000-0000-000000000000
ERROR: b7b859df-498f-434f-af48-18702dce341c : {'status': {'message': "Image path does not exist or cannot be accessed/created: ('/rhev/data-center/mnt/blockSD/09bb1168-4c09-4523-a5dd-35e5329b0736/images/2399b2ac-ec64-401e-b162-96106a46bab4',)", 'code': 254}}
1de06201-53c3-4b3b-a2a5-478d95dc5494 : hosted-engine.metadata. 
5fe71750-0111-4989-976b-8575ca63750a : hosted-engine.lockspace. 
51e6b26c-9707-4ac7-b8e2-c38fc36e4e39 : HostedEngineConfigurationImage. 

The upgrade procedure is quite picky on that and it stops the scan and so it doesn't found the configuration volume but it fails later on when it tries to create another one.

It doesn't happens if the user doesn't reboot before restarting ovirt-ha-agent and it doesn't happens on NFS cause getVolumesList doesn't fails here after the reboot.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. deploy hosted-engine 3.5 on at lest two hosts using iSCSI
2. upgrade the first host to 3.6
3. stop and disable ovirt-ha-agent on the second host
4. upgrade all the rpms of the second host to 3.6 and avoid restarting ovirt-ha-agent
5. reboot the second host
6. enable and restart ovirt-ha-agent to trigger the upgrade

Actual results:
it says 
MainThread::DEBUG::2016-01-11 11:31:51,456::image::86::ovirt_hosted_engine_ha.lib.image.Image::(prepare_images) Configuration image doesn't exist
also if the Configuration image is already on the storage domain

Expected results:
MainThread::INFO::2016-01-11 15:51:07,559::upgrade::960::ovirt_hosted_engine_ha.lib.upgrade.StorageServer::(upgrade) Successfully upgraded

Additional info:
Comment 1 Simone Tiraboschi 2016-01-11 10:04:52 EST
Workaround: manually call vdsClient -s prepareImage for the engine image and restart ovirt-ha-agent
Comment 2 Sandro Bonazzola 2016-01-20 09:00:05 EST
Simone this bug is on modified but still have a patch on NEW, can you check?
Comment 3 Simone Tiraboschi 2016-01-20 09:22:04 EST
It's just a docstring but the code is really the same
Comment 4 Yaniv Lavi (Dary) 2016-01-20 09:25:26 EST
(In reply to Simone Tiraboschi from comment #3)
> It's just a docstring but the code is really the same

This is on 3.6.2 and has patches on new that are less important and patches that unblock flow which are critical. Please move to QE is released in latest 3.6.2 and create a new bug for the other fix targeted to a later 3.6.z.
Comment 5 Red Hat Bugzilla Rules Engine 2016-01-20 09:44:44 EST
Bug tickets that are moved to testing must have target release set to make sure tester knows what to test. Please set the correct target release before moving to ON_QA.
Comment 6 Artyom 2016-02-18 07:42:36 EST
wait for release
Comment 7 Yaniv Lavi (Dary) 2016-02-21 08:37:42 EST
What do you man by this comment?
Comment 8 Artyom 2016-02-21 10:23:09 EST
I mean that version of last package that I have ovirt-hosted-engine-setup-, but target release is, so I need package ovirt-hosted-engine-setup-*.el7ev.noarch to verify this bug
Comment 9 Artyom 2016-02-21 10:24:28 EST
my mistake, I confused setup package with ha package
Comment 10 Artyom 2016-02-23 10:36:09 EST
Checked on ovirt-hosted-engine-ha-
Comment 11 Red Hat Bugzilla Rules Engine 2016-02-23 10:36:15 EST
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.
Comment 12 Simone Tiraboschi 2016-02-23 12:14:39 EST
Now it fails due to https://bugzilla.redhat.com/show_bug.cgi?id=1274622#c13
Comment 13 Artyom 2016-02-26 07:29:18 EST
Verified on ovirt-hosted-engine-ha-

1) Deploy 3.5 HE on two hosts
2) Upgrade engine to 3.6
3) Upgrade first host to 3.6
4) stop and mask ovirt-ha-agent on second host
5) update packages on second host
6) unmask ovirt-ha-agent on second host
7) reboot second host
8) check that upgrade pass on second host

Note You need to log in before you can comment on or make changes to this bug.