Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1610439 - [downstream clone - 4.2.5] After upgrade to RHV 4.2.3, hosts can no longer be set into maintenance mode.
[downstream clone - 4.2.5] After upgrade to RHV 4.2.3, hosts can no longer be...
Status: CLOSED ERRATA
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine (Show other bugs)
4.2.3
Unspecified Unspecified
unspecified Severity high
: ovirt-4.2.6
: 4.2.6
Assigned To: Daniel Erez
Kevin Alon Goldblatt
: ZStream
Depends On: 1586126
Blocks:
  Show dependency treegraph
 
Reported: 2018-07-31 11:27 EDT by RHV Bugzilla Automation and Verification Bot
Modified: 2018-09-04 09:42 EDT (History)
13 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1586126
Environment:
Last Closed: 2018-09-04 09:41:42 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Storage
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 3551351 None None None 2018-08-03 02:55 EDT
oVirt gerrit 93029 master MERGED core: add foreign key to image_transfers 2018-07-31 11:28 EDT
oVirt gerrit 93059 ovirt-engine-4.2 MERGED core: add foreign key to image_transfers 2018-07-31 11:28 EDT
Red Hat Product Errata RHBA-2018:2623 None None None 2018-09-04 09:42 EDT

  None (edit)
Description RHV Bugzilla Automation and Verification Bot 2018-07-31 11:27:17 EDT
+++ This bug is a downstream clone. The original bug is: +++
+++   bug 1586126 +++
======================================================================

Description of problem:
After upgrading a RHV installation from RHV 4.1 to 4.2.3, I can no longer set either of my two hosts to maintenance mode. The error I receive is:

"Error while executing action: Cannot switch Host rhelh01.bit63.net to Maintenance mode. Image transfer is in progress for the following (1) disks: 

821a160f-da54-4559-b145-79fe97c6d7ef 

Please wait for the operations to complete and try again."

I have searched for a disk with that ID and it doesn't appear to exist. There are no VMs running, and this worked fine in RHV 4.1

Version-Release number of selected component (if applicable):
4.2.3.8-0.1.el7

How reproducible:
Every time

Steps to Reproduce:
1. Upgrade RHV to 4.2.3
2. Attempt to set a host to maintenance mode

Actual results:
The error message is seen

Expected results:
The host should go into maintenance mode


Additional info:

(Originally by Peter McGowan)
Comment 1 RHV Bugzilla Automation and Verification Bot 2018-07-31 11:27:25 EDT
Can you please attach logs?

(Originally by Yaniv Kaul)
Comment 3 RHV Bugzilla Automation and Verification Bot 2018-07-31 11:27:30 EDT
Created attachment 1448342 [details]
server.log

(Originally by Peter McGowan)
Comment 4 RHV Bugzilla Automation and Verification Bot 2018-07-31 11:27:36 EDT
Created attachment 1448343 [details]
engine.log

(Originally by Peter McGowan)
Comment 6 RHV Bugzilla Automation and Verification Bot 2018-07-31 11:27:47 EDT
I had the same issue. After checking the DB there was an entry in the engine database going back months - not sure how that happened.

As the issue (for me) occurred in our test area I decided to remove the entry from the database and it seemed to resolve the problem - of course I can not confirm if this is a wise choice or not for production environments :)

steps:
1) create backup of the engine DB
2) log into postgres and the engine DB
3) 
postgres-# select disk_id from image_transfers ;

4) 
postgres-# delete from image_transfers where disk_id='170fca12-0d26-4845-96af-f20970be5c06' ;

5)
postgres-# commit;

Cheers
Rod

(Originally by rodney_vdw)
Comment 7 RHV Bugzilla Automation and Verification Bot 2018-07-31 11:27:52 EDT
The engine prevents moving an host to maintenance when there are any running image transfers on it (i.e. transfers that are not in a paused state). So the proper solution is to pause or cancel the transfers from the api/ui. However, in the described scenario, there was a stale image transfer for a missing disk. So the safe workaround is to alter the image_transfers record and set 'phase' to 4 (we can create a tool for that if the issue persist on existing environments). But I couldn't reproduce this scenario on a local env.

@Peter/Rod - did you keep the image_transfers records or a db dump by any chance?

(Originally by Daniel Erez)
Comment 8 RHV Bugzilla Automation and Verification Bot 2018-07-31 11:27:57 EDT
I've just changed the phase to 4 on each of the records and I can confirm that I can now set my hosts to maintenance mode.

The dumps of the 2 records are as follows:

engine=# select * from image_transfers ;
-[ RECORD 1 ]-------------+----------------------------------------------
command_id                | 38215664-e58d-4d04-8de0-b4c52436cc03
command_type              | 1024
phase                     | 4
last_updated              | 2018-03-07 14:32:34.042+00
message                   |
vds_id                    | 2b2e458e-6573-4843-8020-9e7e2bfbb8aa
disk_id                   | 821a160f-da54-4559-b145-79fe97c6d7ef
imaged_ticket_id          |
proxy_uri                 | https://localhost:54323/images
signed_ticket             | eyJzY... redacted
bytes_sent                | 0
bytes_total               | 1065680896
type                      | 0
active                    | f
daemon_uri                |
client_inactivity_timeout |
-[ RECORD 2 ]-------------+----------------------------------------------
command_id                | 3e519ef2-17f4-475e-8e6c-01a6f950c218
command_type              | 1024
phase                     | 4
last_updated              | 2018-03-06 14:04:08.748+00
message                   |
vds_id                    | 448abfa2-a5ff-416e-9d4a-8291fafbcd34
disk_id                   | fad7d49b-3c3d-40c7-a49a-313217e0dcb8
imaged_ticket_id          |
proxy_uri                 | https://localhost:54323/images
signed_ticket             | eyJzY... redacted
bytes_sent                | 0
bytes_total               | 1065680896
type                      | 0
active                    | f
daemon_uri                |
client_inactivity_timeout |

(Originally by Peter McGowan)
Comment 9 RHV Bugzilla Automation and Verification Bot 2018-07-31 11:28:04 EDT
Perhaps I should add that the records were both on phase 6

(Originally by Peter McGowan)
Comment 10 RHV Bugzilla Automation and Verification Bot 2018-07-31 11:28:09 EDT
(In reply to Peter McGowan from comment #8)
> Perhaps I should add that the records were both on phase 6

Can you please also attach a dump of the associated command_entities records? or was it already been cleared from db?

(Originally by Daniel Erez)
Comment 11 RHV Bugzilla Automation and Verification Bot 2018-07-31 11:28:15 EDT
Created attachment 1454613 [details]
Dump of command_entities table

(Originally by Peter McGowan)
Comment 12 RHV Bugzilla Automation and Verification Bot 2018-07-31 11:28:20 EDT
(In reply to Peter McGowan from comment #10)
> Created attachment 1454613 [details]
> Dump of command_entities table

I see the command entities of the image transfers are missing. Did you use clean zombie tasks on upgrade perhaps? Or executed taskcleaner manually? Also, IIUC, the relevant disks are missing from db?

(Originally by Daniel Erez)
Comment 13 RHV Bugzilla Automation and Verification Bot 2018-07-31 11:28:25 EDT
I think I executed taskcleaner manually, which probably explains why they are missing.

The disk images were never uploaded. I think I hadn't installed the CA cert into my browser, so the image upload failed (https://access.redhat.com/solutions/2592941)

(Originally by Peter McGowan)
Comment 14 RHV Bugzilla Automation and Verification Bot 2018-07-31 11:28:30 EDT
Added a foreign key to image_transfers so removing command_entities won't leave stale transfers records (which prevents moving the host to maintenance).

(Originally by Daniel Erez)
Comment 15 RHV Bugzilla Automation and Verification Bot 2018-07-31 11:28:36 EDT
WARN: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[Found non-acked flags: '{'rhevm-4.2.z': '?'}', ]

For more info please contact: rhv-devops@redhat.comINFO: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[Found non-acked flags: '{'rhevm-4.2.z': '?'}', ]

For more info please contact: rhv-devops@redhat.com

(Originally by rhv-bugzilla-bot)
Comment 16 Kevin Alon Goldblatt 2018-08-12 10:41:18 EDT
Verified with the following code:
----------------------------------------
ovirt-engine-4.2.5.3-0.1.el7ev.noarch
vdsm-4.20.35-1.el7ev.x86_64


Verified with the following scenario:
----------------------------------------
Steps to Reproduce:
1. Upgrade RHV to 4.2.3
2. Attempt to set a host to maintenance mode >>>>> host is set to maintenance
3. Set a second host to maintenance >>>>> host is set to maintenance.



Moving to VERIFIED!
Comment 18 errata-xmlrpc 2018-09-04 09:41:42 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2623

Note You need to log in before you can comment on or make changes to this bug.