Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1638540

Summary: LSM encountered WELD-000049 exception and never issued live merge
Product: Red Hat Enterprise Virtualization Manager Reporter: Gordon Watson <gwatson>
Component: ovirt-engineAssignee: Benny Zlotnik <bzlotnik>
Status: CLOSED ERRATA QA Contact: Shir Fishbain <sfishbai>
Severity: high Docs Contact:
Priority: high    
Version: 4.2.6CC: bzlotnik, ebenahar, eedri, gwatson, kshukla, mkalinin, mtessun, Rhev-m-bugs, rmcswain, tnisan
Target Milestone: ovirt-4.3.0Keywords: Reopened, ZStream
Target Release: ---Flags: lsvaty: testing_plan_complete-
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ovirt-engine-4.2.8.1 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1660529 (view as bug list) Environment:
Last Closed: 2019-05-08 12:38:43 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1660529    

Description Gordon Watson 2018-10-11 20:55:03 UTC
Description of problem:

Live Storage Migration (LSM) completed the disk move steps, but never issued the live merge sequence. Right after VmReplicateDiskFinishVDSCommand completed, the following error occurred;

2018-09-21 03:01:58,772+02 ERROR [org.ovirt.engine.core.bll.CommandsFactory] (EE-ManagedThreadFactory-engineScheduled-Thread-79) [6c0649c7-d37d-408a-a6af-0c50ccf8353d] An exception has occurred while trying to create a command object for command 'LiveMigrateDisk' with parameters 'LiveMigrateDiskParameters:{commandId='bfaabf40-cac3-4771-a569-6baf768aca23', user='admin', commandType='LiveMigrateDisk'}': WELD-000049: Unable to invoke protected final void org.ovirt.engine.core.bll.CommandBase.postConstruct() on org.ovirt.engine.core.bll.storage.lsm.LiveMigrateDiskCommand@4a6d91c8


The end result as that the disk had been moved, but the disk images remained locked.


Version-Release number of selected component (if applicable):

RHV 4.2.6
RHEL 7.5 hosts w/vdsm-4.20.35-1


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 10 Elad 2018-10-31 10:54:17 UTC
I don't have anything to add here, if you have a specific question please raise it.
I know Benny Zlotnik (bzlotnik) is investigating this right now.

Comment 11 Benny Zlotnik 2018-10-31 14:11:14 UTC
I was able to reproduce the issue. It seems there is a high load in the LSM department which caused commands to wait more than 10 minutes for the lock on the VM (in order to perform the snapshot operation), by the time the lock was released the transaction was already killed by the transaction reaper.
This probably affected live merge as well since they use the same lock as well.

Comment 14 RHV bug bot 2018-11-28 14:37:37 UTC
WARN: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[Found non-acked flags: '{'rhevm-4.2.z': '?'}', ]

For more info please contact: rhv-devops: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[Found non-acked flags: '{'rhevm-4.2.z': '?'}', ]

For more info please contact: rhv-devops

Comment 15 Eyal Edri 2018-12-13 16:00:47 UTC
This bug needs cloning before it can move to ON_QA.

Comment 19 Shir Fishbain 2019-01-30 13:14:05 UTC
Verified 

ovirt-engine-4.3.0-0.8.rc2.el7.noarch
vdsm-4.30.6-1.el7ev.x86_64

The bug was reproduced by the following steps:
1. Create a VM with a disk 
2. Run the VM 
3. Connected to the host that VM run on it 
4. Reduce the transaction timeout to 60 seconds (can be done via jboss-cli) - /subsystem=transactions:write-attribute(name=default-timeout,value=60
5. Add sleep to the host (In "def snapshot" method add an import time\ time.sleep(90))
6. Edited step 1 and restart the host
7. Run LSM on the disk

Comment 21 errata-xmlrpc 2019-05-08 12:38:43 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:1085

Comment 22 Daniel Gur 2019-08-28 13:11:52 UTC
sync2jira

Comment 23 Daniel Gur 2019-08-28 13:16:04 UTC
sync2jira