Bug 1684537

Summary: VM crash during migration with "qemu-kvm: Failed to lock byte 100"
Product: Red Hat Enterprise Virtualization Manager Reporter: Marian Jankular <mjankula>
Component: vdsmAssignee: Milan Zamazal <mzamazal>
Status: CLOSED ERRATA QA Contact: Polina <pagranat>
Severity: high Docs Contact:
Priority: urgent    
Version: 4.1.8CC: ahadas, dfediuck, dgilbert, fjin, hreitz, lsurette, michal.skrivanek, mjankula, mtessun, mzamazal, rdlugyhe, smaudet, srevivo, ycui
Target Milestone: ovirt-4.4.1Keywords: TestOnly
Target Release: 4.3.0Flags: lsvaty: testing_plan_complete-
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Previously, a virtual machine could crash with the message "qemu-kvm: Failed to lock byte 100" during a live migration with storage problems. The current release fixes this issue in the underlying platform so the issue no longer happens.
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-08-04 13:26:25 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Virt RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Comment 2 Michal Skrivanek 2019-03-02 05:45:58 UTC
Could you please also attach qemu.log from both sides for (at least some of) the VMs

Comment 12 Hanna Czenczek 2019-03-29 15:09:56 UTC
I rather think it’s the same as BZ 1652572 (which may or may not be the same as BZ 1603104).  The problem (I believe) is that we try to drop locks we have on a file, which fails (because something's just wrong with the file handle itself), but the block layer expects dropping locks to always work.

One way to fix it would be to just ignore the fact that we weren’t able to drop the locks, that would at least rid us of the failed assertion.  (And the qemu instance itself doesn’t really care anyway whether it can drop locks or not.)

Comment 13 Hanna Czenczek 2019-03-29 16:18:13 UTC
Just noticed that this has just been fixed upstream (well, it still needs to go into master): http://lists.nongnu.org/archive/html/qemu-block/2019-03/msg00974.html

(It’s always nice to come out of PTO and see someone fixed your bugs.)

Max

Comment 14 Hanna Czenczek 2019-03-29 17:50:38 UTC
Some discussion later, I’m not sure whether that patch would really fix the issue here (well, it wouldn’t hurt).  However, Kevin just pointed me to the fact that upstream 2996ffad3acabe890fbb4f84a069cdc325a68108 might have been the actual fix.  This was included in qemu-kvm-rhev-2.12.0-23.el7 for BZ 1551486.

This BZ here was reported against qemu-kvm-rhev-2.12.0-18.el7 (i.e. RHV 7.6).  So maybe the fix is already in for RHV 7.7?

Max

Comment 15 Hanna Czenczek 2019-04-03 17:51:18 UTC
696aaaed579ac5bf5fa336216909b46d3d8f07a8 (the patch I linked to in comment 13, which is in upstream’s master by now) is required to fix a related crash, but I’m not sure it is relevant here.  I’ll backport it for BZ 1603104.

Max

Comment 16 Doron Fediuck 2019-04-04 08:16:30 UTC
(In reply to Max Reitz from comment #15)
> 696aaaed579ac5bf5fa336216909b46d3d8f07a8 (the patch I linked to in comment
> 13, which is in upstream’s master by now) is required to fix a related
> crash, but I’m not sure it is relevant here.  I’ll backport it for BZ
> 1603104.
> 
> Max

So it seems that this bz is blocked by BZ 1603104?

Comment 17 Hanna Czenczek 2019-04-05 15:26:13 UTC
Hi Doron,

It depends.  I think the main fix should already be in qemu-kvm-rhev-2.12.0-23.el7 (as the fix for BZ 1551486).  However, there is another related crash that is tracked with BZ 1603104, yes.

Max

Comment 18 Daniel Gur 2019-08-28 13:13:23 UTC
sync2jira

Comment 19 Daniel Gur 2019-08-28 13:17:36 UTC
sync2jira

Comment 20 RHV bug bot 2019-10-22 17:25:33 UTC
WARN: Bug status (ON_QA) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops: Bug status (ON_QA) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops

Comment 21 RHV bug bot 2019-10-22 17:39:09 UTC
WARN: Bug status (ON_QA) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops: Bug status (ON_QA) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops

Comment 22 RHV bug bot 2019-10-22 17:46:25 UTC
WARN: Bug status (ON_QA) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops: Bug status (ON_QA) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops

Comment 23 RHV bug bot 2019-10-22 18:02:13 UTC
WARN: Bug status (ON_QA) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops: Bug status (ON_QA) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops

Comment 24 Polina 2019-11-17 13:49:08 UTC
Verified ovirt-engine-4.4.0-0.4.master.el7.noarch and vdsm-4.40.0-141.gitb9d2120.el8ev.x86_64.

A lot of interactions of simultaneous migrations of 8VMs don't cause qemu crash. Also tried while there are storage problems on the destination host.no crash

Comment 25 RHV bug bot 2019-11-19 11:52:39 UTC
WARN: Bug status (VERIFIED) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops: Bug status (VERIFIED) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops

Comment 26 RHV bug bot 2019-11-19 12:02:43 UTC
WARN: Bug status (VERIFIED) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops: Bug status (VERIFIED) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops

Comment 27 RHV bug bot 2019-12-13 13:16:51 UTC
WARN: Bug status (VERIFIED) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops: Bug status (VERIFIED) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops

Comment 28 RHV bug bot 2019-12-20 17:46:13 UTC
WARN: Bug status (VERIFIED) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops: Bug status (VERIFIED) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops

Comment 29 RHV bug bot 2020-01-08 14:49:51 UTC
WARN: Bug status (VERIFIED) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops: Bug status (VERIFIED) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops

Comment 30 RHV bug bot 2020-01-08 15:18:54 UTC
WARN: Bug status (VERIFIED) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops: Bug status (VERIFIED) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops

Comment 31 RHV bug bot 2020-01-24 19:51:39 UTC
WARN: Bug status (VERIFIED) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops: Bug status (VERIFIED) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops

Comment 37 errata-xmlrpc 2020-08-04 13:26:25 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (RHV RHEL Host (ovirt-host) 4.4), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2020:3246