Bug 1456456

Summary: qemu crashes on job completion during drain
Product: Red Hat Enterprise Linux 7 Reporter: Kevin Wolf <kwolf>
Component: qemu-kvm-rhevAssignee: Kevin Wolf <kwolf>
Status: CLOSED ERRATA QA Contact: Qianqian Zhu <qizhu>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 7.4CC: chayang, juzhang, kwolf, michen, mrezanin, qizhu, virt-maint, xfu
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: qemu-kvm-rhev-2.9.0-8.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-08-02 04:41:00 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Kevin Wolf 2017-05-29 12:38:51 UTC
If an active commit block job completes in a blk_drain() that is called from
mirror_drain(), write permissions to the base node are kept for too long and
removing the mirror_top filter node for job completion fails. This operation is
not supposed to fail, so it results in abort().

This bug was found with qemu-iotests 129, but it only seems to fail on Yash's
CI test host. Several engineers tried to reproduce it locally on their systems,
but couldn't, because the failure involves a race condition. So, QE may or may
not be able to reproduce with qemu-iotests 129.

The problem is fixed with a patch I sent to upstream: "mirror: Drop permissions
on s->target on completion". This has not been merged into master yet.

Comment 1 Qianqian Zhu 2017-06-02 02:59:05 UTC
QE has run 1000 rounds of qemu-iotests 129, did not reproduce the issue so far.

Comment 2 Miroslav Rezanina 2017-06-06 08:55:07 UTC
Fix included in qemu-kvm-rhev-2.9.0-8.el7

Comment 4 Qianqian Zhu 2017-06-09 03:32:36 UTC
Hi Keven,

Since QE can not reproduce this bz, so maybe we can only verify it by feature block mirror regression test, since it is closed to our deadline, would you help indicate the test scope? Can we use qemu-iotests for it or need fully testing? If use qemu-iotests, which cases should be covered?

Thanks,
Qianqian

Comment 5 Kevin Wolf 2017-06-09 08:40:32 UTC
qemu-iotests should be quick enough that a full run can be done, but test cases
that could involve mirror are: 030 040 041 094 109 118 129 132 139 141 144 152
155 156

The change is unlikely to make any difference in manual testing, so I agree that
qemu-iotests could be enough.

Comment 7 Qianqian Zhu 2017-06-14 02:31:21 UTC
qemu-iotests all passed except 139, which failed with below log:

....F...F
======================================================================
FAIL: testDeviceModel (__main__.TestBlockdevDel)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "139", line 255, in testDeviceModel
    self.addDeviceModel('device0', 'node0')
  File "139", line 93, in addDeviceModel
    self.assert_qmp(result, 'return', {})
  File "/root/qemu-kvm/tests/qemu-iotests/iotests.py", line 262, in assert_qmp
    result = self.dictpath(d, path)
  File "/root/qemu-kvm/tests/qemu-iotests/iotests.py", line 221, in dictpath
    self.fail('failed path traversal for "%s" in "%s"' % (path, str(d)))
AssertionError: failed path traversal for "return" in "{u'error': {u'class': u'GenericError', u'desc': u"Bus 'pcie.0' does not support hotplugging"}}"

======================================================================
FAIL: testSnapshotSync (__main__.TestBlockdevDel)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "139", line 280, in testSnapshotSync
    self.addDeviceModel('device0', 'node0')
  File "139", line 93, in addDeviceModel
    self.assert_qmp(result, 'return', {})
  File "/root/qemu-kvm/tests/qemu-iotests/iotests.py", line 262, in assert_qmp
    result = self.dictpath(d, path)
  File "/root/qemu-kvm/tests/qemu-iotests/iotests.py", line 221, in dictpath
    self.fail('failed path traversal for "%s" in "%s"' % (path, str(d)))
AssertionError: failed path traversal for "return" in "{u'error': {u'class': u'GenericError', u'desc': u"Bus 'pcie.0' does not support hotplugging"}}"

----------------------------------------------------------------------
Ran 9 tests

FAILED (failures=2)

Kevin,
I think the above failure is not related to block mirror, could you help confirm it? Thanks.

Comment 8 Kevin Wolf 2017-06-14 08:24:50 UTC
(In reply to Qianqian Zhu from comment #7)
> Kevin,
> I think the above failure is not related to block mirror, could you help
> confirm it? Thanks.

This is correct, the failing tests don't involve mirror at all.

The same test passes for me locally. Can you please verify that on your setup
this failed even on the old version?

Comment 9 Qianqian Zhu 2017-06-15 01:47:21 UTC
It also failed with qemu-kvm-rhev-2.6.0-28.el7, so I tried to adjust my compile option, and after that, all test cases passed. I think we can ignore this error.

So moving to VERIFIED.

Comment 11 errata-xmlrpc 2017-08-02 04:41:00 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392