Red Hat Bugzilla – Bug 1456456
qemu crashes on job completion during drain
Last modified: 2017-08-02 00:41:00 EDT
If an active commit block job completes in a blk_drain() that is called from mirror_drain(), write permissions to the base node are kept for too long and removing the mirror_top filter node for job completion fails. This operation is not supposed to fail, so it results in abort(). This bug was found with qemu-iotests 129, but it only seems to fail on Yash's CI test host. Several engineers tried to reproduce it locally on their systems, but couldn't, because the failure involves a race condition. So, QE may or may not be able to reproduce with qemu-iotests 129. The problem is fixed with a patch I sent to upstream: "mirror: Drop permissions on s->target on completion". This has not been merged into master yet.
QE has run 1000 rounds of qemu-iotests 129, did not reproduce the issue so far.
Fix included in qemu-kvm-rhev-2.9.0-8.el7
Hi Keven, Since QE can not reproduce this bz, so maybe we can only verify it by feature block mirror regression test, since it is closed to our deadline, would you help indicate the test scope? Can we use qemu-iotests for it or need fully testing? If use qemu-iotests, which cases should be covered? Thanks, Qianqian
qemu-iotests should be quick enough that a full run can be done, but test cases that could involve mirror are: 030 040 041 094 109 118 129 132 139 141 144 152 155 156 The change is unlikely to make any difference in manual testing, so I agree that qemu-iotests could be enough.
qemu-iotests all passed except 139, which failed with below log: ....F...F ====================================================================== FAIL: testDeviceModel (__main__.TestBlockdevDel) ---------------------------------------------------------------------- Traceback (most recent call last): File "139", line 255, in testDeviceModel self.addDeviceModel('device0', 'node0') File "139", line 93, in addDeviceModel self.assert_qmp(result, 'return', {}) File "/root/qemu-kvm/tests/qemu-iotests/iotests.py", line 262, in assert_qmp result = self.dictpath(d, path) File "/root/qemu-kvm/tests/qemu-iotests/iotests.py", line 221, in dictpath self.fail('failed path traversal for "%s" in "%s"' % (path, str(d))) AssertionError: failed path traversal for "return" in "{u'error': {u'class': u'GenericError', u'desc': u"Bus 'pcie.0' does not support hotplugging"}}" ====================================================================== FAIL: testSnapshotSync (__main__.TestBlockdevDel) ---------------------------------------------------------------------- Traceback (most recent call last): File "139", line 280, in testSnapshotSync self.addDeviceModel('device0', 'node0') File "139", line 93, in addDeviceModel self.assert_qmp(result, 'return', {}) File "/root/qemu-kvm/tests/qemu-iotests/iotests.py", line 262, in assert_qmp result = self.dictpath(d, path) File "/root/qemu-kvm/tests/qemu-iotests/iotests.py", line 221, in dictpath self.fail('failed path traversal for "%s" in "%s"' % (path, str(d))) AssertionError: failed path traversal for "return" in "{u'error': {u'class': u'GenericError', u'desc': u"Bus 'pcie.0' does not support hotplugging"}}" ---------------------------------------------------------------------- Ran 9 tests FAILED (failures=2) Kevin, I think the above failure is not related to block mirror, could you help confirm it? Thanks.
(In reply to Qianqian Zhu from comment #7) > Kevin, > I think the above failure is not related to block mirror, could you help > confirm it? Thanks. This is correct, the failing tests don't involve mirror at all. The same test passes for me locally. Can you please verify that on your setup this failed even on the old version?
It also failed with qemu-kvm-rhev-2.6.0-28.el7, so I tried to adjust my compile option, and after that, all test cases passed. I think we can ignore this error. So moving to VERIFIED.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:2392