Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1405042

Summary: live merge fails (possibly on image tear down - LV deactivation fails)
Product: [oVirt] vdsm Reporter: Yaniv Kaul <ykaul>
Component: CoreAssignee: Ala Hino <ahino>
Status: CLOSED WORKSFORME QA Contact: Raz Tamir <ratamir>
Severity: high Docs Contact:
Priority: high    
Version: 4.18.30CC: bugs, nsoffer, tnisan, ykaul
Target Milestone: ovirt-4.1.0-betaKeywords: AutomationBlocker, Regression
Target Release: ---Flags: rule-engine: ovirt-4.1+
rule-engine: blocker+
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-01-08 16:14:26 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
vdsm log
none
engine log none

Description Yaniv Kaul 2016-12-15 12:46:05 UTC
Created attachment 1232138 [details]
vdsm log

Description of problem:
While running ovirt-system-tests tests with live merge, it fails, apparentl on tear down of the image.

Version-Release number of selected component (if applicable):
vdsm-4.18.999-1162.gite95442e.el7.centos.x86_64.rpm

How reproducible:
Sometimes - I do not think it fails all the time.

I see a deactivation failure:

2016-12-15 06:29:31,528 INFO  (jsonrpc/0) [dispatcher] Run and protect: teardownImage(sdUUID=u'43c129a6-2a33-48c0-bceb-a9585b78a1fd', spUUID=u'34f579e9-89b5-49fd-9c61-6cbcd67f672e', imgUUID=u'11629df7-653d-4679-
8094-651e2c92dd39', volUUID=None) (logUtils:49)
2016-12-15 06:29:31,559 INFO  (jsonrpc/0) [storage.LVM] Deactivating lvs: vg=43c129a6-2a33-48c0-bceb-a9585b78a1fd lvs=['f99c7c30-6613-4c36-a115-1c877d1a1d2a', '7d630987-5be2-4cfa-bca8-a5409a64216b'] (lvm:1306)
2016-12-15 06:29:33,668 INFO  (periodic/3) [dispatcher] Run and protect: getVolumeSize(sdUUID=u'43c129a6-2a33-48c0-bceb-a9585b78a1fd', spUUID=u'34f579e9-89b5-49fd-9c61-6cbcd67f672e', imgUUID=u'11629df7-653d-4679
-8094-651e2c92dd39', volUUID=u'f99c7c30-6613-4c36-a115-1c877d1a1d2a', options=None) (logUtils:49)
2016-12-15 06:29:33,669 INFO  (periodic/3) [dispatcher] Run and protect: getVolumeSize, Return response: {'truesize': '1073741824', 'apparentsize': '1073741824'} (logUtils:52)
2016-12-15 06:29:33,670 INFO  (periodic/2) [dispatcher] Run and protect: repoStats(options=None) (logUtils:49)
2016-12-15 06:29:33,670 INFO  (periodic/2) [dispatcher] Run and protect: repoStats, Return response: {u'894d6b50-b70c-4438-a2c7-f71ac10cc8b6': {'code': 0, 'actual': True, 'version': 0, 'acquired': True, 'delay':
 '0.000484075', 'lastCheck': '3.4', 'valid': True}, u'4ca13f73-ff87-4b31-99d8-43b5c83e4caf': {'code': 0, 'actual': True, 'version': 0, 'acquired': True, 'delay': '0.0104277', 'lastCheck': '6.3', 'valid': True}, 
u'43c129a6-2a33-48c0-bceb-a9585b78a1fd': {'code': 0, 'actual': True, 'version': 4, 'acquired': True, 'delay': '0.00360224', 'lastCheck': '3.9', 'valid': True}, u'ef9c603b-eb82-4b70-b3a4-edf7c2286239': {'code': 0
, 'actual': True, 'version': 4, 'acquired': True, 'delay': '0.00021639', 'lastCheck': '8.0', 'valid': True}} (logUtils:52)
2016-12-15 06:29:39,229 INFO  (jsonrpc/2) [dispatcher] Run and protect: repoStats(options=None) (logUtils:49)
2016-12-15 06:29:39,229 INFO  (jsonrpc/2) [dispatcher] Run and protect: repoStats, Return response: {u'894d6b50-b70c-4438-a2c7-f71ac10cc8b6': {'code': 0, 'actual': True, 'version': 0, 'acquired': True, 'delay': 
'0.000484075', 'lastCheck': '8.9', 'valid': True}, u'4ca13f73-ff87-4b31-99d8-43b5c83e4caf': {'code': 0, 'actual': True, 'version': 0, 'acquired': True, 'delay': '0.000249655', 'lastCheck': '1.9', 'valid': True},
 u'43c129a6-2a33-48c0-bceb-a9585b78a1fd': {'code': 0, 'actual': True, 'version': 4, 'acquired': True, 'delay': '0.00925802', 'lastCheck': '0.4', 'valid': True}, u'ef9c603b-eb82-4b70-b3a4-edf7c2286239': {'code': 
0, 'actual': True, 'version': 4, 'acquired': True, 'delay': '0.000233896', 'lastCheck': '3.6', 'valid': True}} (logUtils:52)
2016-12-15 06:29:39,234 INFO  (jsonrpc/2) [jsonrpc.JsonRpcServer] RPC call Host.getStats succeeded in 0.01 seconds (__init__:515)
2016-12-15 06:29:39,727 INFO  (jsonrpc/7) [jsonrpc.JsonRpcServer] RPC call Host.getAllVmStats succeeded in 0.00 seconds (__init__:515)
2016-12-15 06:29:39,739 INFO  (jsonrpc/1) [jsonrpc.JsonRpcServer] RPC call Host.getAllVmIoTunePolicies succeeded in 0.00 seconds (__init__:515)
2016-12-15 06:29:48,673 INFO  (periodic/0) [dispatcher] Run and protect: repoStats(options=None) (logUtils:49)
2016-12-15 06:29:48,674 INFO  (periodic/0) [dispatcher] Run and protect: repoStats, Return response: {u'894d6b50-b70c-4438-a2c7-f71ac10cc8b6': {'code': 0, 'actual': True, 'version': 0, 'acquired': True, 'delay':
 '0.000246189', 'lastCheck': '8.4', 'valid': True}, u'4ca13f73-ff87-4b31-99d8-43b5c83e4caf': {'code': 0, 'actual': True, 'version': 0, 'acquired': True, 'delay': '0.000229334', 'lastCheck': '1.3', 'valid': True}
, u'43c129a6-2a33-48c0-bceb-a9585b78a1fd': {'code': 0, 'actual': True, 'version': 4, 'acquired': True, 'delay': '0.00925802', 'lastCheck': '8.9', 'valid': True}, u'ef9c603b-eb82-4b70-b3a4-edf7c2286239': {'code':
 0, 'actual': True, 'version': 4, 'acquired': True, 'delay': '0.000272641', 'lastCheck': '3.0', 'valid': True}} (logUtils:52)
2016-12-15 06:29:50,837 ERROR (jsonrpc/0) [storage.TaskManager.Task] (Task='7db61c74-eb49-40f6-a4a8-9ab0b616d2ab') Unexpected error (task:870)
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 877, in _run
    return fn(*args, **kargs)
  File "/usr/lib/python2.7/site-packages/vdsm/logUtils.py", line 50, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 3187, in teardownImage
    dom.deactivateImage(imgUUID)
  File "/usr/share/vdsm/storage/blockSD.py", line 1245, in deactivateImage
    lvm.deactivateLVs(self.sdUUID, volUUIDs)
  File "/usr/share/vdsm/storage/lvm.py", line 1307, in deactivateLVs
    _setLVAvailability(vgName, toDeactivate, "n")
  File "/usr/share/vdsm/storage/lvm.py", line 846, in _setLVAvailability
    raise error(str(e))
CannotDeactivateLogicalVolume: Cannot deactivate Logical Volume: ('General Storage Exception: ("5 [] [\'  WARNING: Not using lvmetad because config setting use_lvmetad=0.\', \'  WARNING: To avoid corruption, rescan devices to make changes visible (pvscan --cache).\', \'  Logical volume 43c129a6-2a33-48c0-bceb-a9585b78a1fd/7d630987-5be2-4cfa-bca8-a5409a64216b in use.\', \'  Logical volume 43c129a6-2a33-48c0-bceb-a9585b78a1fd/f99c7c30-6613-4c36-a115-1c877d1a1d2a in use.\']\\n43c129a6-2a33-48c0-bceb-a9585b78a1fd/[\'f99c7c30-6613-4c36-a115-1c877d1a1d2a\', \'7d630987-5be2-4cfa-bca8-a5409a64216b\']",)',)
2016-12-15 06:29:50,842 INFO  (jsonrpc/0) [storage.TaskManager.Task] (Task='7db61c74-eb49-40f6-a4a8-9ab0b616d2ab') aborting: Task is aborted: 'Cannot deactivate Logical Volume' - code 552 (task:1175)
2016-12-15 06:29:50,842 ERROR (jsonrpc/0) [storage.Dispatcher] {'status': {'message': 'Cannot deactivate Logical Volume: (\'General Storage Exception: ("5 [] [\\\'  WARNING: Not using lvmetad because config setting use_lvmetad=0.\\\', \\\'  WARNING: To avoid corruption, rescan devices to make changes visible (pvscan --cache).\\\', \\\'  Logical volume 43c129a6-2a33-48c0-bceb-a9585b78a1fd/7d630987-5be2-4cfa-bca8-a5409a64216b in use.\\\', \\\'  Logical volume 43c129a6-2a33-48c0-bceb-a9585b78a1fd/f99c7c30-6613-4c36-a115-1c877d1a1d2a in use.\\\']\\\\n43c129a6-2a33-48c0-bceb-a9585b78a1fd/[\\\'f99c7c30-6613-4c36-a115-1c877d1a1d2a\\\', \\\'7d630987-5be2-4cfa-bca8-a5409a64216b\\\']",)\',)', 'code': 552}} (dispatcher:77)
2016-12-15 06:29:50,842 INFO  (jsonrpc/0) [jsonrpc.JsonRpcServer] RPC call Image.teardown failed (error 552) in 19.32 seconds (__init__:515)

Comment 1 Yaniv Kaul 2016-12-15 12:50:44 UTC
Created attachment 1232151 [details]
engine log

Comment 2 Nir Soffer 2016-12-15 15:40:57 UTC
This fail with:

Logical volume 43c129a6-2a33-48c0-bceb-a9585b78a1fd/7d630987-5be2-4cfa-bca8-a5409a64216b in use.

Failing on teardown mean that live merge was completed, but we cannot deactivate
the volume. This was error ignored silently in the past, but now we fail the flow.

Leaving active lv is dangerous, we had image corruption bug caused by this.

Is this raw volume? do we have guest lvs on this volume?

Comment 3 Yaniv Kaul 2016-12-16 09:34:39 UTC
(In reply to Nir Soffer from comment #2)
> This fail with:
> 
> Logical volume
> 43c129a6-2a33-48c0-bceb-a9585b78a1fd/7d630987-5be2-4cfa-bca8-a5409a64216b in
> use.
> 
> Failing on teardown mean that live merge was completed, but we cannot
> deactivate
> the volume. This was error ignored silently in the past, but now we fail the
> flow.
> 
> Leaving active lv is dangerous, we had image corruption bug caused by this.
> 
> Is this raw volume? do we have guest lvs on this volume?

I'm now seeing failures in cold merge as well, in similar scenario. However, I've also noticed that there's an even earlier WARNING (and not ERROR, which is why I've missed it in the first place) that deactivation fails. Example:

2016-12-16 04:09:30,505 INFO  (tasks/1) [storage.VolumeManifest] Tearing down volume 954c1706-3ed9-4b7f-be73-5d5d4cca209e/b790b46c-3304-4934-bab7-01aca8dcae50 justme False (blockVolume:390)
2016-12-16 04:09:30,506 INFO  (tasks/1) [storage.LVM] Deactivating lvs: vg=954c1706-3ed9-4b7f-be73-5d5d4cca209e lvs=[u'b790b46c-3304-4934-bab7-01aca8dcae50'] (lvm:1306)
2016-12-16 04:09:35,354 WARN  (tasks/1) [storage.ResourcesFactories] Failure deactivate LV 954c1706-3ed9-4b7f-be73-5d5d4cca209e/b790b46c-3304-4934-bab7-01aca8dcae50 (Cannot deactivate Logical Volume: ('General Storage Exception: (u"5 [] [\'  WARNING: Not using lvmetad because config setting use_lvmetad=0.\', \'  WARNING: To avoid corruption, rescan devices to make changes visible (pvscan --cache).\', \'  Logical volume 954c1706-3ed9-4b7f-be73-5d5d4cca209e/b790b46c-3304-4934-bab7-01aca8dcae50 in use.\']\\n954c1706-3ed9-4b7f-be73-5d5d4cca209e/[u\'b790b46c-3304-4934-bab7-01aca8dcae50\']",)',)) (resourceFactories:58)
2016-12-16 04:09:52,181 INFO  (jsonrpc/5) [vdsm.api] START snapshot args=(<virt.vm.Vm object at 0x2b01c90>, [{u'baseVolumeID': u'b790b46c-3304-4934-bab7-01aca8dcae50', u'domainID': u'954c1706-3ed9-4b7f-be73-5d5d4cca209e', u'volumeID': u'85d3cc1e-0bd6-49a6-afa5-d45db60e112d', u'imageID': u'2daa88c9-7646-4e1c-a697-08222c54ecef'}], {'dst': {'device': 'disk', 'imageID': u'78658374-e538-4b9a-8b28-3896883ff19c', 'domainID': u'954c1706-3ed9-4b7f-be73-5d5d4cca209e', 'volumeID': u'13ee4f0f-1efa-491f-b995-abac8e3205b4', 'poolID': u'9fd6b273-aa60-4031-992a-09fb232c8a47'}, 'dstparams': {'device': 'disk', 'imageID': u'8051c36a-b871-418a-8043-74001386dc95', 'domainID': u'954c1706-3ed9-4b7f-be73-5d5d4cca209e', 'volumeID': u'2e74c4f6-5aa5-4009-9c1f-f8deef4ca38a', 'poolID': u'9fd6b273-aa60-4031-992a-09fb232c8a47'}}, False) kwargs={} (api:37)
2016-12-16 04:09:52,251 INFO  (jsonrpc/5) [storage.LVM] Refreshing lvs: vg=954c1706-3ed9-4b7f-be73-5d5d4cca209e lvs=['b790b46c-3304-4934-bab7-01aca8dcae50'] (lvm:1294)

Comment 4 Red Hat Bugzilla Rules Engine 2016-12-27 16:39:47 UTC
This bug report has Keywords: Regression or TestBlocker.
Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.