Bug 902838

Summary: spmStart throws a MetaDataKeyNotFoundError exception while upgrading a domain from v2 to v3 due to a partially removed volume (lv present, metadata blanked).
Product: Red Hat Enterprise Virtualization Manager Reporter: Lee Yarwood <lyarwood>
Component: vdsmAssignee: Federico Simoncelli <fsimonce>
Status: CLOSED CURRENTRELEASE QA Contact: Dafna Ron <dron>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 3.1.1CC: abaron, amureini, bazulay, fsimonce, hateya, iheim, italkohe, knesenko, lpeer, lyarwood, mkalinin, pablo.iranzo, sgrinber, sputhenp, ykaul
Target Milestone: ---Keywords: ZStream
Target Release: 3.2.0   
Hardware: All   
OS: Other   
Whiteboard: storage
Fixed In Version: vdsm-4.10.2-6.0.el6ev.x86_64.rpm Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 905924 (view as bug list) Environment:
Last Closed: 2013-06-11 09:19:11 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 905924, 915537    

Description Lee Yarwood 2013-01-22 14:04:33 UTC
Description of problem:
spmStart throws a MetaDataKeyNotFoundError exception while upgrading a domain from v2 to v3 due to a partially removed volume (lv present, metadata blanked).

This is due to the following try catch block missing the MetaDataKeyNotFoundError exception from img.prepare on line 251 :

vdsm/vdsm/storage/imageRepository/formatConverter.py

~~~
250             try:
251                 for vol in img.prepare(domain.sdUUID, imgUUID):
252                     try:
253                         v3ResetMetaVolSize(vol)  # BZ#811880
254                     except qemuImg.QImgError:
255                         log.error("It is not possible to read the volume %s "
256                                   "using qemu-img, the content looks damaged",
257                                   vol.volUUID, exc_info=True)
258 
259             except se.VolumeDoesNotExist:
260                 log.error("It is not possible to prepare the image %s, the "
261                           "volume chain looks damaged", imgUUID, exc_info=True)
262 
263             finally:
264                 try:
265                     img.teardown(domain.sdUUID, imgUUID)
266                 except:
267                     log.debug("Unable to teardown the image %s, this error is "
268                               "not critical since the volume might be in use",
269                               imgUUID, exc_info=True)
~~~

Suggested fix would be to simply catch the MetaDataKeyNotFoundError and continue as we do in finally.

Example vdsm.log attached with ~12 examples of spmStart failing due to this.

I've included an example snippet below :
~~~

39985 Thread-13913::DEBUG::2013-01-22 12:53:58,448::BindingXMLRPC::171::vds::(wrapper) [10.141.149.254]
39986 Thread-13913::DEBUG::2013-01-22 12:53:58,448::task::588::TaskManager.Task::(_updateState) Task=`13697099-e662-4d6a-bd4b-0268d46c100a`::moving from state init -> state preparing
39987 Thread-13913::INFO::2013-01-22 12:53:58,449::logUtils::37::dispatcher::(wrapper) Run and protect: spmStart(spUUID='7b2ded83-73c1-412a-b7d9-765e21fb1e42', prevID=-1, prevLVER='114', recoveryMode=None, scsiFencing='false', maxHostID=      250, domVersion='3', options=None)
[..]
42691 13697099-e662-4d6a-bd4b-0268d46c100a::DEBUG::2013-01-22 12:55:36,688::formatConverter::269::Storage.v3DomainConverter::(v3DomainConverter) Unable to teardown the image 69b5106e-5d4a-4642-8be7-d5d39cfbaede, this error is not critica      l since the volume might be in use
42692 Traceback (most recent call last):
42693   File "/usr/share/vdsm/storage/imageRepository/formatConverter.py", line 265, in v3DomainConverter
42694   File "/usr/share/vdsm/storage/image.py", line 379, in teardown
42695   File "/usr/share/vdsm/storage/image.py", line 319, in getChain
42696   File "/usr/share/vdsm/storage/volume.py", line 660, in isShared
42697   File "/usr/share/vdsm/storage/volume.py", line 621, in getVolType
42698   File "/usr/share/vdsm/storage/volume.py", line 865, in getMetaParam
42699 MetaDataKeyNotFoundError: Meta Data key not found error: ("{'NONE': '##################################################################################################################################################################      #######################################################################################################################################################################################################################################      #############################################################################################################'}:VOLTYPE",)
[..]
42776 13697099-e662-4d6a-bd4b-0268d46c100a::ERROR::2013-01-22 12:55:38,929::sp::316::Storage.StoragePool::(startSpm) Unexpected error
42777 Traceback (most recent call last):
42778   File "/usr/share/vdsm/storage/sp.py", line 274, in startSpm
42779   File "/usr/share/vdsm/storage/securable.py", line 63, in wrapper
42780   File "/usr/share/vdsm/storage/sp.py", line 414, in _upgradePool
42781   File "/usr/share/vdsm/storage/sp.py", line 1033, in _convertDomain
42782   File "/usr/share/vdsm/storage/imageRepository/formatConverter.py", line 342, in convert
42783   File "/usr/share/vdsm/storage/imageRepository/formatConverter.py", line 251, in v3DomainConverter
42784   File "/usr/share/vdsm/storage/image.py", line 364, in prepare
42785   File "/usr/share/vdsm/storage/image.py", line 319, in getChain
42786   File "/usr/share/vdsm/storage/volume.py", line 660, in isShared
42787   File "/usr/share/vdsm/storage/volume.py", line 621, in getVolType
42788   File "/usr/share/vdsm/storage/volume.py", line 865, in getMetaParam
42789 MetaDataKeyNotFoundError: Meta Data key not found error: ("{'NONE': '##################################################################################################################################################################      #######################################################################################################################################################################################################################################      #############################################################################################################'}:VOLTYPE",)
42790 13697099-e662-4d6a-bd4b-0268d46c100a::ERROR::2013-01-22 12:55:38,929::sp::317::Storage.StoragePool::(startSpm) failed: Meta Data key not found error: ("{'NONE': '#####################################################################      #######################################################################################################################################################################################################################################      ##########################################################################################################################################################################################################'}:VOLTYPE",)
42791 13697099-e662-4d6a-bd4b-0268d46c100a::DEBUG::2013-01-22 12:55:38,929::sp::323::Storage.StoragePool::(_shutDownUpgrade) Shutting down upgrade process
~~~

A simple workaround to this situation is to lvrename the volume LVs that form the image and also lvchange --deltag any LV tags that they might have.

Version-Release number of selected component (if applicable):
4.9.6-44.2

How reproducible:
Always where a volume's LV is still present but it's metadata is blanked out.

Steps to Reproduce:
1. Remove a volume from engine.
2. Blank out the volumes MD block.
3. Attempt to upgrade the domain from V2 to V3.
  
Actual results:
Failure to upgrade due to the uncaught MetaDataKeyNotFoundError exception.


Expected results:
Upgrade catches the MetaDataKeyNotFoundError exception and completes successfully.

Additional info:
For the original customer this all started with a  failure in 3.0 to cleanly remove this image/volume, something I will look into shortly with data from older cases.

Comment 2 Lee Yarwood 2013-01-22 14:20:21 UTC
Suggested patch : http://gerrit.ovirt.org/11281

Comment 11 Haim 2013-03-18 19:07:20 UTC
verified in sf10.1.

[root@puma32 ~]# hexdump /dev/c9b67373-61df-4bd8-adc5-18763468f09f/metadata 
0000000 0000 0000 0000 0000 0000 0000 0000 0000
*

[root@puma32 ~]# dd if=/dev/zero of=/dev/c9b67373-61df-4bd8-adc5-18763468f09f/metadata bs=1M count=512

issued ypgrade from v1>>v2>>v3

2013-03-18 09:02:25,246 INFO  [org.ovirt.engine.core.bll.storage.UpdateStoragePoolCommand] (pool-3-thread-46) [1dcdde66] Updating storage domain c9b67373-61df-4bd8-adc5-18763468f09f (type Master) to format V3
2013-03-18 09:02:25,250 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.UpgradeStoragePoolVDSCommand] (pool-3-thread-46) [1dcdde66] START, UpgradeStoragePoolVDSCommand( storagePoolId = 407c0f8b-67fb-4cec-b873-1a01092630a3, poolVersion = 3), log id: 1a68c870
2013-03-18 09:02:25,250 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.UpgradeStoragePoolVDSCommand] (pool-3-thread-46) [1dcdde66] Upgrading storage pool 407c0f8b-67fb-4cec-b873-1a01092630a3 to version 3
2013-03-18 09:02:29,561 INFO  [org.ovirt.engine.core.bll.LoginUserCommand] (ajp-/127.0.0.1:8702-3) Running command: LoginUserCommand internal: false.
2013-03-18 09:02:29,571 WARN  [org.ovirt.engine.core.bll.GetConfigurationValueQuery] (ajp-/127.0.0.1:8702-3) calling GetConfigurationValueQuery (ApplicationMode) with null version, using default general for version
2013-03-18 09:02:29,571 WARN  [org.ovirt.engine.core.bll.GetConfigurationValueQuery] (ajp-/127.0.0.1:8702-3) calling GetConfigurationValueQuery (VdcVersion) with null version, using default general for version
2013-03-18 09:02:47,728 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.UpgradeStoragePoolVDSCommand] (pool-3-thread-46) [1dcdde66] FINISH, UpgradeStoragePoolVDSCommand, log id: 1a68c870
2013-03-18 09:03:11,994 INFO  [org.ovirt.engine.core.bll.storage.UpdateStoragePoolCommand] (pool-3-thread-50) [1de44d] Running command: UpdateStoragePoolCommand internal: false. Entities affected :  ID: 407c0f8b-67fb-4cec-b873-1a01092630a3 Type: StoragePool
2013-03-18 09:03:12,001 INFO  [org.ovirt.engine.core.bll.storage.UpdateStoragePoolCommand] (pool-3-thread-50) [1de44d] Updating storage domain c9b67373-61df-4bd8-adc5-18763468f09f (type Master) to format V3
2013-03-18 09:03:12,004 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.UpgradeStoragePoolVDSCommand] (pool-3-thread-50) [1de44d] START, UpgradeStoragePoolVDSCommand( storagePoolId = 407c0f8b-67fb-4cec-b873-1a01092630a3, poolVersion = 3), log id: 6f35e18b
2013-03-18 09:03:12,004 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.UpgradeStoragePoolVDSCommand] (pool-3-thread-50) [1de44d] Upgrading storage pool 407c0f8b-67fb-4cec-b873-1a01092630a3 to version 3
2013-03-18 09:03:12,109 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.UpgradeStoragePoolVDSCommand] (pool-3-thread-50) [1de44d] FINISH, UpgradeStoragePoolVDSCommand, log id: 6f35e18b
2013-03-18 09:03:29,574 INFO  [org.ovirt.engine.core.bll.LoginUserCommand] (ajp-/127.0.0.1:8702-8) Running command: LoginUserCommand internal: false.

Comment 12 Itamar Heim 2013-06-11 09:19:11 UTC
3.2 has been released

Comment 13 Itamar Heim 2013-06-11 09:42:46 UTC
3.2 has been released