Bug 902838 - spmStart throws a MetaDataKeyNotFoundError exception while upgrading a domain from v2 to v3 due to a partially removed volume (lv present, metadata blanked).
spmStart throws a MetaDataKeyNotFoundError exception while upgrading a domain...
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm (Show other bugs)
3.1.1
All Other
urgent Severity urgent
: ---
: 3.2.0
Assigned To: Federico Simoncelli
Dafna Ron
storage
: ZStream
Depends On:
Blocks: 905924 915537
  Show dependency treegraph
 
Reported: 2013-01-22 09:04 EST by Lee Yarwood
Modified: 2016-02-10 15:26 EST (History)
15 users (show)

See Also:
Fixed In Version: vdsm-4.10.2-6.0.el6ev.x86_64.rpm
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 905924 (view as bug list)
Environment:
Last Closed: 2013-06-11 05:19:11 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Storage
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 297213 None None None Never
oVirt gerrit 11281 None None None Never

  None (edit)
Description Lee Yarwood 2013-01-22 09:04:33 EST
Description of problem:
spmStart throws a MetaDataKeyNotFoundError exception while upgrading a domain from v2 to v3 due to a partially removed volume (lv present, metadata blanked).

This is due to the following try catch block missing the MetaDataKeyNotFoundError exception from img.prepare on line 251 :

vdsm/vdsm/storage/imageRepository/formatConverter.py

~~~
250             try:
251                 for vol in img.prepare(domain.sdUUID, imgUUID):
252                     try:
253                         v3ResetMetaVolSize(vol)  # BZ#811880
254                     except qemuImg.QImgError:
255                         log.error("It is not possible to read the volume %s "
256                                   "using qemu-img, the content looks damaged",
257                                   vol.volUUID, exc_info=True)
258 
259             except se.VolumeDoesNotExist:
260                 log.error("It is not possible to prepare the image %s, the "
261                           "volume chain looks damaged", imgUUID, exc_info=True)
262 
263             finally:
264                 try:
265                     img.teardown(domain.sdUUID, imgUUID)
266                 except:
267                     log.debug("Unable to teardown the image %s, this error is "
268                               "not critical since the volume might be in use",
269                               imgUUID, exc_info=True)
~~~

Suggested fix would be to simply catch the MetaDataKeyNotFoundError and continue as we do in finally.

Example vdsm.log attached with ~12 examples of spmStart failing due to this.

I've included an example snippet below :
~~~

39985 Thread-13913::DEBUG::2013-01-22 12:53:58,448::BindingXMLRPC::171::vds::(wrapper) [10.141.149.254]
39986 Thread-13913::DEBUG::2013-01-22 12:53:58,448::task::588::TaskManager.Task::(_updateState) Task=`13697099-e662-4d6a-bd4b-0268d46c100a`::moving from state init -> state preparing
39987 Thread-13913::INFO::2013-01-22 12:53:58,449::logUtils::37::dispatcher::(wrapper) Run and protect: spmStart(spUUID='7b2ded83-73c1-412a-b7d9-765e21fb1e42', prevID=-1, prevLVER='114', recoveryMode=None, scsiFencing='false', maxHostID=      250, domVersion='3', options=None)
[..]
42691 13697099-e662-4d6a-bd4b-0268d46c100a::DEBUG::2013-01-22 12:55:36,688::formatConverter::269::Storage.v3DomainConverter::(v3DomainConverter) Unable to teardown the image 69b5106e-5d4a-4642-8be7-d5d39cfbaede, this error is not critica      l since the volume might be in use
42692 Traceback (most recent call last):
42693   File "/usr/share/vdsm/storage/imageRepository/formatConverter.py", line 265, in v3DomainConverter
42694   File "/usr/share/vdsm/storage/image.py", line 379, in teardown
42695   File "/usr/share/vdsm/storage/image.py", line 319, in getChain
42696   File "/usr/share/vdsm/storage/volume.py", line 660, in isShared
42697   File "/usr/share/vdsm/storage/volume.py", line 621, in getVolType
42698   File "/usr/share/vdsm/storage/volume.py", line 865, in getMetaParam
42699 MetaDataKeyNotFoundError: Meta Data key not found error: ("{'NONE': '##################################################################################################################################################################      #######################################################################################################################################################################################################################################      #############################################################################################################'}:VOLTYPE",)
[..]
42776 13697099-e662-4d6a-bd4b-0268d46c100a::ERROR::2013-01-22 12:55:38,929::sp::316::Storage.StoragePool::(startSpm) Unexpected error
42777 Traceback (most recent call last):
42778   File "/usr/share/vdsm/storage/sp.py", line 274, in startSpm
42779   File "/usr/share/vdsm/storage/securable.py", line 63, in wrapper
42780   File "/usr/share/vdsm/storage/sp.py", line 414, in _upgradePool
42781   File "/usr/share/vdsm/storage/sp.py", line 1033, in _convertDomain
42782   File "/usr/share/vdsm/storage/imageRepository/formatConverter.py", line 342, in convert
42783   File "/usr/share/vdsm/storage/imageRepository/formatConverter.py", line 251, in v3DomainConverter
42784   File "/usr/share/vdsm/storage/image.py", line 364, in prepare
42785   File "/usr/share/vdsm/storage/image.py", line 319, in getChain
42786   File "/usr/share/vdsm/storage/volume.py", line 660, in isShared
42787   File "/usr/share/vdsm/storage/volume.py", line 621, in getVolType
42788   File "/usr/share/vdsm/storage/volume.py", line 865, in getMetaParam
42789 MetaDataKeyNotFoundError: Meta Data key not found error: ("{'NONE': '##################################################################################################################################################################      #######################################################################################################################################################################################################################################      #############################################################################################################'}:VOLTYPE",)
42790 13697099-e662-4d6a-bd4b-0268d46c100a::ERROR::2013-01-22 12:55:38,929::sp::317::Storage.StoragePool::(startSpm) failed: Meta Data key not found error: ("{'NONE': '#####################################################################      #######################################################################################################################################################################################################################################      ##########################################################################################################################################################################################################'}:VOLTYPE",)
42791 13697099-e662-4d6a-bd4b-0268d46c100a::DEBUG::2013-01-22 12:55:38,929::sp::323::Storage.StoragePool::(_shutDownUpgrade) Shutting down upgrade process
~~~

A simple workaround to this situation is to lvrename the volume LVs that form the image and also lvchange --deltag any LV tags that they might have.

Version-Release number of selected component (if applicable):
4.9.6-44.2

How reproducible:
Always where a volume's LV is still present but it's metadata is blanked out.

Steps to Reproduce:
1. Remove a volume from engine.
2. Blank out the volumes MD block.
3. Attempt to upgrade the domain from V2 to V3.
  
Actual results:
Failure to upgrade due to the uncaught MetaDataKeyNotFoundError exception.


Expected results:
Upgrade catches the MetaDataKeyNotFoundError exception and completes successfully.

Additional info:
For the original customer this all started with a  failure in 3.0 to cleanly remove this image/volume, something I will look into shortly with data from older cases.
Comment 2 Lee Yarwood 2013-01-22 09:20:21 EST
Suggested patch : http://gerrit.ovirt.org/11281
Comment 11 Haim 2013-03-18 15:07:20 EDT
verified in sf10.1.

[root@puma32 ~]# hexdump /dev/c9b67373-61df-4bd8-adc5-18763468f09f/metadata 
0000000 0000 0000 0000 0000 0000 0000 0000 0000
*

[root@puma32 ~]# dd if=/dev/zero of=/dev/c9b67373-61df-4bd8-adc5-18763468f09f/metadata bs=1M count=512

issued ypgrade from v1>>v2>>v3

2013-03-18 09:02:25,246 INFO  [org.ovirt.engine.core.bll.storage.UpdateStoragePoolCommand] (pool-3-thread-46) [1dcdde66] Updating storage domain c9b67373-61df-4bd8-adc5-18763468f09f (type Master) to format V3
2013-03-18 09:02:25,250 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.UpgradeStoragePoolVDSCommand] (pool-3-thread-46) [1dcdde66] START, UpgradeStoragePoolVDSCommand( storagePoolId = 407c0f8b-67fb-4cec-b873-1a01092630a3, poolVersion = 3), log id: 1a68c870
2013-03-18 09:02:25,250 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.UpgradeStoragePoolVDSCommand] (pool-3-thread-46) [1dcdde66] Upgrading storage pool 407c0f8b-67fb-4cec-b873-1a01092630a3 to version 3
2013-03-18 09:02:29,561 INFO  [org.ovirt.engine.core.bll.LoginUserCommand] (ajp-/127.0.0.1:8702-3) Running command: LoginUserCommand internal: false.
2013-03-18 09:02:29,571 WARN  [org.ovirt.engine.core.bll.GetConfigurationValueQuery] (ajp-/127.0.0.1:8702-3) calling GetConfigurationValueQuery (ApplicationMode) with null version, using default general for version
2013-03-18 09:02:29,571 WARN  [org.ovirt.engine.core.bll.GetConfigurationValueQuery] (ajp-/127.0.0.1:8702-3) calling GetConfigurationValueQuery (VdcVersion) with null version, using default general for version
2013-03-18 09:02:47,728 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.UpgradeStoragePoolVDSCommand] (pool-3-thread-46) [1dcdde66] FINISH, UpgradeStoragePoolVDSCommand, log id: 1a68c870
2013-03-18 09:03:11,994 INFO  [org.ovirt.engine.core.bll.storage.UpdateStoragePoolCommand] (pool-3-thread-50) [1de44d] Running command: UpdateStoragePoolCommand internal: false. Entities affected :  ID: 407c0f8b-67fb-4cec-b873-1a01092630a3 Type: StoragePool
2013-03-18 09:03:12,001 INFO  [org.ovirt.engine.core.bll.storage.UpdateStoragePoolCommand] (pool-3-thread-50) [1de44d] Updating storage domain c9b67373-61df-4bd8-adc5-18763468f09f (type Master) to format V3
2013-03-18 09:03:12,004 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.UpgradeStoragePoolVDSCommand] (pool-3-thread-50) [1de44d] START, UpgradeStoragePoolVDSCommand( storagePoolId = 407c0f8b-67fb-4cec-b873-1a01092630a3, poolVersion = 3), log id: 6f35e18b
2013-03-18 09:03:12,004 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.UpgradeStoragePoolVDSCommand] (pool-3-thread-50) [1de44d] Upgrading storage pool 407c0f8b-67fb-4cec-b873-1a01092630a3 to version 3
2013-03-18 09:03:12,109 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.UpgradeStoragePoolVDSCommand] (pool-3-thread-50) [1de44d] FINISH, UpgradeStoragePoolVDSCommand, log id: 6f35e18b
2013-03-18 09:03:29,574 INFO  [org.ovirt.engine.core.bll.LoginUserCommand] (ajp-/127.0.0.1:8702-8) Running command: LoginUserCommand internal: false.
Comment 12 Itamar Heim 2013-06-11 05:19:11 EDT
3.2 has been released
Comment 13 Itamar Heim 2013-06-11 05:42:46 EDT
3.2 has been released

Note You need to log in before you can comment on or make changes to this bug.