Bug 843432

Summary: 3.1 - beta1 - Create export storage domain with V1 options fails: KeyError: 'SDUUID
Product: Red Hat Enterprise Linux 6 Reporter: Haim <hateya>
Component: vdsmAssignee: Federico Simoncelli <fsimonce>
Status: CLOSED WORKSFORME QA Contact: Haim <hateya>
Severity: high Docs Contact:
Priority: urgent    
Version: 6.3CC: abaron, bazulay, cpelland, iheim, lpeer, yeylon, ykaul
Target Milestone: betaKeywords: Regression
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard: storage
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-08-02 08:49:58 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
vdsm log none

Description Haim 2012-07-26 10:11:13 UTC
Description of problem:

try to create a V1 export NFS storage domain, command fails on the following:

Thread-5010::DEBUG::2012-07-26 16:06:55,612::fileSD::105::Storage.StorageDomain::(__init__) Reading domain in path /rhev/data-center/mnt/orion.qa.lab.tlv.redhat.com:_export_hateya_hateya3/35931093-c0fc-4150-abc4-19e7dd8ea059
Thread-5010::DEBUG::2012-07-26 16:06:55,612::persistentDict::185::Storage.PersistentDict::(__init__) Created a persistant dict with FileMetadataRW backend
Thread-5010::DEBUG::2012-07-26 16:06:55,613::persistentDict::226::Storage.PersistentDict::(refresh) read lines (FileMetadataRW)=[]
Thread-5010::WARNING::2012-07-26 16:06:55,613::persistentDict::248::Storage.PersistentDict::(refresh) data has no embedded checksum - trust it as it is
Thread-5010::ERROR::2012-07-26 16:06:55,614::task::853::TaskManager.Task::(_setError) Task=`0e958538-bfa9-4a76-84c8-facdc3d1883e`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 861, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/logUtils.py", line 38, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 2143, in createStorageDomain
    typeSpecificArg, storageType, domVersion)
  File "/usr/share/vdsm/storage/nfsSD.py", line 90, in create
    fsd = cls(os.path.join(mntPoint, sdUUID))
  File "/usr/share/vdsm/storage/fileSD.py", line 111, in __init__
    sdUUID = metadata[sd.DMDK_SDUUID]
  File "/usr/share/vdsm/storage/persistentDict.py", line 85, in __getitem__
    return dec(self._dict[key])
  File "/usr/share/vdsm/storage/persistentDict.py", line 195, in __getitem__
    raise KeyError(key)
KeyError: 'SDUUID'
Thread-5010::DEBUG::2012-07-26 16:06:55,615::task::872::TaskManager.Task::(_run) Task=`0e958538-bfa9-4a76-84c8-facdc3d1883e`::Task._run: 0e958538-bfa9-4a76-84c8-facdc3d1883e (1, '35931093-c0fc-4150-abc4-19e7dd8ea059', 'kaka', 'orion.qa.lab.tlv.redhat.com:/export/hateya/hateya3', 1, '0') {} failed - stopping task
Thread-5010::DEBUG::2012-07-26 16:06:55,615::task::1199::TaskManager.Task::(stop) Task=`0e958538-bfa9-4a76-84c8-facdc3d1883e`::stopping in state preparing (force False)
Thread-5010::DEBUG::2012-07-26 16:06:55,616::task::978::TaskManager.Task::(_decref) Task=`0e958538-bfa9-4a76-84c8-facdc3d1883e`::ref 1 aborting True
Thread-5010::INFO::2012-07-26 16:06:55,616::task::1157::TaskManager.Task::(prepare) Task=`0e958538-bfa9-4a76-84c8-facdc3d1883e`::aborting: Task is aborted: u"'SDUUID'" - code 100
Thread-5010::DEBUG::2012-07-26 16:06:55,617::task::1162::TaskManager.Task::(prepare) Task=`0e958538-bfa9-4a76-84c8-facdc3d1883e`::Prepare: aborted: 'SDUUID'
Thread-5010::DEBUG::2012-07-26 16:06:55,617::task::978::TaskManager.Task::(_decref) Task=`0e958538-bfa9-4a76-84c8-facdc3d1883e`::ref 0 aborting True
Thread-5010::DEBUG::2012-07-26 16:06:55,618::task::913::TaskManager.Task::(_doAbort) Task=`0e958538-bfa9-4a76-84c8-facdc3d1883e`::Task._doAbort: force False
Thread-5010::DEBUG::2012-07-26 16:06:55,618::resourceManager::844::ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {}
Thread-5010::DEBUG::2012-07-26 16:06:55,619::task::588::TaskManager.Task::(_updateState) Task=`0e958538-bfa9-4a76-84c8-facdc3d1883e`::moving from state preparing -> state aborting
Thread-5010::DEBUG::2012-07-26 16:06:55,619::task::537::TaskManager.Task::(__state_aborting) Task=`0e958538-bfa9-4a76-84c8-facdc3d1883e`::_aborting: recover policy none
Thread-5010::DEBUG::2012-07-26 16:06:55,619::task::588::TaskManager.Task::(_updateState) Task=`0e958538-bfa9-4a76-84c8-facdc3d1883e`::moving from state aborting -> state failed
Thread-5010::DEBUG::2012-07-26 16:06:55,620::resourceManager::809::ResourceManager.Owner::(releaseAll) Owner.releaseAll requests {} resources {}
Thread-5010::DEBUG::2012-07-26 16:06:55,620::resourceManager::844::ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {}
Thread-5010::ERROR::2012-07-26 16:06:55,621::dispatcher::69::Storage.Dispatcher.Protect::(run) 'SDUUID'
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/dispatcher.py", line 61, in run
    result = ctask.prepare(self.func, *args, **kwargs)
  File "/usr/share/vdsm/storage/task.py", line 1164, in prepare
    raise self.error
KeyError: 'SDUUID'


domain structure was created, and looks as follows (snip from metadata):

CLASS=Data
DESCRIPTION=kaka
IOOPTIMEOUTSEC=1
LEASERETRIES=3
LEASETIMESEC=5
LOCKPOLICY=
LOCKRENEWALINTERVALSEC=5
POOL_UUID=
REMOTE_PATH=orion.qa.lab.tlv.redhat.com:/export/hateya/hateya3
ROLE=Regular
SDUUID=35931093-c0fc-4150-abc4-19e7dd8ea059
TYPE=NFS
VERSION=0
_SHA_CKSUM=2abf147566178bd7cfe0ffffe20e4f1676112d20

Thread logs:

Comment 1 Haim 2012-07-26 10:13:41 UTC
Created attachment 600474 [details]
vdsm log

Comment 5 Ayal Baron 2012-08-01 21:26:01 UTC
Haim, looks to me like your NFS is messed up.
Seems like we wrote to disk but when reading the data is not there (see below).
It also seems to me like the same problems you're having with BZ#843511

Thread-5010::DEBUG::2012-07-26 16:06:55,571::persistentDict::168::Storage.PersistentDict::(transaction) Flushing changes
Thread-5010::DEBUG::2012-07-26 16:06:55,571::persistentDict::287::Storage.PersistentDict::(flush) about to write lines (FileMetadataRW)=['CLASS=Data', 'DESCR
IPTION=kaka', 'IOOPTIMEOUTSEC=1', 'LEASERETRIES=3', 'LEASETIMESEC=5', 'LOCKPOLICY=', 'LOCKRENEWALINTERVALSEC=5', 'POOL_UUID=', 'REMOTE_PATH=orion.qa.lab.tlv.
redhat.com:/export/hateya/hateya3', 'ROLE=Regular', 'SDUUID=35931093-c0fc-4150-abc4-19e7dd8ea059', 'TYPE=NFS', 'VERSION=0', '_SHA_CKSUM=2abf147566178bd7cfe0f
fffe20e4f1676112d20']
Thread-5010::DEBUG::2012-07-26 16:06:55,610::persistentDict::170::Storage.PersistentDict::(transaction) Finished transaction
Thread-5010::DEBUG::2012-07-26 16:06:55,612::fileSD::105::Storage.StorageDomain::(__init__) Reading domain in path /rhev/data-center/mnt/orion.qa.lab.tlv.red
hat.com:_export_hateya_hateya3/35931093-c0fc-4150-abc4-19e7dd8ea059
Thread-5010::DEBUG::2012-07-26 16:06:55,612::persistentDict::185::Storage.PersistentDict::(__init__) Created a persistant dict with FileMetadataRW backend
Thread-5010::DEBUG::2012-07-26 16:06:55,613::persistentDict::226::Storage.PersistentDict::(refresh) read lines (FileMetadataRW)=[]

Comment 6 Haim 2012-08-02 08:49:58 UTC
(In reply to comment #5)
> Haim, looks to me like your NFS is messed up.
> Seems like we wrote to disk but when reading the data is not there (see
> below).
> It also seems to me like the same problems you're having with BZ#843511
> 
> Thread-5010::DEBUG::2012-07-26
> 16:06:55,571::persistentDict::168::Storage.PersistentDict::(transaction)
> Flushing changes
> Thread-5010::DEBUG::2012-07-26
> 16:06:55,571::persistentDict::287::Storage.PersistentDict::(flush) about to
> write lines (FileMetadataRW)=['CLASS=Data', 'DESCR
> IPTION=kaka', 'IOOPTIMEOUTSEC=1', 'LEASERETRIES=3', 'LEASETIMESEC=5',
> 'LOCKPOLICY=', 'LOCKRENEWALINTERVALSEC=5', 'POOL_UUID=',
> 'REMOTE_PATH=orion.qa.lab.tlv.
> redhat.com:/export/hateya/hateya3', 'ROLE=Regular',
> 'SDUUID=35931093-c0fc-4150-abc4-19e7dd8ea059', 'TYPE=NFS', 'VERSION=0',
> '_SHA_CKSUM=2abf147566178bd7cfe0f
> fffe20e4f1676112d20']
> Thread-5010::DEBUG::2012-07-26
> 16:06:55,610::persistentDict::170::Storage.PersistentDict::(transaction)
> Finished transaction
> Thread-5010::DEBUG::2012-07-26
> 16:06:55,612::fileSD::105::Storage.StorageDomain::(__init__) Reading domain
> in path /rhev/data-center/mnt/orion.qa.lab.tlv.red
> hat.com:_export_hateya_hateya3/35931093-c0fc-4150-abc4-19e7dd8ea059
> Thread-5010::DEBUG::2012-07-26
> 16:06:55,612::persistentDict::185::Storage.PersistentDict::(__init__)
> Created a persistant dict with FileMetadataRW backend
> Thread-5010::DEBUG::2012-07-26
> 16:06:55,613::persistentDict::226::Storage.PersistentDict::(refresh) read
> lines (FileMetadataRW)=[]

looks to me like sanlock issue which is no more reproducible on SI12 with vdsm-4.9.6-24.0.el6_3.x86_64 and sanlock-2.3-3.el6_3.x86_64