Bug 722699

Summary: Failed to attach Storage due to an error on the Data Center master Storage Domain.
Product: Red Hat Enterprise Linux 6 Reporter: Jaroslav Henner <jhenner>
Component: vdsmAssignee: Federico Simoncelli <fsimonce>
Status: CLOSED INSUFFICIENT_DATA QA Contact: yeylon <yeylon>
Severity: high Docs Contact:
Priority: unspecified    
Version: 6.1CC: abaron, bazulay, danken, iheim, oramraz, srevivo, ykaul
Target Milestone: rcKeywords: Regression
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-07-22 16:54:24 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
logs. The error happened arround 19:18
none
vdsm.log.5.xz
none
spm-lock.log.xz none

Description Jaroslav Henner 2011-07-16 18:19:47 UTC
Created attachment 513494 [details]
logs. The error happened arround 19:18

Description of problem:
Sometimes, when REST-API automatic tests tries to attach the export domain, I get the error stated in subject and the export fails to get attached. Then, after some time I can attach it in the Admin Portal with no problems.

Version-Release number of selected component (if applicable):
rhevm - ic130
vdsm-4.9-81.el6.x86_64

How reproducible:
40%

Steps to Reproduce:
1. Have a two hosts, one datacenter, one NFS data storage
2. Import the NFS export (on another server than the data domain).
3. I tried make the test wait for more than one minute here which made no improvement.
4. Attach it to the DC.
  
Actual results:
Error message

Expected results:
Export domain attached.

Additional info:
May be related to #722649.

Comment 3 Dan Kenigsberg 2011-07-17 16:32:03 UTC
Please provide more complete logs (I would not object to see everything since the creation of the pool), and more information on your setup (how many hosts? did we have spm contention?). The error 

670487f2-a9f9-4fd8-a6f4-06d2c73fb047::DEBUG::2011-07-16 19:19:22,928::safelease::61::Storage.Misc.excCmd::(acquire) FAILED: <err> = ''; <rc> = 1

reminds me too much of bug 718483, so this might be a dup.

Comment 4 Federico Simoncelli 2011-07-18 09:11:51 UTC
Created attachment 513583 [details]
vdsm.log.5.xz

Dan, it's not related to bug 718483, the sdUUID is correct in the metadata:


670487f2-a9f9-4fd8-a6f4-06d2c73fb047::DEBUG::2011-07-16 19:16:08,927::persistentDict::204::Storage.PersistentDict::(refresh) read lines (FileMetadataRW)=['CLASS=Data', 'DESCRIPTION=Migrations_DD', 'IOOPTIMEOUTSEC=10', 'LEASERETRIES=3', 'LEASETIMESEC=60', 'LOCKPOLICY=', 'LOCKRENEWALINTERVALSEC=5', 'MASTER_VERSION=1', 'POOL_DESCRIPTION=Migrations_DC', 'POOL_DOMAINS=48dd9530-0fa4-41f0-9d93-34fa503e4554:Active', 'POOL_SPM_ID=-1', 'POOL_SPM_LVER=0', 'POOL_UUID=e8369b94-71a7-4c64-b430-4e1c8b7081a1', 'REMOTE_PATH=10.34.63.204:/mnt/export/nfs/10/nfs02', 'ROLE=Master', 'SDUUID=48dd9530-0fa4-41f0-9d93-34fa503e4554', 'TYPE=NFS', 'VERSION=0', '_SHA_CKSUM=918203fcfe3f66d4358963add586763899489d0f']

670487f2-a9f9-4fd8-a6f4-06d2c73fb047::DEBUG::2011-07-16 19:16:08,929::safelease::61::Storage.Misc.excCmd::(acquire) '/usr/bin/sudo -n /usr/bin/setsid /usr/bin/ionice -c1 -n0 /bin/su vdsm -s /bin/sh -c "/usr/libexec/vdsm/spmprotect.sh start 48dd9530-0fa4-41f0-9d93-34fa503e4554 1 5 /rhev/data-center/mnt/10.34.63.204:_mnt_export_nfs_10_nfs02/48dd9530-0fa4-41f0-9d93-34fa503e4554/dom_md/leases 60000 10000 3"' (cwd /usr/libexec/vdsm/)

Comment 5 Federico Simoncelli 2011-07-18 09:15:03 UTC
Created attachment 513584 [details]
spm-lock.log.xz

The spm log might have interesting information:

[2011-07-16 19:14:14] Trying to acquire lease - spUUID=48dd9530-0fa4-41f0-9d93-34fa503e4554 lease_file=/rhev/data-center/mnt/10.34.63.204:_mnt_export_nfs_10_nfs02/48dd9530-0fa4-41f0-9d93-34fa503e4554/dom_md/leases id=1000 lease_time_ms=60000 io_op_to_ms=10000
[...]
[2011-07-16 19:14:34] Stopping lease for pool: 48dd9530-0fa4-41f0-9d93-34fa503e4554 pgrps: -2981
[...]
[2011-07-16 19:19:22] Acquire failed for spUUID=48dd9530-0fa4-41f0-9d93-34fa503e4554 id=1 lease_path=/rhev/data-center/mnt/10.34.63.204:_mnt_export_nfs_10_nfs02/48dd9530-0fa4-41f0-9d93-34fa503e4554/dom_md/leases

Comment 7 Jaroslav Henner 2011-07-18 17:34:57 UTC
Added #722417 to see also, because it may be related.

Comment 8 Federico Simoncelli 2011-07-22 16:54:24 UTC
The bug didn't reproduce again and according to what I found in the logs the issue might have been temporary, eg: storage issue.
Please re-open when you have more information.