Bug 924834 - Export domain attached to the wrong pool.
Summary: Export domain attached to the wrong pool.
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 3.2.0
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: ---
: 3.2.0
Assignee: Maor
QA Contact: Haim
URL:
Whiteboard: storage
: 924835 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-03-22 15:24 UTC by Jiri Belka
Modified: 2016-02-10 20:23 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-03-29 20:59:15 UTC
oVirt Team: Storage
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
vdsm.log (1.21 MB, application/x-tar)
2013-03-22 15:24 UTC, Jiri Belka
no flags Details
engine.log (343.10 KB, application/x-xz)
2013-03-25 08:13 UTC, Jiri Belka
no flags Details
All engine logs (3.30 MB, application/x-gzip)
2013-03-25 12:13 UTC, Eduardo Warszawski
no flags Details

Description Jiri Belka 2013-03-22 15:24:17 UTC
Created attachment 714607 [details]
vdsm.log

Description of problem:

Cannot export a VM with SF11. In vdsm.log there's reference to path which does not exist:

5ef86a24-283b-4436-9396-7c511e20a2b2::ERROR::2013-03-22 16:07:01,178::image::560::Storage.Image::(_createTargetImage) Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/image.py", line 542, in _createTargetImage
    srcVolUUID=volParams['parent'])
  File "/usr/share/vdsm/storage/fileSD.py", line 285, in createVolume
    volUUID, desc, srcImgUUID, srcVolUUID)
  File "/usr/share/vdsm/storage/volume.py", line 415, in create
    imgPath = image.Image(repoPath).create(sdUUID, imgUUID)
  File "/usr/share/vdsm/storage/image.py", line 124, in create
    os.mkdir(imageDir)
OSError: [Errno 2] No such file or directory: '/rhev/data-center/ffec9aa4-692c-11e2-9e91-001a4a013f3a/131d564c-52d1-4bba-8d60-39e889a8bc08/images/6aff0cf7-c11e-4ce6-863b-dcf2fa5fb387'

See '/rhev/data-center/ffec9aa4-692c-11e2-9e91-001a4a013f3a/131d564c-52d1-4bba-8d60-39e889a8bc08/images/6aff0cf7-c11e-4ce6-863b-dcf2fa5fb387'

The UUID 'ffec9aa4-692c-11e2-9e91-001a4a013f3a' is not UUID of my DC at all, the UUID is poolui.

# zcat /tmp/vdsm.log.gz | grep  =ffec9aa4-692c-11e2-9e91-001a4a013f3a
Thread-7570::DEBUG::2013-03-22 16:07:00,929::persistentDict::234::Storage.PersistentDict::(refresh) read lines (FileMetadataRW)=['CLASS=Backup', 'DESCRIPTION=str02-nfs-export', 'IOOPTIMEOUTSEC=1', 'LEASERETRIES=3', 'LEASETIMESEC=5', 'LOCKPOLICY=', 'LOCKRENEWALINTERVALSEC=5', 'MASTER_VERSION=0', 'POOL_UUID=ffec9aa4-692c-11e2-9e91-001a4a013f3a', 'REMOTE_PATH=10.34.63.204:/mnt/export/nfs/export', 'ROLE=Regular', 'SDUUID=131d564c-52d1-4bba-8d60-39e889a8bc08', 'TYPE=NFS', 'VERSION=0', '_SHA_CKSUM=30fbc3acc41aa41401319d58fd79115755a81d95']

Version-Release number of selected component (if applicable):
sf11

vdsm-xmlrpc-4.10.2-12.0.el6ev.noarch
vdsm-4.10.2-12.0.el6ev.x86_64
vdsm-cli-4.10.2-12.0.el6ev.noarch
vdsm-python-4.10.2-12.0.el6ev.x86_64

Red Hat Enterprise Linux Server release 6.4 (Santiago)


How reproducible:
100%

Steps to Reproduce:
1. have a vm
2. try to export
3.
  
Actual results:
export fails

Expected results:
should work

Additional info:
export is nfs

10.34.63.199:/jb01 on /rhev/data-center/mnt/10.34.63.199:_jb01 type nfs (rw,soft,nosharecache,timeo=600,retrans=6,nfsvers=3,addr=10.34.63.199)
10.34.63.204:/mnt/export/nfs/export on /rhev/data-center/mnt/10.34.63.204:_mnt_export_nfs_export type nfs (rw,soft,nosharecache,timeo=600,retrans=6,nfsvers=3,addr=10.34.63.204)
10.34.63.204:/home/iso/shared on /rhev/data-center/mnt/10.34.63.204:_home_iso_shared type nfs (rw,soft,nosharecache,timeo=600,retrans=6,nfsvers=3,addr=10.34.63.204)

Comment 1 Haim 2013-03-22 16:32:39 UTC
please attach engine log as well.

Comment 2 Haim 2013-03-22 16:40:27 UTC
Jiri, 

I suspect something is wrong with the export path, can you please execute the following commands on the hypervisor:

tree /rhev/data-center/ffec9aa4-692c-11e2-9e91-001a4a013f3a

ls -l /rhev/data-center/ffec9aa4-692c-11e2-9e91-001a4a013f3a/131d564c-52d1-4bba-8d60-39e889a8bc08/images/62787b06-d836-4e14-afc7-023e08aeee96

Comment 4 Jiri Belka 2013-03-25 08:13:05 UTC
Created attachment 715897 [details]
engine.log

Comment 5 Jiri Belka 2013-03-25 08:16:51 UTC
[root@dell-r210ii-03 ~]# tree /rhev/data-center/ffec9aa4-692c-11e2-9e91-001a4a013f3a/rhev/data-center/ffec9aa4-692c-11e2-9e91-001a4a013f3a [error opening dir]

0 directories, 0 files
[root@dell-r210ii-03 ~]# ls -l /rhev/data-center/ffec9aa4-692c-11e2-9e91-001a4a013f3a/131d564c-52d1-4bba-8d60-39e889a8bc08/images/62787b06-d836-4e14-afc7-023e08aeee96
ls: cannot access /rhev/data-center/ffec9aa4-692c-11e2-9e91-001a4a013f3a/131d564c-52d1-4bba-8d60-39e889a8bc08/images/62787b06-d836-4e14-afc7-023e08aeee96: No such file or directory

Comment 6 Eduardo Warszawski 2013-03-25 09:38:20 UTC
*** Bug 924835 has been marked as a duplicate of this bug. ***

Comment 7 Eduardo Warszawski 2013-03-25 10:02:32 UTC
The export domain 131d564c-52d1-4bba-8d60-39e889a8bc08 belongs to pool ffec9aa4-692c-11e2-9e91-001a4a013f3a.

In spite of this pool, a05c6f22-2a40-4f39-a2a8-aa91b539b217 metadata marks this export domain as part of the pool.

After extensive search of _all_ the vdsm logs in the host, can't be determined which failed operation left the export domain in such state.

########################################################################
Thread-155232::INFO::2013-03-25 08:47:49,369::fileSD::302::Storage.StorageDomain::(validate) sdUUID=131d564c-52d1-4bba-8d60-39e889a8bc08
Thread-155232::DEBUG::2013-03-25 08:47:49,371::persistentDict::234::Storage.PersistentDict::(refresh) read lines (FileMetadataRW)=['CLASS=Backup', 'DESCRIPTION=str02-nfs-export', 'IOOPTIMEOUTSEC=1', 'LEASERETRIES=3', 'LEASETIMESEC=5', 'LOCKPOLICY=', 'LOCKRENEWALINTERVALSEC=5', 'MASTER_VERSION=0', 'POOL_UUID=ffec9aa4-692c-11e2-9e91-001a4a013f3a', 'REMOTE_PATH=10.34.63.204:/mnt/export/nfs/export', 'ROLE=Regular', 'SDUUID=131d564c-52d1-4bba-8d60-39e889a8bc08', 'TYPE=NFS', 'VERSION=0', '_SHA_CKSUM=30fbc3acc41aa41401319d58fd79115755a81d95']

Pool a05c6f22-2a40-4f39-a2a8-aa91b539b217 metadata:
CLASS=Data
DESCRIPTION=str03-jb01-data
IOOPTIMEOUTSEC=10
LEASERETRIES=3
LEASETIMESEC=60
LOCKPOLICY=
LOCKRENEWALINTERVALSEC=5
MASTER_VERSION=1
POOL_DESCRIPTION=Default
POOL_DOMAINS=131d564c-52d1-4bba-8d60-39e889a8bc08:Active,a7e5f59c-2877-475b-8afc-f760ba63defb:Active,cc4d884d-15d9-4e35-b869-4330245c1b94:Active
POOL_SPM_ID=2
POOL_SPM_LVER=37
POOL_UUID=a05c6f22-2a40-4f39-a2a8-aa91b539b217
REMOTE_PATH=10.34.63.199:/jb01
ROLE=Master
SDUUID=cc4d884d-15d9-4e35-b869-4330245c1b94
TYPE=NFS
VERSION=3
_SHA_CKSUM=2d71c0a6c4450b8476bbbd557a7f90df2bba21a3

Comment 8 Eduardo Warszawski 2013-03-25 12:13:26 UTC
Created attachment 715965 [details]
All engine logs

Comment 9 Eduardo Warszawski 2013-03-25 12:17:47 UTC
In the engine logs, failed attempts of attach the export were found.
In addition, recurrent attempts to activate this export found too. They failed.
Probably the fact that the sdUUID of the export is part of the pool, in spite the failed attaches is consequence of a wrong reconstruct master.

Comment 10 Eduardo Warszawski 2013-03-25 12:23:22 UTC
Pool and domain metadata should be manually corrected in order to use this export.

Comment 11 Ayal Baron 2013-03-27 10:12:07 UTC
(In reply to comment #10)
> Pool and domain metadata should be manually corrected in order to use this
> export.

Domain was probably force detached from the pool.
This is a known design issue and will be taken care of once we get rid of the export domain entirely.

Comment 12 Haim 2013-03-27 10:39:59 UTC
(In reply to comment #11)
> (In reply to comment #10)
> > Pool and domain metadata should be manually corrected in order to use this
> > export.
> 
> Domain was probably force detached from the pool.
> This is a known design issue and will be taken care of once we get rid of
> the export domain entirely.

I don't think so, we need to understand how system got to a state where export domain metadata contains wrong pool, export domain status is up and active, hence export process starts and fails.
moving back to assigned, scrubbing is needed from engine side.

Comment 13 Ayal Baron 2013-03-27 10:43:25 UTC
(In reply to comment #12)
> (In reply to comment #11)
> > (In reply to comment #10)
> > > Pool and domain metadata should be manually corrected in order to use this
> > > export.
> > 
> > Domain was probably force detached from the pool.
> > This is a known design issue and will be taken care of once we get rid of
> > the export domain entirely.
> 
> I don't think so, we need to understand how system got to a state where
> export domain metadata contains wrong pool, export domain status is up and
> active, hence export process starts and fails.
> moving back to assigned, scrubbing is needed from engine side.

Ack, I missed the part where it became up on engine.

Comment 14 Jiri Belka 2013-03-28 10:49:13 UTC
The comment #10 states true, POOL_UUID in export domain metadata was not my storage pool id. I don't know how exactly it has happened (comment #11 + comment #12) but truth is, this export domain is "shared" by our team. Right now I force detached the export domain as I could not remove it, manually removed id from POOL_UUID and _SHA_CKSUM and reimported the export domain again. I can import from the domain now.

If you think this was caused by PEBKAC issue, feel free to close the BZ.

Comment 15 Haim 2013-03-29 20:59:15 UTC
closing for now (smells like manual intervention + modification of MD was in place).
feel free to re-open if issue reproduces.


Note You need to log in before you can comment on or make changes to this bug.