Bug 836149

Summary: vdsm: move of 20-30 disks will cause image corruption
Product: [Retired] oVirt Reporter: Dafna Ron <dron>
Component: vdsmAssignee: Ayal Baron <abaron>
Status: CLOSED WONTFIX QA Contact:
Severity: high Docs Contact:
Priority: unspecified    
Version: unspecifiedCC: abaron, acathrow, amureini, bazulay, dyasny, iheim, mgoldboi, rvaknin, ykaul
Target Milestone: ---   
Target Release: 3.3.4   
Hardware: x86_64   
OS: Linux   
Whiteboard: storage
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-01-30 22:51:13 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
log none

Description Dafna Ron 2012-06-28 08:45:33 UTC
Description of problem:

move of 20-30 disks from one domain to another caused image corruption  
some of the images become illegal and other images were disconnected from the pool (we cannot even delete them since we are getting error that the image is not associated with the pool). 

Version-Release number of selected component (if applicable):

vdsm-4.10.0-0.92.gitd2067b5.el6.x86_64 

How reproducible:

100%

Steps to Reproduce:
1. create several vm's with disks on a multiple domain pool
2. move some of the disks from one domain to second domain at once
3.
  
Actual results:

we are getting image corruption

Expected results:

we should be able to move that many disks at once 

Additional info: full vdsm log


Thread-15579::ERROR::2012-06-27 18:09:52,683::task::853::TaskManager.Task::(_setError) Task=`86d2ebe1-5ec6-4cbe-9b30-0d0278656895`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 861, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/logUtils.py", line 38, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 1289, in deleteImage
    self._spmSchedule(spUUID, "deleteImage", lambda : True)
  File "/usr/share/vdsm/storage/hsm.py", line 670, in _spmSchedule
    self.taskMng.scheduleJob("spm", pool.tasksDir, vars.task, name, func, *args)
  File "/usr/share/vdsm/storage/taskManager.py", line 62, in scheduleJob
    task.setPersistence(store, cleanPolicy=TaskCleanType.manual)
  File "/usr/share/vdsm/storage/task.py", line 1099, in setPersistence
    self.persist()
  File "/usr/share/vdsm/storage/task.py", line 1124, in persist
    self._save(self.store)
  File "/usr/share/vdsm/storage/task.py", line 738, in _save
    raise se.TaskDirError("_save: no such task dir '%s'" % origTaskDir)
TaskDirError: can't find/access task dir: ("_save: no such task dir '/rhev/data-center/c66c52b4-373d-493e-b644-2722d63ad6a7/mastersd/master/tasks/86d2ebe1-5ec6-4cbe-9b30-0d0278656895'",)
Thread-15579::DEBUG::2012-06-27 18:09:52,684::task::872::TaskManager.Task::(_run) Task=`86d2ebe1-5ec6-4cbe-9b30-0d0278656895`::Task._run: 86d2ebe1-5ec6-4cbe-9b30-0d0278656895 ('bb4250b5-df22-4a03-856a-0cca6ec54331', 'c66c52b4-373d-493e-b
644-2722d63ad6a7', 'be55e43c-5b20-4348-8800-8149ca13ba5a', 'false', 'false') {} failed - stopping task


Thread-4656::ERROR::2012-06-27 14:26:34,321::task::853::TaskManager.Task::(_setError) Task=`40e8b344-73a0-410d-a05a-57ae6cf042ef`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 861, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/logUtils.py", line 38, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 957, in attachStorageDomain
    pool.attachSD(sdUUID)
  File "/usr/share/vdsm/storage/securable.py", line 68, in wrapper
    return f(self, *args, **kwargs)
  File "/usr/share/vdsm/storage/sp.py", line 941, in attachSD
    dom.attach(self.spUUID)
  File "/usr/share/vdsm/storage/sd.py", line 485, in attach
    raise se.StorageDomainAlreadyAttached(pools[0], self.sdUUID)
StorageDomainAlreadyAttached: Storage domain already attached to pool: 'domain=1c4b81c0-bc2b-4b57-9341-5179ee848abd, pool=bedca9a2-89fb-11e1-9eb2-472ab56fc6f0'
Thread-4656::DEBUG::2012-06-27 14:26:34,351::task::872::TaskManager.Task::(_run) Task=`40e8b344-73a0-410d-a05a-57ae6cf042ef`::Task._run: 40e8b344-73a0-410d-a05a-57ae6cf042ef ('1c4b81c0-bc2b-4b57-9341-5179ee848abd', '7a383d50-4960-4a5b-ae39-0ff6160a8ef1') {} failed - stopping task
Thread-4656::DEBUG::2012-06-27 14:26:34,352::task::1199::TaskManager.Task::(stop) Task=`40e8b344-73a0-410d-a05a-57ae6cf042ef`::stopping in state preparing (force False)
Thread-4656::DEBUG::2012-06-27 14:26:34,352::task::978::TaskManager.Task::(_decref) Task=`40e8b344-73a0-410d-a05a-57ae6cf042ef`::ref 1 aborting True
Thread-4656::INFO::2012-06-27 14:26:34,352::task::1157::TaskManager.Task::(prepare) Task=`40e8b344-73a0-410d-a05a-57ae6cf042ef`::aborting: Task is aborted: 'Storage domain already attached to pool' - code 380
Thread-4656::DEBUG::2012-06-27 14:26:34,353::task::1162::TaskManager.Task::(prepare) Task=`40e8b344-73a0-410d-a05a-57ae6cf042ef`::Prepare: aborted: Storage domain already attached to pool
Thread-4656::DEBUG::2012-06-27 14:26:34,353::task::978::TaskManager.Task::(_decref) Task=`40e8b344-73a0-410d-a05a-57ae6cf042ef`::ref 0 aborting True
Thread-4656::DEBUG::2012-06-27 14:26:34,353::task::913::TaskManager.Task::(_doAbort) Task=`40e8b344-73a0-410d-a05a-57ae6cf042ef`::Task._doAbort: force False
Thread-4656::DEBUG::2012-06-27 14:26:34,353::resourceManager::844::ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {}

Comment 1 Dafna Ron 2012-06-28 08:50:34 UTC
Created attachment 594973 [details]
log

Comment 2 Itamar Heim 2013-01-30 22:51:13 UTC
Closing old bugs. If this issue is still relevant/important in current version, please re-open the bug.