Bug 878886

Summary: [BLOCKED] vdsm: [Scalability] when performing several storage related tasks, creation of live snapshots takes over 5 minutes in vdsm and dies in engine because of timeout
Product: [oVirt] vdsm Reporter: Dafna Ron <dron>
Component: GeneralAssignee: Liron Aravot <laravot>
Status: CLOSED EOL QA Contact: Eldad Marciano <emarcian>
Severity: high Docs Contact:
Priority: high    
Version: ---CC: acanan, amureini, bazulay, bsettle, bugs, dron, fsimonce, gklein, lpeer, mgoldboi, rbalakri, sbonazzo, scohen, yeylon
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard: storage
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-10-02 11:02:13 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Docs RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1080372, 1185830    
Bug Blocks: 882647    
Attachments:
Description Flags
logs none

Description Dafna Ron 2012-11-21 13:25:32 UTC
Created attachment 649220 [details]
logs

Description of problem:

I ran 20 live snapshot + 2 deleteImage with wipe after delete and we are getting a broken pipe error. 

Version-Release number of selected component (if applicable):

vdsm-4.9.6-44.0.el6_3.x86_64
libvirt-0.9.10-21.el6_3.6.x86_64

How reproducible:

100%

Steps to Reproduce:
1. create 20 thin provision vms and run them (make sure they are writing)
2. remove 20GB and a 10GB preallocated disks with wipe after delete 
3. create live snapshots for each of the vms 
  
Actual results:

some of the snapshots fail to be created because vdsm takes more than 5 minutes to create the snapshot and engine gets timeout for the tasks. 

Expected results:

user should be able t run multiple actions without timeout 
PLEASE NOTE: although we get a failure in the create snapshot UI is showing the snapshot as created. this might cause problem for user in chain. 

Additional info:logs

Comment 2 Allon Mureinik 2013-02-13 10:18:58 UTC
Trying to understand the threshold that kills the snapshot.
If we do not perform the deletes and wipe-after-delete, will the snapshot still fail?

Comment 3 Allon Mureinik 2013-02-13 10:19:24 UTC
Danfa, see question in comment #2.

Comment 4 Dafna Ron 2013-02-13 15:36:13 UTC
(In reply to comment #2)
> Trying to understand the threshold that kills the snapshot.
> If we do not perform the deletes and wipe-after-delete, will the snapshot
> still fail?

it might, but not with timeout error.
the problem is that wipe after delete takes a long time. 
since we have a queue for tasks, and if we have a long running task, other tasks that are started will get timeout.

Comment 8 Allon Mureinik 2014-06-24 09:08:06 UTC
Should be re-evaluated after the flow is changed in bug 1080372.

Comment 9 Sandro Bonazzola 2015-09-04 09:00:28 UTC
This is an automated message.
This Bugzilla report has been opened on a version which is not maintained anymore.
Please check if this bug is still relevant in oVirt 3.5.4.
If it's not relevant anymore, please close it (you may use EOL or CURRENT RELEASE resolution)
If it's an RFE please update the version to 4.0 if still relevant.

Comment 10 Sandro Bonazzola 2015-10-02 11:02:13 UTC
This is an automated message.
This Bugzilla report has been opened on a version which is not maintained
anymore.
Please check if this bug is still relevant in oVirt 3.5.4 and reopen if still
an issue.