Bug 1372743
| Summary: | Database transaction cancellations result in failed moveImage operations | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Julio Entrena Perez <jentrena> | |
| Component: | ovirt-engine | Assignee: | Liron Aravot <laravot> | |
| Status: | CLOSED ERRATA | QA Contact: | Avihai <aefrat> | |
| Severity: | high | Docs Contact: | ||
| Priority: | high | |||
| Version: | 3.5.7 | CC: | amureini, gklein, jentrena, laravot, lsurette, mperina, oourfali, ratamir, rbalakri, Rhev-m-bugs, srevivo, tnisan, ykaul | |
| Target Milestone: | ovirt-4.1.0-alpha | Keywords: | Reopened | |
| Target Release: | --- | |||
| Hardware: | All | |||
| OS: | Linux | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1390923 1390934 1390936 (view as bug list) | Environment: | ||
| Last Closed: | 2017-04-20 12:42:32 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
|
Comment 1
Julio Entrena Perez
2016-09-02 14:29:36 UTC
Allon, could you please take a look if related storage parts are using transactions with compensation logic correctly? Liron, please take a look. AFAIK, the relevant code no longer exists in master. Do we have a clear steps to reproduce? no easy reproducer for this one. we can cause the removeImage() call to take a long time (more than 10 minutes) to execute. Hi Liron , Same request as Raz , please provide clear scenario of how to reproduce this issue . Hi Avihai, The only relative easy way to reproduce is to cause the removeImage() call on vdsm to take a long time as specified in comment #4 (by modifying the vdsm code), check that we have failures/retries on the engine side and then to remove the code modification and check that the flow completes successfully. (In reply to Liron Aravot from comment #27) > Hi Avihai, > The only relative easy way to reproduce is to cause the removeImage() call > on vdsm to take a long time as specified in comment #4 (by modifying the > vdsm code), check that we have failures/retries on the engine side and then > to remove the code modification and check that the flow completes > successfully. Ok , is it something I can do (modifying the vdsm code) ? if so how ? Searching /usr/share/vdsm/storage/* I did not find removeImage() method .
All I found was removeImageLinks() method (see below ) , is this what I need ?
Where (what line) should I insert the time.sleep(600) ?
*grep results searching for removeImage method:
> grep -i removeImage /usr/share/vdsm/storage/*
/usr/share/vdsm/storage/blockSD.py: self.removeImageLinks(imgUUID)
Hi Avihai, Please move the SPM to maintenance, edit /usr/share/vdsm/API.py and replace the Image.delete() code with sleep(). Bug 1415407 blocks this bugs scenario : Steps to Reproduce: 1. Shutdown some VMs -> step fails due to bug 1415407 2. Move the vDisks of those VMs to another storage domain. in current build ( engine 4.1.0.3-0.1.el7 ) this bug rarely reproduce (2 failures out of 20 ) . Trying to verify verified . Engine : 4.1.0.3-0.1.el7 VDSM : 4.19.2-2 |