Bug 2110186
| Summary: | Restart of ovirt-engine while LSM is running causes LSM to get stuck | ||||||
|---|---|---|---|---|---|---|---|
| Product: | [oVirt] ovirt-engine | Reporter: | Evelina Shames <eshames> | ||||
| Component: | BLL.Storage | Assignee: | Mark Kemel <mkemel> | ||||
| Status: | CLOSED NEXTRELEASE | QA Contact: | Ilia Markelov <imarkelo> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 4.5.2 | CC: | ahadas, bugs, bzlotnik, dfodor, pbar, sfishbai | ||||
| Target Milestone: | ovirt-4.5.3 | Flags: | pm-rhel:
ovirt-4.5?
|
||||
| Target Release: | --- | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | ovirt-engine-4.5.3.1 | Doc Type: | If docs needed, set a value | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2022-10-03 19:00:53 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
|
Description
Evelina Shames
2022-07-24 10:10:04 UTC
It looks like the root cause is the lock is reacquired after restart by LiveMigrateDiskCommand, this can probably be resolved by either overriding reacquireLocks and not locking again if snapshot create has already started. Or by removing the command locks entirely since their acquisition is handled in MoveDiskCommand we suspect this is not a new issue but the timing of restarting the engine during create-snapshot, which is a fairly quick operation compared to copying the disk, made us miss this before I also suggest to update the log message. The current "Failed to acquire VM lock, will retry on the next polling cycle" is a little confusing. For example in this case the actual failure is a disk acquire (exclusive lock), not a VM (shared lock). I suggest to log the exact failure as can be retrieved from the "LockingResult" received from the "acquireLock()" call in the "LiveDiskMigrateStage.LIVE_MIGRATE_DISK_EXEC_COMPLETED" phase. Just a thought :) This bug has low overall severity and is not going to be further verified by QE. If you believe special care is required, feel free to properly align relevant severity, flags and keywords to raise PM_Score or use one of the Bumps ('PrioBumpField', 'PrioBumpGSS', 'PrioBumpPM', 'PrioBumpQA') in Keywords to raise it's PM_Score above verification threashold (1000).
(In reply to Benny Zlotnik from comment #1) > It looks like the root cause is the lock is reacquired after restart by > LiveMigrateDiskCommand, this can probably be resolved by either overriding > reacquireLocks and not locking again if snapshot create has already started. > Or by removing the command locks entirely since their acquisition is handled > in MoveDiskCommand right, we chose to go with the latter This bug has low overall severity and passed an automated regression suite, and is not going to be further verified by QE. If you believe special care is required, feel free to re-open to ON_QA status. |