Bug 1390072
Summary: | Stopping a stateless VM does not erase state snapshot | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [oVirt] ovirt-engine | Reporter: | Barak Korren <bkorren> | ||||||
Component: | BLL.Storage | Assignee: | Allon Mureinik <amureini> | ||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Avihai <aefrat> | ||||||
Severity: | medium | Docs Contact: | |||||||
Priority: | low | ||||||||
Version: | 4.0.4.4 | CC: | ahadas, bkorren, bugs, gklein, mgoldboi, pzhukov, ratamir, tjelinek | ||||||
Target Milestone: | ovirt-4.1.0-alpha | Flags: | rule-engine:
ovirt-4.1+
rule-engine: planning_ack+ amureini: devel_ack+ ratamir: testing_ack+ |
||||||
Target Release: | 4.1.0.2 | ||||||||
Hardware: | x86_64 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2017-02-01 14:33:41 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Barak Korren
2016-10-31 07:47:07 UTC
Arik, can you please decscribe the current behavior when exactly it's removed? Only on next VM start? (In reply to Michal Skrivanek from comment #1) > Arik, can you please decscribe the current behavior when exactly it's > removed? Only on next VM start? We try to remove the snapshot when we handle a stateless VM that went down. As a fallback we also try to remove it when we see a VM has stateless snapshot when it is being started - that should almost never happen. We do this since the storage pool might be down when the VM went down and therefore we cannot remove the snapshot on the first attempt. Barak, can you please provide the engine log? Created attachment 1218012 [details]
engine.log.xz
Added engine.log from a reproducing system (Running on Lago BTW).
One can see the 'fill-pool-4' VM is filling up the storage until it gets paused, then when it gets shut down the snapshot stays there and no further stateless VMs can be started because the storage is full.
What happens is this: - the storage is filled up completely so there is no space left on it - when the stateless snapshot should be deleted, there is a validation which checks if the disk size is smaller or not than a critical size (which is configured while creating the storage domain as "Critical Space Action Blocker (GB)" and by default it is 5GB) - if the available space is smaller than this 5GB, than the storage action is aborted (this happens to you: Validation of action 'RestoreAllSnapshots' failed for user SYSTEM. Reasons: VAR__ACTION__REVERT_TO,VAR__TYPE__SNAPSHOT,ACTION_TYPE_FAILED_DISK_SPACE_LOW_ON_STORAGE_DOMAIN,$storageName iscsi_small ) - you should be able to fix this by going to storage tab in webadmin, click "manage domain", than in "advanced" set 0 for "critical space action blocker" and hope in the best. But keep in mind that setting it to 0 is normally not a good idea. Could you please confirm that this steps help? (In reply to Tomas Jelinek from comment #4) I wonder if that validation of the remaining space in the storage domain (allDomainsWithinThresholds) is really needed when restoring the active snapshot of a stateless VM. At the end of the process we will surely be with more free space than we had before, but I don't know if during the snapshot removal we rely on having more free space that the define threshold (this validation was added as part of I4cfc89 which seems to be mostly a refactoring, so not sure it is intentional). Therefore, moving it to storage. Created attachment 1218887 [details] new engine.log (In reply to Tomas Jelinek from comment #4) > - you should be able to fix this by going to storage tab in webadmin, click > "manage domain", than in "advanced" set 0 for "critical space action > blocker" and hope in the best. But keep in mind that setting it to 0 is > normally not a good idea. Hi, I've reduced it to 2% before b/c the storage was just too small to have engine react on 5%. Reducing in further to 0% now does not seem to help any more, the storage is just too full at this point... Nevertheless, removing a snapshot should not require any more space, so I should be able to do it. I'm adding an update log that should include to change to 0% and what happens after it... The fix for this issue should be included in oVirt 4.1.0 beta 1 released on December 1st. If not included please move back to modified. verified at build 4.1.0-0.2.master.20161210231201.git26a385e.el7.centos Scenario : 1) small storage domain (3GB available storage)+ Critical Space Action Blocker (GB) = 0 . 2) created stateless VM from template (1G size) with thin provisioned disk 3) written 5GB to disk until VM is stateless 4) shutdown the VM I saw that stateless snapshot was restored successfully . From engine log: 2016-12-13 16:53:14,447+02 INFO [org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCallback] (DefaultQuartzScheduler2) [d4bcc7b] Command 'RestoreAllSnapshots' id: 'ac83bc83-6228-4fc8-83d6-b0288c2bfd7a' child commands '[bc15788f-f 237-4feb-8c4d-d6bd892df2b4]' executions were completed, status 'SUCCEEDED' 2016-12-13 16:53:15,467+02 INFO [org.ovirt.engine.core.bll.snapshots.RestoreAllSnapshotsCommand] (DefaultQuartzScheduler4) [d4bcc7b] Ending command 'org.ovirt.engine.core.bll.snapshots.RestoreAllSnapshotsCommand' successfully. 2016-12-13 16:53:15,491+02 INFO [org.ovirt.engine.core.bll.snapshots.RestoreFromSnapshotCommand] (DefaultQuartzScheduler4) [d4bcc7b] Ending command 'org.ovirt.engine.core.bll.snapshots.RestoreFromSnapshotCommand' successfully. |