Bug 1913429 - Engine allows exceeding critical_space_action_blocker
Summary: Engine allows exceeding critical_space_action_blocker
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.Storage
Version: ---
Hardware: x86_64
OS: Linux
low
high
Target Milestone: ovirt-4.5.1
: ---
Assignee: Benny Zlotnik
QA Contact: Ilia Markelov
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-01-06 18:27 UTC by Donald Berry
Modified: 2022-06-23 05:54 UTC (History)
12 users (show)

Fixed In Version: ovirt-engine-4.5.1.2
Clone Of:
Environment:
Last Closed: 2022-06-23 05:54:58 UTC
oVirt Team: Storage
Embargoed:
pm-rhel: ovirt-4.5?
pm-rhel: devel_ack+


Attachments (Terms of Use)
screen sharing after steps has performed (16.98 MB, video/x-msvideo)
2021-06-10 07:56 UTC, Ritesh Chikatwar
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github oVirt ovirt-engine pull 63 0 None open core: fix space validation for MoveOrCopyDiskCommand 2022-02-09 12:40:26 UTC

Description Donald Berry 2021-01-06 18:27:02 UTC
Description of problem:
Users migrated VM disks to new storage domain vmstore2, causing it to become full. The storage domain was then deactivated by RHV:

[root@rhvm ovirt-engine]# zgrep -i vmstore2.*deactivated engine.log*
engine.log-20201218.gz:2020-12-17 08:47:30,387-05 WARN  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-500789) [4159363a] EVENT_ID: SYSTEM_DEACTIVATED_STORAGE_DOMAIN(970), Storage Domain vmstore2 (Data Center Default) was deactivated by system because it's not visible by any of the hosts.

To recover, we had to overwrite one of the VM disks to free up some space:

[root@dell-r640-01 ~]# echo 1 > /rhev/data-center/mnt/glusterSD/dell-r640-01.gluster.tamlab.rdu2.redhat.com:_vmstore2/04ad3ba4-4459-4f56-a73f-c07fcaa1617e/images/eedf329f-1f84-4845-928b-9284fbfb363c/862784b8-0512-4531-a3de-8562f23c8535

We also had to override the critical_space_action_blocker:

[root@rhvm ~]# /usr/share/ovirt-engine/dbscripts/engine-psql.sh -c "update storage_domain_static set critical_space_action_blocker = '1' where storage_name = 'vmstore2';"
UPDATE 1

We were then able to activate the SD (storage/domains/vmstore2/data center/activate) and migrate VM disks from it to another SD.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 3 Vojtech Juranek 2021-01-07 22:23:08 UTC
What is actually the issue you want to report? Deactivating SD when there's no disk space available is expected/correct behavior. Recovering SD once free disk space is available should happen automatically - I wasn't able to reproduce a situation when I have to remove anything from the DB (of course have to free some disk space). Are you able to reproduce it?

Comment 6 Eyal Shenitzky 2021-01-11 15:34:17 UTC
Donald, 

Can you please add some more information on this case?
What kind of environment you are using (hyper-converged/hosted engine etc..)?
Do you have the steps to reproduce this case?
How the storage domain is configured (create_checkpoint_xml, create_checkpoint_xml, create_checkpoint_xml)?

And if you have engine and VDSM logs please add those to the bug.

Comment 7 Donald Berry 2021-01-11 15:57:43 UTC
Hi Eyal, this is RHHI-V 1.7 (RHV 4.3), HE. You can log in to this internal env using the info in comment 1.

Vojtech says he can reproduce it, and I think to reproduce it you would just make a SD full.
The logs have been overwritten (see c2, c4).

I am saying we should not deactivate the SD when it becomes full, because then we can't migrate VM disks off of it.
I think there were some warning messages in the GUI that it was getting full, but it was not clear from those that the SD would be deactivated. Are email notifications an option?

Don

Comment 8 Marina Kalinin 2021-01-20 18:18:14 UTC
Raising insights rule flag to add an insight rule do detect storage domains that got to low space indicator.
Some more details about this parameter can be found in bz#1667783.

Comment 20 Ritesh Chikatwar 2021-06-10 07:56:34 UTC
Created attachment 1789766 [details]
screen sharing after steps has performed

Comment 23 Ritesh Chikatwar 2021-10-18 07:54:38 UTC
Closing this bug as i tried reproducing this but couldn't. Please try this with latest version once and if you encounter the same issue again feel free to re-open this case.

Comment 34 Arik 2022-05-24 13:55:56 UTC
"Specifically, there was 13gb left in a block-based SD. I created a preallocated 12gb disk and it created it without a problem and without warning" - that's the flow we are going to address

Comment 35 Shir Fishbain 2022-05-30 19:58:51 UTC
QE doesn't have the capacity to verify this bug during 4.5.1.

Comment 36 Benny Zlotnik 2022-06-13 16:00:59 UTC
For QE: The space validation for copying disks was fixed to avoid exceeding available size

As for not allowing new disks to get into the critical space blocker, it does not seem like the intended original behavior as the critical space blocker is intended to block operations when the SD is in it, not before.

Comment 38 Benny Zlotnik 2022-06-22 08:33:31 UTC
1. Try to copy a disk to a storage domain that doesn't have enough space for it - should be blocked

Comment 39 Ilia Markelov 2022-06-22 12:50:07 UTC
Verified.

Copying the disk to a storage domain that doesn't have enough space finished with an error and operation was cancelled. SD works well after it.

Versions:
engine-4.5.1.2-0.11.el8ev
vdsm-4.50.1.3-1.el8ev.x86_64

Comment 40 Sandro Bonazzola 2022-06-23 05:54:58 UTC
This bugzilla is included in oVirt 4.5.1 release, published on June 22nd 2022.
Since the problem described in this bug report should be resolved in oVirt 4.5.1 release, it has been closed with a resolution of CURRENT RELEASE.
If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.