Bug 1913429

Summary: Engine allows exceeding critical_space_action_blocker
Product: [oVirt] ovirt-engine Reporter: Donald Berry <dberry>
Component: BLL.StorageAssignee: Benny Zlotnik <bzlotnik>
Status: CLOSED CURRENTRELEASE QA Contact: Ilia Markelov <imarkelo>
Severity: high Docs Contact:
Priority: low    
Version: ---CC: aefrat, ahadas, bugs, bzlotnik, dfodor, eshames, godas, mkalinin, rchikatw, sfishbai, tnisan, vjuranek
Target Milestone: ovirt-4.5.1Keywords: Reopened
Target Release: ---Flags: pm-rhel: ovirt-4.5?
pm-rhel: devel_ack+
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: ovirt-engine-4.5.1.2 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-06-23 05:54:58 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
screen sharing after steps has performed none

Description Donald Berry 2021-01-06 18:27:02 UTC
Description of problem:
Users migrated VM disks to new storage domain vmstore2, causing it to become full. The storage domain was then deactivated by RHV:

[root@rhvm ovirt-engine]# zgrep -i vmstore2.*deactivated engine.log*
engine.log-20201218.gz:2020-12-17 08:47:30,387-05 WARN  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-500789) [4159363a] EVENT_ID: SYSTEM_DEACTIVATED_STORAGE_DOMAIN(970), Storage Domain vmstore2 (Data Center Default) was deactivated by system because it's not visible by any of the hosts.

To recover, we had to overwrite one of the VM disks to free up some space:

[root@dell-r640-01 ~]# echo 1 > /rhev/data-center/mnt/glusterSD/dell-r640-01.gluster.tamlab.rdu2.redhat.com:_vmstore2/04ad3ba4-4459-4f56-a73f-c07fcaa1617e/images/eedf329f-1f84-4845-928b-9284fbfb363c/862784b8-0512-4531-a3de-8562f23c8535

We also had to override the critical_space_action_blocker:

[root@rhvm ~]# /usr/share/ovirt-engine/dbscripts/engine-psql.sh -c "update storage_domain_static set critical_space_action_blocker = '1' where storage_name = 'vmstore2';"
UPDATE 1

We were then able to activate the SD (storage/domains/vmstore2/data center/activate) and migrate VM disks from it to another SD.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 3 Vojtech Juranek 2021-01-07 22:23:08 UTC
What is actually the issue you want to report? Deactivating SD when there's no disk space available is expected/correct behavior. Recovering SD once free disk space is available should happen automatically - I wasn't able to reproduce a situation when I have to remove anything from the DB (of course have to free some disk space). Are you able to reproduce it?

Comment 6 Eyal Shenitzky 2021-01-11 15:34:17 UTC
Donald, 

Can you please add some more information on this case?
What kind of environment you are using (hyper-converged/hosted engine etc..)?
Do you have the steps to reproduce this case?
How the storage domain is configured (create_checkpoint_xml, create_checkpoint_xml, create_checkpoint_xml)?

And if you have engine and VDSM logs please add those to the bug.

Comment 7 Donald Berry 2021-01-11 15:57:43 UTC
Hi Eyal, this is RHHI-V 1.7 (RHV 4.3), HE. You can log in to this internal env using the info in comment 1.

Vojtech says he can reproduce it, and I think to reproduce it you would just make a SD full.
The logs have been overwritten (see c2, c4).

I am saying we should not deactivate the SD when it becomes full, because then we can't migrate VM disks off of it.
I think there were some warning messages in the GUI that it was getting full, but it was not clear from those that the SD would be deactivated. Are email notifications an option?

Don

Comment 8 Marina Kalinin 2021-01-20 18:18:14 UTC
Raising insights rule flag to add an insight rule do detect storage domains that got to low space indicator.
Some more details about this parameter can be found in bz#1667783.

Comment 20 Ritesh Chikatwar 2021-06-10 07:56:34 UTC
Created attachment 1789766 [details]
screen sharing after steps has performed

Comment 23 Ritesh Chikatwar 2021-10-18 07:54:38 UTC
Closing this bug as i tried reproducing this but couldn't. Please try this with latest version once and if you encounter the same issue again feel free to re-open this case.

Comment 34 Arik 2022-05-24 13:55:56 UTC
"Specifically, there was 13gb left in a block-based SD. I created a preallocated 12gb disk and it created it without a problem and without warning" - that's the flow we are going to address

Comment 35 Shir Fishbain 2022-05-30 19:58:51 UTC
QE doesn't have the capacity to verify this bug during 4.5.1.

Comment 36 Benny Zlotnik 2022-06-13 16:00:59 UTC
For QE: The space validation for copying disks was fixed to avoid exceeding available size

As for not allowing new disks to get into the critical space blocker, it does not seem like the intended original behavior as the critical space blocker is intended to block operations when the SD is in it, not before.

Comment 38 Benny Zlotnik 2022-06-22 08:33:31 UTC
1. Try to copy a disk to a storage domain that doesn't have enough space for it - should be blocked

Comment 39 Ilia Markelov 2022-06-22 12:50:07 UTC
Verified.

Copying the disk to a storage domain that doesn't have enough space finished with an error and operation was cancelled. SD works well after it.

Versions:
engine-4.5.1.2-0.11.el8ev
vdsm-4.50.1.3-1.el8ev.x86_64

Comment 40 Sandro Bonazzola 2022-06-23 05:54:58 UTC
This bugzilla is included in oVirt 4.5.1 release, published on June 22nd 2022.
Since the problem described in this bug report should be resolved in oVirt 4.5.1 release, it has been closed with a resolution of CURRENT RELEASE.
If the solution does not work for you, please open a new bug report.