Bug 1083476

Summary:	On mixed storage pools the thin provisioned volumes on block domains are not extended
Product:	Red Hat Enterprise Virtualization Manager	Reporter:	Federico Simoncelli <fsimonce>
Component:	vdsm	Assignee:	Federico Simoncelli <fsimonce>
Status:	CLOSED ERRATA	QA Contact:	Aharon Canan <acanan>
Severity:	urgent	Docs Contact:
Priority:	unspecified
Version:	3.4.0	CC:	amureini, bazulay, danken, fsimonce, gklein, iheim, knesenko, lpeer, scohen, tpoitras, yeylon
Target Milestone:	---
Target Release:	3.4.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:	storage
Fixed In Version:	vdsm-4.14.7-0.1.beta3.el6ev	Doc Type:	Bug Fix
Doc Text:	Previously, on block domains, when a thinly-provisioned disk neared its limit, the host running it requested that the SPM extend the volume. This was done by writing a message to a pre-defined volume on the master storage domain called the mailbox. The SPM then monitored this mailbox, and handled the extend requests. RHEVM 3.4 introduced mixed storage domains, allowing master file domains to be in charge of block domains. Also, RHEVM 3.4 creates the mailbox on the master storage domain, regardless of its type.	Story Points:	---
Clone Of:		Environment:
Last Closed:	2014-06-09 13:30:08 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	Storage	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1052318, 1069730

Description Federico Simoncelli 2014-04-02 10:23:13 UTC

Description of problem:
In mixed storage pools (containing block domains) where the master is a file domain the volume extensions requests are not sent/received. In fact the requests are delivered through the mailbox on the master domain.

Version-Release number of selected component (if applicable):
vdsm-4.13.2-0.13.el6ev

How reproducible:
100%

Steps to Reproduce:
1. create a mixed storage pool (at least 1 block domain and 1 file domain as master)
2. prepare a vm with a thin-provisioned disk on the block domain
3. start filling the disk

Actual results:
The disk thin-provisioned volume is not extended.

Expected results:
The disk thin-provisioned volume must be extended.

Additional info:
We probably need to enable the mailbox on file domains as well.

Comment 1 Federico Simoncelli 2014-04-09 08:39:00 UTC

With regard to the implementation we have two options:

1. enable the mailbox on file master domains always (possible performance hit due the additional IO for monitoring the mailbox)
2. enable the mailbox on file master domains only when there is a block domain in the pool (enabling and disabling the mailbox dynamically is error prone)

In case we are considering 2 we'll have to:

- on connectStoragePool iterate on all the domains to check if at least one is a block domain and eventually activate the mailbox
- when a domain connectivity is restored check if it's a block domain and eventually activate the mailbox (we might have missed it on connectStoragePool)
- when a domain is activated check if it's a block domain and eventually activate the mailbox
- when a domain is deactivated check if it's a block domain and eventually deactivate the mailbox (if no other block domains are left)


At the moment the patch posted upstream is implementing 1. We need to decide if we want (it's worth) to invest time in developing and stabilizing 2.

Comment 2 Federico Simoncelli 2014-04-09 09:15:59 UTC

When there are no active extensions requests the SPM IO required for monitoring the inbox is 1Mb read every 2 seconds (max hosts = 250).

When there are active requests the IO increases (also from the HSM) but obviously this case is not interesting because it means that the mailbox was indeed needed (a block domain in the pool).

Comment 3 Sean Cohen 2014-04-09 10:56:32 UTC

(In reply to Federico Simoncelli from comment #2)
> When there are no active extensions requests the SPM IO required for
> monitoring the inbox is 1Mb read every 2 seconds (max hosts = 250).
> 
> When there are active requests the IO increases (also from the HSM) but
> obviously this case is not interesting because it means that the mailbox was
> indeed needed (a block domain in the pool).

Considering the low performance impact tested on a single host, and where we are in the release cycle (approaching RC) we can go ahead with the first approach to always enable the mailbox on file master domains.

This will allow also for QA to test it our on multiple hosts, and provide further indication.

Sean

Comment 4 Barak 2014-04-09 11:22:13 UTC

2 comments:

1 - I would go on #1 above for simplicity (at this point), and try to reduce the IO by simply checking (on file mailboxes) whether the file was changed in the last X seconds (better for the default use case).

2 - 250 hosts is the mailbox limitations.
It may hit us on currently deployed scaled environments on upgrade to 3.4, as this actually activates the mailbox on NFS (IIUC more commonly used on scaled environments)  


E.G. in our currently scale lab we are running 500 fake hosts on NFS pool

Comment 5 Barak 2014-04-09 12:04:43 UTC

(In reply to Barak from comment #4)
> 2 comments:
> 
> 1 - I would go on #1 above for simplicity (at this point), and try to reduce
> the IO by simply checking (on file mailboxes) whether the file was changed
> in the last X seconds (better for the default use case).
> 
> 2 - 250 hosts is the mailbox limitations.
> It may hit us on currently deployed scaled environments on upgrade to 3.4,
> as this actually activates the mailbox on NFS (IIUC more commonly used on
> scaled environments)

I'm do not fully know what are the implications of increasing it,
It can be a simple change in configuration.

Fede ?

> 
> 
> E.G. in our currently scale lab we are running 500 fake hosts on NFS pool

Comment 7 Allon Mureinik 2014-04-09 14:09:18 UTC

(In reply to Barak from comment #4)
> 2 comments:
> 
> 1 - I would go on #1 above for simplicity (at this point), and try to reduce
> the IO by simply checking (on file mailboxes) whether the file was changed
> in the last X seconds (better for the default use case).

I'm totally on board with this.
IIUC, the impact is adding an additional constant 1MB read every two seconds - nothing to be alarmed about.

If anything, the future optimization should be to completely remove the mailbox when SPM is removed, and have every host serve its own extends.

Comment 8 Federico Simoncelli 2014-04-09 21:38:41 UTC

(In reply to Barak from comment #5)
> (In reply to Barak from comment #4)
> > 2 comments:
> > 
> > 1 - I would go on #1 above for simplicity (at this point), and try to reduce
> > the IO by simply checking (on file mailboxes) whether the file was changed
> > in the last X seconds (better for the default use case).

Good idea to check the file modification timestamp, we could try to evaluate the risk of that change using a different patch on top of #1.

> > 2 - 250 hosts is the mailbox limitations.
> > It may hit us on currently deployed scaled environments on upgrade to 3.4,
> > as this actually activates the mailbox on NFS (IIUC more commonly used on
> > scaled environments)
> 
> I'm do not fully know what are the implications of increasing it,
> It can be a simple change in configuration.

Currently maxHostID is sent from engine to vdsm using the value configured in MaxNumberOfHostsInStoragePool:

fn_db_add_config_value('MaxNumberOfHostsInStoragePool','250','general');

Polling size = MaxNumberOfHostsInStoragePool * 4K = 1Mb (by default)

In the release notes about the mailbox on file domains we should mention that an higher value of MaxNumberOfHostsInStoragePool will trigger larger reads.

Comment 9 Aharon Canan 2014-04-27 11:00:41 UTC

verified using av7

Comment 10 errata-xmlrpc 2014-06-09 13:30:08 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-0504.html