1083476 – On mixed storage pools the thin provisioned volumes on block domains are not extended

Bug 1083476 - On mixed storage pools the thin provisioned volumes on block domains are not extended

Summary: On mixed storage pools the thin provisioned volumes on block domains are not ...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Virtualization Manager
Classification:	Red Hat
Component:	vdsm
Sub Component:
Version:	3.4.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Target Release:	3.4.0
Assignee:	Federico Simoncelli
QA Contact:	Aharon Canan
Docs Contact:
URL:
Whiteboard:	storage
Depends On:
Blocks:	1052318 1069730
TreeView+	depends on / blocked

Reported:	2014-04-02 10:23 UTC by Federico Simoncelli
Modified:	2016-02-10 19:18 UTC (History)
CC List:	11 users (show)
Fixed In Version:	vdsm-4.14.7-0.1.beta3.el6ev
Doc Type:	Bug Fix
Doc Text:	Previously, on block domains, when a thinly-provisioned disk neared its limit, the host running it requested that the SPM extend the volume. This was done by writing a message to a pre-defined volume on the master storage domain called the mailbox. The SPM then monitored this mailbox, and handled the extend requests. RHEVM 3.4 introduced mixed storage domains, allowing master file domains to be in charge of block domains. Also, RHEVM 3.4 creates the mailbox on the master storage domain, regardless of its type.
Clone Of:
Environment:
Last Closed:	2014-06-09 13:30:08 UTC
oVirt Team:	Storage
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2014:0504	normal	SHIPPED_LIVE	vdsm 3.4.0 bug fix and enhancement update	2014-06-09 17:21:35 UTC
oVirt gerrit	26414	None	None	None	Never
oVirt gerrit	26989	None	None	None	Never

Description Federico Simoncelli 2014-04-02 10:23:13 UTC

Description of problem:
In mixed storage pools (containing block domains) where the master is a file domain the volume extensions requests are not sent/received. In fact the requests are delivered through the mailbox on the master domain.

Version-Release number of selected component (if applicable):
vdsm-4.13.2-0.13.el6ev

How reproducible:
100%

Steps to Reproduce:
1. create a mixed storage pool (at least 1 block domain and 1 file domain as master)
2. prepare a vm with a thin-provisioned disk on the block domain
3. start filling the disk

Actual results:
The disk thin-provisioned volume is not extended.

Expected results:
The disk thin-provisioned volume must be extended.

Additional info:
We probably need to enable the mailbox on file domains as well.

Comment 1 Federico Simoncelli 2014-04-09 08:39:00 UTC

With regard to the implementation we have two options:

1. enable the mailbox on file master domains always (possible performance hit due the additional IO for monitoring the mailbox)
2. enable the mailbox on file master domains only when there is a block domain in the pool (enabling and disabling the mailbox dynamically is error prone)

In case we are considering 2 we'll have to:

- on connectStoragePool iterate on all the domains to check if at least one is a block domain and eventually activate the mailbox
- when a domain connectivity is restored check if it's a block domain and eventually activate the mailbox (we might have missed it on connectStoragePool)
- when a domain is activated check if it's a block domain and eventually activate the mailbox
- when a domain is deactivated check if it's a block domain and eventually deactivate the mailbox (if no other block domains are left)


At the moment the patch posted upstream is implementing 1. We need to decide if we want (it's worth) to invest time in developing and stabilizing 2.

Comment 2 Federico Simoncelli 2014-04-09 09:15:59 UTC

When there are no active extensions requests the SPM IO required for monitoring the inbox is 1Mb read every 2 seconds (max hosts = 250).

When there are active requests the IO increases (also from the HSM) but obviously this case is not interesting because it means that the mailbox was indeed needed (a block domain in the pool).

Comment 3 Sean Cohen 2014-04-09 10:56:32 UTC

(In reply to Federico Simoncelli from comment #2)
> When there are no active extensions requests the SPM IO required for
> monitoring the inbox is 1Mb read every 2 seconds (max hosts = 250).
> 
> When there are active requests the IO increases (also from the HSM) but
> obviously this case is not interesting because it means that the mailbox was
> indeed needed (a block domain in the pool).

Considering the low performance impact tested on a single host, and where we are in the release cycle (approaching RC) we can go ahead with the first approach to always enable the mailbox on file master domains.

This will allow also for QA to test it our on multiple hosts, and provide further indication.

Sean

Comment 4 Barak 2014-04-09 11:22:13 UTC

2 comments:

1 - I would go on #1 above for simplicity (at this point), and try to reduce the IO by simply checking (on file mailboxes) whether the file was changed in the last X seconds (better for the default use case).

2 - 250 hosts is the mailbox limitations.
It may hit us on currently deployed scaled environments on upgrade to 3.4, as this actually activates the mailbox on NFS (IIUC more commonly used on scaled environments)  


E.G. in our currently scale lab we are running 500 fake hosts on NFS pool

Comment 5 Barak 2014-04-09 12:04:43 UTC

(In reply to Barak from comment #4)
> 2 comments:
> 
> 1 - I would go on #1 above for simplicity (at this point), and try to reduce
> the IO by simply checking (on file mailboxes) whether the file was changed
> in the last X seconds (better for the default use case).
> 
> 2 - 250 hosts is the mailbox limitations.
> It may hit us on currently deployed scaled environments on upgrade to 3.4,
> as this actually activates the mailbox on NFS (IIUC more commonly used on
> scaled environments)

I'm do not fully know what are the implications of increasing it,
It can be a simple change in configuration.

Fede ?

> 
> 
> E.G. in our currently scale lab we are running 500 fake hosts on NFS pool

Comment 7 Allon Mureinik 2014-04-09 14:09:18 UTC

(In reply to Barak from comment #4)
> 2 comments:
> 
> 1 - I would go on #1 above for simplicity (at this point), and try to reduce
> the IO by simply checking (on file mailboxes) whether the file was changed
> in the last X seconds (better for the default use case).

I'm totally on board with this.
IIUC, the impact is adding an additional constant 1MB read every two seconds - nothing to be alarmed about.

If anything, the future optimization should be to completely remove the mailbox when SPM is removed, and have every host serve its own extends.

Comment 8 Federico Simoncelli 2014-04-09 21:38:41 UTC

(In reply to Barak from comment #5)
> (In reply to Barak from comment #4)
> > 2 comments:
> > 
> > 1 - I would go on #1 above for simplicity (at this point), and try to reduce
> > the IO by simply checking (on file mailboxes) whether the file was changed
> > in the last X seconds (better for the default use case).

Good idea to check the file modification timestamp, we could try to evaluate the risk of that change using a different patch on top of #1.

> > 2 - 250 hosts is the mailbox limitations.
> > It may hit us on currently deployed scaled environments on upgrade to 3.4,
> > as this actually activates the mailbox on NFS (IIUC more commonly used on
> > scaled environments)
> 
> I'm do not fully know what are the implications of increasing it,
> It can be a simple change in configuration.

Currently maxHostID is sent from engine to vdsm using the value configured in MaxNumberOfHostsInStoragePool:

fn_db_add_config_value('MaxNumberOfHostsInStoragePool','250','general');

Polling size = MaxNumberOfHostsInStoragePool * 4K = 1Mb (by default)

In the release notes about the mailbox on file domains we should mention that an higher value of MaxNumberOfHostsInStoragePool will trigger larger reads.

Comment 9 Aharon Canan 2014-04-27 11:00:41 UTC

verified using av7

Comment 10 errata-xmlrpc 2014-06-09 13:30:08 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-0504.html

Note You need to log in before you can comment on or make changes to this bug.