Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1433228

Summary: [3.6 only] Storage monitoring get serialized in operational mutex
Product: Red Hat Enterprise Virtualization Manager Reporter: Roman Hodain <rhodain>
Component: vdsmAssignee: Nir Soffer <nsoffer>
Status: CLOSED ERRATA QA Contact: Natalie Gavrielov <ngavrilo>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.6.10CC: amureini, aperotti, bazulay, gveitmic, lsurette, mkalinin, nsoffer, rhodain, srevivo, tnisan, ycui, ylavi
Target Milestone: ovirt-3.6.11Keywords: ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-05-09 17:04:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
repoplot from vdsm log before applying the patch
none
repoplot from vdsm log after applying the patch
none
vdsm logs from verification
none
repoplot from verification logs none

Description Roman Hodain 2017-03-17 06:26:44 UTC
Description of problem:
The storage monitoring threads get serialized when acquiring operational mutex for the same operation instead of of being running parallel after the first operation release the mutex.

Version-Release number of selected component (if applicable):
vdsm-4.17.33-1

How reproducible:
100% in certain environments

Steps to Reproduce:
1.Create high number of storage domains (20+)
2.make the storage overloaded so the threads get delayed and ahs wait for each other 

Actual results:
Thread-55::DEBUG::2017-02-26 21:12:34,899::lvm::370::Storage.OperationMutex::(_reloadvgs) Operation 'lvm reload operation' got the operation mutex
Thread-55::DEBUG::2017-02-26 21:12:34,899::lvm::290::Storage.Misc.excCmd::(cmd) /usr/bin/taskset --cpu-list 0-7 /usr/bin/sudo -n /usr/sbin/lvm vgs ... a87bea91-fafc-4601-86bb-986bd3f823a6 (cwd None)
Thread-63::DEBUG::2017-02-26 21:12:34,912::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting...
Thread-38::DEBUG::2017-02-26 21:12:34,912::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting...
Thread-41::DEBUG::2017-02-26 21:12:34,912::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting...
Thread-59::DEBUG::2017-02-26 21:12:34,913::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting...
...
Thread-55::DEBUG::2017-02-26 21:12:53,192::lvm::415::Storage.OperationMutex::(_reloadvgs) Operation 'lvm reload operation' released the operation mutex
...
Thread-63::DEBUG::2017-02-26 21:12:53,206::lvm::370::Storage.OperationMutex::(_reloadvgs) Operation 'lvm reload operation' got the operation mutex
...
Thread-38::DEBUG::2017-02-26 21:12:53,219::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting...
Thread-41::DEBUG::2017-02-26 21:12:53,219::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting...
Thread-59::DEBUG::2017-02-26 21:12:53,219::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting...
Thread-61::DEBUG::2017-02-26 21:12:53,219::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting...
Thread-40::DEBUG::2017-02-26 21:12:53,219::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting...
Thread-75::DEBUG::2017-02-26 21:12:53,219::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting...

Expected results:
All threads are expected to get the mutex as soon as Thread-55 releases the mutex instead just Thread-63

Comment 3 Nir Soffer 2017-03-18 03:33:13 UTC
Created attachment 1264340 [details]
repoplot from vdsm log before applying the patch

This show how the all threads run one after another. Because lvm operations are
slow (~15s), and there are 40 monitors, lastCheck values are higher than 300 
seconds, which causes the host to become non-operational.

Comment 4 Nir Soffer 2017-03-18 03:38:07 UTC
Created attachment 1264341 [details]
repoplot from vdsm log after applying the patch

With the patch we can see that lvm commands run concurrently, and even with some
extremely slow lvm commands (90s), lastCheck values are less than 150 seconds, and
the host remain operational.

We need to understand why lvm operations are slow slow, this is not expected with 
good storage.

Comment 5 Nir Soffer 2017-03-18 03:45:26 UTC
This is the same issue as bug 1081962, that was not fixed in 3.6.

The attached patch fixes the unwanted serialization of lvm commands, causing high
lastCheck values that cause the host to become non-operational.

This patch does not fix other issues that may cause monitoring to be blocked, and 
were fixed in 4.0.

Roman, we should understand why lvm commands are so slow (e.g. 90s). What happened
on this host at 23:07 (in the log with the fix), and at 21:01 (in the log without
the fix)?

Maybe open another bug about the slow lvm commands?  I would like to see the lvm
sos_commands from a host on this environment, and sar logs from same timeframe.

Comment 6 Nir Soffer 2017-03-19 09:32:14 UTC
Tal, this issue does not exist in 4.0 and 4.1 (there is no operation mutex). This
is 3.6.z bug.

Comment 9 Nir Soffer 2017-03-19 23:16:24 UTC
How to test:

1. Setup a system with many iSCSI/FC storage domains (50 is a good number)
2. Create 10 big raw disks on one domain. The disk should be big enough so
   copying all of them to another domain takes about 10 minutes.
3. Move all disks to another domain
4. Attach vdsm logs from both runs to the bug

Result without this fix:
- Lvm operations will become slow
- The operation is likely to fail after a while
- The host will become non-operational.

Result with this fix:
- Lvm operations will become slow
- The operation is likely to complete successfully
- host should remain in UP state

For more info see bug 1081962.

Comment 10 Roman Hodain 2017-03-20 06:54:05 UTC
(In reply to Nir Soffer from comment #5)
> This is the same issue as bug 1081962, that was not fixed in 3.6.
> 
> The attached patch fixes the unwanted serialization of lvm commands, causing
> high
> lastCheck values that cause the host to become non-operational.
> 
> This patch does not fix other issues that may cause monitoring to be
> blocked, and 
> were fixed in 4.0.
> 

Understood

> Roman, we should understand why lvm commands are so slow (e.g. 90s). What
> happened
> on this host at 23:07 (in the log with the fix), and at 21:01 (in the log
> without
> the fix)?
> 

Not clear at the moment, but can be caused by high storage load.

> Maybe open another bug about the slow lvm commands?  I would like to see the
> lvm
> sos_commands from a host on this environment, and sar logs from same
> timeframe.

I am not sure if it is worth to crate a bug for this. There is nothing in the logs and the sar stats only suggest high load on storage generated by a source which is not part on the investigated hypervisors or a storage issue. I assume that as the await on some of the LUNs goes from average 15 to 308 at that time. The data written / read to/from the LUNs are in 10ts of MB per sec. Also vdsm did not performed any outstanding operation at that time.

Comment 17 Raz Tamir 2017-03-27 19:34:49 UTC
(In reply to Nir Soffer from comment #9)
> How to test:
> 
> 1. Setup a system with many iSCSI/FC storage domains (50 is a good number)
> 2. Create 10 big raw disks on one domain. The disk should be big enough so
>    copying all of them to another domain takes about 10 minutes.
> 3. Move all disks to another domain
> 4. Attach vdsm logs from both runs to the bug
> 
> Result without this fix:
> - Lvm operations will become slow
> - The operation is likely to fail after a while
> - The host will become non-operational.
> 
> Result with this fix:
> - Lvm operations will become slow
> - The operation is likely to complete successfully
> - host should remain in UP state
> 
> For more info see bug 1081962.

Thanks Nir

Comment 18 Natalie Gavrielov 2017-04-03 12:46:13 UTC
Created attachment 1268381 [details]
vdsm logs from verification

Used scenario described in comment 9, and additional data supplied in bug 1081962.
10 raw disks 100gb each were moved from one iscsi storage domain to another, and later on were copied to a storage domain.
Operations took ~23-24 minutes each (moving then copying for all disks).
Used:
vdsm-4.17.39-1.el7ev.noarch
rhevm-3.6.11-0.1.el6.noarch

Comment 19 Natalie Gavrielov 2017-04-03 12:47:12 UTC
Moving to verified.

Comment 20 Nir Soffer 2017-04-03 22:42:15 UTC
Created attachment 1268486 [details]
repoplot from verification logs

Thanks Natalie!

In the plot we can see that even under huge io load, domain monitors were slowed
down but the system can operate successfully under these conditions.

Comment 22 errata-xmlrpc 2017-05-09 17:04:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:1211