Bug 1903358 - Speed up activation with large number of storage domain
Summary: Speed up activation with large number of storage domain
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: vdsm
Classification: oVirt
Component: Core
Version: 4.40.38
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ovirt-4.4.4
: 4.40.40
Assignee: Nir Soffer
QA Contact: Tzahi Ashkenazi
URL:
Whiteboard:
Depends On: 1902468 1912923
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-12-01 21:43 UTC by Nir Soffer
Modified: 2021-11-04 19:28 UTC (History)
3 users (show)

Fixed In Version: vdsm-4.40.40
Doc Type: Bug Fix
Doc Text:
Cause: Sanlock was not configured correctly for large number of storage domains. Consequence: Running HA VM with a storage lease or performing storage operation using a storage lease was not possible after activating a host. Fix: Optimize sanlock configuration for 50 storage domains. Result: Host is ready for using storage leases up to 7 times faster after activation.
Clone Of:
Environment:
Last Closed: 2021-01-12 16:23:52 UTC
oVirt Team: Storage
Embargoed:
mlehrer: needinfo-
pm-rhel: ovirt-4.4+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 112498 0 master MERGED tool: sanlock: Speed up add_lockspace 2021-01-10 11:16:19 UTC

Description Nir Soffer 2020-12-01 21:43:52 UTC
Description of problem:

When using large number of storage domains, acquiring the host id
(add_lockspace) in all storage domains takes lot of time.
The issue worse when using larger io timeout in sanlock.

Sanlock bug 1902468 adds max_worker_threads configuration. Based on my
tests, 50 workers is a good default, speeding host activation and
deactivation.

Here are some examples using system with 40 storage domains.

|------------------------|------------|-----------|------------|
| operation              | io_timeout | 8 workers | 50 workers |
|-------------------------------------------------|------------|
| activate host          |         10 |       123 |         47 |
|                        |         20 |       224 |         71 |
| deactivate host        |         10 |        16 |          8 |
|                        |         20 |        16 |          5 |
|------------------------|------------|-----------|------------|


Vdsm should configure max_worker_threads in /etc/sanlock/sanlock.conf
when configuring sanlock (lib/vdsm/tool/configurators/sanlock.py).

How to test

Clean shutdown:

1. Add 40 storage domains
2. Deactivate host in the DC
3. Wait until all lockspaces are removed
4. Activate host
5. Measure the time until all lockspaces were added.

Unclean shutdown:

1. While host is active, perform hard poweroff. If the host is a vm,
   kill the vm.
2. Activate host
3. Measure the time until all lockspaces were added.

When number of workers is higher than number of storage domains, activation
and deactivation are 2-3 times faster.

Here are some measurements, setting the value manually.

To check lockspaces status run this as root:

    watch sanlock client status

When activating hosts, you will see all lockapces with ADD marker. Wait until
all lockapaces do not have ADD marker.

When deactivating host you will see all lockspaces with REM marker. Wait until
the lockspaces are removed.

Comment 1 Nir Soffer 2020-12-01 21:45:24 UTC
Sanlock should provide the new configuration in RHEL 8.3.z.

Comment 2 Nir Soffer 2020-12-01 22:01:42 UTC
This is rather simple change with significant performance improvement.
I'm not sure when required sanlock version will be available, but it
is likely to be available for 4.4.5.

Comment 4 Tzahi Ashkenazi 2021-01-05 08:53:53 UTC
Tested on:
         Engine : https://rhev-red-03.rdu2.scalelab.redhat.com
          40 Storage domains > iscsi connectivity 

         Hosts : 
           F02-h25-000-r620.rdu2.scalelab.redhat.com
       Version :
         vdsm-4.40.40-1.el8ev.x86_64
         Rhv-release-4.4.4-7-001.noarch
         sanlock-3.8.2-1.el8.x86_64
         max_worker_threads = 50
         +---------------------------------+-----------------+-----------+
         | F02-h25-000-r620 - vdsm-4.40.40 |                 | Duration  |          
         +---------------------------------+-----------------+-----------+
         | Scenario                        |                 |           |
         | Clean shutdown                  | deactivate host | 17.76 s   |
         |                                 | activate host   | 139 s     |
         | unclean shutdown                | activate host   | 140 s     |
         +---------------------------------+-----------------+-----------+


        Host: 
          F01-h08-000-1029u.rdu2.scalelab.redhat.com
          Version : 
            vdsm-4.40.37-1.el8ev.x86_64
            Rhv-release-4.4.4-2-001.noarch
            sanlock-3.8.2-1.el8.x86_64
         +----------------------------------+-----------------+-----------+
         | F01-h08-000-1029u - vdsm-4.40.37 |                 | Duration  |            
         +----------------------------------+-----------------+-----------+
         | Scenario                         |                 |           |
         | Clean shutdown                   | deactivate host | 19.51 s   |
         |                                  | activate host   | 136 s     |
         | unclean shutdown                 | activate host   | 975 s     |
         +----------------------------------+-----------------+-----------+

   p.s 
   once we will get the official sanlock build we be able to test it

Comment 5 Tzahi Ashkenazi 2021-01-10 13:19:15 UTC
Hey nir
please provide the correct sanlock installation procedure  : 


[root@f01-h08-000-1029u ~]# yum localinstall sanlock-lib-3.8.2-2.el8.x86_64.rpm
Updating Subscription Management repositories.
Unable to read consumer identity

This system is not registered to Red Hat Subscription Management. You can use subscription-manager to register.

Last metadata expiration check: 2:01:48 ago on Sun 10 Jan 2021 11:13:48 AM UTC.
Error:
 Problem: problem with installed package python3-sanlock-3.8.2-1.el8.x86_64
  - package python3-sanlock-3.8.2-1.el8.x86_64 requires sanlock-lib = 3.8.2-1.el8, but none of the providers can be installed
  - sanlock-lib-3.8.2-1.el8.i686 has inferior architecture
  - cannot install both sanlock-lib-3.8.2-2.el8.x86_64 and sanlock-lib-3.8.2-1.el8.x86_64
  - conflicting requests
(try to add '--allowerasing' to command line to replace conflicting packages or '--skip-broken' to skip uninstallable packages or '--nobest' to use not only best candidate packages)



[root@f01-h08-000-1029u ~]# yum install http://download.eng.bos.redhat.com/brewroot/vol/rhel-8/packages/sanlock/3.8.2/2.el8/x86_64/sanlock-3.8.2-2.el8.x86_64.rpm
Updating Subscription Management repositories.
Unable to read consumer identity

This system is not registered to Red Hat Subscription Management. You can use subscription-manager to register.

Last metadata expiration check: 2:01:55 ago on Sun 10 Jan 2021 11:13:48 AM UTC.
sanlock-3.8.2-2.el8.x86_64.rpm                                                                                                             906 kB/s | 154 kB     00:00
Error:
 Problem: conflicting requests
  - nothing provides sanlock-lib = 3.8.2-2.el8 needed by sanlock-3.8.2-2.el8.x86_64
(try to add '--skip-broken' to skip uninstallable packages or '--nobest' to use not only best candidate packages)



[root@f01-h08-000-1029u ~]# rpm -e sanlock-3.8.2-1.el8.x86_64
error: Failed dependencies:
	sanlock >= 2.4 is needed by (installed) libvirt-lock-sanlock-6.6.0-7.module+el8.3.0+8424+5ea525c5.x86_64
	sanlock >= 3.7.3 is needed by (installed) ovirt-hosted-engine-ha-2.4.5-1.el8ev.noarch
	sanlock >= 2.8 is needed by (installed) ovirt-hosted-engine-setup-2.4.8-1.el8ev.noarch



currently, with the above methods, it failed on conflicting / dependencies

Comment 6 Nir Soffer 2021-01-10 15:10:45 UTC
(In reply to Tzahi Ashkenazi from comment #5)
Try this:

    cd /etc/yum.repos.d
    wget http://brew-task-repos.usersys.redhat.com/repos/official/sanlock/3.8.2/2.el8/sanlock-3.8.2-2.el8.repo
    dnf upgrade

Note: you must put host to maintenance before upgrading sanlock.

Comment 7 Tzahi Ashkenazi 2021-01-10 16:01:16 UTC
Tested on:
         Engine : https://rhev-red-03.rdu2.scalelab.redhat.com
          40 Storage domains > iscsi connectivity 

         Hosts : 
           F02-h25-000-r620.rdu2.scalelab.redhat.com
       Version :
         vdsm-4.40.40-1.el8ev.x86_64
         Rhv-release-4.4.4-7-001.noarch
         max_worker_threads = 50

Comparison between sanlock version 3.8.2-1  to sanlock  3.8.2-2 on the following : 

+----------------------------------+-----------------+-----------------+-----------------+
| F02-h25-000-r620 - VDSM-4.40.40  |                 |             Duration              |
+----------------------------------+-----------------+-----------------+-----------------+
| Scenario                         |                 | sanlock 3.8.2-1 | sanlock 3.8.2-2 |                  
|                                  |                 |                 |                 |
| Clean shutdown                   | deactivate host | 17.76 s         | 6.56 s          |
|                                  | activate host   | 139 s           | 56.74 s         |
| unClean shutdown                 | activate host   | 140 s           | 70.72 s         |
+----------------------------------+-----------------+-----------------+-----------------+

Comment 8 Sandro Bonazzola 2021-01-12 16:23:52 UTC
This bugzilla is included in oVirt 4.4.4 release, published on December 21st 2020.

Since the problem described in this bug report should be resolved in oVirt 4.4.4 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.