Bug 1217401

Summary: [SCALE] Adding direct lun to a VM times out
Product: Red Hat Enterprise Virtualization Manager Reporter: akotov
Component: vdsmAssignee: Fred Rolland <frolland>
Status: CLOSED ERRATA QA Contact: mlehrer
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 3.5.1CC: aefrat, ahino, amureini, bazulay, bhaubeck, dmoessne, eedri, frolland, jbuchta, lpeer, lsurette, mlehrer, nashok, obockows, rgroten, rhodain, rmcswain, tnisan, yeylon, ykaul, ylavi, zkabelac
Target Milestone: ovirt-3.6.0-rc3   
Target Release: 3.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: 3.6.0-16 Doc Type: Bug Fix
Doc Text:
Previously, adding a direct LUN to a virtual machine sometimes timed out. This occurred because a physical volume (PV) create test is performed for each device when calling getDeviceList. Since a PV create test requires significant resources, it affects the response time of GetDeviceList in scale setups, sometimes causing timeouts on the Red Hat Enterprise Virtualization Manager. This has been fixed and the PV create test can now be skipped using a flag. If the PV test is needed to know the usage state of the device, it can be run on specific devices, therefore minimizing the impact and decreasing the user waiting time on the following operations: 1. UI - Add direct LUN disk 2. UI/REST - Add ISCSI/FC storage domain 3. UI/REST - Edit ISCSI/FC storage domain 4. UI/REST - Extend ISCSI/FC storage domain 5. REST - Add direct LUN disk (if <host> parameter is provided) 6. UI/REST - Import an iSCSI/FCP domain
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-03-09 19:38:42 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Scale RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Test PV create command
none
pvcreate command run with the timestamps none

Description akotov 2015-04-30 09:57:11 UTC
Description of problem:

When adding new direct to the VM operations takes long time and then times out


Version-Release number of selected component (if applicable):

vdsm-4.14.18-4.el6ev.x86_64

How reproducible:

, then external with settings virtio-scsi and fibre channel. After a while (some minutes) I get an empty list.


Steps to Reproduce:
1. select VM, then discs, then add
2. select type virtio-scsi and fiber channel

Actual results:

After some minutes empty list is shown

Expected results:

LUN list is displaed

Additional info:

The delay is caused by LVM operation:

Thread-76::DEBUG::2015-04-28 15:58:49,029::lvm::296::Storage.Misc.excCmd::(cmd) '/usr/bin/sudo -n /sbin/lvm pvcreate --config " devices { preferred_names = [\\"^/dev/mapper/\\"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 obtain_device_list
_from_udev=0 filter = [ \'a|/dev/mapper/3600507680181870c2000000000000085...--test....
...
Thread-76::DEBUG::2015-04-28 16:01:44,518::lvm::296::Storage.Misc.excCmd::(cmd) FAILED: <err> = '  TEST MODE: Metadata will NOT be updated and volumes will not be (de)activated.\n  Found duplicate

It seems that the call is not used for anything:

vdsm/storage/hsm.py:

    1948     def _getDeviceList(self, storageType=None, guids=None): 
    ...
    2002         # Look for devices that will probably fail if pvcreated.                
    2003         devNamesToPVTest = tuple(dev["GUID"] for dev in devices)                
    2004         unusedDevs, usedDevs = lvm.testPVCreate(                                
    2005             devNamesToPVTest, metadataSize=blockSD.VG_METADATASIZE)             
    2006         # Assuming that unusables v unusables = None                            
    2007         free = tuple(os.path.basename(d) for d in unusedDevs)                   
    2008         used = tuple(os.path.basename(d) for d in usedDevs)                     
    2009         for dev in devices:                                                     
    2010             guid = dev['GUID']                                                  
    2011             if guid in free:                                                    
    2012                 dev['status'] = "free"                                          
    2013             elif guid in used:                                                  
    2014                 dev['status'] = "used"                                          
    2015             else:                                                               
    2016                 raise KeyError("pvcreate response foresight is "                
    2017                                "can not be determined for %s", dev)             
    2018                                                                                 
    2019         return devices                                                          

We can see that we only return the devices, but we do not return free or used
variable. We do not use the dev variable ether. I guess that this part of the
code can be removed. Please confirm and also offer workaround until the issue is resolved

Comment 1 Allon Mureinik 2015-04-30 12:39:10 UTC
Ala, as this week's QA contact, can you take a look please?

Comment 3 Ala Hino 2015-05-04 08:49:22 UTC
Alexandar,

I am not able to reproduce the issue as described.

Please note that the lvm call you referred to as the root cause is *not* related to adding LUN - 'lvm pvcreate' is invoked when creating storage domain, not when adding LUNs.

When adding LUNs, I's look for 'getDeviceList' calls.

Can you isolate the operation and make sure you are not creating storage domains while adding the LUN?

Comment 5 Roman Hodain 2015-05-21 14:18:28 UTC
(In reply to Ala Hino from comment #3)
> Alexandar,
> 
> I am not able to reproduce the issue as described.
> 
> Please note that the lvm call you referred to as the root cause is *not*
> related to adding LUN - 'lvm pvcreate' is invoked when creating storage
> domain, not when adding LUNs.
> 
> When adding LUNs, I's look for 'getDeviceList' calls.
> 
> Can you isolate the operation and make sure you are not creating storage
> domains while adding the LUN?

Hi,

are you sure? getDeviceList is called any time you need to list devices available on the hypervisor. If you want to create a direct lun, you have call this so yo uknow what LUNs are available.

lvm.testPVCreate is part of this call as it is part of HSM._getDeviceList which is called by HSM.getDeviceList.

Click on Tab "Disks"
Click on "add"
Click on "External (Direct Lun)"
   engine tries to retrieve the available LUNs and triggers:

Thread-72158::DEBUG::2015-05-21 14:13:44,227::BindingXMLRPC::318::vds::(wrapper) client [10.34.57.135]
Thread-72158::DEBUG::2015-05-21 14:13:44,227::task::595::Storage.TaskManager.Task::(_updateState) Task=`bc8d4989-cb21-4529-9385-b97c449ee6dc`::moving from state init -> state preparing
Thread-72158::INFO::2015-05-21 14:13:44,228::logUtils::44::dispatcher::(wrapper) Run and protect: getDeviceList(storageType=3, options={})
...
Thread-72158::DEBUG::2015-05-21 14:13:45,691::lvm::291::Storage.Misc.excCmd::(cmd) /usr/bin/sudo -n /sbin/lvm pvcreate --config
....


See the code snipped mentioned by Alexandar.

Roman

Comment 6 Ala Hino 2015-05-25 15:51:31 UTC
Zdenek,

We found out that testPVCreate take almost 3 minutes.
In this deployment, there are 183 devices: 141 failed in pvcreate and 42 succeeded in pvcreate.

I attached the command that is executed.

Is this reasonable?

Comment 7 Ala Hino 2015-05-25 15:52:43 UTC
Created attachment 1029508 [details]
Test PV create command

Comment 8 Ala Hino 2015-05-25 15:59:19 UTC
(In reply to Roman Hodain from comment #5)
> (In reply to Ala Hino from comment #3)
> > Alexandar,
> > 
> > I am not able to reproduce the issue as described.
> > 
> > Please note that the lvm call you referred to as the root cause is *not*
> > related to adding LUN - 'lvm pvcreate' is invoked when creating storage
> > domain, not when adding LUNs.
> > 
> > When adding LUNs, I's look for 'getDeviceList' calls.
> > 
> > Can you isolate the operation and make sure you are not creating storage
> > domains while adding the LUN?
> 
> Hi,
> 
> are you sure? getDeviceList is called any time you need to list devices
> available on the hypervisor. If you want to create a direct lun, you have
> call this so yo uknow what LUNs are available.
> 
> lvm.testPVCreate is part of this call as it is part of HSM._getDeviceList
> which is called by HSM.getDeviceList.
> 
> Click on Tab "Disks"
> Click on "add"
> Click on "External (Direct Lun)"
>    engine tries to retrieve the available LUNs and triggers:
> 
> Thread-72158::DEBUG::2015-05-21
> 14:13:44,227::BindingXMLRPC::318::vds::(wrapper) client [10.34.57.135]
> Thread-72158::DEBUG::2015-05-21
> 14:13:44,227::task::595::Storage.TaskManager.Task::(_updateState)
> Task=`bc8d4989-cb21-4529-9385-b97c449ee6dc`::moving from state init -> state
> preparing
> Thread-72158::INFO::2015-05-21
> 14:13:44,228::logUtils::44::dispatcher::(wrapper) Run and protect:
> getDeviceList(storageType=3, options={})
> ...
> Thread-72158::DEBUG::2015-05-21
> 14:13:45,691::lvm::291::Storage.Misc.excCmd::(cmd) /usr/bin/sudo -n
> /sbin/lvm pvcreate --config
> ....
> 
> 
> See the code snipped mentioned by Alexandar.
> 
> Roman

Hi,

You are correct, testPVCreate is also invoked when performing getDeviceList.
The thing is that in this deployment there are 183 devices and the testPVCreate takes almost 3 minutes, hence the timeout seen in the engine.

The code mentioned in the description is required because it is used to mark what devices are used and what are not. This info is needed by the engine to provide warning to the admin when trying to add a lun that is used.

Sent needinfo request to Zdenek to see what we can do here.

Comment 9 Zdenek Kabelac 2015-05-25 16:08:08 UTC
Which version of lvm2 is here in use ?

Is lvmetad in use ?

Also if you plan to 'pvcreate + vgcreate'
it much more efficient to pass directly devices to vgcreate
(and skip extra pvcreate step)
Would this help for better efficience for now ?

Comment 10 Ala Hino 2015-05-27 10:16:49 UTC
Alexandar,

Can you please reply to Zdenek needinfo?

Comment 11 Roman Hodain 2015-06-04 11:36:45 UTC
It was reported on the following versions so far:
    lvm2-2.02.111-2.el6_6.1.x86_64
    lvm2-2.02.111-2.el6_6.2.x86_64

lvmetad is not in use. It is not enabled by default on RHEL nor on RHEV-H

We can test that with lvmetad enabled.

Comment 12 Roman Hodain 2015-06-13 12:02:04 UTC
I have run a couple of tests in my ENV with iSCSI. There are 50 LUNs. The performance is not very high as all of the LUNs are on a single disk exported via LIO (RHEL 7.1)

The parameter use_lvmetad=1 cannot be simply set as we pass the option in the --config section of pvcreate, but I have tweaked to code an test that. Here are the results:

lvmetad enabled:

    # time vdsClient -s 0 getDeviceList >/dev/null 
        real	0m29.070s
        user	0m0.187s
        sys	0m0.046s
    # time vdsClient -s 0 getDeviceList >/dev/null 
        real	0m28.417s
        user	0m0.198s
        sys	0m0.042s
    # time vdsClient -s 0 getDeviceList >/dev/null 
        real	0m30.982s
        user	0m0.195s
        sys	0m0.040s

lvmetad disabled:
    # time vdsClient -s 0 getDeviceList >/dev/null 
        real	2m54.038s
        user	0m0.201s
        sys	0m0.060s
    # time vdsClient -s 0 getDeviceList >/dev/null 
        real	2m55.566s
        user	0m0.193s
        sys	0m0.060s
    # time vdsClient -s 0 getDeviceList >/dev/null 
        real	3m1.949s
        user	0m0.196s
        sys	0m0.062s

So there is a significant improvement, but we cannot use it as it is a significant change in the storage stack.

I have also run the pvcreate command manually without lvmetad and it takes almost the same time as the vdsClient command. The interesting is how the performance of the command degradates. I have add timestamp before the lines reported by pvcreate command. All the pvs are on the commandline:
$ grep 2015-06-13 pvcreate.out 
    2015-06-13 09:51:02   Physical volume "/dev/mapper/360014056628b5d6642d45f0a53c4415d" successfully created
    2015-06-13 09:51:03   Physical volume "/dev/mapper/360014051845d840046e498992e9be252" successfully created
    2015-06-13 09:51:03   Physical volume "/dev/mapper/36001405c36d824ad312400aa25c07f9b" successfully created
    2015-06-13 09:51:04   Physical volume "/dev/mapper/36001405bc71ffe3507f4e9db5fa17e8b" successfully created
    2015-06-13 09:51:05   Physical volume "/dev/mapper/36001405b6ac01b3cb1346448a86168b6" successfully created
    2015-06-13 09:51:06   Physical volume "/dev/mapper/360014053f20c8b48bc74e9fa94abce3c" successfully created
    2015-06-13 09:51:07   Physical volume "/dev/mapper/3600140524b0f150c17043b4a06006cf9" successfully created
    2015-06-13 09:51:08   Physical volume "/dev/mapper/3600140547d444218c574c61bf2102920" successfully created
    2015-06-13 09:51:09   Physical volume "/dev/mapper/360014058b44394769264c85b5092f717" successfully created
    2015-06-13 09:51:10   Physical volume "/dev/mapper/3600140588e25e0d887e4b7eaa5bb0aa3" successfully created
    2015-06-13 09:51:12   Physical volume "/dev/mapper/36001405925e187b37024b10a3cd30cfa" successfully created
    2015-06-13 09:51:12   Physical volume "/dev/mapper/360014059340348152274828aa152c894" successfully created
    2015-06-13 09:51:13   Physical volume "/dev/mapper/360014054959fd6b9d084be18826a025a" successfully created
    2015-06-13 09:51:14   Physical volume "/dev/mapper/36001405445bb63ae57d403e973f09cf2" successfully created
    2015-06-13 09:51:16   Physical volume "/dev/mapper/36001405946e227c6c424659b5d6c20ea" successfully created
    2015-06-13 09:51:17   Physical volume "/dev/mapper/3600140544a62a65ee3f455f9c0d72374" successfully created
    2015-06-13 09:51:18   Physical volume "/dev/mapper/360014050e40f9deb69d4d29918708173" successfully created
    2015-06-13 09:51:21   Physical volume "/dev/mapper/360014057deb77fec0e04de6a65f4ddd5" successfully created
    2015-06-13 09:51:22   Physical volume "/dev/mapper/36001405d3e3ebdab1544383b11391de6" successfully created
    2015-06-13 09:51:24   Physical volume "/dev/mapper/36001405f66e726ac84e4d0798b646e25" successfully created
    2015-06-13 09:51:27   Physical volume "/dev/mapper/36001405b76a1eef0c5f4c78a481fe501" successfully created
    2015-06-13 09:51:32   Physical volume "/dev/mapper/36001405c61c377577d04b29838ec458c" successfully created
    2015-06-13 09:51:36   Physical volume "/dev/mapper/36001405c14320db62c242088820cd568" successfully created
    2015-06-13 09:51:40   Physical volume "/dev/mapper/360014055af6a25fce3b4c7e8af621253" successfully created
    2015-06-13 09:51:44   Physical volume "/dev/mapper/3600140548ccb367d0494b24888334fb6" successfully created
    2015-06-13 09:51:48   Physical volume "/dev/mapper/36001405c639f497b715425486a3dfaad" successfully created
    2015-06-13 09:51:53   Physical volume "/dev/mapper/36001405cc6ea3749a524478b86d23626" successfully created
    2015-06-13 09:51:58   Physical volume "/dev/mapper/3600140539e18caa79ee43069598b4f25" successfully created
    2015-06-13 09:52:08   Physical volume "/dev/mapper/36001405839725329b92472fbf5911851" successfully created
    2015-06-13 09:52:13   Physical volume "/dev/mapper/360014057c4ae88d66d3455b80babcbc4" successfully created
    2015-06-13 09:52:19   Physical volume "/dev/mapper/360014050814d94d749747a1b3dac343f" successfully created
    2015-06-13 09:52:47   Physical volume "/dev/mapper/3600140563afa5c20c1740aba1dccc608" successfully created
    2015-06-13 09:53:08   Physical volume "/dev/mapper/360014057b5533bb930d4973b8a4664bf" successfully created
    2015-06-13 09:53:15   Physical volume "/dev/mapper/360014054eb95ba99c04429f9b666e082" successfully created
    2015-06-13 09:53:21   Physical volume "/dev/mapper/360014050f4642d00f904ddf8e79e3ab2" successfully created
    2015-06-13 09:53:27   Physical volume "/dev/mapper/3600140569245e0a199344d0ae7464062" successfully created
    2015-06-13 09:53:34   Physical volume "/dev/mapper/360014056cd8cd6a1ee44910a859f7391" successfully created
    2015-06-13 09:53:40   Physical volume "/dev/mapper/360014055db686a56c51465ca8fa66412" successfully created
    2015-06-13 09:53:47   Physical volume "/dev/mapper/360014053de8438cb7de4b6484b4d5db5" successfully created
    2015-06-13 09:53:54   Physical volume "/dev/mapper/3600140554a49dcf28aa4d0a96c0ea57a" successfully created

You can see that first couple of PVs were finished much faster than the later PVs. This I guess should be explained by the LVM guys. How can we fix this (on vdsm level or LVM level)

BTW regarding to Zdenek's question we pass all of the LUNs to the command as we are testing if they are already in use or not.

Comment 13 Roman Hodain 2015-06-13 12:03:38 UTC
Created attachment 1038248 [details]
pvcreate command run with the timestamps

Comment 14 Allon Mureinik 2015-06-14 14:39:50 UTC
Any change here will be too substantional and/or risky for 3.5.4.
Pushing out to 3.5.5 to reconsider.

Comment 18 Allon Mureinik 2015-10-07 07:49:28 UTC
Fred, this patch modifies getDeviceList's API, and should actually speed up several flows, not just adding a LUN disk.

Could you add some doctext detailing where we expect to see a performance gain?
Thanks!

Comment 19 Allon Mureinik 2015-10-08 21:56:37 UTC
Fred, if I'm reading the code correctly, I'd expect the following non-GUI flows to improve too:

- Add an iSCSI/FCP domain via REST (see BackendStorageDomainsResource)

- Import an iSCSI/FCP domain via GUI or REST (see GetUnregisteredBlockStorageDomainsQuery, BackendStorageDomainsResource)

- Adding a LUN disk via REST if a <host> parameter is passed (if it isn't, the LUN isn't validated, so there's no effect - see AddDiskCommand and DiskValidator).


Please confirm (and add to the doctext) or refute (and explain)

Comment 20 Eyal Edri 2015-10-14 14:52:39 UTC
fixing wrong fixed in version/build_id.

Comment 21 mlehrer 2016-02-11 16:47:13 UTC
Using:
   RHEVM: 3.6.3.1-0.1.el6
    VDSM: 3.6.3-2

with a host connected to storage with 183 possible luns.
VM was successful in attachment of direct lun and showing of list.

Details:
25s total time from Clicking "direct lun" to showing New Virtual Disk Window
2mins 4s total time to attach direct lun with 183 luns available to host

# time vdsClient -s 0 getDeviceList >/dev/null
real    1m32.897s
user    0m1.013s
sys     0m0.048s

# time vdsClient -s 0 getDeviceList >/dev/null
real    1m29.455s
user    0m1.043s
sys     0m0.056s

Please let me know if additional info is needed for this test case or if you would like to review the VDSM log from this test.

Comment 23 Fred Rolland 2016-02-14 09:06:38 UTC
(In reply to mlehrer from comment #21)

In order to test the change introduced, you should use the getDeviceList this way:

getDeviceList ISCSI False
or
getDeviceList FCP False


From vdsClient :
getDeviceList
	[storageType]
	[checkStatus]
	[<devlist>]
	List of all block devices (optionally - matching storageType, optionally - check device statusoptionally - of each device listed).
	    getDeviceList
	        return all devices
	    getDeviceList FCP
	        return only FCP devices
	    getDeviceList FCP True
	        return only FCP devices and perform 
	        PV creation test
	    getDeviceList ISCSI False guid1 guid2
	        return info for guid1 and guid2
	        , assuming ISCSI type and skip PV 
	        creation test

Comment 24 mlehrer 2016-02-14 13:32:22 UTC
total luns available: 383
         total paths: 1,532

The additional luns were added in order to increase the total number of paths which were previously 732 paths, and now are 1532 paths.


time vdsClient -s 0 getDeviceList ISCSI False > /dev/null

real    0m28.467s
user    0m1.973s
sys     0m0.047s

time vdsClient -s 0 getDeviceList FCP False > /dev/null                                                                                                                                    

real    0m27.981s
user    0m0.110s
sys     0m0.040s


and just for comparison sake:
time vdsClient -s 0 getDeviceList > /dev/null

real    9m37.496s
user    0m1.905s
sys     0m0.047s


UI:
39s to display "New Virtual Disk" Dialog for Direct lun Disk
34s to add disk of type direct lun to VM

Comment 25 Yaniv Lavi 2016-02-14 13:40:47 UTC
I would say this is reasonable with so many devices present. I think we can do better, but at least now it doesn't time out.

Freddy, any flows we still run "time vdsClient -s 0 getDeviceList > /dev/null" we should test? Because this could create a lot of issues since it's so heavy.

Comment 26 Fred Rolland 2016-02-14 14:06:38 UTC
For DC level 3.6, the only place we still do the PV creation test when executing GetDeviceList verb is through the REST API.
When getting the LUNs list, we need to provide the "Status" of the LUN.

It was kept for backward compatibility. Note that in scale setups, this REST API call will timeout as before.

Comment 27 Yaniv Lavi 2016-02-14 14:24:46 UTC
(In reply to Fred Rolland from comment #26)
> For DC level 3.6, the only place we still do the PV creation test when
> executing GetDeviceList verb is through the REST API.
> When getting the LUNs list, we need to provide the "Status" of the LUN.
> 
> It was kept for backward compatibility. Note that in scale setups, this REST
> API call will timeout as before.

Did we provide any alternative to this REST call that will work for scale?

Comment 28 Fred Rolland 2016-02-14 14:38:57 UTC
Not currently.

Comment 29 Yaniv Lavi 2016-02-14 14:52:10 UTC
Created BZ #1308350 to track API improvement.

Comment 33 errata-xmlrpc 2016-03-09 19:38:42 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0362.html