Hide Forgot
Description of problem: When running tests we can see a significant change in execution time, between environments with rhels and environments with rhvhs notice: there is a difference when the rhvhs are connected, even without running anything on them (the tests are only using another rhel host which is also in the environment) Version-Release number of selected component (if applicable): How reproducible: 100% Steps to Reproduce: 1. create a setup with only rhels 2. create a setup with at least one rhvh 3. run some rest request (we saw it in all requests, but major differences were seen in refresh capabilities & setup networks) and compare response time Actual results: I have multiple examples in the attached logs, but for example: SEARCH: 2016-11-03 17:45:01,062 - MainThread - clusters - DEBUG - SEARCH request content is -- url:https://jenkins-vm-12.scl.lab.tlv.redhat.com/ovirt-engine/api/clusters?search=name%3Dgolden_env_mixed_1 2016-11-03 17:45:01,492 - MainThread - core_api - DEBUG - Request GET response time: 0.320 2016-11-10 16:49:50,374 - MainThread - clusters - DEBUG - SEARCH request content is -- url:https://jenkins-vm-12.scl.lab.tlv.redhat.com/ovirt-engine/api/clusters?search=name%3Dgolden_env_mixed_1 2016-11-10 16:49:50,593 - MainThread - core_api - DEBUG - Request GET response time: 0.140 PUT: 2016-11-03 18:01:32,851 - MainThread - macpools - INFO - Using Correlation-Id: macpools_update_4a69df48-ad13-45aa 2016-11-03 18:01:33,207 - MainThread - core_api - DEBUG - Request PUT response time: 0.400 2016-11-10 17:03:04,004 - MainThread - macpools - INFO - Using Correlation-Id: macpools_update_1cd61eb2-5d3f-47ea 2016-11-10 17:03:04,293 - MainThread - core_api - DEBUG - Request PUT response time: 0.160 DELETE: 2016-11-03 18:01:36,571 - MainThread - datacenters - DEBUG - DELETE request content is -- url:/ovirt-engine/api/datacenters/fdb4d54a-453d-4f1e-bbcb-64c4647fd8cf 2016-11-03 18:01:37,146 - MainThread - core_api - DEBUG - Request DELETE response time: 0.640 2016-11-10 17:03:06,612 - MainThread - datacenters - DEBUG - DELETE request content is -- url:/ovirt-engine/api/datacenters/5ce11483-317b-435d-a93c-a7f6ce53ca50 2016-11-10 17:03:07,087 - MainThread - core_api - DEBUG - Request DELETE response time: 0.160 Expected results: Response time should be the same, no matter which host type is connected to the engine Additional info: With Roy's help we were able to see that when running explain analyze select * from vds;, the response time with rhvhs connected is much longer than with only rhels connected with rhels its around 2ms, actually <2ms most of the time with rhvhs its around 2.8ms and sometimes above 3ms we mostly exxecuted network tier1 tests at the beginning we saw that on the rhels environment the tests took 23 mins and on an environment with rhevh it took 30 mins we tried stopping the dwh service to see if it has affect, but it appears that it doesnt - last executions with and without it on an environment with rhevhs took 27-28 mins
Created attachment 1221525 [details] execution with rhevh
Created attachment 1221526 [details] execution with rhels
Can you try to narrow it down, and check whether specific parts took longer? Any other differences between the two runs?
we see it in each rest api call, the time until the engine answers is just longer.
Please supply the engine.log with DEBUG to spot where is the time spent.
Created attachment 1223056 [details] debug logs
From the logs analysis, I was able to count 1204 times where "The exclusive lock for key *** is released and lock is removed from map" freezes for about ~3 seconds before moving to the next operation. 2016-11-16 03:06:11,838 DEBUG [org.ovirt.engine.core.bll.lock.InMemoryLockManager] (DefaultQuartzScheduler10) [62e8b35a] The exclusive lock for key 'e8933e9f-665e-4186-80bf-f17e94c44e8cVDS_INIT' is released and lock is removed from map 2016-11-16 03:06:14,139 DEBUG [org.ovirt.engine.core.bll.lock.InMemoryLockManager] (DefaultQuartzScheduler8) [36c519c3] Before acquiring lock 'EngineLock:{exclusiveLocks='[fec8135e-5910-4412-bc9a-bae12f1fa757=<VDS_INIT, >]', sharedLocks='null'}'
Could it be related to gluster? In the logs I see: 2016-11-15 06:12:45,946 DEBUG [org.ovirt.engine.core.bll.gluster.GlusterSyncJob] (DefaultQuartzScheduler3) [4f6836ab] Refreshing Gluster Data [lightweight] 2016-11-15 06:12:45,946 DEBUG [org.ovirt.engine.core.dal.dbbroker.PostgresDbEngineDialect$PostgresJdbcTemplate] (DefaultQuartzScheduler3) [4f6836ab] Executing prepared SQL query 2016-11-15 06:12:45,946 DEBUG [org.ovirt.engine.core.dal.dbbroker.PostgresDbEngineDialect$PostgresJdbcTemplate] (DefaultQuartzScheduler3) [4f6836ab] Executing prepared SQL statement [select * from getallfromcluster(?, ?)] 2016-11-15 06:12:46,548 DEBUG [org.ovirt.engine.core.bll.lock.InMemoryLockManager] (DefaultQuartzScheduler8) [66ca7030] Before acquiring lock 'EngineLock:{exclusiveLocks='[e8933e9f-665e-4186-80bf-f17e94c44e8c=<VDS_INIT, >]', sharedLocks='null'}' 2016-11-15 06:12:46,548 DEBUG [org.ovirt.engine.core.bll.lock.InMemoryLockManager] (DefaultQuartzScheduler8) [66ca7030] Success acquiring lock 'EngineLock:{exclusiveLocks='[e8933e9f-665e-4186-80bf-f17e94c44e8c=<VDS_INIT, >]', sharedLocks='null'}' succeeded 2016-11-15 06:12:46,548 DEBUG [org.ovirt.engine.core.dal.dbbroker.PostgresDbEngineDialect$PostgresJdbcTemplate] (DefaultQuartzScheduler8) [66ca7030] Executing prepared SQL query 2016-11-15 06:12:46,548 DEBUG [org.ovirt.engine.core.dal.dbbroker.PostgresDbEngineDialect$PostgresJdbcTemplate] (DefaultQuartzScheduler8) [66ca7030] Executing prepared SQL statement [select * from getvdsbyvdsid(?, ?, ?)] 2016-11-15 06:12:46,554 DEBUG [org.ovirt.engine.core.bll.lock.InMemoryLockManager] (DefaultQuartzScheduler8) [66ca7030] Before releasing a lock 'EngineLock:{exclusiveLocks='[e8933e9f-665e-4186-80bf-f17e94c44e8c=<VDS_INIT, >]', sharedLocks='null'}' 2016-11-15 06:12:46,554 DEBUG [org.ovirt.engine.core.bll.lock.InMemoryLockManager] (DefaultQuartzScheduler8) [66ca7030] The exclusive lock for key 'e8933e9f-665e-4186-80bf-f17e94c44e8cVDS_INIT' is released and lock is removed from map 2016-11-15 06:12:49,555 DEBUG [org.ovirt.engine.core.bll.lock.InMemoryLockManager] (DefaultQuartzScheduler5) [7535ab8a] Before acquiring lock 'EngineLock:{exclusiveLocks='[e8933e9f-665e-4186-80bf-f17e94c44e8c=<VDS_INIT, >]', sharedLocks='null'}' 2016-11-15 06:12:49,555 DEBUG [org.ovirt.engine.core.bll.lock.InMemoryLockManager] (DefaultQuartzScheduler5) [7535ab8a] Success acquiring lock 'EngineLock:{exclusiveLocks='[e8933e9f-665e-4186-80bf-f17e94c44e8c=<VDS_INIT, >]', sharedLocks='null'}' succeeded So the locking mentioned in comment 7 seems to be accompanied by the gluster refresh. Nelly, could you please provide the logs of the a good run? (So the RHEL logs for the same run you used to gather the logs in comment 6)
We have the same gluster version on rhels & rhvhs glusterfs-3.7.9-12.el7.x86_64
Created attachment 1226750 [details] Engine log on rhels only
Not sure whether this still reproduces, but we haven't identified any change that might cause this. Closing as wontfix.