Bug 854438

Summary: very big number of guest systems registered causes long responses on Details page of their host
Product: [Community] Spacewalk Reporter: Jan Hutař <jhutar>
Component: WebUIAssignee: Tomas Lestach <tlestach>
Status: CLOSED CURRENTRELEASE QA Contact: Red Hat Satellite QA List <satqe-list>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 1.8CC: jhutar
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-11-28 08:31:31 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1564160    

Description Jan Hutař 2012-09-05 04:02:40 UTC
Description of problem:
I have virtualization host (Xen in my case) with ~2300 registered guests (to be honest, it is just one host registered this may times). Accessing Details page of their host takes too long.


Version-Release number of selected component (if applicable):
This was reproduced on Satellite, but expected to be broken on current SW as well.


How reproducible:
always


Steps to Reproduce:
1. Have a registered virt. host with a guest and register it multiple times
2. Make sure they appear in host's Virtulzation tab
3. Reload host's Details page and observe page load time


Actual results:
For ~2300 guests it took about 50 seconds on 1.3GHz x 24 cores, 4GB RAM machine.


Expected results:
Page load time should be independent on number of guests registered.


Additional info:
When I have tried to remove some systems and measure again as per tlestach's idea, these were the numbers:
0 guests deleted (i.e. ~2300 guests registered): 68 50 50 (3 attempmts to reload Details page, numbers are seconds)
500 guests deleted: 49 49 49
1000 guests deleted: 50 30 29
1500 guests deleted: 19 20 19
2000 guests deleted: 9 9 9
all guests deleted: 2 1 2

Comment 1 Jan Pazdziora 2013-03-08 15:14:31 UTC
Can you reproduce on Spacewalk with PostgreSQL? Can you use select from the pg_stat_activity table to see what select it hangs on?

Comment 2 Jan Hutař 2013-04-22 11:04:04 UTC
I have used system with 16 CPUs and 24 GB RAM and configured this in postgresql.conf (and restarted DB):

log_min_messages = info
log_min_duration_statement = 0

and used host with 1000 guests registered (actually one guest registered 1000 times). To load its system details page it takes 16s. To load system details page of another system without guests it takes 2 seconds.

Comparing how many lines are generated in PostgreSQL logs when refreshing page with system details:

system without guests: 1028
system with 1000 guests: 21138

I'll attach log for that big guest.

Comment 5 Jan Pazdziora 2017-10-18 07:41:45 UTC
Is this still an active issue?

Comment 6 Jan Hutař 2017-11-28 08:31:31 UTC
Looks like it is OK now. Tested on Satellite with 4 CPUs and 32 GB RAM with:

    satellite-schema-5.8.0.37-1.el6sat.noarch
    spacewalk-java-2.5.14-106.el6sat.noarch

and these are the timings (I have noticed page under "Provisioning" is slow now, so measured it as well):

1306 guests:
    Detail [s]: 2.1 3.6 3.0 3.5 3.2
    Provisioning [s]: 17.3 18.6 18.2 19.4 20.0

1000 guests:
    Detail [s]: 3.2 2.6 2.6 3.0 2.5
    Provisioning [s]: 13.2 13.6 15.4 15.9 15.2

500 guests:
    Detail [s]: 3.3 3.0 3.1 2.8 3.5
    Provisioning [s]: 6.8 9.7 8.5 9.3 9.6

25 guests (4 without profiles):
    Detail [s]: 2.7 2.8 2.7 2.7 2.5
    Provisioning [s]: 3.8 3.9 3.7 3.1 3.6

Load speeds of "Performance" tab page seems very linear. Also when I had "log_min_duration_statement = 500" in PostgreSQL log, it did not logged any long query when reloading that page. So IMO this is nothing to report.

Noticed this slow query when displaying Virtualization -> Details tab, but it is probably OK (given this was for 1306 guests while you will have tenths of guests per one host at maximum):

2017-11-28 02:26:25.806 EST LOG:  duration: 1288.037 ms  execute <unnamed>: SELECT
	  DISTINCT VI.id,
	           VI.host_system_id,
	           VI.virtual_system_id,
	           VI.uuid,
	         COALESCE(VII.name, '(none)') AS NAME,
	         COALESCE(VIS.name, '(unknown)') AS STATE_NAME,
	         COALESCE(VIS.label, 'unknown') AS STATE_LABEL,
	     COALESCE(VII.vcpus, 0) AS VCPUS,
	     COALESCE(VII.memory_size_k, 0) AS MEMORY,
	     COALESCE(((SELECT 1 FROM rhnUserServerPerms USP WHERE USP.user_id = $1 AND USP.server_id = VI.virtual_system_id)), 0 ) as accessible,
	         rhn_channel.user_role_check((select channel_id from rhnServerOverview where server_id = VI.virtual_system_id), $2, 'subscribe') AS subscribable,
	         1 AS selectable
	    FROM rhnVirtualInstance VI
	         LEFT OUTER JOIN rhnVirtualInstanceInfo VII ON VI.id = VII.instance_id
	         LEFT OUTER JOIN rhnVirtualInstanceState VIS ON VII.state = VIS.id
	   WHERE EXISTS (SELECT 1
	                   FROM rhnUserServerPerms USP
	                  WHERE USP.user_id = $3
	                    AND USP.server_id = VI.host_system_id)
	     AND VI.host_system_id = $4
	     AND VI.uuid IS NOT NULL
	ORDER BY NAME, VI.virtual_system_id, VI.uuid DESC
2017-11-28 02:26:25.806 EST DETAIL:  parameters: $1 = '1', $2 = '1', $3 = '1', $4 = '1000021025'