Red Hat Bugzilla – Bug 854438
very big number of guest systems registered causes long responses on Details page of their host
Last modified: 2017-11-28 03:31:31 EST
Description of problem:
I have virtualization host (Xen in my case) with ~2300 registered guests (to be honest, it is just one host registered this may times). Accessing Details page of their host takes too long.
Version-Release number of selected component (if applicable):
This was reproduced on Satellite, but expected to be broken on current SW as well.
Steps to Reproduce:
1. Have a registered virt. host with a guest and register it multiple times
2. Make sure they appear in host's Virtulzation tab
3. Reload host's Details page and observe page load time
For ~2300 guests it took about 50 seconds on 1.3GHz x 24 cores, 4GB RAM machine.
Page load time should be independent on number of guests registered.
When I have tried to remove some systems and measure again as per tlestach's idea, these were the numbers:
0 guests deleted (i.e. ~2300 guests registered): 68 50 50 (3 attempmts to reload Details page, numbers are seconds)
500 guests deleted: 49 49 49
1000 guests deleted: 50 30 29
1500 guests deleted: 19 20 19
2000 guests deleted: 9 9 9
all guests deleted: 2 1 2
Can you reproduce on Spacewalk with PostgreSQL? Can you use select from the pg_stat_activity table to see what select it hangs on?
I have used system with 16 CPUs and 24 GB RAM and configured this in postgresql.conf (and restarted DB):
log_min_messages = info
log_min_duration_statement = 0
and used host with 1000 guests registered (actually one guest registered 1000 times). To load its system details page it takes 16s. To load system details page of another system without guests it takes 2 seconds.
Comparing how many lines are generated in PostgreSQL logs when refreshing page with system details:
system without guests: 1028
system with 1000 guests: 21138
I'll attach log for that big guest.
Is this still an active issue?
Looks like it is OK now. Tested on Satellite with 4 CPUs and 32 GB RAM with:
and these are the timings (I have noticed page under "Provisioning" is slow now, so measured it as well):
Detail [s]: 2.1 3.6 3.0 3.5 3.2
Provisioning [s]: 17.3 18.6 18.2 19.4 20.0
Detail [s]: 3.2 2.6 2.6 3.0 2.5
Provisioning [s]: 13.2 13.6 15.4 15.9 15.2
Detail [s]: 3.3 3.0 3.1 2.8 3.5
Provisioning [s]: 6.8 9.7 8.5 9.3 9.6
25 guests (4 without profiles):
Detail [s]: 2.7 2.8 2.7 2.7 2.5
Provisioning [s]: 3.8 3.9 3.7 3.1 3.6
Load speeds of "Performance" tab page seems very linear. Also when I had "log_min_duration_statement = 500" in PostgreSQL log, it did not logged any long query when reloading that page. So IMO this is nothing to report.
Noticed this slow query when displaying Virtualization -> Details tab, but it is probably OK (given this was for 1306 guests while you will have tenths of guests per one host at maximum):
2017-11-28 02:26:25.806 EST LOG: duration: 1288.037 ms execute <unnamed>: SELECT
COALESCE(VII.name, '(none)') AS NAME,
COALESCE(VIS.name, '(unknown)') AS STATE_NAME,
COALESCE(VIS.label, 'unknown') AS STATE_LABEL,
COALESCE(VII.vcpus, 0) AS VCPUS,
COALESCE(VII.memory_size_k, 0) AS MEMORY,
COALESCE(((SELECT 1 FROM rhnUserServerPerms USP WHERE USP.user_id = $1 AND USP.server_id = VI.virtual_system_id)), 0 ) as accessible,
rhn_channel.user_role_check((select channel_id from rhnServerOverview where server_id = VI.virtual_system_id), $2, 'subscribe') AS subscribable,
1 AS selectable
FROM rhnVirtualInstance VI
LEFT OUTER JOIN rhnVirtualInstanceInfo VII ON VI.id = VII.instance_id
LEFT OUTER JOIN rhnVirtualInstanceState VIS ON VII.state = VIS.id
WHERE EXISTS (SELECT 1
FROM rhnUserServerPerms USP
WHERE USP.user_id = $3
AND USP.server_id = VI.host_system_id)
AND VI.host_system_id = $4
AND VI.uuid IS NOT NULL
ORDER BY NAME, VI.virtual_system_id, VI.uuid DESC
2017-11-28 02:26:25.806 EST DETAIL: parameters: $1 = '1', $2 = '1', $3 = '1', $4 = '1000021025'