Description of problem: I have virtualization host (Xen in my case) with ~2300 registered guests (to be honest, it is just one host registered this may times). Accessing Details page of their host takes too long. Version-Release number of selected component (if applicable): This was reproduced on Satellite, but expected to be broken on current SW as well. How reproducible: always Steps to Reproduce: 1. Have a registered virt. host with a guest and register it multiple times 2. Make sure they appear in host's Virtulzation tab 3. Reload host's Details page and observe page load time Actual results: For ~2300 guests it took about 50 seconds on 1.3GHz x 24 cores, 4GB RAM machine. Expected results: Page load time should be independent on number of guests registered. Additional info: When I have tried to remove some systems and measure again as per tlestach's idea, these were the numbers: 0 guests deleted (i.e. ~2300 guests registered): 68 50 50 (3 attempmts to reload Details page, numbers are seconds) 500 guests deleted: 49 49 49 1000 guests deleted: 50 30 29 1500 guests deleted: 19 20 19 2000 guests deleted: 9 9 9 all guests deleted: 2 1 2
Can you reproduce on Spacewalk with PostgreSQL? Can you use select from the pg_stat_activity table to see what select it hangs on?
I have used system with 16 CPUs and 24 GB RAM and configured this in postgresql.conf (and restarted DB): log_min_messages = info log_min_duration_statement = 0 and used host with 1000 guests registered (actually one guest registered 1000 times). To load its system details page it takes 16s. To load system details page of another system without guests it takes 2 seconds. Comparing how many lines are generated in PostgreSQL logs when refreshing page with system details: system without guests: 1028 system with 1000 guests: 21138 I'll attach log for that big guest.
Is this still an active issue?
Looks like it is OK now. Tested on Satellite with 4 CPUs and 32 GB RAM with: satellite-schema-5.8.0.37-1.el6sat.noarch spacewalk-java-2.5.14-106.el6sat.noarch and these are the timings (I have noticed page under "Provisioning" is slow now, so measured it as well): 1306 guests: Detail [s]: 2.1 3.6 3.0 3.5 3.2 Provisioning [s]: 17.3 18.6 18.2 19.4 20.0 1000 guests: Detail [s]: 3.2 2.6 2.6 3.0 2.5 Provisioning [s]: 13.2 13.6 15.4 15.9 15.2 500 guests: Detail [s]: 3.3 3.0 3.1 2.8 3.5 Provisioning [s]: 6.8 9.7 8.5 9.3 9.6 25 guests (4 without profiles): Detail [s]: 2.7 2.8 2.7 2.7 2.5 Provisioning [s]: 3.8 3.9 3.7 3.1 3.6 Load speeds of "Performance" tab page seems very linear. Also when I had "log_min_duration_statement = 500" in PostgreSQL log, it did not logged any long query when reloading that page. So IMO this is nothing to report. Noticed this slow query when displaying Virtualization -> Details tab, but it is probably OK (given this was for 1306 guests while you will have tenths of guests per one host at maximum): 2017-11-28 02:26:25.806 EST LOG: duration: 1288.037 ms execute <unnamed>: SELECT DISTINCT VI.id, VI.host_system_id, VI.virtual_system_id, VI.uuid, COALESCE(VII.name, '(none)') AS NAME, COALESCE(VIS.name, '(unknown)') AS STATE_NAME, COALESCE(VIS.label, 'unknown') AS STATE_LABEL, COALESCE(VII.vcpus, 0) AS VCPUS, COALESCE(VII.memory_size_k, 0) AS MEMORY, COALESCE(((SELECT 1 FROM rhnUserServerPerms USP WHERE USP.user_id = $1 AND USP.server_id = VI.virtual_system_id)), 0 ) as accessible, rhn_channel.user_role_check((select channel_id from rhnServerOverview where server_id = VI.virtual_system_id), $2, 'subscribe') AS subscribable, 1 AS selectable FROM rhnVirtualInstance VI LEFT OUTER JOIN rhnVirtualInstanceInfo VII ON VI.id = VII.instance_id LEFT OUTER JOIN rhnVirtualInstanceState VIS ON VII.state = VIS.id WHERE EXISTS (SELECT 1 FROM rhnUserServerPerms USP WHERE USP.user_id = $3 AND USP.server_id = VI.host_system_id) AND VI.host_system_id = $4 AND VI.uuid IS NOT NULL ORDER BY NAME, VI.virtual_system_id, VI.uuid DESC 2017-11-28 02:26:25.806 EST DETAIL: parameters: $1 = '1', $2 = '1', $3 = '1', $4 = '1000021025'