+++ This bug was initially created as a clone of Bug #1115149 +++ Description of problem: ++++++++++++++++++++++++ This issue is similar to the bug #906157. This customer had a RHN Satellite 5.4.1 and as suggested by the update https://bugzilla.redhat.com/show_bug.cgi?id=906157#c25. ++++++++++++++++++++++++ We were able to reproduce the issue internally. Even after upgrading to the RHN Satellite 5.6 + PostgreSQL the issue still continues. We need to provide to the customer a solution for this. ======== # rhn-schema-version 5.6.0.18-1.el5sat started remove the file ----------------- ssl_access_log: - - [01/Jul/2014:15:43:33 -0400] "POST /rhn/configuration/ChannelFilesSubmit.do?ccid=21 HTTP/1.1" 200 11442 postgresql.log ----- 2014-07-01 15:44:29.228 EDT LOG: checkpoints are occurring too frequently (24 seconds apart) 2014-07-01 15:44:29.228 EDT HINT: Consider increasing the configuration parameter "checkpoint_segments". 2014-07-01 15:44:43.050 EDT LOG: checkpoints are occurring too frequently (14 seconds apart) 2014-07-01 15:44:43.050 EDT HINT: Consider increasing the configuration parameter "checkpoint_segments". -------------- checkpoint_segments (integer) Maximum number of log file segments between automatic WAL checkpoints (each segment is normally 16 megabytes). The default is three segments. Increasing this parameter can increase the amount of time needed for crash recovery. This parameter can only be set in the postgresql.conf file or on the server command line. ----------- --> during the deletion, the webUI is unresponsive. --> hit the timeout issue ---> it's happening the same problem as on Sat541/Sat55 top - 15:50:24 up 12:36, 3 users, load average: 5.16, 4.18, 5.33 Tasks: 171 total, 11 running, 159 sleeping, 1 stopped, 0 zombie Cpu(s): 76.2%us, 1.3%sy, 0.0%ni, 0.0%id, 21.9%wa, 0.7%hi, 0.0%si, 0.0%st Mem: 4050976k total, 4025580k used, 25396k free, 103700k buffers Swap: 1020116k total, 96k used, 1020020k free, 2592196k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 3915 postgres 16 0 539m 363m 354m R 51.6 9.2 19:00.68 postmaster 3913 postgres 16 0 540m 355m 344m R 25.0 9.0 15:34.39 postmaster 16147 postgres 16 0 546m 375m 366m R 0.7 9.5 0:13.25 postmaster After +- 40 minutes, the file was removed and the webUI is back to respond. ======== On RHN Satellite 5.5, still running on Oracle we can see the an interesting behavior. # su - oracle $ db-control gather-stats --> this will update the database stats and indexes $ exit # sqlplus $(spacewalk-cfg-get default_db) SQL> set timming on; SQL> exec rhn_config.delete_file(1971); PL/SQL procedure successfully completed. Elapsed: 01:08:03.78 SQL> commit; ======== Version-Release number of selected component (if applicable): Red Hat Satellite 5.6 Steps to Reproduce: 1. Try to delete a configuration file from the webUI. Actual results: Server become unresponsive Expected results: Should work as expected. Additional info: Running the SQL script mentioned on the article https://access.redhat.com/site/solutions/323263, we can see thousands of entries for the configuration files. ++++++++++++++++++++++++++++ SQL> @/tmp/est.sql PL/SQL procedure successfully completed. Elapsed: 00:00:00.00 CONFIGREVISION_ID ORG_ID RHNSNAPSHOTCONFIGREVISION RHNCONFIGFILE RHNACTIONCONFIGREVISION RHNACTIONCONFIGFILENAME ----------------- ---------- ------------------------- ------------- ----------------------- ----------------------- 2051 1 1925 0 45147 0 3010 1 26327 1 183615 0 Elapsed: 00:00:15.39 SQL> exec rhn_config.delete_file(1971); PL/SQL procedure successfully completed. Elapsed: 01:08:03.78 ++++++++++++++++++++++++++++ ** trying another file via SQLplus ** SQL> @/tmp/est.sql PL/SQL procedure successfully completed. CONFIGREVISION_ID ORG_ID RHNSNAPSHOTCONFIGREVISION RHNCONFIGFILE RHNACTIONCONFIGREVISION RHNACTIONCONFIGFILENAME ----------------- ---------- ------------------------- ------------- ----------------------- ----------------------- 5952 1 30 0 0 0 5972 1 0 0 0 0 5973 1 44 0 0 0 5992 1 0 0 0 0 5993 1 883 0 14 0 6054 1 6012 1 104 0 6 rows selected. --workaround worked via SQLplus SQL> set timing on; SQL> exec rhn_config.delete_file(4850); PL/SQL procedure successfully completed. Elapsed: 00:01:23.06 PL/SQL procedure successfully completed. CONFIGREVISION_ID ORG_ID RHNSNAPSHOTCONFIGREVISION RHNCONFIGFILE RHNACTIONCONFIGREVISION RHNACTIONCONFIGFILENAME ----------------- ---------- ------------------------- ------------- ----------------------- ----------------------- 6072 1 1576 0 20 0 6133 1 3924 1 76 0 SQL> select table_name, index_name, uniqueness, status, distinct_keys, num_rows from dba_indexes where table_name='RHNACTIONCONFIGREVISION'; TABLE_NAME INDEX_NAME UNIQUENES STATUS DISTINCT_KEYS NUM_ROWS ------------------------------ ------------------------------ --------- -------- ------------- ---------- RHNACTIONCONFIGREVISION RHN_ACTIONCR_SID_IDX NONUNIQUE VALID 1155 150954533 RHNACTIONCONFIGREVISION RHN_ACTIONCR_ID_PK UNIQUE VALID 147883393 147883393 RHNACTIONCONFIGREVISION RHN_ACTIONCR_AID_SID_CRID_UQ UNIQUE VALID 152158027 152158027 RHNACTIONCONFIGREVISION RHN_ACT_CR_CRID_IDX NONUNIQUE VALID 1257 151787700 --- Additional comment from Stephen Herr on 2014-07-31 15:08:51 EDT --- (In reply to Marcelo Moreira de Mello from comment #0) > CONFIGREVISION_ID ORG_ID RHNSNAPSHOTCONFIGREVISION RHNCONFIGFILE > RHNACTIONCONFIGREVISION RHNACTIONCONFIGFILENAME > ----------------- ---------- ------------------------- ------------- > ----------------------- ----------------------- > 6072 1 1576 0 20 0 > 6133 1 3924 1 76 0 > > > SQL> select table_name, index_name, uniqueness, status, distinct_keys, > num_rows from dba_indexes where table_name='RHNACTIONCONFIGREVISION'; > > TABLE_NAME INDEX_NAME UNIQUENES STATUS DISTINCT_KEYS > NUM_ROWS > ------------------------------ ------------------------------ --------- > -------- ------------- ---------- > RHNACTIONCONFIGREVISION RHN_ACTIONCR_SID_IDX NONUNIQUE VALID > 1155 150954533 > RHNACTIONCONFIGREVISION RHN_ACTIONCR_ID_PK UNIQUE VALID > 147883393 147883393 > RHNACTIONCONFIGREVISION RHN_ACTIONCR_AID_SID_CRID_UQ UNIQUE VALID > 152158027 152158027 > RHNACTIONCONFIGREVISION RHN_ACT_CR_CRID_IDX NONUNIQUE VALID > 1257 151787700 This right here is the problem, the customer just has an too many stored actions. There are a few things I can tweek, but the vast majority of the performance problem is coming from the size of the table. The appropriate indexes are in place for rhnActionConfigRevision and the explain plans are reasonable, but even so it takes a while to work on a 150 Million row table. The only real solution there is for this problem is for the customer to delete her old actions. See https://access.redhat.com/solutions/93253 .
Committing to Spacewalk master: 373c48072798863b8f8c5914244f36037d08a54d This commit will allow config file deletion to be somewhat faster in the prescence of many system snapshots, however the user is still going to have to get rid of some old actions to make config file deletion be reasonably fast.
Moving bugs to ON_QA as we move to release Spacewalk 2.3
Spacewalk 2.3 has been released. See https://fedorahosted.org/spacewalk/wiki/ReleaseNotes23