1125428 – webUI becomes unresponsive and hit "The connection has timed out" error when trying to delete a configuration file

Bug 1125428 - webUI becomes unresponsive and hit "The connection has timed out" error when trying to delete a configuration file

Summary: webUI becomes unresponsive and hit "The connection has timed out" error when ...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Spacewalk
Classification:	Community
Component:	Server
Sub Component:
Version:	2.2
Hardware:	All
OS:	Linux
Priority:	urgent
Severity:	high
Target Milestone:	---
Assignee:	Stephen Herr
QA Contact:	Red Hat Satellite QA List
Docs Contact:
URL:
Whiteboard:
Depends On:	1115149
Blocks:	space23
TreeView+	depends on / blocked

Reported:	2014-07-31 19:33 UTC by Stephen Herr
Modified:	2015-04-14 19:02 UTC (History)
CC List:	5 users (show)
Fixed In Version:	spacewalk-schema-2.3.5-1
Clone Of:	1115149
Environment:
Last Closed:	2015-04-14 19:02:56 UTC
Embargoed:

Attachments	(Terms of Use)

Description Stephen Herr 2014-07-31 19:33:41 UTC

+++ This bug was initially created as a clone of Bug #1115149 +++

Description of problem:

++++++++++++++++++++++++
This issue is similar to the bug #906157. 

This customer had a RHN Satellite 5.4.1 and as suggested by the update https://bugzilla.redhat.com/show_bug.cgi?id=906157#c25. 
++++++++++++++++++++++++

 We were able to reproduce the issue internally. Even after upgrading to the RHN Satellite 5.6 + PostgreSQL the issue still continues. 

 We need to provide to the customer a solution for this. 


========
# rhn-schema-version 
5.6.0.18-1.el5sat

started remove the file
-----------------
ssl_access_log: - - [01/Jul/2014:15:43:33 -0400] "POST /rhn/configuration/ChannelFilesSubmit.do?ccid=21 HTTP/1.1" 200 11442


postgresql.log
-----
2014-07-01 15:44:29.228 EDT LOG:  checkpoints are occurring too frequently (24 seconds apart)
2014-07-01 15:44:29.228 EDT HINT:  Consider increasing the configuration parameter "checkpoint_segments".
2014-07-01 15:44:43.050 EDT LOG:  checkpoints are occurring too frequently (14 seconds apart)
2014-07-01 15:44:43.050 EDT HINT:  Consider increasing the configuration parameter "checkpoint_segments".

--------------
checkpoint_segments (integer)

    Maximum number of log file segments between automatic WAL checkpoints (each segment is normally 16 megabytes). The default is three segments. Increasing this parameter can increase the amount of time needed for crash recovery. This parameter can only be set in the postgresql.conf file or on the server command line.
-----------

  --> during the deletion, the webUI is unresponsive. 
  
  --> hit the timeout issue
  
  ---> it's happening the same problem as on Sat541/Sat55


top - 15:50:24 up 12:36,  3 users,  load average: 5.16, 4.18, 5.33
Tasks: 171 total,  11 running, 159 sleeping,   1 stopped,   0 zombie
Cpu(s): 76.2%us,  1.3%sy,  0.0%ni,  0.0%id, 21.9%wa,  0.7%hi,  0.0%si,  0.0%st
Mem:   4050976k total,  4025580k used,    25396k free,   103700k buffers
Swap:  1020116k total,       96k used,  1020020k free,  2592196k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                
 3915 postgres  16   0  539m 363m 354m R 51.6  9.2  19:00.68 postmaster                              
 3913 postgres  16   0  540m 355m 344m R 25.0  9.0  15:34.39 postmaster                              
16147 postgres  16   0  546m 375m 366m R  0.7  9.5   0:13.25 postmaster

  After +- 40 minutes, the file was removed and the webUI is back to respond.

========

 On RHN Satellite 5.5, still running on Oracle we can see the an interesting behavior. 

 # su - oracle
      $ db-control gather-stats   --> this will update the database stats and indexes
      $ exit

      # sqlplus $(spacewalk-cfg-get default_db)
      SQL> set timming on;
      SQL> exec rhn_config.delete_file(1971);

        PL/SQL procedure successfully completed.

        Elapsed: 01:08:03.78
        
      SQL> commit;
========

Version-Release number of selected component (if applicable):
Red Hat Satellite 5.6


Steps to Reproduce:
1. Try to delete a configuration file from the webUI.

Actual results:
Server become unresponsive



Expected results:
Should work as expected. 


Additional info:

 Running the SQL script mentioned on the article https://access.redhat.com/site/solutions/323263, we can see thousands of entries for the configuration files.


++++++++++++++++++++++++++++

SQL> @/tmp/est.sql

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.00

CONFIGREVISION_ID     ORG_ID RHNSNAPSHOTCONFIGREVISION RHNCONFIGFILE RHNACTIONCONFIGREVISION RHNACTIONCONFIGFILENAME
----------------- ---------- ------------------------- ------------- ----------------------- -----------------------
             2051          1                      1925             0                   45147                       0
             3010          1                     26327             1                  183615                       0

Elapsed: 00:00:15.39

SQL> exec rhn_config.delete_file(1971);

PL/SQL procedure successfully completed.

Elapsed: 01:08:03.78
++++++++++++++++++++++++++++

** trying another file via SQLplus ** 

SQL> @/tmp/est.sql

PL/SQL procedure successfully completed.


CONFIGREVISION_ID     ORG_ID RHNSNAPSHOTCONFIGREVISION RHNCONFIGFILE RHNACTIONCONFIGREVISION RHNACTIONCONFIGFILENAME
----------------- ---------- ------------------------- ------------- ----------------------- -----------------------
             5952          1                        30             0                       0                       0
             5972          1                         0             0                       0                       0
             5973          1                        44             0                       0                       0
             5992          1                         0             0                       0                       0
             5993          1                       883             0                      14                       0
             6054          1                      6012             1                     104                       0

6 rows selected.

 --workaround worked via SQLplus


SQL> set timing on;
SQL> exec rhn_config.delete_file(4850);

PL/SQL procedure successfully completed.

Elapsed: 00:01:23.06


PL/SQL procedure successfully completed.


CONFIGREVISION_ID     ORG_ID RHNSNAPSHOTCONFIGREVISION RHNCONFIGFILE RHNACTIONCONFIGREVISION RHNACTIONCONFIGFILENAME
----------------- ---------- ------------------------- ------------- ----------------------- -----------------------
	     6072	   1			  1576		   0			  20			   0
	     6133	   1			  3924		   1			  76			   0


SQL> select table_name, index_name, uniqueness, status, distinct_keys, num_rows from dba_indexes where table_name='RHNACTIONCONFIGREVISION';

TABLE_NAME		       INDEX_NAME		      UNIQUENES STATUS	 DISTINCT_KEYS	 NUM_ROWS
------------------------------ ------------------------------ --------- -------- ------------- ----------
RHNACTIONCONFIGREVISION        RHN_ACTIONCR_SID_IDX	      NONUNIQUE VALID		  1155	150954533
RHNACTIONCONFIGREVISION        RHN_ACTIONCR_ID_PK	      UNIQUE	VALID	     147883393	147883393
RHNACTIONCONFIGREVISION        RHN_ACTIONCR_AID_SID_CRID_UQ   UNIQUE	VALID	     152158027	152158027
RHNACTIONCONFIGREVISION        RHN_ACT_CR_CRID_IDX	      NONUNIQUE VALID		  1257	151787700


--- Additional comment from Stephen Herr on 2014-07-31 15:08:51 EDT ---

(In reply to Marcelo Moreira de Mello from comment #0)
> CONFIGREVISION_ID     ORG_ID RHNSNAPSHOTCONFIGREVISION RHNCONFIGFILE
> RHNACTIONCONFIGREVISION RHNACTIONCONFIGFILENAME
> ----------------- ---------- ------------------------- -------------
> ----------------------- -----------------------
> 	     6072	   1			  1576		   0			  20			   0
> 	     6133	   1			  3924		   1			  76			   0
> 
> 
> SQL> select table_name, index_name, uniqueness, status, distinct_keys,
> num_rows from dba_indexes where table_name='RHNACTIONCONFIGREVISION';
> 
> TABLE_NAME		       INDEX_NAME		      UNIQUENES STATUS	 DISTINCT_KEYS	
> NUM_ROWS
> ------------------------------ ------------------------------ ---------
> -------- ------------- ----------
> RHNACTIONCONFIGREVISION        RHN_ACTIONCR_SID_IDX	      NONUNIQUE VALID		 
> 1155	150954533
> RHNACTIONCONFIGREVISION        RHN_ACTIONCR_ID_PK	      UNIQUE	VALID	    
> 147883393	147883393
> RHNACTIONCONFIGREVISION        RHN_ACTIONCR_AID_SID_CRID_UQ   UNIQUE	VALID	 
> 152158027	152158027
> RHNACTIONCONFIGREVISION        RHN_ACT_CR_CRID_IDX	      NONUNIQUE VALID		 
> 1257	151787700

This right here is the problem, the customer just has an too many stored actions. There are a few things I can tweek, but the vast majority of the performance problem is coming from the size of the table. The appropriate indexes are in place for rhnActionConfigRevision and the explain plans are reasonable, but even so it takes a while to work on a 150 Million row table. 

The only real solution there is for this problem is for the customer to delete her old actions. See https://access.redhat.com/solutions/93253 .

Comment 1 Stephen Herr 2014-07-31 20:02:28 UTC

Committing to Spacewalk master:
373c48072798863b8f8c5914244f36037d08a54d

This commit will allow config file deletion to be somewhat faster in the prescence of many system snapshots, however the user is still going to have to get rid of some old actions to make config file deletion be reasonably fast.

Comment 2 Grant Gainey 2015-03-23 16:58:56 UTC

Moving bugs to ON_QA as we move to release Spacewalk 2.3

Comment 3 Grant Gainey 2015-04-14 19:02:56 UTC

Spacewalk 2.3 has been released. See

https://fedorahosted.org/spacewalk/wiki/ReleaseNotes23

Note You need to log in before you can comment on or make changes to this bug.