Bug 753728

Summary: ISE when postgresql is restarted and not spacewalk
Product: [Community] Spacewalk Reporter: pierre.casenove
Component: ServerAssignee: Michael Mráka <mmraka>
Status: CLOSED CURRENTRELEASE QA Contact: Red Hat Satellite QA List <satqe-list>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 1.5CC: slukasik
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: spacewalk-java-1.6.98-1 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-12-22 16:49:39 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 723481    
Attachments:
Description Flags
TaskOMatic log
none
Tomcat log none

Description pierre.casenove 2011-11-14 10:10:58 UTC
Description of problem:
If postgresql is restarted while tomcat is up and running, the connection to the database is never recovered.
Attached files: catalina.out and rhn_taskomatic_daemon.log


Version-Release number of selected component (if applicable): 1.5


How reproducible: always


Steps to Reproduce:
1. go to https://myserver/rhn and log on
2. service postgresql restart
3. refresh the page --> Error 500
  
Actual results:
error 500


Expected results:
Spacewalk should recover automatically the connection to the DB.


Additional info:
it SEEMS that osa-dispatcher reconnects correctly (not sure by reading the logs)

Comment 1 pierre.casenove 2011-11-14 10:11:32 UTC
Created attachment 533483 [details]
TaskOMatic log

Comment 2 pierre.casenove 2011-11-14 10:12:03 UTC
Created attachment 533484 [details]
Tomcat log

Comment 3 pierre.casenove 2011-11-14 13:01:27 UTC
Additional tests:
In file /etc/rhn/default/rhn_hibernate.conf, the following parameter is set:
# test period value in seconds
hibernate.c3p0.idle_test_period=300

I just made a test: 5 minutes (300 seconds) after postgresql restart, the webui is functional again. But TaskOMatic is still failing with the same error and needs a restart. It seems thant TaskOmatic never flushes its connection. Does it uses c3p0 connection pool?

From my point of view, idle_test_period value should be decreased. 5 minutes is quite a long period.

Comment 4 Michael Mráka 2011-12-20 10:00:59 UTC
This issue should be fixed by 
commit 2d41929a4ae4fc62f6b4c46a77d8b00f6972269a
    753728 - test database connection prior running query

Comment 5 pierre.casenove 2011-12-20 10:08:51 UTC
Should parameter testConnectionOnCheckout really be set? Hibernate don't recommend to use this.
Must be set in c3p0.properties, C3P0 default: false
Don't use it, this feature is very expensive. If set to true, an operation will be performed at every connection checkout to verify that the connection is valid. A better choice is to verify connections periodically using c3p0.idleConnectionTestPeriod.

Comment 6 Michael Mráka 2011-12-20 13:13:43 UTC
It's 5ms per request which is not that expensive. Periodic connection test either won't eliminate ISE (in case of long timeout) or will overload server (short timeout).

Comment 7 Milan Zázrivec 2011-12-22 16:49:39 UTC
Spacewalk 1.6 has been released.