Description of problem: If postgresql is restarted while tomcat is up and running, the connection to the database is never recovered. Attached files: catalina.out and rhn_taskomatic_daemon.log Version-Release number of selected component (if applicable): 1.5 How reproducible: always Steps to Reproduce: 1. go to https://myserver/rhn and log on 2. service postgresql restart 3. refresh the page --> Error 500 Actual results: error 500 Expected results: Spacewalk should recover automatically the connection to the DB. Additional info: it SEEMS that osa-dispatcher reconnects correctly (not sure by reading the logs)
Created attachment 533483 [details] TaskOMatic log
Created attachment 533484 [details] Tomcat log
Additional tests: In file /etc/rhn/default/rhn_hibernate.conf, the following parameter is set: # test period value in seconds hibernate.c3p0.idle_test_period=300 I just made a test: 5 minutes (300 seconds) after postgresql restart, the webui is functional again. But TaskOMatic is still failing with the same error and needs a restart. It seems thant TaskOmatic never flushes its connection. Does it uses c3p0 connection pool? From my point of view, idle_test_period value should be decreased. 5 minutes is quite a long period.
This issue should be fixed by commit 2d41929a4ae4fc62f6b4c46a77d8b00f6972269a 753728 - test database connection prior running query
Should parameter testConnectionOnCheckout really be set? Hibernate don't recommend to use this. Must be set in c3p0.properties, C3P0 default: false Don't use it, this feature is very expensive. If set to true, an operation will be performed at every connection checkout to verify that the connection is valid. A better choice is to verify connections periodically using c3p0.idleConnectionTestPeriod.
It's 5ms per request which is not that expensive. Periodic connection test either won't eliminate ISE (in case of long timeout) or will overload server (short timeout).
Spacewalk 1.6 has been released.