Bug 826602

Summary: Connections hang daily in hosted Candlepin
Product: [Community] Candlepin Reporter: Samuel Munilla <smunilla>
Component: candlepinAssignee: Chris Duryee <cduryee>
Status: CLOSED CURRENTRELEASE QA Contact: Eric Sammons <esammons>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 0.9CC: awood, cduryee
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 826663 (view as bug list) Environment:
Last Closed: 2012-05-30 19:47:34 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 826663    

Description Samuel Munilla 2012-05-30 15:11:26 UTC
Description of problem: The Tomcat instances of Candlepin seem to intermittently hang and cause all requests to fail. This seems to likely be cause by c3p0.


Version-Release number of selected component (if applicable): c3p0-0.9.0


How reproducible: Happens in QA and Stage daily.


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:

The Ruby tier triggers a nagios alert and the following is found in the logs:

"Exception PartialOutageException: Could not find shard for XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX"

In the Candlepin logs, among other errors, we see:

"Caused by: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: The last packet successfully received from the server was 34,864,978 milliseconds ago.  The last packet sent successfully to the server was 34,864,979 milliseconds ago. is longer than the server configured value of 'wait_timeout'. You should consider either expiring and/or testing connection validity before use in your application, increasing the server configured values for client timeouts, or using the Connector/J connection property 'autoReconnect=true' to avoid this problem."

Additional info: Our current version of c3p0 is from 2005

Comment 1 Chris Duryee 2012-05-30 15:23:03 UTC
Sam,

Is 0.9.1.2 ok? That seems to be the latest version.

Also, how difficult is it to reproduce this issue?

Comment 2 Samuel Munilla 2012-05-30 15:36:02 UTC
0.9.1.2 works. 

The issue crops up daily in QA and Stage. I don't think we can reproduce it on demand.

Comment 3 Samuel Munilla 2012-05-30 15:38:14 UTC
(In reply to comment #1)
> Sam,
> 
> Is 0.9.1.2 ok? That seems to be the latest version.
> 
> Also, how difficult is it to reproduce this issue?

Comment 4 Alex Wood 2012-05-30 19:47:34 UTC
Fixed in branch 0.5 with 19dac38c215768ba3f7e244f5d7f8f62da396f6e

Released with candlepin-0.5.33-1