Bug 826663 - Connections hang daily in hosted Candlepin
Summary: Connections hang daily in hosted Candlepin
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Subscription Asset Manager
Classification: Retired
Component: candlepin
Version: 1.0.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: 1.X
Assignee: Bryan Kearney
QA Contact: SAM QE List
URL:
Whiteboard:
Depends On: 826602
Blocks: sam12-tracker
TreeView+ depends on / blocked
 
Reported: 2012-05-30 17:34 UTC by Chris Duryee
Modified: 2012-10-24 18:39 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of: 826602
Environment:
Last Closed: 2012-10-24 18:39:29 UTC
Embargoed:


Attachments (Terms of Use)

Description Chris Duryee 2012-05-30 17:34:48 UTC
This was reported to happen when candlepin 0.5.x runs in tomcat. I don't know if it affects sam, but the stacks are similar enough that it might. IT does not have a repro case, aside from waiting for the issue.

If the fix works for 826602 (against 0.5.x), I'd recommend upgrading c3p0 in candlepin for 0.6.x as well.

more info:
> http://sourceforge.net/tracker/?func=detail&aid=1383783&group_id=25357&atid=383690

+++ This bug was initially created as a clone of Bug #826602 +++

Description of problem: The Tomcat instances of Candlepin seem to intermittently hang and cause all requests to fail. This seems to likely be cause by c3p0.


Version-Release number of selected component (if applicable): c3p0-0.9.0


How reproducible: Happens in QA and Stage daily.


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:

The Ruby tier triggers a nagios alert and the following is found in the logs:

"Exception PartialOutageException: Could not find shard for XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX"

In the Candlepin logs, among other errors, we see:

"Caused by: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: The last packet successfully received from the server was 34,864,978 milliseconds ago.  The last packet sent successfully to the server was 34,864,979 milliseconds ago. is longer than the server configured value of 'wait_timeout'. You should consider either expiring and/or testing connection validity before use in your application, increasing the server configured values for client timeouts, or using the Connector/J connection property 'autoReconnect=true' to avoid this problem."

Additional info: Our current version of c3p0 is from 2005

--- Additional comment from cduryee on 2012-05-30 11:23:03 EDT ---

Sam,

Is 0.9.1.2 ok? That seems to be the latest version.

Also, how difficult is it to reproduce this issue?

--- Additional comment from smunilla on 2012-05-30 11:36:02 EDT ---

0.9.1.2 works. 

The issue crops up daily in QA and Stage. I don't think we can reproduce it on demand.

--- Additional comment from smunilla on 2012-05-30 11:38:14 EDT ---

(In reply to comment #1)
> Sam,
> 
> Is 0.9.1.2 ok? That seems to be the latest version.
> 
> Also, how difficult is it to reproduce this issue?

Comment 2 RHEL Program Management 2012-05-30 17:57:48 UTC
Thank you for your bug report. This issue was evaluated for inclusion
in the current release of Subscription Asset Manager (SAM). Unfortunately,
we are unable to address this request. Because we are in the final stages
of development in the current release, only significant, release-blocking
issues involving serious regressions and data corruption can be considered.

If you believe this issue meets the release blocking criteria as defined and
communicated to you by your Red Hat Support representative, please ask
your representative to file this issue as a blocker for the current release.
Otherwise, ask that it be evaluated for inclusion in the next release of SAM.

Comment 3 Bryan Kearney 2012-10-24 18:39:29 UTC
This has been resolved in hosted with a new keepalive directive in the jdbc adapter.


Note You need to log in before you can comment on or make changes to this bug.