Bug 1095016
Summary: | Quartz stops working if database connection is reset | ||||||
---|---|---|---|---|---|---|---|
Product: | [JBoss] JBoss Operations Network | Reporter: | Larry O'Leary <loleary> | ||||
Component: | Core Server, Database | Assignee: | Jay Shaughnessy <jshaughn> | ||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Mike Foley <mfoley> | ||||
Severity: | urgent | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | JON 3.2 | CC: | ahovsepy, ctrianta, fbrychta, jshaughn, mazz, spinder | ||||
Target Milestone: | DR02 | Keywords: | Regression, TestCaseNeeded | ||||
Target Release: | JON 3.2.2 | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2014-07-29 00:17:20 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Larry O'Leary
2014-05-07 00:31:15 UTC
for the record, in JON 3.1.2, when I see the DB blip, and quartz reconnect to the DB, i see this message: 14:25:38,931 INFO [QuartzSchedulerThread] releaseTriggerRetryLoop: connection restored. If this doesn't work in 3.2, we'll need to find out why that releaseTriggerRetryLoop in the QuartzSchedulerThread doesn't work anymore. I've been at this for > 1/2 day trying various combinations of settings in quartz.properties and in the DS declarations in the app server, without any success. I have to attribute the regression/change of behavior to our move from AS 4.2.3 in JON 3.1.2 to AS7 in JON 3.2.0. I don't think it's anything we've done. As of now I don't see a solution other than a server restart. We are using an old version of Quartz, 1.6.5 (2009). I'm going to see id we can painlessly upgrade to the last release of the V1 line, 1.8.6 (2012). The V2 line has significant API changes. We'll see how much trouble it is to move to 1.8.6, and if we get there, if it causes any trouble... By the way, Bug 957691 already exists for upgrading Quartz, although it suggests a full upgrade to the most recent 2.x version. The upgrade did not solve the issue. I've posted a message in the Quartz Scheduler forums here: http://forums.terracotta.org/forums/posts/list/6565.page I'm out of ideas, it seems that for whatever reason Quartz continues to use a dead connection either because it doesn't let it go, or because AS7 provides it to them. So at the moment, a server restart is the only workaround. I may have a possible solution but it requires a few packaging changes. I'll discuss further with mazz to review the feasibility. In the end a simple solution was found, just took a while to find it. master commit a3c6d72a44b982abcb8b918daccd2dd74dc76fac Author: Jay Shaughnessy <jshaughn> Date: Mon May 12 15:44:19 2014 -0400 We create the NoTxRHQDS solely for use by Quartz to interact with its jobs. The errors resulted from connections not being validated in a timely manner by AS7/EAP. It was hard to figure out because we needed to explicitly set <validation> <validate-on-match>true</validate-on-match> </validation> What this flag does is a little hard to figure from its description. And according to the EAP documentation [1] this is supposed to be true by default. But it isn't. EAP bug pending creation... - added the new option to our DMR client used by the installer - cleaned up quartz.properties a bit, organizing and removing ignored settings - note that this is not needed on the XA DS Via product triage, determined that this bug is to be included for DR01 target milestone. Moving to ON_QA as available for test in latest cumulative patch build(DR01): http://jon01.mw.lab.eng.bos.redhat.com:8042/dist/release/jon/3.2.2.GA/5-29-2014/ The issue is still there on Version : 3.2.0.GA Update 02 Build Number : 055b880:0620403 <validate-on-match> on the NoTxRHQDS was false after installation. I tried to set it to true manually and then the issue goes away. So a question is, why the patch didn't set <validate-on-match> to true. I think I messed up here and treated this BZ as if it were targeted to 3.3 instead of 3.2.2. Because I see no backport of the fix. Although, because the fix is an installer fix, a backport is not enough to repair an existing install. In that case DMR must execute against the JON server. First, the backport to jon3.2.x: commit 6f02db638000b52c031ab3402866f1a15279af1f Author: Jay Shaughnessy <jshaughn> Date: Mon Jun 2 10:45:31 2014 -0400 We create the NoTxRHQDS solely for use by Quartz to interact with its jobs. The errors resulted from connections not being validated in a timely manner by AS7/EAP. It was hard to figure out because we needed to explicitly set <validation> <validate-on-match>true</validate-on-match> </validation> What this flag does is a little hard to figure from its description. And according to the EAP documentation [1] this is supposed to be true by default. But it isn't. EAP bug pending creation... - added the new option to our DMR client used by the installer - cleaned up quartz.properties a bit, organizing and removing ignored settings - note that this is not needed on the XA DS Cherry-Pick master: a3c6d72a44b982abcb8b918daccd2dd74dc76fac Next, here is the DMR for updating an existing [running] server via the jboss CLI: /subsystem=datasources/data-source=NoTxRHQDS/:write-attribute(name=validate-on-match,value=true) Executing the above DMR operation will effectively apply the fix to an existing server. I'm not sure whether the change takes place immediately, it may require a restart. Yep. Jay and I already discussed. Additional jon.git commit fix: c941596f86ff so that configuration update applied as part of cumulative fix. Moving to MODIFIED for testing in next build. Moving to ON_QA as available for test in build: http://jon01.mw.lab.eng.bos.redhat.com:8042/dist/release/jon/3.2.2.GA/6-13-2014_0900/ Verified on Version : 3.2.0.GA Update 02 Build Number : dd34f04:ee37628 Created attachment 909069 [details]
server.log.png
This has been verified and released in Red Hat JBoss Operations Network 3.2 Update 02 (3.2.2) available from the Red Hat Customer Portal[1]. [1]: https://access.redhat.com/jbossnetwork/restricted/softwareDetail.html?softwareId=31783 |