Bug 660516

Summary:	cli list of Available Subscriptions does not always reflect changes to the 'Expires' date after it is changed in the database
Product:	[Community] Candlepin	Reporter:	John Sefler <jsefler>
Component:	candlepin	Assignee:	Bryan Kearney <bkearney>
Status:	CLOSED CURRENTRELEASE	QA Contact:	John Sefler <jsefler>
Severity:	medium	Docs Contact:
Priority:	low
Version:	0.5	CC:	bkearney, dgoodwin
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2012-05-30 20:44:30 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	568421

Description John Sefler 2010-12-06 23:36:59 UTC

Description of problem:
The following TCMS testcase was automated: https://tcms.engineering.redhat.com/case/56025/?search=56025

The basic steps are:
1. Use the cli to subscribe to a POOLID (take note of the Expires DATE1)
2. in the database, use sql to change the enddate
update cp_subscription set enddate='DATE1+1MONTH' where id=(select pool.subscriptionid from cp_pool pool where pool.id='POOLID');
3. Refresh the subscription pools using the RESTful API
curl -k -u admin:admin --request PUT https://jsefler-f12-candlepin.usersys.redhat.com:8443/candlepin/owners/admin/subscriptions
(wait for the jobdeatil to be FINISHED)...
curl -k -u admin:admin --request GET https://jsefler-f12-candlepin.usersys.redhat.com:8443/candlepin/jobs/refresh_pools_0d740a16-472c-4b3a-9b71-01aa55f23097  <- comes from refresh json statusPath
4. subscription-manager list --all --available should show contains the POOLID with an Expires date equal to DATE1+1MONTH

^^^  This occasionally fails and the date displayed is equal to the original value (as if the database was never changed, but it WAS!).

Version-Release number of selected component (if applicable):
Has been happening for months (Pre-Alpha, Alpha, Post-Alpha)


How reproducible:
often, but not every time



Additional info:
Have been working with anadathu to troubleshoot and resolve.  We have reproduced the situation, but have not yet found a fix.

Because this test is automated, it's success/failure remains on our radar.

Comment 1 John Sefler 2011-02-04 23:10:11 UTC

Moving back to ASSIGNED...
See comments in https://bugzilla.redhat.com/show_bug.cgi?id=675244

Comment 2 James Bowes 2011-02-07 13:16:05 UTC

I haven't been able to reproduce this one at all. John, can you confirm that the automated test is passing now? If so, I say we close this out.

Comment 3 John Sefler 2011-02-07 13:58:04 UTC

Unfortunately, the automated test continues to fail.  As I said, the failure does not happen every time, but it happens often.  I think the best way to troubleshoot this is face-to-face with one of the local developers so we can trace through the automated test together.  I did this with Ajay, but we did not resolve.  That's when I opened the bug.

Comment 5 James Bowes 2011-02-08 17:58:47 UTC

phew, that was a tricky one!

We use warp-persist to handle the setup of hibernate/jpa for us. Besides providing things like the @Transactional annotation, it also defines a unit of work (though i'm not convinced it's really like the unit of work as defined in P of AA). The unit of work we use is set to equal a single HTTP request. This is perfect for most candlepin operations; at the start of a request we get a new jpa entity manager set up for us, we have caching within the scope of the request, and we can treat the request as a single operation to commit or rollback.

We use the same warp-persist setup for pinsetter, the quartz powered async job runner. each pinsetter task runs in its own thread; these threads are kept around for the lifetime of pinsetter, so for a task to run there must be a free thread, which (which it is presumably taking over from a job that previously finished). Because there's no http request, warp-persist doesn't know when to end the unit of work. jobs that run within the same thread share the same cache of jpa/hibernate objects.

You can see this happening by setting org.quartz.threadPool.threadCount=1 in candlepin.conf (you may also want to set maxThreads=1 in the tomcat connector, to compare and contrast the behaviour). Upon the first run of the refresh pools task in pinsetter, the catalina.out log will show debugging info from the task, including a list of all subscriptions affected, and their end dates. if you watch for an arbitrary id, note its end date, then alter the end date in the db and re-run the refresh pools job, you'll see that in the logs the end date is still listed as the old one.

Compare this to a get on the subscriptions rest interface, which always gives you the most up2date value for enddate.

The solution here is to manually override the unit of work settings for quartz jobs. I've got a patch to run by the rest of the team, hopefully we can land it shortly.

Comment 6 James Bowes 2011-02-09 17:58:39 UTC

fix committed to both master and beta. should be in candlepins 0.1.36 and 0.2.3

Comment 7 John Sefler 2011-03-10 22:21:06 UTC

The non-repeatable failures in the problem description were surfaced by the nightly automated Certificate Revocation List (CRL) Tests.  Since the fix has been applied, the errors have gone away.

Moving this bug to VERIFIED

The latest automated test runs were run against these versions...

[root@jsefler-f12-candlepin candlepin]# git show-ref BETA
bbef85fc641dd1ec1a6fcb3527ad20108fe802f3 refs/heads/BETA
3b0042c82f8277ef3cc1e81771b0c19181721fa7 refs/remotes/origin/BETA

[root@jsefler-f14-candlepin candlepin]# git show-ref 0.2
3a300aa9e5724df9602995b5631be1e941b69d2d refs/heads/0.2
9c13d6bf6d83070bbc78638f6ca3bc1dc5267977 refs/remotes/origin/0.2

Comment 8 John Sefler 2011-05-04 14:41:32 UTC

Group move of VERIFIED Candlepin component bugs to RELEASE_PENDING