Bug 848872

Summary:	subscription-manager autosubscribe with --servicelevel does not ALWAYS persist the serviceLevel onto the consumer
Product:	[Community] Candlepin	Reporter:	John Sefler <jsefler>
Component:	candlepin	Assignee:	Devan Goodwin <dgoodwin>
Status:	CLOSED WONTFIX	QA Contact:	Katello QA List <katello-qa-list>
Severity:	unspecified	Docs Contact:
Priority:	low
Version:	0.9	CC:	alikins, bkearney, dgoodwin, jsefler
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2013-11-22 17:07:40 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	863175

Description John Sefler 2012-08-16 16:03:16 UTC

Description of problem:
Automation tests results are running into an intermittent failure whereby the test does this:
1. Registers
2. makes a curl call to update the auto-heal attribute to false for the registered consumer (to prevent any test result influence by auto healing)
3. Foreach service level available
4.    unsubscribe --all
5.    subscribe --auto --service-level <valid service level>
6.    assert that the stdout feedback confirms the service level was set
7.    use curl to get the current consumer and asert that the service level was persisted onto the serviceLevel attribute of the consumer 

Occassionally step 7 fails.  The service level does not always get persisted to the consumer.  When it does not persist, it's current value remains equal to its original value prior to the autosubscribe.


Version-Release number of selected component (if applicable):
[root@jsefler-59server ~]# subscription-manager version
Error while checking server version: No such file or directory  <<= bug 848409
remote entitlement server: Unknown
remote entitlement server type: Unknown
subscription-manager: 1.0.13-1.git.30.404361f.el5
python-rhsm: 1.0.5-1.git.1.1ce0910.el5


How reproducible:
Intermittent


Steps to Reproduce:
outlined above

Here is a trace from a test automation log showing that the service level remained unchanged and failed on the assertion that the requested autosubscribe service level "Premium" was not persisted to the current consumer.  Notice that the service level starts out as "None" and does not get changed to "Premium" even though the autosubscribe feedback says that it did.


201208160229:54.675 - INFO: ======= Starting Test: AutoSubscribeWithServiceLevel_Test([null, Premium]) (com.redhat.qe.auto.testng.TestNGListener.onTestStart)
201208160229:54.943 - FINE: ssh root.redhat.com subscription-manager service-level --show (com.redhat.qe.tools.SSHCommandRunner.run)
201208160229:56.376 - FINE: Stdout: Current service level: None
 (com.redhat.qe.tools.SSHCommandRunner.runCommandAndWait)
201208160229:56.377 - FINE: Stderr:  (com.redhat.qe.tools.SSHCommandRunner.runCommandAndWait)
201208160229:56.377 - FINE: ExitCode: 0 (com.redhat.qe.tools.SSHCommandRunner.runCommandAndWait)
201208160229:56.380 - FINE: ssh root.redhat.com subscription-manager unsubscribe --all (com.redhat.qe.tools.SSHCommandRunner.run)
201208160229:58.967 - FINE: Stdout: This machine has been unsubscribed from 3 subscriptions
 (com.redhat.qe.tools.SSHCommandRunner.runCommandAndWait)
201208160229:58.967 - FINE: Stderr:  (com.redhat.qe.tools.SSHCommandRunner.runCommandAndWait)
201208160229:58.968 - FINE: ExitCode: 0 (com.redhat.qe.tools.SSHCommandRunner.runCommandAndWait)
201208160230:00.237 - FINE: ssh root.redhat.com subscription-manager subscribe --auto --servicelevel=Premium (com.redhat.qe.tools.SSHCommandRunner.run)
201208160230:08.383 - FINE: Stdout: 
Service level set to: Premium
Installed Product Current Status:
Product Name:         	Awesome OS for S390 Bits
Status:               	Not Subscribed

Product Name:         	Stackable with Awesome OS for x86_64 Bits
Status:               	Not Subscribed

Product Name:         	Awesome OS Developer Basic
Status:               	Not Subscribed

Product Name:         	Awesome OS for x86_64/i686/ia64/ppc/ppc64/s390x/s390 Bits
Status:               	Not Subscribed

Product Name:         	Awesome OS for z80/x86_64 Bits
Status:               	Not Subscribed

Product Name:         	Awesome OS for z80/ALL Bits
Status:               	Not Subscribed

Product Name:         	Awesome OS for x86_64/ALL Bits
Status:               	Not Subscribed

Product Name:         	Awesome OS for x86_64/ALL Bits for ZERO sockets
Status:               	Not Subscribed

Product Name:         	Exempt Product Bits
Status:               	Subscribed

Product Name:         	Awesome OS Workstation Bits
Status:               	Not Subscribed

Product Name:         	Red Hat Enterprise Linux Server
Status:               	Not Subscribed

Product Name:         	Multiplier Product Bits
Status:               	Not Subscribed

Product Name:         	Awesome OS Developer Bits
Status:               	Not Subscribed

Product Name:         	Management Bits
Status:               	Not Subscribed

Product Name:         	Large File Support Bits
Status:               	Not Subscribed

Product Name:         	Awesome OS Modifier Bits
Status:               	Not Subscribed

Product Name:         	Clustering Bits
Status:               	Not Subscribed

Product Name:         	Shared Storage Bits
Status:               	Not Subscribed

Product Name:         	Awesome OS Server Bits
Status:               	Subscribed

Product Name:         	Awesome OS Premium Architecture Bits
Status:               	Not Subscribed

Product Name:         	Awesome OS for ia64 Bits
Status:               	Not Subscribed

Product Name:         	Awesome OS for i686/x86_64 Bits
Status:               	Not Subscribed

Product Name:         	Awesome OS for z80 Bits
Status:               	Not Subscribed

Product Name:         	Awesome OS for i686 Bits
Status:               	Not Subscribed

Product Name:         	Awesome OS for x86_64 Bits
Status:               	Not Subscribed

Product Name:         	Awesome OS for ppc64 Bits
Status:               	Not Subscribed

Product Name:         	Awesome OS for S390X Bits
Status:               	Not Subscribed

Product Name:         	Awesome OS for i386 Bits
Status:               	Not Subscribed

Product Name:         	Awesome OS for x86 Bits
Status:               	Not Subscribed

Product Name:         	Awesome OS for x86_64/s390x Bits
Status:               	Not Subscribed

Product Name:         	Awesome OS for Intel x86_64/i686/ia64 Bits
Status:               	Not Subscribed

Product Name:         	Awesome OS for x86/x64_64 Bits
Status:               	Not Subscribed

Product Name:         	Shared File System Bits
Status:               	Subscribed

Product Name:         	Awesome OS Scalable Filesystem Bits
Status:               	Not Subscribed

Product Name:         	Load Balancing Bits
Status:               	Not Subscribed

 (com.redhat.qe.tools.SSHCommandRunner.runCommandAndWait)
201208160230:08.385 - FINE: Stderr:  (com.redhat.qe.tools.SSHCommandRunner.runCommandAndWait)
201208160230:08.387 - FINE: ExitCode: 0 (com.redhat.qe.tools.SSHCommandRunner.runCommandAndWait)
201208160230:08.388 - INFO: Asserted: The autosubscribe stdout reports: Service level set to: Premium (com.redhat.qe.Assert.pass)
201208160230:08.389 - INFO: Asserted: The autosubscribe stdout reports: Installed Product Current Status (com.redhat.qe.Assert.pass)
201208160230:08.389 - INFO: Asserted: The exit code from the subscribe --auto command does not indicate a failure (exit code 0 indicates an entitlement was granted, 1 indicates an entitlement was not granted, 255 indicates a failure). (com.redhat.qe.Assert.pass)
201208160230:08.659 - INFO: SSH alternative to HTTP request: curl --stderr /dev/null --insecure --user testuser1:password --request GET https://jsefler-f14-candlepin.usersys.redhat.com:8443/candlepin/consumers/cbfc3d18-c9d0-448b-9401-92df9c914e6c | python -m simplejson/tool (com.redhat.qe.sm.cli.tasks.CandlepinTasks.getResourceUsingRESTfulAPI)
201208160230:09.109 - SEVERE: Test Failed: AutoSubscribeWithServiceLevel_Test (com.redhat.qe.auto.testng.TestNGListener.onTestFailure)
java.lang.AssertionError: The call to subscribe with auto and a servicelevel persisted the servicelevel setting on the current consumer object. expected:<Premium> but was:<None>
	at com.redhat.qe.Assert.fail(Assert.java:186)
	at com.redhat.qe.Assert.failNotEquals(Assert.java:606)
	at com.redhat.qe.Assert.assertEquals(Assert.java:210)
	at com.redhat.qe.Assert.assertEquals(Assert.java:266)
	at com.redhat.qe.sm.cli.tests.ServiceLevelTests.AutoSubscribeWithServiceLevel_Test(ServiceLevelTests.java:472)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:616)
	at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:76)
	at org.testng.internal.Invoker.invokeMethod(Invoker.java:673)
	at org.testng.internal.Invoker.invokeTestMethod(Invoker.java:846)
	at org.testng.internal.Invoker.invokeTestMethods(Invoker.java:1170)
	at org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:125)
	at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:109)
	at org.testng.TestRunner.runWorkers(TestRunner.java:1147)
	at org.testng.TestRunner.privateRun(TestRunner.java:749)
	at org.testng.TestRunner.run(TestRunner.java:600)
	at org.testng.SuiteRunner.runTest(SuiteRunner.java:317)
	at org.testng.SuiteRunner.runSequentially(SuiteRunner.java:312)
	at org.testng.SuiteRunner.privateRun(SuiteRunner.java:274)
	at org.testng.SuiteRunner.run(SuiteRunner.java:223)
	at org.testng.SuiteRunnerWorker.runSuite(SuiteRunnerWorker.java:52)
	at org.testng.SuiteRunnerWorker.run(SuiteRunnerWorker.java:86)
	at org.testng.TestNG.runSuitesSequentially(TestNG.java:1039)
	at org.testng.TestNG.runSuitesLocally(TestNG.java:964)
	at org.testng.TestNG.run(TestNG.java:900)
	at org.testng.TestNG.privateMain(TestNG.java:1182)
	at org.testng.TestNG.main(TestNG.java:1146)

201208160230:09.111 - INFO: ======= Starting Test: AutoSubscribeWithServiceLevel_Test([null, Standard]) (com.redhat.qe.auto.testng.TestNGListener.onTestStart)
201208160230:09.311 - FINE: ssh root.redhat.com subscription-manager service-level --show (com.redhat.qe.tools.SSHCommandRunner.run)
201208160230:10.788 - FINE: Stdout: Current service level: None
 (com.redhat.qe.tools.SSHCommandRunner.runCommandAndWait)




Additional info:
excerpt from the candlpin log (catalina.out)
Aug 16 02:30:00 [QuartzScheduler_Worker-14] INFO  org.candlepin.pinsetter.tasks.CertificateRevocationListTask - Executing CRL Job. CRL filePath=/var/lib/candlepin/candlepin-crl.crl
Aug 16 02:30:00 [QuartzScheduler_Worker-14] INFO  org.candlepin.util.CrlFileUtil - CRL File: /var/lib/candlepin/candlepin-crl.crl exists. Loading the old CRL
Aug 16 02:30:00 [QuartzScheduler_Worker-14] INFO  org.candlepin.controller.CrlGenerator - Old CRLNumber is : 191
Aug 16 02:30:00 [QuartzScheduler_Worker-14] INFO  org.candlepin.pki.impl.BouncyCastlePKIUtility - Completed adding CRL numbers to the certificate.
Aug 16 02:30:00 [QuartzScheduler_Worker-14] INFO  org.candlepin.util.CrlFileUtil - Completed generating CRL. Writing it to disk
Aug 16 02:30:02 [http-8443-2] INFO  org.candlepin.resource.ConsumerResource - Consumer cbfc3d18-c9d0-448b-9401-92df9c914e6c updated.
Aug 16 02:30:03 [http-8443-2] INFO  org.candlepin.resource.ConsumerResource - Consumer cbfc3d18-c9d0-448b-9401-92df9c914e6c updated.
Aug 16 02:30:03 [http-8443-2] INFO  org.candlepin.controller.CandlepinPoolManager - Attempting for products on date: Thu Aug 16 02:30:03 EDT 2012
Aug 16 02:30:03 [http-8443-2] INFO  org.candlepin.controller.CandlepinPoolManager -   100000000000003
Aug 16 02:30:03 [http-8443-2] INFO  org.candlepin.controller.CandlepinPoolManager -   100000000000004
Aug 16 02:30:03 [http-8443-2] INFO  org.candlepin.controller.CandlepinPoolManager -   100000000000005
Aug 16 02:30:03 [http-8443-2] INFO  org.candlepin.controller.CandlepinPoolManager -   100000000000006
Aug 16 02:30:03 [http-8443-2] INFO  org.candlepin.controller.CandlepinPoolManager -   27060
Aug 16 02:30:03 [http-8443-2] INFO  org.candlepin.controller.CandlepinPoolManager -   100000000000000
Aug 16 02:30:03 [http-8443-2] INFO  org.candlepin.controller.CandlepinPoolManager -   100000000000001
Aug 16 02:30:03 [http-8443-2] INFO  org.candlepin.controller.CandlepinPoolManager -   100000000000002
Aug 16 02:30:03 [http-8443-2] INFO  org.candlepin.controller.CandlepinPoolManager -   1144
Aug 16 02:30:03 [http-8443-2] INFO  org.candlepin.controller.CandlepinPoolManager -   100000000000060
Aug 16 02:30:03 [http-8443-2] INFO  org.candlepin.controller.CandlepinPoolManager -   100000000000022
Aug 16 02:30:03 [http-8443-2] INFO  org.candlepin.controller.CandlepinPoolManager -   37070
Aug 16 02:30:03 [http-8443-2] INFO  org.candlepin.controller.CandlepinPoolManager -   100000000000007
Aug 16 02:30:03 [http-8443-2] INFO  org.candlepin.controller.CandlepinPoolManager -   100000000000008
Aug 16 02:30:03 [http-8443-2] INFO  org.candlepin.controller.CandlepinPoolManager -   100000000000009
Aug 16 02:30:03 [http-8443-2] INFO  org.candlepin.controller.CandlepinPoolManager -   37090
Aug 16 02:30:03 [http-8443-2] INFO  org.candlepin.controller.CandlepinPoolManager -   100000000000020
Aug 16 02:30:03 [http-8443-2] INFO  org.candlepin.controller.CandlepinPoolManager -   37065
Aug 16 02:30:03 [http-8443-2] INFO  org.candlepin.controller.CandlepinPoolManager -   69
Aug 16 02:30:03 [http-8443-2] INFO  org.candlepin.controller.CandlepinPoolManager -   37062
Aug 16 02:30:03 [http-8443-2] INFO  org.candlepin.controller.CandlepinPoolManager -   37069
Aug 16 02:30:03 [http-8443-2] INFO  org.candlepin.controller.CandlepinPoolManager -   917571
Aug 16 02:30:03 [http-8443-2] INFO  org.candlepin.controller.CandlepinPoolManager -   37068
Aug 16 02:30:03 [http-8443-2] INFO  org.candlepin.controller.CandlepinPoolManager -   37067
Aug 16 02:30:03 [http-8443-2] INFO  org.candlepin.controller.CandlepinPoolManager -   100000000000013
Aug 16 02:30:03 [http-8443-2] INFO  org.candlepin.controller.CandlepinPoolManager -   99000
Aug 16 02:30:03 [http-8443-2] INFO  org.candlepin.controller.CandlepinPoolManager -   100000000000012
Aug 16 02:30:03 [http-8443-2] INFO  org.candlepin.controller.CandlepinPoolManager -   100000000000011
Aug 16 02:30:03 [http-8443-2] INFO  org.candlepin.controller.CandlepinPoolManager -   100000000000010
Aug 16 02:30:03 [http-8443-2] INFO  org.candlepin.controller.CandlepinPoolManager -   37080
Aug 16 02:30:03 [http-8443-2] INFO  org.candlepin.controller.CandlepinPoolManager -   37060
Aug 16 02:30:03 [http-8443-2] INFO  org.candlepin.controller.CandlepinPoolManager -   100000000000015
Aug 16 02:30:03 [http-8443-2] INFO  org.candlepin.controller.CandlepinPoolManager -   1
Aug 16 02:30:03 [http-8443-2] INFO  org.candlepin.controller.CandlepinPoolManager -   88888
Aug 16 02:30:03 [http-8443-2] INFO  org.candlepin.controller.CandlepinPoolManager -   1000000000000023
Aug 16 02:30:04 [http-8443-2] INFO  org.candlepin.policy.js.entitlement.EntitlementRules - Selecting best entitlement pool for product: [100000000000003, 100000000000004, 100000000000005, 100000000000006, 27060, 100000000000000, 100000000000001, 100000000000002, 1144, 100000000000060, 100000000000022, 37070, 100000000000007, 100000000000008, 100000000000009, 37090, 100000000000020, 37065, 69, 37062, 37069, 917571, 37068, 37067, 100000000000013, 99000, 100000000000012, 100000000000011, 100000000000010, 37080, 37060, 100000000000015, 1, 88888, 1000000000000023]
Aug 16 02:30:04 [http-8443-9] INFO  org.candlepin.resource.ConsumerResource - Consumer cbfc3d18-c9d0-448b-9401-92df9c914e6c updated.
Aug 16 02:30:05 [http-8443-2] INFO  org.candlepin.policy.js.entitlement.EntitlementRules - Excuted javascript rule: select_pool_global
Aug 16 02:30:06 [http-8443-2] WARN  org.candlepin.jackson.JsonBeanPropertyFilter - Unable to serialize property 'canActivate without getter: getCanActivate
Aug 16 02:30:06 [http-8443-2] WARN  org.candlepin.jackson.JsonBeanPropertyFilter - Unable to serialize property 'autoheal without getter: getAutoheal
Aug 16 02:30:06 [http-8443-2] WARN  org.candlepin.jackson.JsonBeanPropertyFilter - Unable to serialize property 'canActivate without getter: getCanActivate
Aug 16 02:30:06 [http-8443-2] WARN  org.candlepin.jackson.JsonBeanPropertyFilter - Unable to serialize property 'autoheal without getter: getAutoheal
Aug 16 02:30:06 [http-8443-2] WARN  org.candlepin.jackson.JsonBeanPropertyFilter - Unable to serialize property 'canActivate without getter: getCanActivate
Aug 16 02:30:06 [http-8443-2] WARN  org.candlepin.jackson.JsonBeanPropertyFilter - Unable to serialize property 'autoheal without getter: getAutoheal
Aug 16 02:30:06 [http-8443-2] WARN  org.candlepin.jackson.JsonBeanPropertyFilter - Unable to serialize property 'canActivate without getter: getCanActivate
Aug 16 02:30:06 [http-8443-2] WARN  org.candlepin.jackson.JsonBeanPropertyFilter - Unable to serialize property 'autoheal without getter: getAutoheal
Aug 16 02:30:06 [http-8443-2] WARN  org.candlepin.jackson.JsonBeanPropertyFilter - Unable to serialize property 'canActivate without getter: getCanActivate
Aug 16 02:30:06 [http-8443-2] WARN  org.candlepin.jackson.JsonBeanPropertyFilter - Unable to serialize property 'autoheal without getter: getAutoheal
Aug 16 02:30:06 [http-8443-2] WARN  org.candlepin.jackson.JsonBeanPropertyFilter - Unable to serialize property 'canActivate without getter: getCanActivate
Aug 16 02:30:06 [http-8443-2] WARN  org.candlepin.jackson.JsonBeanPropertyFilter - Unable to serialize property 'autoheal without getter: getAutoheal

Comment 1 John Sefler 2012-08-17 14:29:04 UTC

I have a thought on a potential cause...

I see in catalina.out these two back-to-back update consumer transactions which occur at the start of the autosubscribe...

Aug 16 02:30:02 [http-8443-2] INFO  org.candlepin.resource.ConsumerResource - Consumer cbfc3d18-c9d0-448b-9401-92df9c914e6c updated.
Aug 16 02:30:03 [http-8443-2] INFO  org.candlepin.resource.ConsumerResource - Consumer cbfc3d18-c9d0-448b-9401-92df9c914e6c updated.

Is it possible that a "chatty fact" (ref bug 838123) could be the cause for one of the consumer updates and the new serviceLevel specified for autosubscribe the cause for the other consumer update.  Then if the updates are performed asynchronously, maybe the consumer update with the new serviceLevel attribute is getting clobbered by the other update (which has the former service level).  Just a thought.

Comment 2 Devan Goodwin 2012-10-03 13:28:50 UTC

John could you verify something for me, in your testing for each of the steps you take, do you block and wait for the requests to return, or do you ever fire off requests to say update autoheal or subscribe without waiting for a past command to complete?

I think your theory is probably quite likely to be correct. I don't see how it could be happening in subscription-manager though, as far as I know we would always wait for the update consumer to complete (when updating facts for example) before proceeding to the next step. The only possibility would perhaps be if rhsmcertd happened to run at exactly the same time.

Out of curiosity, what interval do you have the rhsmcertd set to in this environment?

I'm going to do a quick test with a big sleep in update consumer, firing off a facts update and then a set service level, almost surely this can cause what you are seeing. We will probably defer to 6.4 though as this would be a candlepin server side fix.

Comment 3 Devan Goodwin 2012-10-03 13:55:51 UTC

Successfully reproduced with a 10 second sleep in update consumer server side call, forcing facts to change, and firing off first a "set SLA" request, then an update facts request before the first one can finish.

The second request loads a representation of the consumer before the first one has committed, so the SLA is the old one, both requests complete, but the second one overwrites the first, and SLA remains the same.

My guess is either the test suite is firing off requests and not blocking waiting for a response, or rhsmcertd is running very frequently and happens to hit during an update consumer. 

John: debug logging might provide a little more info in terms of "facts changed" and what not if that is an option. (if you'd like to enable it and wait for it to happe nagain)

I don't think there's anything to be fixed client side here (we can't block across rhsmcertd and the app itself, at least not easily), so we would have to use database locking on the consumer server side, similar to how we lock on pools. 

If the above theory is correct, the odds of setting an SLA while rhsmcertd running seem fairly low so probably not a high priority fix here.

Pushing to 6.4.

Comment 4 Bryan Kearney 2013-11-01 17:55:20 UTC

John, are you still seeing this? I am tempted to close/wontfix.

Comment 5 John Sefler 2013-11-22 17:07:40 UTC

I have not seen this error in awhile.
Let's close this issue for now.