Description of problem: Automation tests results are running into an intermittent failure whereby the test does this: 1. Registers 2. makes a curl call to update the auto-heal attribute to false for the registered consumer (to prevent any test result influence by auto healing) 3. Foreach service level available 4. unsubscribe --all 5. subscribe --auto --service-level <valid service level> 6. assert that the stdout feedback confirms the service level was set 7. use curl to get the current consumer and asert that the service level was persisted onto the serviceLevel attribute of the consumer Occassionally step 7 fails. The service level does not always get persisted to the consumer. When it does not persist, it's current value remains equal to its original value prior to the autosubscribe. Version-Release number of selected component (if applicable): [root@jsefler-59server ~]# subscription-manager version Error while checking server version: No such file or directory <<= bug 848409 remote entitlement server: Unknown remote entitlement server type: Unknown subscription-manager: 1.0.13-1.git.30.404361f.el5 python-rhsm: 1.0.5-1.git.1.1ce0910.el5 How reproducible: Intermittent Steps to Reproduce: outlined above Here is a trace from a test automation log showing that the service level remained unchanged and failed on the assertion that the requested autosubscribe service level "Premium" was not persisted to the current consumer. Notice that the service level starts out as "None" and does not get changed to "Premium" even though the autosubscribe feedback says that it did. 201208160229:54.675 - INFO: ======= Starting Test: AutoSubscribeWithServiceLevel_Test([null, Premium]) (com.redhat.qe.auto.testng.TestNGListener.onTestStart) 201208160229:54.943 - FINE: ssh root.redhat.com subscription-manager service-level --show (com.redhat.qe.tools.SSHCommandRunner.run) 201208160229:56.376 - FINE: Stdout: Current service level: None (com.redhat.qe.tools.SSHCommandRunner.runCommandAndWait) 201208160229:56.377 - FINE: Stderr: (com.redhat.qe.tools.SSHCommandRunner.runCommandAndWait) 201208160229:56.377 - FINE: ExitCode: 0 (com.redhat.qe.tools.SSHCommandRunner.runCommandAndWait) 201208160229:56.380 - FINE: ssh root.redhat.com subscription-manager unsubscribe --all (com.redhat.qe.tools.SSHCommandRunner.run) 201208160229:58.967 - FINE: Stdout: This machine has been unsubscribed from 3 subscriptions (com.redhat.qe.tools.SSHCommandRunner.runCommandAndWait) 201208160229:58.967 - FINE: Stderr: (com.redhat.qe.tools.SSHCommandRunner.runCommandAndWait) 201208160229:58.968 - FINE: ExitCode: 0 (com.redhat.qe.tools.SSHCommandRunner.runCommandAndWait) 201208160230:00.237 - FINE: ssh root.redhat.com subscription-manager subscribe --auto --servicelevel=Premium (com.redhat.qe.tools.SSHCommandRunner.run) 201208160230:08.383 - FINE: Stdout: Service level set to: Premium Installed Product Current Status: Product Name: Awesome OS for S390 Bits Status: Not Subscribed Product Name: Stackable with Awesome OS for x86_64 Bits Status: Not Subscribed Product Name: Awesome OS Developer Basic Status: Not Subscribed Product Name: Awesome OS for x86_64/i686/ia64/ppc/ppc64/s390x/s390 Bits Status: Not Subscribed Product Name: Awesome OS for z80/x86_64 Bits Status: Not Subscribed Product Name: Awesome OS for z80/ALL Bits Status: Not Subscribed Product Name: Awesome OS for x86_64/ALL Bits Status: Not Subscribed Product Name: Awesome OS for x86_64/ALL Bits for ZERO sockets Status: Not Subscribed Product Name: Exempt Product Bits Status: Subscribed Product Name: Awesome OS Workstation Bits Status: Not Subscribed Product Name: Red Hat Enterprise Linux Server Status: Not Subscribed Product Name: Multiplier Product Bits Status: Not Subscribed Product Name: Awesome OS Developer Bits Status: Not Subscribed Product Name: Management Bits Status: Not Subscribed Product Name: Large File Support Bits Status: Not Subscribed Product Name: Awesome OS Modifier Bits Status: Not Subscribed Product Name: Clustering Bits Status: Not Subscribed Product Name: Shared Storage Bits Status: Not Subscribed Product Name: Awesome OS Server Bits Status: Subscribed Product Name: Awesome OS Premium Architecture Bits Status: Not Subscribed Product Name: Awesome OS for ia64 Bits Status: Not Subscribed Product Name: Awesome OS for i686/x86_64 Bits Status: Not Subscribed Product Name: Awesome OS for z80 Bits Status: Not Subscribed Product Name: Awesome OS for i686 Bits Status: Not Subscribed Product Name: Awesome OS for x86_64 Bits Status: Not Subscribed Product Name: Awesome OS for ppc64 Bits Status: Not Subscribed Product Name: Awesome OS for S390X Bits Status: Not Subscribed Product Name: Awesome OS for i386 Bits Status: Not Subscribed Product Name: Awesome OS for x86 Bits Status: Not Subscribed Product Name: Awesome OS for x86_64/s390x Bits Status: Not Subscribed Product Name: Awesome OS for Intel x86_64/i686/ia64 Bits Status: Not Subscribed Product Name: Awesome OS for x86/x64_64 Bits Status: Not Subscribed Product Name: Shared File System Bits Status: Subscribed Product Name: Awesome OS Scalable Filesystem Bits Status: Not Subscribed Product Name: Load Balancing Bits Status: Not Subscribed (com.redhat.qe.tools.SSHCommandRunner.runCommandAndWait) 201208160230:08.385 - FINE: Stderr: (com.redhat.qe.tools.SSHCommandRunner.runCommandAndWait) 201208160230:08.387 - FINE: ExitCode: 0 (com.redhat.qe.tools.SSHCommandRunner.runCommandAndWait) 201208160230:08.388 - INFO: Asserted: The autosubscribe stdout reports: Service level set to: Premium (com.redhat.qe.Assert.pass) 201208160230:08.389 - INFO: Asserted: The autosubscribe stdout reports: Installed Product Current Status (com.redhat.qe.Assert.pass) 201208160230:08.389 - INFO: Asserted: The exit code from the subscribe --auto command does not indicate a failure (exit code 0 indicates an entitlement was granted, 1 indicates an entitlement was not granted, 255 indicates a failure). (com.redhat.qe.Assert.pass) 201208160230:08.659 - INFO: SSH alternative to HTTP request: curl --stderr /dev/null --insecure --user testuser1:password --request GET https://jsefler-f14-candlepin.usersys.redhat.com:8443/candlepin/consumers/cbfc3d18-c9d0-448b-9401-92df9c914e6c | python -m simplejson/tool (com.redhat.qe.sm.cli.tasks.CandlepinTasks.getResourceUsingRESTfulAPI) 201208160230:09.109 - SEVERE: Test Failed: AutoSubscribeWithServiceLevel_Test (com.redhat.qe.auto.testng.TestNGListener.onTestFailure) java.lang.AssertionError: The call to subscribe with auto and a servicelevel persisted the servicelevel setting on the current consumer object. expected:<Premium> but was:<None> at com.redhat.qe.Assert.fail(Assert.java:186) at com.redhat.qe.Assert.failNotEquals(Assert.java:606) at com.redhat.qe.Assert.assertEquals(Assert.java:210) at com.redhat.qe.Assert.assertEquals(Assert.java:266) at com.redhat.qe.sm.cli.tests.ServiceLevelTests.AutoSubscribeWithServiceLevel_Test(ServiceLevelTests.java:472) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:76) at org.testng.internal.Invoker.invokeMethod(Invoker.java:673) at org.testng.internal.Invoker.invokeTestMethod(Invoker.java:846) at org.testng.internal.Invoker.invokeTestMethods(Invoker.java:1170) at org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:125) at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:109) at org.testng.TestRunner.runWorkers(TestRunner.java:1147) at org.testng.TestRunner.privateRun(TestRunner.java:749) at org.testng.TestRunner.run(TestRunner.java:600) at org.testng.SuiteRunner.runTest(SuiteRunner.java:317) at org.testng.SuiteRunner.runSequentially(SuiteRunner.java:312) at org.testng.SuiteRunner.privateRun(SuiteRunner.java:274) at org.testng.SuiteRunner.run(SuiteRunner.java:223) at org.testng.SuiteRunnerWorker.runSuite(SuiteRunnerWorker.java:52) at org.testng.SuiteRunnerWorker.run(SuiteRunnerWorker.java:86) at org.testng.TestNG.runSuitesSequentially(TestNG.java:1039) at org.testng.TestNG.runSuitesLocally(TestNG.java:964) at org.testng.TestNG.run(TestNG.java:900) at org.testng.TestNG.privateMain(TestNG.java:1182) at org.testng.TestNG.main(TestNG.java:1146) 201208160230:09.111 - INFO: ======= Starting Test: AutoSubscribeWithServiceLevel_Test([null, Standard]) (com.redhat.qe.auto.testng.TestNGListener.onTestStart) 201208160230:09.311 - FINE: ssh root.redhat.com subscription-manager service-level --show (com.redhat.qe.tools.SSHCommandRunner.run) 201208160230:10.788 - FINE: Stdout: Current service level: None (com.redhat.qe.tools.SSHCommandRunner.runCommandAndWait) Additional info: excerpt from the candlpin log (catalina.out) Aug 16 02:30:00 [QuartzScheduler_Worker-14] INFO org.candlepin.pinsetter.tasks.CertificateRevocationListTask - Executing CRL Job. CRL filePath=/var/lib/candlepin/candlepin-crl.crl Aug 16 02:30:00 [QuartzScheduler_Worker-14] INFO org.candlepin.util.CrlFileUtil - CRL File: /var/lib/candlepin/candlepin-crl.crl exists. Loading the old CRL Aug 16 02:30:00 [QuartzScheduler_Worker-14] INFO org.candlepin.controller.CrlGenerator - Old CRLNumber is : 191 Aug 16 02:30:00 [QuartzScheduler_Worker-14] INFO org.candlepin.pki.impl.BouncyCastlePKIUtility - Completed adding CRL numbers to the certificate. Aug 16 02:30:00 [QuartzScheduler_Worker-14] INFO org.candlepin.util.CrlFileUtil - Completed generating CRL. Writing it to disk Aug 16 02:30:02 [http-8443-2] INFO org.candlepin.resource.ConsumerResource - Consumer cbfc3d18-c9d0-448b-9401-92df9c914e6c updated. Aug 16 02:30:03 [http-8443-2] INFO org.candlepin.resource.ConsumerResource - Consumer cbfc3d18-c9d0-448b-9401-92df9c914e6c updated. Aug 16 02:30:03 [http-8443-2] INFO org.candlepin.controller.CandlepinPoolManager - Attempting for products on date: Thu Aug 16 02:30:03 EDT 2012 Aug 16 02:30:03 [http-8443-2] INFO org.candlepin.controller.CandlepinPoolManager - 100000000000003 Aug 16 02:30:03 [http-8443-2] INFO org.candlepin.controller.CandlepinPoolManager - 100000000000004 Aug 16 02:30:03 [http-8443-2] INFO org.candlepin.controller.CandlepinPoolManager - 100000000000005 Aug 16 02:30:03 [http-8443-2] INFO org.candlepin.controller.CandlepinPoolManager - 100000000000006 Aug 16 02:30:03 [http-8443-2] INFO org.candlepin.controller.CandlepinPoolManager - 27060 Aug 16 02:30:03 [http-8443-2] INFO org.candlepin.controller.CandlepinPoolManager - 100000000000000 Aug 16 02:30:03 [http-8443-2] INFO org.candlepin.controller.CandlepinPoolManager - 100000000000001 Aug 16 02:30:03 [http-8443-2] INFO org.candlepin.controller.CandlepinPoolManager - 100000000000002 Aug 16 02:30:03 [http-8443-2] INFO org.candlepin.controller.CandlepinPoolManager - 1144 Aug 16 02:30:03 [http-8443-2] INFO org.candlepin.controller.CandlepinPoolManager - 100000000000060 Aug 16 02:30:03 [http-8443-2] INFO org.candlepin.controller.CandlepinPoolManager - 100000000000022 Aug 16 02:30:03 [http-8443-2] INFO org.candlepin.controller.CandlepinPoolManager - 37070 Aug 16 02:30:03 [http-8443-2] INFO org.candlepin.controller.CandlepinPoolManager - 100000000000007 Aug 16 02:30:03 [http-8443-2] INFO org.candlepin.controller.CandlepinPoolManager - 100000000000008 Aug 16 02:30:03 [http-8443-2] INFO org.candlepin.controller.CandlepinPoolManager - 100000000000009 Aug 16 02:30:03 [http-8443-2] INFO org.candlepin.controller.CandlepinPoolManager - 37090 Aug 16 02:30:03 [http-8443-2] INFO org.candlepin.controller.CandlepinPoolManager - 100000000000020 Aug 16 02:30:03 [http-8443-2] INFO org.candlepin.controller.CandlepinPoolManager - 37065 Aug 16 02:30:03 [http-8443-2] INFO org.candlepin.controller.CandlepinPoolManager - 69 Aug 16 02:30:03 [http-8443-2] INFO org.candlepin.controller.CandlepinPoolManager - 37062 Aug 16 02:30:03 [http-8443-2] INFO org.candlepin.controller.CandlepinPoolManager - 37069 Aug 16 02:30:03 [http-8443-2] INFO org.candlepin.controller.CandlepinPoolManager - 917571 Aug 16 02:30:03 [http-8443-2] INFO org.candlepin.controller.CandlepinPoolManager - 37068 Aug 16 02:30:03 [http-8443-2] INFO org.candlepin.controller.CandlepinPoolManager - 37067 Aug 16 02:30:03 [http-8443-2] INFO org.candlepin.controller.CandlepinPoolManager - 100000000000013 Aug 16 02:30:03 [http-8443-2] INFO org.candlepin.controller.CandlepinPoolManager - 99000 Aug 16 02:30:03 [http-8443-2] INFO org.candlepin.controller.CandlepinPoolManager - 100000000000012 Aug 16 02:30:03 [http-8443-2] INFO org.candlepin.controller.CandlepinPoolManager - 100000000000011 Aug 16 02:30:03 [http-8443-2] INFO org.candlepin.controller.CandlepinPoolManager - 100000000000010 Aug 16 02:30:03 [http-8443-2] INFO org.candlepin.controller.CandlepinPoolManager - 37080 Aug 16 02:30:03 [http-8443-2] INFO org.candlepin.controller.CandlepinPoolManager - 37060 Aug 16 02:30:03 [http-8443-2] INFO org.candlepin.controller.CandlepinPoolManager - 100000000000015 Aug 16 02:30:03 [http-8443-2] INFO org.candlepin.controller.CandlepinPoolManager - 1 Aug 16 02:30:03 [http-8443-2] INFO org.candlepin.controller.CandlepinPoolManager - 88888 Aug 16 02:30:03 [http-8443-2] INFO org.candlepin.controller.CandlepinPoolManager - 1000000000000023 Aug 16 02:30:04 [http-8443-2] INFO org.candlepin.policy.js.entitlement.EntitlementRules - Selecting best entitlement pool for product: [100000000000003, 100000000000004, 100000000000005, 100000000000006, 27060, 100000000000000, 100000000000001, 100000000000002, 1144, 100000000000060, 100000000000022, 37070, 100000000000007, 100000000000008, 100000000000009, 37090, 100000000000020, 37065, 69, 37062, 37069, 917571, 37068, 37067, 100000000000013, 99000, 100000000000012, 100000000000011, 100000000000010, 37080, 37060, 100000000000015, 1, 88888, 1000000000000023] Aug 16 02:30:04 [http-8443-9] INFO org.candlepin.resource.ConsumerResource - Consumer cbfc3d18-c9d0-448b-9401-92df9c914e6c updated. Aug 16 02:30:05 [http-8443-2] INFO org.candlepin.policy.js.entitlement.EntitlementRules - Excuted javascript rule: select_pool_global Aug 16 02:30:06 [http-8443-2] WARN org.candlepin.jackson.JsonBeanPropertyFilter - Unable to serialize property 'canActivate without getter: getCanActivate Aug 16 02:30:06 [http-8443-2] WARN org.candlepin.jackson.JsonBeanPropertyFilter - Unable to serialize property 'autoheal without getter: getAutoheal Aug 16 02:30:06 [http-8443-2] WARN org.candlepin.jackson.JsonBeanPropertyFilter - Unable to serialize property 'canActivate without getter: getCanActivate Aug 16 02:30:06 [http-8443-2] WARN org.candlepin.jackson.JsonBeanPropertyFilter - Unable to serialize property 'autoheal without getter: getAutoheal Aug 16 02:30:06 [http-8443-2] WARN org.candlepin.jackson.JsonBeanPropertyFilter - Unable to serialize property 'canActivate without getter: getCanActivate Aug 16 02:30:06 [http-8443-2] WARN org.candlepin.jackson.JsonBeanPropertyFilter - Unable to serialize property 'autoheal without getter: getAutoheal Aug 16 02:30:06 [http-8443-2] WARN org.candlepin.jackson.JsonBeanPropertyFilter - Unable to serialize property 'canActivate without getter: getCanActivate Aug 16 02:30:06 [http-8443-2] WARN org.candlepin.jackson.JsonBeanPropertyFilter - Unable to serialize property 'autoheal without getter: getAutoheal Aug 16 02:30:06 [http-8443-2] WARN org.candlepin.jackson.JsonBeanPropertyFilter - Unable to serialize property 'canActivate without getter: getCanActivate Aug 16 02:30:06 [http-8443-2] WARN org.candlepin.jackson.JsonBeanPropertyFilter - Unable to serialize property 'autoheal without getter: getAutoheal Aug 16 02:30:06 [http-8443-2] WARN org.candlepin.jackson.JsonBeanPropertyFilter - Unable to serialize property 'canActivate without getter: getCanActivate Aug 16 02:30:06 [http-8443-2] WARN org.candlepin.jackson.JsonBeanPropertyFilter - Unable to serialize property 'autoheal without getter: getAutoheal
I have a thought on a potential cause... I see in catalina.out these two back-to-back update consumer transactions which occur at the start of the autosubscribe... Aug 16 02:30:02 [http-8443-2] INFO org.candlepin.resource.ConsumerResource - Consumer cbfc3d18-c9d0-448b-9401-92df9c914e6c updated. Aug 16 02:30:03 [http-8443-2] INFO org.candlepin.resource.ConsumerResource - Consumer cbfc3d18-c9d0-448b-9401-92df9c914e6c updated. Is it possible that a "chatty fact" (ref bug 838123) could be the cause for one of the consumer updates and the new serviceLevel specified for autosubscribe the cause for the other consumer update. Then if the updates are performed asynchronously, maybe the consumer update with the new serviceLevel attribute is getting clobbered by the other update (which has the former service level). Just a thought.
John could you verify something for me, in your testing for each of the steps you take, do you block and wait for the requests to return, or do you ever fire off requests to say update autoheal or subscribe without waiting for a past command to complete? I think your theory is probably quite likely to be correct. I don't see how it could be happening in subscription-manager though, as far as I know we would always wait for the update consumer to complete (when updating facts for example) before proceeding to the next step. The only possibility would perhaps be if rhsmcertd happened to run at exactly the same time. Out of curiosity, what interval do you have the rhsmcertd set to in this environment? I'm going to do a quick test with a big sleep in update consumer, firing off a facts update and then a set service level, almost surely this can cause what you are seeing. We will probably defer to 6.4 though as this would be a candlepin server side fix.
Successfully reproduced with a 10 second sleep in update consumer server side call, forcing facts to change, and firing off first a "set SLA" request, then an update facts request before the first one can finish. The second request loads a representation of the consumer before the first one has committed, so the SLA is the old one, both requests complete, but the second one overwrites the first, and SLA remains the same. My guess is either the test suite is firing off requests and not blocking waiting for a response, or rhsmcertd is running very frequently and happens to hit during an update consumer. John: debug logging might provide a little more info in terms of "facts changed" and what not if that is an option. (if you'd like to enable it and wait for it to happe nagain) I don't think there's anything to be fixed client side here (we can't block across rhsmcertd and the app itself, at least not easily), so we would have to use database locking on the consumer server side, similar to how we lock on pools. If the above theory is correct, the odds of setting an SLA while rhsmcertd running seem fairly low so probably not a high priority fix here. Pushing to 6.4.
John, are you still seeing this? I am tempted to close/wontfix.
I have not seen this error in awhile. Let's close this issue for now.