Description of problem: On Satellite 5.7 after some time of running I find mentioned traceback in log file. This problem was discovered on two other systems. .... org.quartz.ObjectAlreadyExistsException: Unable to store Trigger with name: 'immediateTrigger-server' and group: 'group1', because one already exists with this identification. .... In some case service rhn-search was stopped and searching doesn't work by webUI. Last message in log looks following: .... ERROR | wrapper | 2016/09/18 15:42:30 | JVM appears hung: Timed out waiting for signal from JVM. ERROR | wrapper | 2016/09/18 15:42:30 | JVM did not exit on request, terminated STATUS | wrapper | 2016/09/18 15:42:31 | JVM exited in response to signal SIGKILL (9). ERROR | wrapper | 2016/09/18 15:42:31 | Unable to start a JVM STATUS | wrapper | 2016/09/18 15:42:39 | <-- Wrapper Stopped Version-Release number of selected component (if applicable): spacewalk-search-2.3.0-8.el6sat.noarch How reproducible: randomly Steps to Reproduce: 1. After some time traceback is in log file Actual results: >> tail /var/log/rhn/search/rhn_search_daemon.log ... INFO | jvm 1 | 2016/09/20 17:07:28 | 2016-09-20 17:07:28,738 [Thread-26] WARN com.redhat.satellite.search.scheduler.ScheduleManager - Scheduling trigger: Trigger 'group1.immediateTrigger-server': triggerClass: 'org.quartz.SimpleTrigger isVolatile: false calendar: 'null' misfireInstruction: 0 nextFireTime: Tue Sep 20 17:07:28 EDT 2016 failed. INFO | jvm 1 | 2016/09/20 17:07:28 | 2016-09-20 17:07:28,841 [Thread-26] WARN com.redhat.satellite.search.scheduler.ScheduleManager - Exception was caught: INFO | jvm 1 | 2016/09/20 17:07:28 | org.quartz.ObjectAlreadyExistsException: Unable to store Trigger with name: 'immediateTrigger-server' and group: 'group1', because one already exists with this identification. INFO | jvm 1 | 2016/09/20 17:07:28 | at org.quartz.simpl.RAMJobStore.storeTrigger(RAMJobStore.java:314) INFO | jvm 1 | 2016/09/20 17:07:28 | at org.quartz.core.QuartzScheduler.scheduleJob(QuartzScheduler.java:849) INFO | jvm 1 | 2016/09/20 17:07:28 | at org.quartz.impl.StdScheduler.scheduleJob(StdScheduler.java:254) INFO | jvm 1 | 2016/09/20 17:07:28 | at com.redhat.satellite.search.scheduler.ScheduleManager.triggerIndexTask(ScheduleManager.java:199) INFO | jvm 1 | 2016/09/20 17:07:28 | at com.redhat.satellite.search.rpc.handlers.AdminHandler.updateIndex(AdminHandler.java:56) INFO | jvm 1 | 2016/09/20 17:07:28 | at sun.reflect.GeneratedMethodAccessor72.invoke(Unknown Source) INFO | jvm 1 | 2016/09/20 17:07:28 | at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:56) INFO | jvm 1 | 2016/09/20 17:07:28 | at java.lang.reflect.Method.invoke(Method.java:620) INFO | jvm 1 | 2016/09/20 17:07:28 | at redstone.xmlrpc.handlers.ReflectiveInvocationHandler.execute(ReflectiveInvocationHandler.java:217) INFO | jvm 1 | 2016/09/20 17:07:28 | at redstone.xmlrpc.handlers.ReflectiveInvocationHandler.invoke(ReflectiveInvocationHandler.java:133) INFO | jvm 1 | 2016/09/20 17:07:28 | at redstone.xmlrpc.XmlRpcDispatcher.dispatch(XmlRpcDispatcher.java:123) INFO | jvm 1 | 2016/09/20 17:07:28 | at redstone.xmlrpc.XmlRpcServer.execute(XmlRpcServer.java:83) INFO | jvm 1 | 2016/09/20 17:07:28 | at com.redhat.satellite.search.rpc.XmlRpcInvoker.handle(XmlRpcInvoker.java:73) INFO | jvm 1 | 2016/09/20 17:07:28 | at simple.http.Dispatcher.run(Unknown Source) INFO | jvm 1 | 2016/09/20 17:07:28 | at simple.util.process.Daemon.execute(Unknown Source) INFO | jvm 1 | 2016/09/20 17:07:28 | at simple.util.process.Daemon.run(Unknown Source) Expected results: Without traceback.
I see this same error in my spacewalk rhn_search_daemon.log. I'll take this bug and investigate.
This issue is caused when two systems perform rhn_check at the same time and concurrent calls to xmlrpc/registration.upgrade_version(1000026053, '23') occur. The client systems involved are physical systems... virtual systems do not appear to make this call. To reproduce, connect to two client systems identified as physical systems and simultaneously execute a series of rhn_check commands on the command line: # rhn_check; rhn_check; rhn_check; ..... I issued about 30 on each system and would see the issue occur 2-5 times as they ran.
After much more testing. I find that the up2dateAuth.py has a maybeUpdateVersion() function that isn't working as designed. It gets the version from the spacewalk server and compares to the version from the system itself via up2dateUtils.getVersion(). Only if these two versions differ should it be telling the server to update the version via s.registration.upgrade_version(...) Problem is this conditional is comparing a byte array from up2dateUtils.getVersion() to a value returned from rpclib.xmlrpclib.loads(getSystemId())[0][0]['os_release'] so they never are equal and the server is alway told to upgrade the version. I think this is ultimately what we should fix. And leave in place the log warning that the immediately scheduled trigger already exists.
continuing from comment 4 above: BUT... possibly this happens only on certain client systems, and likely the reason my physical systems do it, but not the virtual ones... The up2dateUtils.getVersion() call has many paths through it, depending on whether (1) "versionOverride" is set, (2) platform is 'deb' or something else, (2a) if something else, Providename is "redhat-release" or "distribution-release", etc. At least one path through is returning a byte array instead of a value, but likely not all paths through. I believe all paths through up2dateUtils.getVersion() should be returning a value, not a byte array. I believe we've uncovered an issue that could have wider impact.
BTW, I got the two calls to get version mixed... up2dateUtils.getVersion() gets the value from the server while the rpclib call gets the locally defined version. In other places where we make a call to get values from the server, the value is retrieved from the dictionary wrapped in a sstr() call. I've added this to up2dateUtils.py.
correct spacewalk commit: d94017eb49a6161673b6999339d9d6e1d5953612
I incorrectly created this as a Satellite BZ when it needed to be a RHEL BZ. I've cloned this BZ for each of RHEL 6 and RHEL 7 BZs: RHEL 6 - 1415744 RHEL 7 - 1415742 Closing this BZ as dupe of the RHEL 6 BZ to leave a trail. *** This bug has been marked as a duplicate of bug 1415744 ***