+++ This bug was initially created as a clone of Bug #1378441 +++ Description of problem: On Satellite 5.7 after some time of running I find mentioned traceback in log file. This problem was discovered on two other systems. .... org.quartz.ObjectAlreadyExistsException: Unable to store Trigger with name: 'immediateTrigger-server' and group: 'group1', because one already exists with this identification. .... In some case service rhn-search was stopped and searching doesn't work by webUI. Last message in log looks following: .... ERROR | wrapper | 2016/09/18 15:42:30 | JVM appears hung: Timed out waiting for signal from JVM. ERROR | wrapper | 2016/09/18 15:42:30 | JVM did not exit on request, terminated STATUS | wrapper | 2016/09/18 15:42:31 | JVM exited in response to signal SIGKILL (9). ERROR | wrapper | 2016/09/18 15:42:31 | Unable to start a JVM STATUS | wrapper | 2016/09/18 15:42:39 | <-- Wrapper Stopped Version-Release number of selected component (if applicable): spacewalk-search-2.3.0-8.el6sat.noarch How reproducible: randomly Steps to Reproduce: 1. After some time traceback is in log file Actual results: >> tail /var/log/rhn/search/rhn_search_daemon.log ... INFO | jvm 1 | 2016/09/20 17:07:28 | 2016-09-20 17:07:28,738 [Thread-26] WARN com.redhat.satellite.search.scheduler.ScheduleManager - Scheduling trigger: Trigger 'group1.immediateTrigger-server': triggerClass: 'org.quartz.SimpleTrigger isVolatile: false calendar: 'null' misfireInstruction: 0 nextFireTime: Tue Sep 20 17:07:28 EDT 2016 failed. INFO | jvm 1 | 2016/09/20 17:07:28 | 2016-09-20 17:07:28,841 [Thread-26] WARN com.redhat.satellite.search.scheduler.ScheduleManager - Exception was caught: INFO | jvm 1 | 2016/09/20 17:07:28 | org.quartz.ObjectAlreadyExistsException: Unable to store Trigger with name: 'immediateTrigger-server' and group: 'group1', because one already exists with this identification. INFO | jvm 1 | 2016/09/20 17:07:28 | at org.quartz.simpl.RAMJobStore.storeTrigger(RAMJobStore.java:314) INFO | jvm 1 | 2016/09/20 17:07:28 | at org.quartz.core.QuartzScheduler.scheduleJob(QuartzScheduler.java:849) INFO | jvm 1 | 2016/09/20 17:07:28 | at org.quartz.impl.StdScheduler.scheduleJob(StdScheduler.java:254) INFO | jvm 1 | 2016/09/20 17:07:28 | at com.redhat.satellite.search.scheduler.ScheduleManager.triggerIndexTask(ScheduleManager.java:199) INFO | jvm 1 | 2016/09/20 17:07:28 | at com.redhat.satellite.search.rpc.handlers.AdminHandler.updateIndex(AdminHandler.java:56) INFO | jvm 1 | 2016/09/20 17:07:28 | at sun.reflect.GeneratedMethodAccessor72.invoke(Unknown Source) INFO | jvm 1 | 2016/09/20 17:07:28 | at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:56) INFO | jvm 1 | 2016/09/20 17:07:28 | at java.lang.reflect.Method.invoke(Method.java:620) INFO | jvm 1 | 2016/09/20 17:07:28 | at redstone.xmlrpc.handlers.ReflectiveInvocationHandler.execute(ReflectiveInvocationHandler.java:217) INFO | jvm 1 | 2016/09/20 17:07:28 | at redstone.xmlrpc.handlers.ReflectiveInvocationHandler.invoke(ReflectiveInvocationHandler.java:133) INFO | jvm 1 | 2016/09/20 17:07:28 | at redstone.xmlrpc.XmlRpcDispatcher.dispatch(XmlRpcDispatcher.java:123) INFO | jvm 1 | 2016/09/20 17:07:28 | at redstone.xmlrpc.XmlRpcServer.execute(XmlRpcServer.java:83) INFO | jvm 1 | 2016/09/20 17:07:28 | at com.redhat.satellite.search.rpc.XmlRpcInvoker.handle(XmlRpcInvoker.java:73) INFO | jvm 1 | 2016/09/20 17:07:28 | at simple.http.Dispatcher.run(Unknown Source) INFO | jvm 1 | 2016/09/20 17:07:28 | at simple.util.process.Daemon.execute(Unknown Source) INFO | jvm 1 | 2016/09/20 17:07:28 | at simple.util.process.Daemon.run(Unknown Source) Expected results: Without traceback. --- Additional comment from Eric Herget on 2016-12-19 13:21:43 EST --- I see this same error in my spacewalk rhn_search_daemon.log. I'll take this bug and investigate. --- Additional comment from Eric Herget on 2017-01-12 15:18:35 EST --- This issue is caused when two systems perform rhn_check at the same time and concurrent calls to xmlrpc/registration.upgrade_version(1000026053, '23') occur. The client systems involved are physical systems... virtual systems do not appear to make this call. To reproduce, connect to two client systems identified as physical systems and simultaneously execute a series of rhn_check commands on the command line: # rhn_check; rhn_check; rhn_check; ..... I issued about 30 on each system and would see the issue occur 2-5 times as they ran. --- Additional comment from Eric Herget on 2017-01-12 16:04:24 EST --- After much more testing. I find that the up2dateAuth.py has a maybeUpdateVersion() function that isn't working as designed. It gets the version from the spacewalk server and compares to the version from the system itself via up2dateUtils.getVersion(). Only if these two versions differ should it be telling the server to update the version via s.registration.upgrade_version(...) Problem is this conditional is comparing a byte array from up2dateUtils.getVersion() to a value returned from rpclib.xmlrpclib.loads(getSystemId())[0][0]['os_release'] so they never are equal and the server is alway told to upgrade the version. I think this is ultimately what we should fix. And leave in place the log warning that the immediately scheduled trigger already exists. --- Additional comment from Eric Herget on 2017-01-12 16:16:53 EST --- continuing from comment 4 above: BUT... possibly this happens only on certain client systems, and likely the reason my physical systems do it, but not the virtual ones... The up2dateUtils.getVersion() call has many paths through it, depending on whether (1) "versionOverride" is set, (2) platform is 'deb' or something else, (2a) if something else, Providename is "redhat-release" or "distribution-release", etc. At least one path through is returning a byte array instead of a value, but likely not all paths through. I believe all paths through up2dateUtils.getVersion() should be returning a value, not a byte array. I believe we've uncovered an issue that could have wider impact. --- Additional comment from Eric Herget on 2017-01-18 16:13:47 EST --- BTW, I got the two calls to get version mixed... up2dateUtils.getVersion() gets the value from the server while the rpclib call gets the locally defined version. In other places where we make a call to get values from the server, the value is retrieved from the dictionary wrapped in a sstr() call. I've added this to up2dateUtils.py.
spacewalk.github: d94017eb49a6161673b6999339d9d6e1d5953612
issue identified in comment #2 fixed in spacewalk.github: eed4b7cf20ff5316a746f9df3ffa70a24c29b206
Spacewalk 2.7 has been released. https://github.com/spacewalkproject/spacewalk/wiki/ReleaseNotes27