Bug 1414579 - rhn-search traceback - immediateTrigger-server exists with certain identification
Summary: rhn-search traceback - immediateTrigger-server exists with certain identific...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Spacewalk
Classification: Community
Component: Clients
Version: 2.6
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Eric Herget
QA Contact: Red Hat Satellite QA List
URL:
Whiteboard:
Depends On:
Blocks: 1415742 1415744 space27
TreeView+ depends on / blocked
 
Reported: 2017-01-18 21:24 UTC by Eric Herget
Modified: 2017-09-27 19:31 UTC (History)
6 users (show)

Fixed In Version: rhn-client-tools-2.7.4-1
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1378441
Environment:
Last Closed: 2017-09-27 19:31:20 UTC
Embargoed:


Attachments (Terms of Use)

Description Eric Herget 2017-01-18 21:24:12 UTC
+++ This bug was initially created as a clone of Bug #1378441 +++

Description of problem:
On Satellite 5.7 after some time of running I find mentioned traceback in log file. This problem was discovered on two other systems.

....
org.quartz.ObjectAlreadyExistsException: Unable to store Trigger with name: 'immediateTrigger-server' and group: 'group1', because one already exists with this identification.
....

In some case service rhn-search was stopped and searching doesn't work by webUI. Last message in log looks following:

....
ERROR  | wrapper  | 2016/09/18 15:42:30 | JVM appears hung: Timed out
waiting for signal from JVM.
ERROR  | wrapper  | 2016/09/18 15:42:30 | JVM did not exit on request,
terminated
STATUS | wrapper  | 2016/09/18 15:42:31 | JVM exited in response to
signal SIGKILL (9).
ERROR  | wrapper  | 2016/09/18 15:42:31 | Unable to start a JVM
STATUS | wrapper  | 2016/09/18 15:42:39 | <-- Wrapper Stopped


Version-Release number of selected component (if applicable):
spacewalk-search-2.3.0-8.el6sat.noarch

How reproducible:
randomly

Steps to Reproduce:
1. After some time traceback is in log file

Actual results:
>> tail /var/log/rhn/search/rhn_search_daemon.log
...
INFO   | jvm 1    | 2016/09/20 17:07:28 | 2016-09-20 17:07:28,738 [Thread-26] WARN  com.redhat.satellite.search.scheduler.ScheduleManager - Scheduling trigger: Trigger 'group1.immediateTrigger-server':  triggerClass: 'org.quartz.SimpleTrigger isVolatile: false calendar: 'null' misfireInstruction: 0 nextFireTime: Tue Sep 20 17:07:28 EDT 2016 failed.
INFO   | jvm 1    | 2016/09/20 17:07:28 | 2016-09-20 17:07:28,841 [Thread-26] WARN  com.redhat.satellite.search.scheduler.ScheduleManager - Exception was caught:
INFO   | jvm 1    | 2016/09/20 17:07:28 | org.quartz.ObjectAlreadyExistsException: Unable to store Trigger with name: 'immediateTrigger-server' and group: 'group1', because one already exists with this identification.
INFO   | jvm 1    | 2016/09/20 17:07:28 |   at org.quartz.simpl.RAMJobStore.storeTrigger(RAMJobStore.java:314)
INFO   | jvm 1    | 2016/09/20 17:07:28 |   at org.quartz.core.QuartzScheduler.scheduleJob(QuartzScheduler.java:849)
INFO   | jvm 1    | 2016/09/20 17:07:28 |   at org.quartz.impl.StdScheduler.scheduleJob(StdScheduler.java:254)
INFO   | jvm 1    | 2016/09/20 17:07:28 |   at com.redhat.satellite.search.scheduler.ScheduleManager.triggerIndexTask(ScheduleManager.java:199)
INFO   | jvm 1    | 2016/09/20 17:07:28 |   at com.redhat.satellite.search.rpc.handlers.AdminHandler.updateIndex(AdminHandler.java:56)
INFO   | jvm 1    | 2016/09/20 17:07:28 |   at sun.reflect.GeneratedMethodAccessor72.invoke(Unknown Source)
INFO   | jvm 1    | 2016/09/20 17:07:28 |   at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:56)
INFO   | jvm 1    | 2016/09/20 17:07:28 |   at java.lang.reflect.Method.invoke(Method.java:620)
INFO   | jvm 1    | 2016/09/20 17:07:28 |   at redstone.xmlrpc.handlers.ReflectiveInvocationHandler.execute(ReflectiveInvocationHandler.java:217)
INFO   | jvm 1    | 2016/09/20 17:07:28 |   at redstone.xmlrpc.handlers.ReflectiveInvocationHandler.invoke(ReflectiveInvocationHandler.java:133)
INFO   | jvm 1    | 2016/09/20 17:07:28 |   at redstone.xmlrpc.XmlRpcDispatcher.dispatch(XmlRpcDispatcher.java:123)
INFO   | jvm 1    | 2016/09/20 17:07:28 |   at redstone.xmlrpc.XmlRpcServer.execute(XmlRpcServer.java:83)
INFO   | jvm 1    | 2016/09/20 17:07:28 |   at com.redhat.satellite.search.rpc.XmlRpcInvoker.handle(XmlRpcInvoker.java:73)
INFO   | jvm 1    | 2016/09/20 17:07:28 |   at simple.http.Dispatcher.run(Unknown Source)
INFO   | jvm 1    | 2016/09/20 17:07:28 |   at simple.util.process.Daemon.execute(Unknown Source)
INFO   | jvm 1    | 2016/09/20 17:07:28 |   at simple.util.process.Daemon.run(Unknown Source)


Expected results:
Without traceback.

--- Additional comment from Eric Herget on 2016-12-19 13:21:43 EST ---

I see this same error in my spacewalk rhn_search_daemon.log.  I'll take this bug and investigate.

--- Additional comment from Eric Herget on 2017-01-12 15:18:35 EST ---

This issue is caused when two systems perform rhn_check at the same time and concurrent calls to xmlrpc/registration.upgrade_version(1000026053, '23') occur.  The client systems involved are physical systems... virtual systems do not appear to make this call.

To reproduce, connect to two client systems identified as physical systems and simultaneously execute a series of rhn_check commands on the command line:
# rhn_check; rhn_check; rhn_check; .....
I issued about 30 on each system and would see the issue occur 2-5 times as they ran.

--- Additional comment from Eric Herget on 2017-01-12 16:04:24 EST ---

After much more testing.  I find that the up2dateAuth.py has a maybeUpdateVersion() function that isn't working as designed.

It gets the version from the spacewalk server and compares to the version from the system itself via up2dateUtils.getVersion().

Only if these two versions differ should it be telling the server to update the version via s.registration.upgrade_version(...)

Problem is this conditional is comparing a byte array from up2dateUtils.getVersion() to a value returned from rpclib.xmlrpclib.loads(getSystemId())[0][0]['os_release']

so they never are equal and the server is alway told to upgrade the version.

I think this is ultimately what we should fix.  And leave in place the log warning that the immediately scheduled trigger already exists.

--- Additional comment from Eric Herget on 2017-01-12 16:16:53 EST ---

continuing from comment 4 above:

BUT... possibly this happens only on certain client systems, and likely the reason my physical systems do it, but not the virtual ones...

The up2dateUtils.getVersion() call has many paths through it, depending on whether (1) "versionOverride" is set, (2) platform is 'deb' or something else, (2a) if something else,  Providename is "redhat-release" or "distribution-release", etc.

At least one path through is returning a byte array instead of a value, but likely not all paths through.

I believe all paths through up2dateUtils.getVersion() should be returning a value, not a byte array.

I believe we've uncovered an issue that could have wider impact.

--- Additional comment from Eric Herget on 2017-01-18 16:13:47 EST ---

BTW, I got the two calls to get version mixed... up2dateUtils.getVersion() gets the value from the server while the rpclib call gets the locally defined version.  In other places where we make a call to get values from the server, the value is retrieved from the dictionary wrapped in a sstr() call.  I've added this to up2dateUtils.py.

Comment 1 Eric Herget 2017-01-18 21:27:42 UTC
spacewalk.github:
d94017eb49a6161673b6999339d9d6e1d5953612

Comment 4 Eric Herget 2017-02-01 17:51:16 UTC
issue identified in comment #2 fixed in
spacewalk.github:
eed4b7cf20ff5316a746f9df3ffa70a24c29b206

Comment 5 Eric Herget 2017-09-27 19:31:20 UTC
Spacewalk 2.7 has been released.

https://github.com/spacewalkproject/spacewalk/wiki/ReleaseNotes27


Note You need to log in before you can comment on or make changes to this bug.