Bug 1378441

Summary: rhn-search traceback - immediateTrigger-server exists with certain identification
Product: Red Hat Satellite 5 Reporter: Pavel Studeník <pstudeni>
Component: ServerAssignee: Eric Herget <eherget>
Status: CLOSED DUPLICATE QA Contact: Red Hat Satellite QA List <satqe-list>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 570CC: eherget, lhellebr, risantam, tkasparek, tlestach
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: rhn-client-tools-2.5.1-4-sat Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1414579 1415742 1415744 (view as bug list) Environment:
Last Closed: 2017-01-23 15:58:11 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Pavel Studeník 2016-09-22 12:23:36 UTC
Description of problem:
On Satellite 5.7 after some time of running I find mentioned traceback in log file. This problem was discovered on two other systems.

....
org.quartz.ObjectAlreadyExistsException: Unable to store Trigger with name: 'immediateTrigger-server' and group: 'group1', because one already exists with this identification.
....

In some case service rhn-search was stopped and searching doesn't work by webUI. Last message in log looks following:

....
ERROR  | wrapper  | 2016/09/18 15:42:30 | JVM appears hung: Timed out
waiting for signal from JVM.
ERROR  | wrapper  | 2016/09/18 15:42:30 | JVM did not exit on request,
terminated
STATUS | wrapper  | 2016/09/18 15:42:31 | JVM exited in response to
signal SIGKILL (9).
ERROR  | wrapper  | 2016/09/18 15:42:31 | Unable to start a JVM
STATUS | wrapper  | 2016/09/18 15:42:39 | <-- Wrapper Stopped


Version-Release number of selected component (if applicable):
spacewalk-search-2.3.0-8.el6sat.noarch

How reproducible:
randomly

Steps to Reproduce:
1. After some time traceback is in log file

Actual results:
>> tail /var/log/rhn/search/rhn_search_daemon.log
...
INFO   | jvm 1    | 2016/09/20 17:07:28 | 2016-09-20 17:07:28,738 [Thread-26] WARN  com.redhat.satellite.search.scheduler.ScheduleManager - Scheduling trigger: Trigger 'group1.immediateTrigger-server':  triggerClass: 'org.quartz.SimpleTrigger isVolatile: false calendar: 'null' misfireInstruction: 0 nextFireTime: Tue Sep 20 17:07:28 EDT 2016 failed.
INFO   | jvm 1    | 2016/09/20 17:07:28 | 2016-09-20 17:07:28,841 [Thread-26] WARN  com.redhat.satellite.search.scheduler.ScheduleManager - Exception was caught:
INFO   | jvm 1    | 2016/09/20 17:07:28 | org.quartz.ObjectAlreadyExistsException: Unable to store Trigger with name: 'immediateTrigger-server' and group: 'group1', because one already exists with this identification.
INFO   | jvm 1    | 2016/09/20 17:07:28 |   at org.quartz.simpl.RAMJobStore.storeTrigger(RAMJobStore.java:314)
INFO   | jvm 1    | 2016/09/20 17:07:28 |   at org.quartz.core.QuartzScheduler.scheduleJob(QuartzScheduler.java:849)
INFO   | jvm 1    | 2016/09/20 17:07:28 |   at org.quartz.impl.StdScheduler.scheduleJob(StdScheduler.java:254)
INFO   | jvm 1    | 2016/09/20 17:07:28 |   at com.redhat.satellite.search.scheduler.ScheduleManager.triggerIndexTask(ScheduleManager.java:199)
INFO   | jvm 1    | 2016/09/20 17:07:28 |   at com.redhat.satellite.search.rpc.handlers.AdminHandler.updateIndex(AdminHandler.java:56)
INFO   | jvm 1    | 2016/09/20 17:07:28 |   at sun.reflect.GeneratedMethodAccessor72.invoke(Unknown Source)
INFO   | jvm 1    | 2016/09/20 17:07:28 |   at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:56)
INFO   | jvm 1    | 2016/09/20 17:07:28 |   at java.lang.reflect.Method.invoke(Method.java:620)
INFO   | jvm 1    | 2016/09/20 17:07:28 |   at redstone.xmlrpc.handlers.ReflectiveInvocationHandler.execute(ReflectiveInvocationHandler.java:217)
INFO   | jvm 1    | 2016/09/20 17:07:28 |   at redstone.xmlrpc.handlers.ReflectiveInvocationHandler.invoke(ReflectiveInvocationHandler.java:133)
INFO   | jvm 1    | 2016/09/20 17:07:28 |   at redstone.xmlrpc.XmlRpcDispatcher.dispatch(XmlRpcDispatcher.java:123)
INFO   | jvm 1    | 2016/09/20 17:07:28 |   at redstone.xmlrpc.XmlRpcServer.execute(XmlRpcServer.java:83)
INFO   | jvm 1    | 2016/09/20 17:07:28 |   at com.redhat.satellite.search.rpc.XmlRpcInvoker.handle(XmlRpcInvoker.java:73)
INFO   | jvm 1    | 2016/09/20 17:07:28 |   at simple.http.Dispatcher.run(Unknown Source)
INFO   | jvm 1    | 2016/09/20 17:07:28 |   at simple.util.process.Daemon.execute(Unknown Source)
INFO   | jvm 1    | 2016/09/20 17:07:28 |   at simple.util.process.Daemon.run(Unknown Source)


Expected results:
Without traceback.

Comment 2 Eric Herget 2016-12-19 18:21:43 UTC
I see this same error in my spacewalk rhn_search_daemon.log.  I'll take this bug and investigate.

Comment 3 Eric Herget 2017-01-12 20:18:35 UTC
This issue is caused when two systems perform rhn_check at the same time and concurrent calls to xmlrpc/registration.upgrade_version(1000026053, '23') occur.  The client systems involved are physical systems... virtual systems do not appear to make this call.

To reproduce, connect to two client systems identified as physical systems and simultaneously execute a series of rhn_check commands on the command line:
# rhn_check; rhn_check; rhn_check; .....
I issued about 30 on each system and would see the issue occur 2-5 times as they ran.

Comment 4 Eric Herget 2017-01-12 21:04:24 UTC
After much more testing.  I find that the up2dateAuth.py has a maybeUpdateVersion() function that isn't working as designed.

It gets the version from the spacewalk server and compares to the version from the system itself via up2dateUtils.getVersion().

Only if these two versions differ should it be telling the server to update the version via s.registration.upgrade_version(...)

Problem is this conditional is comparing a byte array from up2dateUtils.getVersion() to a value returned from rpclib.xmlrpclib.loads(getSystemId())[0][0]['os_release']

so they never are equal and the server is alway told to upgrade the version.

I think this is ultimately what we should fix.  And leave in place the log warning that the immediately scheduled trigger already exists.

Comment 5 Eric Herget 2017-01-12 21:16:53 UTC
continuing from comment 4 above:

BUT... possibly this happens only on certain client systems, and likely the reason my physical systems do it, but not the virtual ones...

The up2dateUtils.getVersion() call has many paths through it, depending on whether (1) "versionOverride" is set, (2) platform is 'deb' or something else, (2a) if something else,  Providename is "redhat-release" or "distribution-release", etc.

At least one path through is returning a byte array instead of a value, but likely not all paths through.

I believe all paths through up2dateUtils.getVersion() should be returning a value, not a byte array.

I believe we've uncovered an issue that could have wider impact.

Comment 6 Eric Herget 2017-01-18 21:13:47 UTC
BTW, I got the two calls to get version mixed... up2dateUtils.getVersion() gets the value from the server while the rpclib call gets the locally defined version.  In other places where we make a call to get values from the server, the value is retrieved from the dictionary wrapped in a sstr() call.  I've added this to up2dateUtils.py.

Comment 9 Tomáš Kašpárek 2017-01-23 15:39:46 UTC
correct spacewalk commit: d94017eb49a6161673b6999339d9d6e1d5953612

Comment 10 Eric Herget 2017-01-23 15:58:11 UTC
I incorrectly created this as a Satellite BZ when it needed to be a RHEL BZ.  I've cloned this BZ for each of RHEL 6 and RHEL 7 BZs:
RHEL 6 - 1415744
RHEL 7 - 1415742

Closing this BZ as dupe of the RHEL 6 BZ to leave a trail.

*** This bug has been marked as a duplicate of bug 1415744 ***