1378441 – rhn-search traceback - immediateTrigger-server exists with certain identification

Bug 1378441 - rhn-search traceback - immediateTrigger-server exists with certain identification

Summary: rhn-search traceback - immediateTrigger-server exists with certain identific...

Keywords:
Status:	CLOSED DUPLICATE of bug 1415744
Alias:	None
Product:	Red Hat Satellite 5
Classification:	Red Hat
Component:	Server
Sub Component:
Version:	570
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	Eric Herget
QA Contact:	Red Hat Satellite QA List
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-09-22 12:23 UTC by Pavel Studeník
Modified:	2018-08-17 17:23 UTC (History)
CC List:	5 users (show)
Fixed In Version:	rhn-client-tools-2.5.1-4-sat
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1414579 1415742 1415744 (view as bug list)
Environment:
Last Closed:	2017-01-23 15:58:11 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Pavel Studeník 2016-09-22 12:23:36 UTC

Description of problem:
On Satellite 5.7 after some time of running I find mentioned traceback in log file. This problem was discovered on two other systems.

....
org.quartz.ObjectAlreadyExistsException: Unable to store Trigger with name: 'immediateTrigger-server' and group: 'group1', because one already exists with this identification.
....

In some case service rhn-search was stopped and searching doesn't work by webUI. Last message in log looks following:

....
ERROR  | wrapper  | 2016/09/18 15:42:30 | JVM appears hung: Timed out
waiting for signal from JVM.
ERROR  | wrapper  | 2016/09/18 15:42:30 | JVM did not exit on request,
terminated
STATUS | wrapper  | 2016/09/18 15:42:31 | JVM exited in response to
signal SIGKILL (9).
ERROR  | wrapper  | 2016/09/18 15:42:31 | Unable to start a JVM
STATUS | wrapper  | 2016/09/18 15:42:39 | <-- Wrapper Stopped


Version-Release number of selected component (if applicable):
spacewalk-search-2.3.0-8.el6sat.noarch

How reproducible:
randomly

Steps to Reproduce:
1. After some time traceback is in log file

Actual results:
>> tail /var/log/rhn/search/rhn_search_daemon.log
...
INFO   | jvm 1    | 2016/09/20 17:07:28 | 2016-09-20 17:07:28,738 [Thread-26] WARN  com.redhat.satellite.search.scheduler.ScheduleManager - Scheduling trigger: Trigger 'group1.immediateTrigger-server':  triggerClass: 'org.quartz.SimpleTrigger isVolatile: false calendar: 'null' misfireInstruction: 0 nextFireTime: Tue Sep 20 17:07:28 EDT 2016 failed.
INFO   | jvm 1    | 2016/09/20 17:07:28 | 2016-09-20 17:07:28,841 [Thread-26] WARN  com.redhat.satellite.search.scheduler.ScheduleManager - Exception was caught:
INFO   | jvm 1    | 2016/09/20 17:07:28 | org.quartz.ObjectAlreadyExistsException: Unable to store Trigger with name: 'immediateTrigger-server' and group: 'group1', because one already exists with this identification.
INFO   | jvm 1    | 2016/09/20 17:07:28 |   at org.quartz.simpl.RAMJobStore.storeTrigger(RAMJobStore.java:314)
INFO   | jvm 1    | 2016/09/20 17:07:28 |   at org.quartz.core.QuartzScheduler.scheduleJob(QuartzScheduler.java:849)
INFO   | jvm 1    | 2016/09/20 17:07:28 |   at org.quartz.impl.StdScheduler.scheduleJob(StdScheduler.java:254)
INFO   | jvm 1    | 2016/09/20 17:07:28 |   at com.redhat.satellite.search.scheduler.ScheduleManager.triggerIndexTask(ScheduleManager.java:199)
INFO   | jvm 1    | 2016/09/20 17:07:28 |   at com.redhat.satellite.search.rpc.handlers.AdminHandler.updateIndex(AdminHandler.java:56)
INFO   | jvm 1    | 2016/09/20 17:07:28 |   at sun.reflect.GeneratedMethodAccessor72.invoke(Unknown Source)
INFO   | jvm 1    | 2016/09/20 17:07:28 |   at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:56)
INFO   | jvm 1    | 2016/09/20 17:07:28 |   at java.lang.reflect.Method.invoke(Method.java:620)
INFO   | jvm 1    | 2016/09/20 17:07:28 |   at redstone.xmlrpc.handlers.ReflectiveInvocationHandler.execute(ReflectiveInvocationHandler.java:217)
INFO   | jvm 1    | 2016/09/20 17:07:28 |   at redstone.xmlrpc.handlers.ReflectiveInvocationHandler.invoke(ReflectiveInvocationHandler.java:133)
INFO   | jvm 1    | 2016/09/20 17:07:28 |   at redstone.xmlrpc.XmlRpcDispatcher.dispatch(XmlRpcDispatcher.java:123)
INFO   | jvm 1    | 2016/09/20 17:07:28 |   at redstone.xmlrpc.XmlRpcServer.execute(XmlRpcServer.java:83)
INFO   | jvm 1    | 2016/09/20 17:07:28 |   at com.redhat.satellite.search.rpc.XmlRpcInvoker.handle(XmlRpcInvoker.java:73)
INFO   | jvm 1    | 2016/09/20 17:07:28 |   at simple.http.Dispatcher.run(Unknown Source)
INFO   | jvm 1    | 2016/09/20 17:07:28 |   at simple.util.process.Daemon.execute(Unknown Source)
INFO   | jvm 1    | 2016/09/20 17:07:28 |   at simple.util.process.Daemon.run(Unknown Source)


Expected results:
Without traceback.

Comment 2 Eric Herget 2016-12-19 18:21:43 UTC

I see this same error in my spacewalk rhn_search_daemon.log.  I'll take this bug and investigate.

Comment 3 Eric Herget 2017-01-12 20:18:35 UTC

This issue is caused when two systems perform rhn_check at the same time and concurrent calls to xmlrpc/registration.upgrade_version(1000026053, '23') occur.  The client systems involved are physical systems... virtual systems do not appear to make this call.

To reproduce, connect to two client systems identified as physical systems and simultaneously execute a series of rhn_check commands on the command line:
# rhn_check; rhn_check; rhn_check; .....
I issued about 30 on each system and would see the issue occur 2-5 times as they ran.

Comment 4 Eric Herget 2017-01-12 21:04:24 UTC

After much more testing.  I find that the up2dateAuth.py has a maybeUpdateVersion() function that isn't working as designed.

It gets the version from the spacewalk server and compares to the version from the system itself via up2dateUtils.getVersion().

Only if these two versions differ should it be telling the server to update the version via s.registration.upgrade_version(...)

Problem is this conditional is comparing a byte array from up2dateUtils.getVersion() to a value returned from rpclib.xmlrpclib.loads(getSystemId())[0][0]['os_release']

so they never are equal and the server is alway told to upgrade the version.

I think this is ultimately what we should fix.  And leave in place the log warning that the immediately scheduled trigger already exists.

Comment 5 Eric Herget 2017-01-12 21:16:53 UTC

continuing from comment 4 above:

BUT... possibly this happens only on certain client systems, and likely the reason my physical systems do it, but not the virtual ones...

The up2dateUtils.getVersion() call has many paths through it, depending on whether (1) "versionOverride" is set, (2) platform is 'deb' or something else, (2a) if something else,  Providename is "redhat-release" or "distribution-release", etc.

At least one path through is returning a byte array instead of a value, but likely not all paths through.

I believe all paths through up2dateUtils.getVersion() should be returning a value, not a byte array.

I believe we've uncovered an issue that could have wider impact.

Comment 6 Eric Herget 2017-01-18 21:13:47 UTC

BTW, I got the two calls to get version mixed... up2dateUtils.getVersion() gets the value from the server while the rpclib call gets the locally defined version.  In other places where we make a call to get values from the server, the value is retrieved from the dictionary wrapped in a sstr() call.  I've added this to up2dateUtils.py.

Comment 9 Tomáš Kašpárek 2017-01-23 15:39:46 UTC

correct spacewalk commit: d94017eb49a6161673b6999339d9d6e1d5953612

Comment 10 Eric Herget 2017-01-23 15:58:11 UTC

I incorrectly created this as a Satellite BZ when it needed to be a RHEL BZ.  I've cloned this BZ for each of RHEL 6 and RHEL 7 BZs:
RHEL 6 - 1415744
RHEL 7 - 1415742

Closing this BZ as dupe of the RHEL 6 BZ to leave a trail.

*** This bug has been marked as a duplicate of bug 1415744 ***

Note You need to log in before you can comment on or make changes to this bug.