Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1139528

Summary: java.lang.NullPointerException at org.jbpm.process.core.timer.impl.QuartzSchedulerService$InmemoryTimerJobInstanceDelegate.getJobHandle()
Product: [Retired] JBoss BPMS Platform 6 Reporter: Toshiya Kobayashi <tkobayas>
Component: jBPM CoreAssignee: Alessandro Lazarotti <alazarot>
Status: CLOSED EOL QA Contact: Marek Baluch <mbaluch>
Severity: high Docs Contact:
Priority: unspecified    
Version: 6.0.2CC: mbaluch
Target Milestone: ER2   
Target Release: 6.1.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-03-27 20:08:29 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
jBPM6Ex49_quartz_reproducer.zip
none
BZ1139528-testcase-6.0.x-incomplete.diff
none
jBPM6Ex49_quartz_reproducer_mvn.zip none

Description Toshiya Kobayashi 2014-09-09 07:12:41 UTC
Description of problem:

Conditions to reproduce are
- QuartzSchedulerService
- PerProcessInstanceRuntimeManager
- A process has a Timer in a loop
- The timer has been fired at least once. Then restart the JVM(RuntimeManager)

After reboot, the timer fails with NullPointerException. This Exception is thrown continuously so the timer is no longer fired.

====
java.lang.RuntimeException: Unable to load session snapshot
	at org.drools.persistence.SessionMarshallingHelper.loadSnapshot(SessionMarshallingHelper.java:88)
	at org.drools.persistence.SingleSessionCommandService.initExistingKnowledgeSession(SingleSessionCommandService.java:263)
	at org.drools.persistence.SingleSessionCommandService.<init>(SingleSessionCommandService.java:196)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
	at org.drools.persistence.jpa.KnowledgeStoreServiceImpl.buildCommandService(KnowledgeStoreServiceImpl.java:122)
	at org.drools.persistence.jpa.KnowledgeStoreServiceImpl.loadKieSession(KnowledgeStoreServiceImpl.java:90)
	at org.drools.persistence.jpa.KnowledgeStoreServiceImpl.loadKieSession(KnowledgeStoreServiceImpl.java:39)
	at org.kie.internal.persistence.jpa.JPAKnowledgeService.loadStatefulKnowledgeSession(JPAKnowledgeService.java:130)
	at org.jbpm.runtime.manager.impl.factory.JPASessionFactory.findKieSessionById(JPASessionFactory.java:52)
	at org.jbpm.runtime.manager.impl.PerProcessInstanceRuntimeManager.getRuntimeEngine(PerProcessInstanceRuntimeManager.java:104)
	at org.jbpm.process.core.timer.impl.GlobalTimerService.getCommandService(GlobalTimerService.java:207)
	at org.jbpm.process.core.timer.impl.GlobalTimerService.getCommandService(GlobalTimerService.java:191)
	at org.jbpm.persistence.timer.GlobalJpaTimerJobInstance.call(GlobalJpaTimerJobInstance.java:61)
	at org.jbpm.persistence.timer.GlobalJpaTimerJobInstance.call(GlobalJpaTimerJobInstance.java:42)
	at org.jbpm.process.core.timer.impl.QuartzSchedulerService$QuartzJob.execute(QuartzSchedulerService.java:279)
	at org.quartz.core.JobRunShell.run(JobRunShell.java:216)
	at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:549)
Caused by: java.lang.NullPointerException
	at org.jbpm.process.core.timer.impl.QuartzSchedulerService$InmemoryTimerJobInstanceDelegate.getJobHandle(QuartzSchedulerService.java:320)
	at org.jbpm.process.core.timer.impl.QuartzSchedulerService.internalSchedule(QuartzSchedulerService.java:152)
	at org.jbpm.process.core.timer.impl.QuartzSchedulerService.internalSchedule(QuartzSchedulerService.java:191)
	at org.jbpm.process.core.timer.impl.DelegateSchedulerServiceInterceptor.internalSchedule(DelegateSchedulerServiceInterceptor.java:38)
	at org.jbpm.process.core.timer.impl.QuartzSchedulerService.scheduleJob(QuartzSchedulerService.java:118)
	at org.jbpm.process.core.timer.impl.GlobalTimerService.scheduleJob(GlobalTimerService.java:87)
	at org.jbpm.process.core.timer.impl.RegisteredTimerServiceDelegate.scheduleJob(RegisteredTimerServiceDelegate.java:65)
	at org.jbpm.process.instance.timer.TimerManager$ProcessTimerInputMarshaller.deserialize(TimerManager.java:265)
	at org.drools.core.marshalling.impl.ProtobufInputMarshaller.readTimer(ProtobufInputMarshaller.java:704)
	at org.drools.core.marshalling.impl.ProtobufInputMarshaller.readSession(ProtobufInputMarshaller.java:286)
	at org.drools.core.marshalling.impl.ProtobufInputMarshaller.readSession(ProtobufInputMarshaller.java:157)
	at org.drools.core.marshalling.impl.ProtobufMarshaller.unmarshall(ProtobufMarshaller.java:116)
	at org.drools.core.marshalling.impl.ProtobufMarshaller.unmarshall(ProtobufMarshaller.java:57)
	at org.drools.persistence.SessionMarshallingHelper.loadSnapshot(SessionMarshallingHelper.java:83)
	... 19 more
====

Steps to Reproduce:
1. Unzip jBPM6Ex49_quartz_reproducer.zip
2. Run ProcessMainJPA
3. After fishining ProcessMainJPA, run ProcessMainJPAContinue

Actual results:

Throws NullPointerException

Expected results:

Timer is fired.

Additional info:

This is not reproducible in community jBPM 6.1.0.Final because org.drools.core.impl.StatefulKnowledgeSessionImpl is Externalizable.

Comment 1 Toshiya Kobayashi 2014-09-09 07:14:36 UTC
Created attachment 935558 [details]
jBPM6Ex49_quartz_reproducer.zip

Comment 2 Toshiya Kobayashi 2014-09-09 07:56:57 UTC
[Analysis]
1) At first, the timer job id is '2-1-1'
2) After firing the timer, the next timer job id is '2-1-2'
3) After finishing ProcessMainJPA, the waiting timer job id is '2-1-2' (or '2-1-3'...). The id is persisted in QRTZ_JOB_DETAILS.
4) When starting ProcessMainJPAContinue, a timer thread looks for a job. But using a debugger, I observed that it is looking for '2-1-1'. (Why? I observed that the '1' is Timer's id in session which was read by ProtbufMashaller)

QuartzSchedulerService
====
 80     public JobHandle scheduleJob(Job job, JobContext ctx, Trigger trigger) {
...
 98         try {
 99             JobDetail jobDetail = scheduler.getJobDetail(jobname, "jbpm");
100 
101             if (jobDetail != null) {
102                 TimerJobInstance timerJobInstance = (TimerJobInstance) jobDetail.getJobDataMap().get("timerJobInstance");
103                 return timerJobInstance.getJobHandle();
104             }
105         } catch (SchedulerException e) {
====

5) scheduler.getJobDetail() cannot find '2-1-1' so it will create a jobInstance.

====
108         GlobalQuartzJobHandle quartzJobHandle = new GlobalQuartzJobHandle(id, jobname, "jbpm");
109         TimerJobInstance jobInstance = ((AcceptsTimerJobFactoryManager) globalTimerService).
110                 getTimerJobFactoryManager().createTimerJobInstance( job,
111                                                                     ctx,
112                                                                     trigger,
113                                                                     quartzJobHandle,
114                                                                     (InternalSchedulerService) globalTimerService );
115         quartzJobHandle.setTimerJobInstance( (TimerJobInstance) jobInstance );
116 
117         interceptor.internalSchedule(jobInstance);
====

6) But now, JobPersistenceException (caused by NotSerializableException) is thrown in scheduler.scheduleJob(jobq, triggerq). Because org.jbpm.process.instance.timer.TimerManager$OverdueTrigger holds StatefulKnowledgeSessionImpl which is not Serializable.

====
148     public void internalSchedule(TimerJobInstance timerJobInstance) {
149 
150         GlobalQuartzJobHandle quartzJobHandle = (GlobalQuartzJobHandle) timerJobInstance.getJobHandle();
151         // Define job instance
152         JobDetail jobq = new JobDetail(quartzJobHandle.getJobName(), quartzJobHandle.getJobGroup(), QuartzJob.class);
153 
154         jobq.getJobDataMap().put("timerJobInstance", timerJobInstance);
155 
156         // Define a Trigger that will fire "now"
157         org.quartz.Trigger triggerq = new SimpleTrigger(quartzJobHandle.getJobName()+"_trigger", quartzJobHandle.getJobGroup(), timerJobInstance.getTrigger().hasNextFireTime());
158 
159         // Schedule the job with the trigger
160         try {
161             if (scheduler.isShutdown()) {
162                 return;
163             }
164             ((AcceptsTimerJobFactoryManager) globalTimerService).getTimerJobFactoryManager().addTimerJobInstance( timerJobInstance );
165             JobDetail jobDetail = scheduler.getJobDetail(quartzJobHandle.getJobName(), quartzJobHandle.getJobGroup());
166             if (jobDetail == null) {
167                 scheduler.scheduleJob(jobq, triggerq);
168             } else {
169                 // need to add the job again to replace existing especially important if jobs are persisted in db
170                 scheduler.addJob(jobq, true);
171                 triggerq.setJobName(quartzJobHandle.getJobName());
172                 triggerq.setJobGroup(quartzJobHandle.getJobGroup());
173                 scheduler.rescheduleJob(quartzJobHandle.getJobName()+"_trigger", quartzJobHandle.getJobGroup(), triggerq);
174             }
175 
176         } catch (ObjectAlreadyExistsException e) {
177             // in general this should not happen even in clustered environment but just in case
178             // already registered jobs should be caught in scheduleJob but due to race conditions it might not 
179             // catch it in time - clustered deployments only
180             logger.warn("Job has already been scheduled, most likely running in cluster: {}", e.getMessage());
181 
182         } catch (JobPersistenceException e) {
183             if (e.getCause() instanceof NotSerializableException) {
184                 // in case job cannot be persisted, like rule timer then make it in memory
185                 internalSchedule(new InmemoryTimerJobInstanceDelegate(quartzJobHandle.getJobName(), ((GlobalTimerService) globalTimerService).getTimerServiceId()));
186             } else {
187                 ((AcceptsTimerJobFactoryManager) globalTimerService).getTimerJobFactoryManager().removeTimerJobInstance(timerJobInstance);
188                 throw new RuntimeException(e);
189             }
190         }
====

7) The JobPersistenceException/NotSerializableException seems to be intentionally caught. But the next try internalSchedule(new InmemoryTimerJobInstanceDelegate(quartzJobHandle.getJobName(), ((GlobalTimerService) globalTimerService).getTimerServiceId())) fails with NullPointerException as reported.

So... I'm sure which point is really a problem to be fixed but firstly, the id mismatch looks odd to me.

This issue is not reproducible in community jBPM 6.1.0.Final because org.drools.core.impl.StatefulKnowledgeSessionImpl is Externalizable. But I'm not sure if it is the enough fix.

Comment 3 Toshiya Kobayashi 2014-09-09 08:03:42 UTC
Created attachment 935571 [details]
BZ1139528-testcase-6.0.x-incomplete.diff

I'm sorry about not posting a unit test but an eclipse project. I couldn't reproduce the issue in GlobalQuartzDBTimerServiceTest somehow. I attach the the imcomplete unit test as diff for 6.0.x branch just in case.

Comment 4 Toshiya Kobayashi 2014-09-09 08:18:32 UTC
> So... I'm sure which point is really a problem to be fixed

oops,

 -> So... I'm not sure which point is really a problem to be fixed

Comment 5 Maciej Swiderski 2014-09-09 09:43:07 UTC
Toshiya,

could you please provide mavenized reproducer - it will be way easier to maintain dependencies?

As it comes to not being able to reproduce the problem in jbpm-test it might be because of quartz configuration as I can see you use same data source for managed and unmanaged connections. This is important that unmanaged will not be transactional data source. Take a look at who it is configured in jbpm-test:
https://github.com/droolsjbpm/jbpm/blob/master/jbpm-test/src/test/resources/quartz-db.properties#L61

https://github.com/droolsjbpm/jbpm/blob/master/jbpm-test/src/test/java/org/jbpm/test/timer/quartz/NonTransactionalConnectionProvider.java

if you could try with that configuration locally to confirm if the problem is still there that would be great.

The in memory timer support is only for special cases like rule timers used with processes as they should not be persisted and thus the explicit catch for not serializable exceptions. Process timers should never go into that catch block.

Comment 6 Toshiya Kobayashi 2014-09-10 05:00:24 UTC
Created attachment 935991 [details]
jBPM6Ex49_quartz_reproducer_mvn.zip

Comment 7 Toshiya Kobayashi 2014-09-10 05:10:20 UTC
Hi Maciej,

Attached jBPM6Ex49_quartz_reproducer_mvn.zip as mvn version.

1. Unzip jBPM6Ex49_quartz_reproducer_mvn.zip
2. cd jBPM6Ex49_quartz_reproducer_mvn
3. mvn clean test -Dtest=ProcessMainJPATest
After finishing ProcessMainJPATest,
4. mvn test -Dtest=ProcessMainJPAContinueTest

I can constantly reproduce the issue with the steps.

> This is important that unmanaged will not be transactional data source.

Thank you for the information.

I have changed the nonManagedTXDataSource settings with copying NonTransactionalConnectionProvider into this project. But the isuee is still reproducible.

org.quartz.dataSource.notManagedDS.connectionProvider.class=org.jbpm.test.timer.quartz.NonTransactionalConnectionProvider
org.quartz.dataSource.notManagedDS.driverClassName=org.h2.Driver
org.quartz.dataSource.notManagedDS.user=sa
org.quartz.dataSource.notManagedDS.password=
org.quartz.dataSource.notManagedDS.url=jdbc:h2:tcp://localhost/~/jbpm-db

jdbc:h2:tcp://localhost/~/jbpm-db is the same one which is used by JBPMHelper in this test. I'm not adding ";MVCC=TRUE" because it throws an Exception.

====
org.h2.jdbc.JdbcSQLException: Cannot change the setting "MVCC" when the database is already open; SQL statement:
SET MVCC TRUE [90133-168]
	at org.h2.message.DbException.getJdbcSQLException(DbException.java:329)
====

As the database is already opened by JBPMHelper with MVCC=TRUE, I think it's okay with this url for quartz.

Please let me know if you need any additional information / testing.

Thanks!

Comment 8 Maciej Swiderski 2014-09-10 13:18:07 UTC
fixed on master. main cause of this was with proper removal of completed timer job instances that are taken on ksession serialization. It kept the already completed jobs that on restart was causing failures.

jbpm
master:
https://github.com/droolsjbpm/jbpm/commit/ff59b332b5dc14ff9aab73616aa4169cf108dedd

Comment 9 Toshiya Kobayashi 2014-09-16 07:11:19 UTC
Thanks Maciej!

I have confirmed that patching BPMS 6.0.2 with this commit fixes the restarting issue.

As the next step, I'm testing it in quartz cluster and still seeing the NullPointerException. I have filed another BZ (https://bugzilla.redhat.com/show_bug.cgi?id=1142084). Please take a look.

Thanks!

Comment 10 Jiri Svitak 2014-12-11 15:51:25 UTC
Verified in BPM Suite 6.1.0.ER2 using the Toshiya's reproducer.

I also implemented a test for QE test suite:
https://gitlab.mw.lab.eng.bos.redhat.com/bxms/brms/commit/a70389ba1ece638ac88a96b6121b75b14aadf4c7