Bug 1139528
| Summary: | java.lang.NullPointerException at org.jbpm.process.core.timer.impl.QuartzSchedulerService$InmemoryTimerJobInstanceDelegate.getJobHandle() | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | [Retired] JBoss BPMS Platform 6 | Reporter: | Toshiya Kobayashi <tkobayas> | ||||||||
| Component: | jBPM Core | Assignee: | Alessandro Lazarotti <alazarot> | ||||||||
| Status: | CLOSED EOL | QA Contact: | Marek Baluch <mbaluch> | ||||||||
| Severity: | high | Docs Contact: | |||||||||
| Priority: | unspecified | ||||||||||
| Version: | 6.0.2 | CC: | mbaluch | ||||||||
| Target Milestone: | ER2 | ||||||||||
| Target Release: | 6.1.0 | ||||||||||
| Hardware: | Unspecified | ||||||||||
| OS: | Unspecified | ||||||||||
| Whiteboard: | |||||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||||
| Doc Text: | Story Points: | --- | |||||||||
| Clone Of: | Environment: | ||||||||||
| Last Closed: | 2020-03-27 20:08:29 UTC | Type: | Bug | ||||||||
| Regression: | --- | Mount Type: | --- | ||||||||
| Documentation: | --- | CRM: | |||||||||
| Verified Versions: | Category: | --- | |||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||
| Embargoed: | |||||||||||
| Attachments: |
|
||||||||||
|
Description
Toshiya Kobayashi
2014-09-09 07:12:41 UTC
Created attachment 935558 [details]
jBPM6Ex49_quartz_reproducer.zip
[Analysis]
1) At first, the timer job id is '2-1-1'
2) After firing the timer, the next timer job id is '2-1-2'
3) After finishing ProcessMainJPA, the waiting timer job id is '2-1-2' (or '2-1-3'...). The id is persisted in QRTZ_JOB_DETAILS.
4) When starting ProcessMainJPAContinue, a timer thread looks for a job. But using a debugger, I observed that it is looking for '2-1-1'. (Why? I observed that the '1' is Timer's id in session which was read by ProtbufMashaller)
QuartzSchedulerService
====
80 public JobHandle scheduleJob(Job job, JobContext ctx, Trigger trigger) {
...
98 try {
99 JobDetail jobDetail = scheduler.getJobDetail(jobname, "jbpm");
100
101 if (jobDetail != null) {
102 TimerJobInstance timerJobInstance = (TimerJobInstance) jobDetail.getJobDataMap().get("timerJobInstance");
103 return timerJobInstance.getJobHandle();
104 }
105 } catch (SchedulerException e) {
====
5) scheduler.getJobDetail() cannot find '2-1-1' so it will create a jobInstance.
====
108 GlobalQuartzJobHandle quartzJobHandle = new GlobalQuartzJobHandle(id, jobname, "jbpm");
109 TimerJobInstance jobInstance = ((AcceptsTimerJobFactoryManager) globalTimerService).
110 getTimerJobFactoryManager().createTimerJobInstance( job,
111 ctx,
112 trigger,
113 quartzJobHandle,
114 (InternalSchedulerService) globalTimerService );
115 quartzJobHandle.setTimerJobInstance( (TimerJobInstance) jobInstance );
116
117 interceptor.internalSchedule(jobInstance);
====
6) But now, JobPersistenceException (caused by NotSerializableException) is thrown in scheduler.scheduleJob(jobq, triggerq). Because org.jbpm.process.instance.timer.TimerManager$OverdueTrigger holds StatefulKnowledgeSessionImpl which is not Serializable.
====
148 public void internalSchedule(TimerJobInstance timerJobInstance) {
149
150 GlobalQuartzJobHandle quartzJobHandle = (GlobalQuartzJobHandle) timerJobInstance.getJobHandle();
151 // Define job instance
152 JobDetail jobq = new JobDetail(quartzJobHandle.getJobName(), quartzJobHandle.getJobGroup(), QuartzJob.class);
153
154 jobq.getJobDataMap().put("timerJobInstance", timerJobInstance);
155
156 // Define a Trigger that will fire "now"
157 org.quartz.Trigger triggerq = new SimpleTrigger(quartzJobHandle.getJobName()+"_trigger", quartzJobHandle.getJobGroup(), timerJobInstance.getTrigger().hasNextFireTime());
158
159 // Schedule the job with the trigger
160 try {
161 if (scheduler.isShutdown()) {
162 return;
163 }
164 ((AcceptsTimerJobFactoryManager) globalTimerService).getTimerJobFactoryManager().addTimerJobInstance( timerJobInstance );
165 JobDetail jobDetail = scheduler.getJobDetail(quartzJobHandle.getJobName(), quartzJobHandle.getJobGroup());
166 if (jobDetail == null) {
167 scheduler.scheduleJob(jobq, triggerq);
168 } else {
169 // need to add the job again to replace existing especially important if jobs are persisted in db
170 scheduler.addJob(jobq, true);
171 triggerq.setJobName(quartzJobHandle.getJobName());
172 triggerq.setJobGroup(quartzJobHandle.getJobGroup());
173 scheduler.rescheduleJob(quartzJobHandle.getJobName()+"_trigger", quartzJobHandle.getJobGroup(), triggerq);
174 }
175
176 } catch (ObjectAlreadyExistsException e) {
177 // in general this should not happen even in clustered environment but just in case
178 // already registered jobs should be caught in scheduleJob but due to race conditions it might not
179 // catch it in time - clustered deployments only
180 logger.warn("Job has already been scheduled, most likely running in cluster: {}", e.getMessage());
181
182 } catch (JobPersistenceException e) {
183 if (e.getCause() instanceof NotSerializableException) {
184 // in case job cannot be persisted, like rule timer then make it in memory
185 internalSchedule(new InmemoryTimerJobInstanceDelegate(quartzJobHandle.getJobName(), ((GlobalTimerService) globalTimerService).getTimerServiceId()));
186 } else {
187 ((AcceptsTimerJobFactoryManager) globalTimerService).getTimerJobFactoryManager().removeTimerJobInstance(timerJobInstance);
188 throw new RuntimeException(e);
189 }
190 }
====
7) The JobPersistenceException/NotSerializableException seems to be intentionally caught. But the next try internalSchedule(new InmemoryTimerJobInstanceDelegate(quartzJobHandle.getJobName(), ((GlobalTimerService) globalTimerService).getTimerServiceId())) fails with NullPointerException as reported.
So... I'm sure which point is really a problem to be fixed but firstly, the id mismatch looks odd to me.
This issue is not reproducible in community jBPM 6.1.0.Final because org.drools.core.impl.StatefulKnowledgeSessionImpl is Externalizable. But I'm not sure if it is the enough fix.
Created attachment 935571 [details] BZ1139528-testcase-6.0.x-incomplete.diff I'm sorry about not posting a unit test but an eclipse project. I couldn't reproduce the issue in GlobalQuartzDBTimerServiceTest somehow. I attach the the imcomplete unit test as diff for 6.0.x branch just in case. > So... I'm sure which point is really a problem to be fixed
oops,
-> So... I'm not sure which point is really a problem to be fixed
Toshiya, could you please provide mavenized reproducer - it will be way easier to maintain dependencies? As it comes to not being able to reproduce the problem in jbpm-test it might be because of quartz configuration as I can see you use same data source for managed and unmanaged connections. This is important that unmanaged will not be transactional data source. Take a look at who it is configured in jbpm-test: https://github.com/droolsjbpm/jbpm/blob/master/jbpm-test/src/test/resources/quartz-db.properties#L61 https://github.com/droolsjbpm/jbpm/blob/master/jbpm-test/src/test/java/org/jbpm/test/timer/quartz/NonTransactionalConnectionProvider.java if you could try with that configuration locally to confirm if the problem is still there that would be great. The in memory timer support is only for special cases like rule timers used with processes as they should not be persisted and thus the explicit catch for not serializable exceptions. Process timers should never go into that catch block. Created attachment 935991 [details]
jBPM6Ex49_quartz_reproducer_mvn.zip
Hi Maciej,
Attached jBPM6Ex49_quartz_reproducer_mvn.zip as mvn version.
1. Unzip jBPM6Ex49_quartz_reproducer_mvn.zip
2. cd jBPM6Ex49_quartz_reproducer_mvn
3. mvn clean test -Dtest=ProcessMainJPATest
After finishing ProcessMainJPATest,
4. mvn test -Dtest=ProcessMainJPAContinueTest
I can constantly reproduce the issue with the steps.
> This is important that unmanaged will not be transactional data source.
Thank you for the information.
I have changed the nonManagedTXDataSource settings with copying NonTransactionalConnectionProvider into this project. But the isuee is still reproducible.
org.quartz.dataSource.notManagedDS.connectionProvider.class=org.jbpm.test.timer.quartz.NonTransactionalConnectionProvider
org.quartz.dataSource.notManagedDS.driverClassName=org.h2.Driver
org.quartz.dataSource.notManagedDS.user=sa
org.quartz.dataSource.notManagedDS.password=
org.quartz.dataSource.notManagedDS.url=jdbc:h2:tcp://localhost/~/jbpm-db
jdbc:h2:tcp://localhost/~/jbpm-db is the same one which is used by JBPMHelper in this test. I'm not adding ";MVCC=TRUE" because it throws an Exception.
====
org.h2.jdbc.JdbcSQLException: Cannot change the setting "MVCC" when the database is already open; SQL statement:
SET MVCC TRUE [90133-168]
at org.h2.message.DbException.getJdbcSQLException(DbException.java:329)
====
As the database is already opened by JBPMHelper with MVCC=TRUE, I think it's okay with this url for quartz.
Please let me know if you need any additional information / testing.
Thanks!
fixed on master. main cause of this was with proper removal of completed timer job instances that are taken on ksession serialization. It kept the already completed jobs that on restart was causing failures. jbpm master: https://github.com/droolsjbpm/jbpm/commit/ff59b332b5dc14ff9aab73616aa4169cf108dedd Thanks Maciej! I have confirmed that patching BPMS 6.0.2 with this commit fixes the restarting issue. As the next step, I'm testing it in quartz cluster and still seeing the NullPointerException. I have filed another BZ (https://bugzilla.redhat.com/show_bug.cgi?id=1142084). Please take a look. Thanks! Verified in BPM Suite 6.1.0.ER2 using the Toshiya's reproducer. I also implemented a test for QE test suite: https://gitlab.mw.lab.eng.bos.redhat.com/bxms/brms/commit/a70389ba1ece638ac88a96b6121b75b14aadf4c7 |