Description of problem: Seen by our DBA ----- Information for the OTHER waiting sessions ----- Session 470: sid: 470 ser: 41209 audsid: 397421 user: 87/RHQ flags: (0x45) USR/- flags_idl: (0x1) BSY/-/-/-/-/- flags2: (0x40009) -/-/INC pid: 62 O/S info: user: oracle, term: UNKNOWN, ospid: 2045 client details: O/S info: user: rhq, term: unknown, ospid: 1234 machine: vp25q03ad-hadoop098.iad.apple.com program: JDBC Thin Client application name: JDBC Thin Client, hash value=2546894660 current SQL: insert into RHQ_CONFIG (CTIME, MTIME, NOTES, VERSION, id) values (:1 , :2 , :3 , :4 , :5 ) Session 398: sid: 398 ser: 3403 audsid: 397429 user: 87/RHQ flags: (0x45) USR/- flags_idl: (0x1) BSY/-/-/-/-/- flags2: (0x40009) -/-/INC pid: 60 O/S info: user: oracle, term: UNKNOWN, ospid: 2088 client details: O/S info: user: rhq, term: unknown, ospid: 1234 application name: JDBC Thin Client, hash value=2546894660 current SQL: delete from RHQ_CONFIG where id=:1 ----- End of information for the OTHER waiting sessions ----- Information for THIS session: ----- Current SQL Statement for this session (sql_id=3b79gqjg9d7u2) ----- delete from RHQ_CONFIG where id in (select resourceco1_.CONFIGURATION_ID from RHQ_CONFIG_UPDATE resourceco1_, RHQ_RESOURCE resource2_ where resourceco1_.DTYPE='resource' and resourceco1_.CONFIG_RES_ID=resource2_.ID and (resourceco1_.CONFIG_RES_ID in (:1 )) and resourceco1_.CONFIGURATION_ID<>resource2_.RES_CONFIGURATION_ID) Version-Release number of selected component (if applicable): 4.12 How reproducible: Unclear, probably from doing a plugin update, or possibly a bulk delete of resources
Seems to come from: at org.rhq.enterprise.server.resource.ResourceManagerLocal$$$view23.uninventoryResourceAsyncWork(Unknown Source) [rhq-server.jar:4.12.0] at org.rhq.enterprise.server.scheduler.jobs.AsyncResourceDeleteJob.uninventoryResource(AsyncResourceDeleteJob.java:102) [rhq-server.jar:4.12.0] at org.rhq.enterprise.server.scheduler.jobs.AsyncResourceDeleteJob.executeJobCode(AsyncResourceDeleteJob.java:66) [rhq-server.jar:4.12.0] at org.rhq.enterprise.server.scheduler.jobs.AbstractStatefulJob.execute(AbstractStatefulJob.java:48) [rhq-server.jar:4.12.0] at org.rhq.enterprise.server.resource.metadata.ResourceMetadataManagerBean.removeObsoleteTypes(ResourceMetadataManagerBean.java:201) [rhq-server.jar:4.12.0] ... at org.rhq.enterprise.server.resource.metadata.PluginManagerLocal$$$view149.registerPlugin(Unknown Source) [rhq-server.jar:4.12.0]
// 2) Immediately remove the uninventoried resources by forcing the normally async work to run in-band new AsyncResourceDeleteJob().execute(null); // 3) Immediately finish removing the deleted types by forcing the normally async work to run in-band new PurgeResourceTypesJob().executeJobCode(null); Might not be a good idea to do this actually... Is it possible to simply chain the two together?
Right, looking at this it seems dangerous because these in-band executions could overlap scheduled executions. The question is how to get the behavior we want. We need the uninventoried resources to go away before we can purge the type completely. So the uninventory job code must execute to completion before the purge job code runs. And then we have the issue of existing scheduled jobs, which could already be in-progress. This should only be an issue for plugin updates executing after server startup, because the scheduled jobs haven't been started prior to the initial plugin update. I suppose we may also be at risk during on-demand plugin delete/purge. How to do this seems a little difficult because it's not possible afaik with the quartz api (at least for our ancient version) to determine if a job is actually executing (in a cluster-aware way, it is possible for a single node). So I think we need to maybe do something like: 1) Wait for there to be no resources in an uninventory state 2) pause the two relevant, scheduled jobs 3) execute the job code synchronously, like we do now 4) un-pause the jobs thoughts?