Description of problem: It seems there is a possibility of deadlock. I'm not sure what is the problematic SQL exactly, as there is no precise match. Deadlock graph: ---------Blocker(s)-------- ---------Waiter(s)--------- Resource Name process session holds waits process session holds waits TX-00060012-00080534 54 208 X 75 107 X TX-002d0005-0002ae45 75 107 X 54 208 X session 208: DID 0001-0036-24B4BB3C session 107: DID 0001-004B-0B812C4E session 107: DID 0001-004B-0B812C4E session 208: DID 0001-0036-24B4BB3C Rows waited on: Session 208: obj - rowid = 00014DA7 - AAAVXjAAFAAES3IAAG (dictionary objn - 85415, file - 5, block - 1125832, slot - 6) Session 107: obj - rowid = 00014DA7 - AAAVXjAAFAAAJp/AAB (dictionary objn - 85415, file - 5, block - 39551, slot - 1) ----- Information for the OTHER waiting sessions ----- Session 107: sid: 107 ser: 1531 audsid: 409493 user: 87/RHQ flags: (0x45) USR/- flags_idl: (0x1) BSY/-/-/-/-/- flags2: (0x40009) -/-/INC pid: 75 O/S info: user: oracle, term: UNKNOWN, ospid: 28705 image: oracle.apple.com client details: O/S info: user: rhq, term: unknown, ospid: 1234 machine: vp25q03ad-hadoop098.iad.apple.com program: JDBC Thin Client application name: JDBC Thin Client, hash value=2546894660 current SQL: delete from RHQ_RESOURCE_ERROR where RESOURCE_ID=:1 and ERROR_TYPE=:2 ----- End of information for the OTHER waiting sessions ----- Information for THIS session: ----- Current SQL Statement for this session (sql_id=489wxtgjx80ct) ----- delete from RHQ_RESOURCE_ERROR where RESOURCE_ID=:1 and ERROR_TYPE=:2 My suspicion is this: @NamedQuery(name = ResourceError.QUERY_FIND_BY_RESOURCE_ID_AND_ERROR_TYPE, query = "SELECT re FROM ResourceError re WHERE re.resource.id = :resourceId AND re.errorType = :errorType") }) Which is used here: // finally let's remove the potential previous upgrade error. we've now successfully // upgraded the resource. List<ResourceError> upgradeErrors = resourceManager.findResourceErrors(subjectManager.getOverlord(), resource.getId(), ResourceErrorType.UPGRADE); for (ResourceError error : upgradeErrors) { entityManager.remove(error); } But the SQL isn't for delete, it is for select. I'm wondering if Hibernate is optimizing this somehow. Version-Release number of selected component (if applicable): 4.12 How reproducible: Unclear. Perhaps if two agents connect to one server and call into: @Override @SuppressWarnings("deprecation") public Set<ResourceUpgradeResponse> upgradeResources(Set<ResourceUpgradeRequest> upgradeRequests) { Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
I've been looking at the code and trying to understand the deadlock report throughout the evening. Due to the type of locks (row locks) and the query involved, it looks like it's the consequence of concurrent calls to ResourceManagerBean#clearResourceConfigErrorByType with the same parameters. Such concurrent calls would not necessarily lead to a deadlock. It would happen only if the row locking order is different. My knowledge of Oracle is limited but I'd say this is possible (if the two queries get a different execution plan) but very unlikely. How often does it happen? Maybe purging the table as requested per Bug 1139433 will be enough. Besides, as only one of the two sessions will be rolled-back by Oracle, the data will be deleted anyway.
Sorry to have potentially wasted a lot your time. I report bugs that I see, often without knowing or reporting the severity or impact. My thinking, though, is it isn't really a row lock, since the delete can delete N or more rows. Basically, the delete in session 1 is waiting for any sort of other inserts or deletes to take place in the other session, before it can commit. I do see something similar with rhq_config table, where two sessions attempt to delete the same row, but I don't really understand how that could be a deadlock, either.
(In reply to Elias Ross from comment #2) > Sorry to have potentially wasted a lot your time. I report bugs that I see, > often without knowing or reporting the severity or impact. No worries. > My thinking, though, is it isn't really a row lock, since the delete can > delete N or more rows. Basically, the delete in session 1 is waiting for any > sort of other inserts or deletes to take place in the other session, before > it can commit. What makes me think two sessions holding write locks on different rows and waiting for each other: * the "TX" prefix in resource name: > Deadlock graph: > ---------Blocker(s)-------- > ---------Waiter(s)--------- > Resource Name process session holds waits process session holds > waits > TX-00060012-00080534 54 208 X 75 107 > X > TX-002d0005-0002ae45 75 107 X 54 208 > X * the "Rows waited on" section: > Rows waited on: > Session 208: obj - rowid = 00014DA7 - AAAVXjAAFAAES3IAAG > (dictionary objn - 85415, file - 5, block - 1125832, slot - 6) > Session 107: obj - rowid = 00014DA7 - AAAVXjAAFAAAJp/AAB > (dictionary objn - 85415, file - 5, block - 39551, slot - 1) I tried to figure out why the same query could lead to a different execution plan and one possible reason would be that one is using the index on resourceId and the other one is not.
Pull request sent and waiting for review: https://github.com/rhq-project/rhq/pull/132
Merged in master commit 8efb4f275b98d85a8bee2f12110dd955a1ebf67c Merge: 86968f0 1f08ab3 Author: Libor Zoubek <lzoubek> Date: Thu Nov 13 18:16:58 2014 +0100 Merge branch 'purgepurgepurge' of https://github.com/tsegismont/rhq into tsegismont-purgepurgepurge Conflicts: modules/core/dbutils/src/main/scripts/dbupgrade/db-upgrade.xml modules/enterprise/gui/coregui/src/main/resources/org/rhq/coregui/client/Messages_de.properties modules/enterprise/gui/coregui/src/main/resources/org/rhq/coregui/client/Messages_ja.properties modules/enterprise/gui/coregui/src/main/resources/org/rhq/coregui/client/Messages_ru.properties