Distributed transactional cache: 1. A sends Prepare to B 2. B receives Prepare, but due to ongoing ST it is blocked 3. B replication timeout elapses 4. B sends Rollback, this does not find the TX as Prepare was not executed yet. The transaction is put into completedTransactions. 5. Completed transactions timeout elapses. This is by default 15 seconds, way shorter than ST timeout (due to which the Prepare was blocked) 6. Prepare is executed on B, acquiring lock on K Nobody will rollback the TX as originator thinks it was already rolled back. Result: key K will be locked forever, all attempts to update/remove it will fail.
As a workaround transaction.completionTxTimeout should be increased to value larger than state transfer timeout. This option is not available for server mode, but transactions are not supported there anyway.