Description of problem: When a new cache view prepare fails and is rolled back (for example due to a TimeoutException), the state transfer lock is never released, causing all future operations to fail with a StateTransferInProgressException timeout.
Dan Berindei <dberinde> made a comment on jira ISPN-2989 If I remember correctly, we didn't unblock transactions on rollback because the coordinator was supposed to retry the cache view installation in less than 1 second, and re-acquiring the exclusive state transfer lock via StateTransferLock.blockNewTransactions was very expensive (because it had to wait on all the other running commands to finish). When a cache view installation eventually finished successfully, it would unblock the transactions. If the coordinator died, another node was supposed to pick up the coordinator role and install the new view, releasing the state transfer lock at the end. As such, I would close this issue as expected behaviour, and I would only try to fix the specific situations where the retry mechanism doesn't work properly.
Mircea Markus <mmarkus> updated the status of jira ISPN-2989 to Resolved
Mircea Markus <mmarkus> made a comment on jira ISPN-2989 see last comment.
Per #c1