Created attachment 875493 [details] JMSCrashRec Description of problem: In some cases, after crash recovery scenarios, there is a remaining "uid" in TX log. This happens only for JTS. For JTA it passes constantly. Version-Release number of selected component (if applicable): EAP 6.2.0, EAP 6.3.0 DR2 How reproducible: not constantly, not JDK related Steps to Reproduce: 1. Call StlessSB on server 1 which will send message to it's own queue. 2. In server, add mock test XA resource into transaction before sending message. 3. Crash transaction on server 1 when entering "commit" method of test XA resource. 4. Reboot the server, call recovery for server. 5. Check that message sent to server queue is committed. 6. Check that server does not have any remaining "uid" in tX log. Here is the fail. Server has remaining "uid". Actual results: Server has remaining "uid". Expected results: Server should not have remaining "uid". Additional info: Please find the log file attached. Here is the project git repo: http://git.app.eng.bos.redhat.com/git/jbossqe/eap-tests-transactions.git it is under 'master'. For running the scenario locally you need to change directory "..../eap-tests-transactions/integration/jbossts" and run "mvn clean verify -Dtest=JMSMdbCrashRecoveryTestCase#commitHaltRev -Djbossts.hqobjectstore -Djboss.dist=${eap-6.3-home}"
Does this only fail on hqstore?
It fails for standard store as well.
Hi Hayk, Sorry for the delay. I can explain what is happening. With JTS we have what is known as top down and bottom up recovery. When a resource calls replay completion on the coordinator the return value tells it whether to commit or not. Simultaneously the coordinator takes the opportunity to complete the entire transaction. Therefore there is a small race between the (threaded) coordinator and the resources recovery manager to complete the resource. If the coordinator completes the resource, it will be able to know the outcome and automatically clean up its transaction log. If the resource completes itself, the coordinator when it tries to gets an receives an warning status so leaves the transaction in the store. After 3 attempts to commit the transaction and get OBJECT_NOT_EXIST a transaction is assumed to have fully committed its resources. In the debugger it looks like depending on timing it is easy for this counter to not reach 3 so the entries will still be in the object store. Each time a branch completes the counter is reset and in total you only have 3 recovery scans so by default it should be impossible for (bottom-up completed resources) recovery to remove the entry. It only passes when top-down recovery won the race. Tom
Hi Tom, Thanks for the detailed description. So what can be the solution or workaround here not to leave any uid in log? I tried to call "recovery" 3 times, assuming that after 3 attempts it will consider as fully committed and log will be emptied, but it is still there. /Hayk
Hi Hayk, _After_ it has recovered the HQ x2 and TestXAResource, if you have three recovery calls it should be fine. You don't need the minute wait between recovery scans if you are calling it yourself I wouldn't think. Tom
Did you try three recovery calls?
Yes it calls recovery 3 times, and still the problem exists.
The problem was in test framework. Thanks Gytis for doing research on this.