This release of JBoss EAP 6 carries a bug that shows incorrect transaction statistics when recovery is used when processing in-doubt prepared transactions.
The total count of processed transaction is incorrectly increased prior to a crash of the server and also when the recovery fixes the in-doubt state after the server is restarted. In these cases, a transaction could be counted twice.
This issue is under investingation and is expected to be resolved in a future release of the product.
Created attachment 809970 [details]
Screenshot from the transactions statistic
When recovery of transaction is provided then calculation of the transaction statistics does not work well. You can get percentage of commited transactions over 100%.
I think that's because of the recovered transaction is not counted to the sum of all transactions but when they are committed then the commit number is increased.
You can check the attachment what the statistics shows in my case.
Heiko Braun <firstname.lastname@example.org> updated the status of jira HAL-437 to Coding In Progress
the problem is actually not related to the GUI. here's an explanation why we get to see bogus numbers:
hbraun: jhalliday: it seems QE tests recovery scenarios. Does that ring a bell?
[10:50am] jhalliday: hmm. if you're measuring the stats post crash only then you can't expect them to be equal. the sum of the stats from the container runs pre and post crash should be though. essentially the crash resets the counters since it's just in-memory.
[10:50am] hbraun: jhalliday: ah, makes sense
[10:51am] hbraun: jhalliday: you mean post crash only the commited value will be increased?
[10:52am] jhalliday: in the pre crash segment of the test the total count is increased, since that is done at tx begin. in the post crash segment the commit count is increased by recovery. overall they stay equal, but if you look at only one or the other you'll get weirdness.
[10:52am] hbraun: ok, i think that explains it
IMO this a not a bug, but a limitation of the management layer in general. However I am assigning it to Brian to comment in it.
Heiko Braun <email@example.com> updated the status of jira HAL-437 to Resolved
Reassigning to the subsystem component as this is purely a subsystem issue, either in what data the underlying component provides or in how what the subsystem exposes based on that.
I put flag requires_doc_text to ? as I think that it should be documented in known issues. I put some *draft text* to the doc text field. Would you be so kind and revise it?
As Heiko and Jonathan point out, we initialize statistics during boot so recovering transactions that were started by previous runs will not have a corresponding begin. The complexity involved in tracking transactions created by prior runs (we would have to modify the structure of our transaction log records) does not warrant changing the semantics of "number of transactions" versus "number of commits". As long as it is clear what these two statistics mean then there is no issue and we can simply mark it as a "feature" of how transaction statistics are reported and reject the bug accordingly.
I fully understand your point just I have to say that I would rather have this fixed than leave it as "feature".
If there is no way how to fix it would be probably then the best way to create jira feature request and document this for EAP.
But before that I would have (maybe silly as I do not understand the guts) question/points.
Would not help to get increased number of committed transaction at the end of commit phase of 2PC? It seems to me that it's increased at the beginning of the commit phase of 2PC and it causes the trouble here.
If it's not good idea then, please, would be possible at least somelike handle (do workaround) that webconsole statistics would not show numbers over 100%?
This is not a bug in the code but rather it is people misinterpreting the meaning of the statistic (in which case, as you suggest, we need a doc JIRA for it).
If we didn't reset the counters during boot then they would eventually wrap which is not helpful when most app monitors are expecting the counters to apply to the current application server instance. It would also cause discrepancies when we have proxy recovery (where transactions are recovered elsewhere from where the transactions are created).
If we artificially cap how the statistic is reported in the admin console then that would loose valid information.
I did not understand your point about "It seems to me that it's increased at the beginning of the commit phase of 2PC and it causes the trouble here". I checked the code and we only update the stat after the transaction has successfully ended.
Yeap, it was just my assumption from the test point of view.
The test runs like:
1. prepare phase
1a. prepare first XA resource
1b. prepare second XA resource
2. commit phase
2a. crash of server
3. recover phase
3a. commit first XA resource
3b. commit second XA resource
The console shows 200% as result of committed transaction which should mean that total number of transaction is counted as 1 and number of committed transactions is counted as 2.
My point was why the number of committed transaction is increased. I could be wrong but I assumed that it's sometime during 2PC and not at the full end.
Question is why the commit of transaction is counted twice?
I test it with 6.3.1.CP.CR1 (Narayana 4.17.22.Final) and I'm getting 118% when test finishes. It means that recovery is fully finished and there should not be reason why to get such numbers. I think that #c4 is not explanation of my problem.
Mike and I chatted about this on Tuesday and we can look to add something into Narayana like this: https://github.com/jbosstm/narayana/pull/719
That being said, its quite artificial. All it does is increment the total number of transactions started if we determine this is a recovery scenario.
Ondra, maybe you could take a look and let me know if this is what you are looking for?
Hi Tom, Hi Mike,
that is what I was talking about, yes. Just I was rethinking my point of view. Especially I started to be scared on word 'artificial' :) I agree that such fix would change the way that Narayana worked with statistics for years.
I hope that I wasn't too prim.
The thing that I was fighting for is fact that I can get number over 100%.
If you would not be against I would suggest to put this issue back to 'WebConsole' and ask for changing format of showing transaction statistics. Percents are not the good way how to display statistics of transactions. I would, then, ask for putting some information as user hint which would explain why the statistics could be a bit "strange" after server crash.
Would that be better solution for you or do you consider other one?
I agree with your recommendation, thanks for pursuing this Ondra.
Per discussion above closing this bz as not a bug and creating new bz#1138561 to change the mode how the statistics are displayed in web console.