| Summary: | [Eng] (6.4.0) LargeMessages eventually left unattended after failures, making org.hornetq.tests.integration.cluster.failover.BackupSyncLargeMessageTest::testDeleteLargeMessages to intermittently fail | |||
|---|---|---|---|---|
| Product: | [JBoss] JBoss Enterprise Application Platform 6 | Reporter: | Clebert Suconic <csuconic> | |
| Component: | HornetQ | Assignee: | Clebert Suconic <csuconic> | |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Martin Svehla <msvehla> | |
| Severity: | medium | Docs Contact: | ||
| Priority: | unspecified | |||
| Version: | 6.2.0 | CC: | ataylor, csuconic, jawilson, kkhan, mnovak, msvehla | |
| Target Milestone: | DR1 | Flags: | dmichael:
needinfo?
(csuconic) |
|
| Target Release: | EAP 6.4.0 | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | Bug Fix | ||
| Doc Text: |
Cause:
The delete for a large message is done asynchronously after the journal record has been deleted.
Consequence:
Eventually after a failure files could be left unattended requiring manual intervention to delete them.
Fix:
When deleting a large message, we now add a temporary record on the journal and verify its existent on startup.
Result:
A few files on the large message on the folder after crashes.
You would need to crash the server after the issue was deleted and before the executor with a file.delete was finished.
|
Story Points: | --- | |
| Clone Of: | ||||
| : | 1132185 (view as bug list) | Environment: | ||
| Last Closed: | Type: | Bug | ||
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Bug Depends On: | 1016141 | |||
| Bug Blocks: | 1132168, 1132185 | |||
I added suggested test to BackupSyncLargeMessageTest, but it still fails after some time in 2.3.9 with java.lang.AssertionError: we really ought to delete these after delivery expected:<10> but was:<11> We have seen failures on this test on our runs... We could keep it to next version.. I investigated and the only scenario we could get was a test issue... Lets postpone this to next release? Agreed, this is not functional issue, so we can postpone. (Setting qa nack for now to indicate we're ok with not having this in 6.2.0, I'll ack it later when we have some timeline for fix.) im working on this Appears to be fixed by HQ upgrade to 2.3.21 https://bugzilla.redhat.com/show_bug.cgi?id=1132168. Setting to MODIFIED. |
Description of problem: Large messages not being deleted after failover on BackupSyncLargeMessageTest In a race condition, large message's journal record could be deleted before the file itself. You could eventually miss the delete command on replication and you won't have how to remove the files unless done manually. The cost / risk for this is small as the only thing happening is a file that's not deleted after failover on replicated state. The file can be removed manually after some time easily. Version-Release number of selected component (if applicable): How reproducible: On in 100 Steps to Reproduce: 1. Add a loop method on BackupSyncLargeMessageTest @Test public void testLoop() throws Exception { for (int i = 0; i < 1000; i++) { System.out.println("#test " + i); testDeleteLargeMessages(); tearDown(); setUp(); } } 2. Run the testLoop and watch it fail. Actual results: The test is failing. Expected results: The test shouldn't fail even after 1000 iterations Additional info: