Bug 456272
| Summary: | Broker fails to create journal directory - File already exists | ||
|---|---|---|---|
| Product: | Red Hat Enterprise MRG | Reporter: | David Sommerseth <davids> |
| Component: | qpid-cpp | Assignee: | Kim van der Riet <kim.vdriet> |
| Status: | CLOSED ERRATA | QA Contact: | Kim van der Riet <kim.vdriet> |
| Severity: | high | Docs Contact: | |
| Priority: | medium | ||
| Version: | 1.0 | CC: | freznice, ovasik |
| Target Milestone: | 1.0.1 | ||
| Target Release: | --- | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2008-10-06 19:09:09 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 460109 | ||
|
Description
David Sommerseth
2008-07-22 15:38:22 UTC
I have been unable to reproduce this error. I have eliminated the following possibilities: 1. Directory permission: this results in a different error message; 2. Directory exists: this works fine and the test completes with up to 4 queues per directory; 3. Too many files handles: This results in a different error message on file creation, not dir creation. Since the code checks for the existence of a dir prior to creating it, the only explanation for this error is a thread safety issue - ie two threads happen to create the same dir at the same time. Examination of the code shows that the current algorithm for creating the first level dir uses a simple hash of the queue name to create one of 20 possible dirs. There exists for random dir names a 5% probability that a second equally paced thread may attempt to to create the same top-level dir at the same time for another queue. Although I have not reproduced the error, I am checking in a fix for this oversight in the hope that it will eliminate this bug. By not checking for dir existence, and allowing for a possible duplicate would solve this problem without the need for a lock. I will leave this assigned for a little while longer and see if the bug can be reproduced on the RHTS hardware which originally found this problem. r2215 on trunk; r2216 on 1.0 branch Reproducer is now available from mrg-team SVN ... mrg-team/people/dsommers/bz456272 This reproducer works somehow on hp-xw4800-01.rhts.bos.redhat.com. Starting qpidd with: --auth no --tpl-wcache-page-size 128 --tpl-jfile-size-pgs 32 --num-jfiles 16 --jfile-size-pgs 32 In another screen, run this command: $ (find /usr -type f -exec cat {} \; > /tmp/filedata.dat) & python ./bz456272.py It seems this issue arises much more often when the disk is busy with work. The fail rate is somewhat around 10% with this script on this box. With the latest qpid and storage module from SVN (qpid.0-10, mrg-1.0) this bug seems to be fixed. RHTS test developed (MRG/qpid_broker_jfail_bz456272). Test results comming soon. RHTS test (MRG/qpid_broker_jfail_bz456272) shows no more qpidd 'file already exists fails'. Bug going to VERIFIED. Last results show that MRG/qpid_broker_jfail_bz456272 test is unstable, sometimes fails on disconnecting queues from broker. This behavior is under investigation on QA and might be reported to DEV as new bug. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2008-0640.html |