Bug 460109 - Broker fails to create journal directory - File already exists (RHEL 4)
Broker fails to create journal directory - File already exists (RHEL 4)
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: qpid-cpp (Show other bugs)
All Linux
medium Severity high
: 1.0.1
: ---
Assigned To: messaging-bugs
Kim van der Riet
Depends On: 456272
  Show dependency treegraph
Reported: 2008-08-26 04:46 EDT by Gordon Sim
Modified: 2008-10-06 15:00 EDT (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2008-10-06 15:00:00 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description Gordon Sim 2008-08-26 04:46:15 EDT
+++ This bug was initially created as a clone of Bug #456272 +++

<davids> kpvdr:  I've lately seen a lot of "Directory creation failed"
exceptions .... with the explanation "File exists" ... is that something you
know about?
<kpvdr> davids: no
<davids> kpvdr:  2008-jul-20 15:41:
<davids> 35 error Unexpected exception: Queue
33c46d96-1601-4f6e-801d-8ed5f71c5e3e@guest: create() failed: jexception 0x0301
jdir::create_dir() threw JERR_JDIR_MKDIR: Directory creation failed.
<davids>  (dir="/tmp/rhts_qpidd/qpid-data/pt_broker.568/rhm/jrnl/0017" errno=17
(File exists)) (BdbMessageStore.cpp:356)
<davids> kpvdr:  on one box I have this 3-4 times using the RHTS test script ...
and the broker is started all the times from scratch in a new empty data directory
<davids> kpvdr:  Can I do something in another way to pin-point where/how/why it
<kpvdr> davids: thinking
<kpvdr> davids: what is the test doing?
<davids> kpvdr:  [15:41:17] Running perftest in topic mode (with storage): 1
iterations with 25000 msgs. Msg size: 64 bytes.  Extra test params: --nsubs 10
--qt 4 --durable yes
<kpvdr> davids: Hmmm, this is odd
<davids> kpvdr:  it happens only on topic tests
<kpvdr> davids: I have an idea on this..
<kpvdr> davids: ie when more than one journal maps into the same dir - in this
case 0017

--- Additional comment from kim.vdriet@redhat.com on 2008-07-24 16:33:38 EDT ---

I have been unable to reproduce this error. I have eliminated the following

1. Directory permission: this results in a different error message;
2. Directory exists: this works fine and the test completes with up to 4 queues
per directory;
3. Too many files handles: This results in a different error message on file
creation, not dir creation.

Since the code checks for the existence of a dir prior to creating it, the only
explanation for this error is a thread safety issue - ie two threads happen to
create the same dir at the same time. Examination of the code shows that the
current algorithm for creating the first level dir uses a simple hash of the
queue name to create one of 20 possible dirs. There exists for random dir names
a 5% probability that a second equally paced thread may attempt to to create the
same top-level dir at the same time for another queue.

Although I have not reproduced the error, I am checking in a fix for this
oversight in the hope that it will eliminate this bug. By not checking for dir
existence, and allowing for a possible duplicate would solve this problem
without the need for a lock.

I will leave this assigned for a little while longer and see if the bug can be
reproduced on the RHTS hardware which originally found this problem.

r2215 on trunk; r2216 on 1.0 branch

--- Additional comment from davids@redhat.com on 2008-07-29 14:44:34 EDT ---

Reproducer is now available from mrg-team SVN ... mrg-team/people/dsommers/bz456272

This reproducer works somehow on hp-xw4800-01.rhts.bos.redhat.com.  Starting
qpidd with: --auth no --tpl-wcache-page-size 128 --tpl-jfile-size-pgs 32
--num-jfiles 16 --jfile-size-pgs 32

In another screen, run this command:

  $ (find /usr -type f -exec cat {} \; > /tmp/filedata.dat) & python ./bz456272.py

It seems this issue arises much more often when the disk is busy with work.  The
fail rate is somewhat around 10% with this script on this box.

--- Additional comment from davids@redhat.com on 2008-07-30 09:45:42 EDT ---

With the latest qpid and storage module from SVN (qpid.0-10, mrg-1.0) this bug
seems to be fixed.
Comment 2 Frantisek Reznicek 2008-08-29 03:22:07 EDT
RHTS test developed (MRG/qpid_broker_jfail_bz456272).
Test results comming soon.
Comment 3 Frantisek Reznicek 2008-09-04 06:17:55 EDT
RHTS test (MRG/qpid_broker_jfail_bz456272) shows no more qpidd 'file already exists fails'. Bug going to VERIFIED.

Last results show that MRG/qpid_broker_jfail_bz456272 test is unstable, sometimes fails on disconnecting queues from broker. This behavior is under investigation on QA and might be reported to DEV as new bug.
Comment 5 errata-xmlrpc 2008-10-06 15:00:00 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.


Note You need to log in before you can comment on or make changes to this bug.