Bug 456272 - Broker fails to create journal directory - File already exists
Broker fails to create journal directory - File already exists
Status: CLOSED ERRATA
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: qpid-cpp (Show other bugs)
1.0
All Linux
medium Severity high
: 1.0.1
: ---
Assigned To: Kim van der Riet
Kim van der Riet
:
Depends On:
Blocks: 460109
  Show dependency treegraph
 
Reported: 2008-07-22 11:38 EDT by David Sommerseth
Modified: 2016-05-22 19:27 EDT (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-10-06 15:09:09 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description David Sommerseth 2008-07-22 11:38:22 EDT
<davids> kpvdr:  I've lately seen a lot of "Directory creation failed"
exceptions .... with the explanation "File exists" ... is that something you
know about?
<kpvdr> davids: no
<davids> kpvdr:  2008-jul-20 15:41:
<davids> 35 error Unexpected exception: Queue
33c46d96-1601-4f6e-801d-8ed5f71c5e3e@guest: create() failed: jexception 0x0301
jdir::create_dir() threw JERR_JDIR_MKDIR: Directory creation failed.
<davids>  (dir="/tmp/rhts_qpidd/qpid-data/pt_broker.568/rhm/jrnl/0017" errno=17
(File exists)) (BdbMessageStore.cpp:356)
<davids> kpvdr:  on one box I have this 3-4 times using the RHTS test script ...
and the broker is started all the times from scratch in a new empty data directory
<davids> kpvdr:  Can I do something in another way to pin-point where/how/why it
happens?
<kpvdr> davids: thinking
<kpvdr> davids: what is the test doing?
<davids> kpvdr:  [15:41:17] Running perftest in topic mode (with storage): 1
iterations with 25000 msgs. Msg size: 64 bytes.  Extra test params: --nsubs 10
--qt 4 --durable yes
<kpvdr> davids: Hmmm, this is odd
<davids> kpvdr:  it happens only on topic tests
<kpvdr> davids: I have an idea on this..
<kpvdr> davids: ie when more than one journal maps into the same dir - in this
case 0017
Comment 1 Kim van der Riet 2008-07-24 16:33:38 EDT
I have been unable to reproduce this error. I have eliminated the following
possibilities:

1. Directory permission: this results in a different error message;
2. Directory exists: this works fine and the test completes with up to 4 queues
per directory;
3. Too many files handles: This results in a different error message on file
creation, not dir creation.

Since the code checks for the existence of a dir prior to creating it, the only
explanation for this error is a thread safety issue - ie two threads happen to
create the same dir at the same time. Examination of the code shows that the
current algorithm for creating the first level dir uses a simple hash of the
queue name to create one of 20 possible dirs. There exists for random dir names
a 5% probability that a second equally paced thread may attempt to to create the
same top-level dir at the same time for another queue.

Although I have not reproduced the error, I am checking in a fix for this
oversight in the hope that it will eliminate this bug. By not checking for dir
existence, and allowing for a possible duplicate would solve this problem
without the need for a lock.

I will leave this assigned for a little while longer and see if the bug can be
reproduced on the RHTS hardware which originally found this problem.

r2215 on trunk; r2216 on 1.0 branch
Comment 2 David Sommerseth 2008-07-29 14:44:34 EDT
Reproducer is now available from mrg-team SVN ... mrg-team/people/dsommers/bz456272

This reproducer works somehow on hp-xw4800-01.rhts.bos.redhat.com.  Starting
qpidd with: --auth no --tpl-wcache-page-size 128 --tpl-jfile-size-pgs 32
--num-jfiles 16 --jfile-size-pgs 32

In another screen, run this command:

  $ (find /usr -type f -exec cat {} \; > /tmp/filedata.dat) & python ./bz456272.py


It seems this issue arises much more often when the disk is busy with work.  The
fail rate is somewhat around 10% with this script on this box.
Comment 3 David Sommerseth 2008-07-30 09:45:42 EDT
With the latest qpid and storage module from SVN (qpid.0-10, mrg-1.0) this bug
seems to be fixed.
Comment 5 Frantisek Reznicek 2008-08-29 03:21:27 EDT
RHTS test developed (MRG/qpid_broker_jfail_bz456272).
Test results comming soon.
Comment 6 Frantisek Reznicek 2008-09-04 06:16:47 EDT
RHTS test (MRG/qpid_broker_jfail_bz456272) shows no more qpidd 'file already exists fails'. Bug going to VERIFIED.

Last results show that MRG/qpid_broker_jfail_bz456272 test is unstable, sometimes fails on disconnecting queues from broker. This behavior is under investigation on QA and might be reported to DEV as new bug.
Comment 8 errata-xmlrpc 2008-10-06 15:09:09 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2008-0640.html

Note You need to log in before you can comment on or make changes to this bug.