Bug 711799 - Durable queue creation: failure is not fully rolled-back
Summary: Durable queue creation: failure is not fully rolled-back
Keywords:
Status: CLOSED CANTFIX
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: qpid-cpp
Version: 2.0
Hardware: x86_64
OS: Linux
low
medium
Target Milestone: 2.1.1
: ---
Assignee: Kim van der Riet
QA Contact: MRG Quality Engineering
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-06-08 14:24 UTC by Pavel Moravec
Modified: 2011-11-10 16:45 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-11-09 20:58:46 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Pavel Moravec 2011-06-08 14:24:10 UTC
Description of problem:
When creating a durable queue and an error occurs (see below for details), the queue is not created but the transaction is not fully rolled back. One such particular case is when creating a durable queue demanding more disk space than available: queue is not created but some records for it remain after the failure. So e.g. qpidd process restart fails.


Version-Release number of selected component (if applicable):
Any (tested on MRG 1.3 (qpidd 0.7) and 2.0 (qpidd 0.10) ).

How reproducible:
100%

Steps to Reproduce:
1. Almost fill your disk (leave very few MB free):
# df
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/mapper/VolGroup00-LogVol00
                       5999168   5675952     13560 100% /

2. Try to add a big durable queue:
# qpid-config add queue DurableQueue --durable --file-count=100 --file-size=100
Failed: SessionException: (None, 'Queue DurableQueue: create() failed: jexception 0x0401 fcntl::clean_file() threw JERR_FCNTL_WRITE: Unable to write to file. (wr_size=2097152 errno=28 (No space left on device)) (MessageStoreImpl.cpp:533)')
#

3. Check that the queue has not been created:
# qpid-config queues | grep DurableQueue
#

4. Restart qpidd - you will fail:
 # service qpidd restart
Stopping Qpid AMQP daemon:                                 [  OK  ]
Starting Qpid AMQP daemon: rm to_delete.txtDaemon startup failed: BDB exception occurred while initializing store (MessageStoreImpl.cpp:373): DbEnv::open: No space left on device
                                                           [FAILED]
#

Actual results:
Step 4 fails in restaring qpidd process, though the queue has not been created.

Expected results:
qpidd restart is successfull, /var/lib/qpidd/rhm/jrnl/ is not affected by step 2.

Additional info:
qpidd also works weirdly when before restarting it one frees disk space. Then it is able to be started but with orphaned file /var/lib/qpidd/rhm/jrnl/000c/DurableQueue/JournalData.0000.jdat .. (the file is re-used when creating DurableQueue again, but still..)

Comment 2 Kim van der Riet 2011-11-09 20:33:31 UTC
I have not tried to reproduce this. However, I can make a few comments...

If the disk runs out of space, then all bets are off as far as the consistency and recoverability of the store is concerned. I don't think this is a condition we guarantee.

In this particular case, it looks as though BDB cannot open an environment because it needs disk space to do this. If some additional space were to be freed, would it start then? I can't say if this error is simply a not-enough-space error or if the database itself is corrupted. Unlike the async store, BDB files grow almost continuously, even for failed actions, so it is possible that the attempt at adding a queue used up the last of the disk space.

The message store itself should not be affected by disk space issues provided no new queues are added. Each queue has store files associated with it which are fully formatted and don't grow in size (assuming, of course, that these were created prior to the full disk condition). However, if a new queue is added, then the store will fail.

On the face of it, I am not certain this is a bug, or simply expected behaviour.

Comment 3 Justin Ross 2011-11-09 20:58:46 UTC
Thanks, Kim.  Marking this closed.  Pavel, please reopen if you feel this is in error.

Comment 4 Pavel Moravec 2011-11-10 16:45:33 UTC
I am fine with closing it, as 1) it is rather mis-configuration issue (there should be enough disk space for journals), and 2) if one frees some space, qpidd restart will be successfull (though the orphaned files would remain there forever - until a queue of the same name will be created).


Note You need to log in before you can comment on or make changes to this bug.