Hide Forgot
Description of problem: When creating a durable queue and an error occurs (see below for details), the queue is not created but the transaction is not fully rolled back. One such particular case is when creating a durable queue demanding more disk space than available: queue is not created but some records for it remain after the failure. So e.g. qpidd process restart fails. Version-Release number of selected component (if applicable): Any (tested on MRG 1.3 (qpidd 0.7) and 2.0 (qpidd 0.10) ). How reproducible: 100% Steps to Reproduce: 1. Almost fill your disk (leave very few MB free): # df Filesystem 1K-blocks Used Available Use% Mounted on /dev/mapper/VolGroup00-LogVol00 5999168 5675952 13560 100% / 2. Try to add a big durable queue: # qpid-config add queue DurableQueue --durable --file-count=100 --file-size=100 Failed: SessionException: (None, 'Queue DurableQueue: create() failed: jexception 0x0401 fcntl::clean_file() threw JERR_FCNTL_WRITE: Unable to write to file. (wr_size=2097152 errno=28 (No space left on device)) (MessageStoreImpl.cpp:533)') # 3. Check that the queue has not been created: # qpid-config queues | grep DurableQueue # 4. Restart qpidd - you will fail: # service qpidd restart Stopping Qpid AMQP daemon: [ OK ] Starting Qpid AMQP daemon: rm to_delete.txtDaemon startup failed: BDB exception occurred while initializing store (MessageStoreImpl.cpp:373): DbEnv::open: No space left on device [FAILED] # Actual results: Step 4 fails in restaring qpidd process, though the queue has not been created. Expected results: qpidd restart is successfull, /var/lib/qpidd/rhm/jrnl/ is not affected by step 2. Additional info: qpidd also works weirdly when before restarting it one frees disk space. Then it is able to be started but with orphaned file /var/lib/qpidd/rhm/jrnl/000c/DurableQueue/JournalData.0000.jdat .. (the file is re-used when creating DurableQueue again, but still..)
I have not tried to reproduce this. However, I can make a few comments... If the disk runs out of space, then all bets are off as far as the consistency and recoverability of the store is concerned. I don't think this is a condition we guarantee. In this particular case, it looks as though BDB cannot open an environment because it needs disk space to do this. If some additional space were to be freed, would it start then? I can't say if this error is simply a not-enough-space error or if the database itself is corrupted. Unlike the async store, BDB files grow almost continuously, even for failed actions, so it is possible that the attempt at adding a queue used up the last of the disk space. The message store itself should not be affected by disk space issues provided no new queues are added. Each queue has store files associated with it which are fully formatted and don't grow in size (assuming, of course, that these were created prior to the full disk condition). However, if a new queue is added, then the store will fail. On the face of it, I am not certain this is a bug, or simply expected behaviour.
Thanks, Kim. Marking this closed. Pavel, please reopen if you feel this is in error.
I am fine with closing it, as 1) it is rather mis-configuration issue (there should be enough disk space for journals), and 2) if one frees some space, qpidd restart will be successfull (though the orphaned files would remain there forever - until a queue of the same name will be created).