Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 711799

Summary:	Durable queue creation: failure is not fully rolled-back
Product:	Red Hat Enterprise MRG	Reporter:	Pavel Moravec <pmoravec>
Component:	qpid-cpp	Assignee:	Kim van der Riet <kim.vdriet>
Status:	CLOSED CANTFIX	QA Contact:	MRG Quality Engineering <mrgqe-bugs>
Severity:	medium	Docs Contact:
Priority:	low
Version:	2.0	CC:	jross
Target Milestone:	2.1.1
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2011-11-09 20:58:46 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Pavel Moravec 2011-06-08 14:24:10 UTC

Description of problem:
When creating a durable queue and an error occurs (see below for details), the queue is not created but the transaction is not fully rolled back. One such particular case is when creating a durable queue demanding more disk space than available: queue is not created but some records for it remain after the failure. So e.g. qpidd process restart fails.


Version-Release number of selected component (if applicable):
Any (tested on MRG 1.3 (qpidd 0.7) and 2.0 (qpidd 0.10) ).

How reproducible:
100%

Steps to Reproduce:
1. Almost fill your disk (leave very few MB free):
# df
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/mapper/VolGroup00-LogVol00
                       5999168   5675952     13560 100% /

2. Try to add a big durable queue:
# qpid-config add queue DurableQueue --durable --file-count=100 --file-size=100
Failed: SessionException: (None, 'Queue DurableQueue: create() failed: jexception 0x0401 fcntl::clean_file() threw JERR_FCNTL_WRITE: Unable to write to file. (wr_size=2097152 errno=28 (No space left on device)) (MessageStoreImpl.cpp:533)')
#

3. Check that the queue has not been created:
# qpid-config queues | grep DurableQueue
#

4. Restart qpidd - you will fail:
 # service qpidd restart
Stopping Qpid AMQP daemon:                                 [  OK  ]
Starting Qpid AMQP daemon: rm to_delete.txtDaemon startup failed: BDB exception occurred while initializing store (MessageStoreImpl.cpp:373): DbEnv::open: No space left on device
                                                           [FAILED]
#

Actual results:
Step 4 fails in restaring qpidd process, though the queue has not been created.

Expected results:
qpidd restart is successfull, /var/lib/qpidd/rhm/jrnl/ is not affected by step 2.

Additional info:
qpidd also works weirdly when before restarting it one frees disk space. Then it is able to be started but with orphaned file /var/lib/qpidd/rhm/jrnl/000c/DurableQueue/JournalData.0000.jdat .. (the file is re-used when creating DurableQueue again, but still..)

Comment 2 Kim van der Riet 2011-11-09 20:33:31 UTC

I have not tried to reproduce this. However, I can make a few comments...

If the disk runs out of space, then all bets are off as far as the consistency and recoverability of the store is concerned. I don't think this is a condition we guarantee.

In this particular case, it looks as though BDB cannot open an environment because it needs disk space to do this. If some additional space were to be freed, would it start then? I can't say if this error is simply a not-enough-space error or if the database itself is corrupted. Unlike the async store, BDB files grow almost continuously, even for failed actions, so it is possible that the attempt at adding a queue used up the last of the disk space.

The message store itself should not be affected by disk space issues provided no new queues are added. Each queue has store files associated with it which are fully formatted and don't grow in size (assuming, of course, that these were created prior to the full disk condition). However, if a new queue is added, then the store will fail.

On the face of it, I am not certain this is a bug, or simply expected behaviour.

Comment 3 Justin Ross 2011-11-09 20:58:46 UTC

Thanks, Kim.  Marking this closed.  Pavel, please reopen if you feel this is in error.

Comment 4 Pavel Moravec 2011-11-10 16:45:33 UTC

I am fine with closing it, as 1) it is rather mis-configuration issue (there should be enough disk space for journals), and 2) if one frees some space, qpidd restart will be successfull (though the orphaned files would remain there forever - until a queue of the same name will be created).