Bug 1136020 - Incomplete durable queues created : AIO error
Summary: Incomplete durable queues created : AIO error
Keywords:
Status: NEW
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: qpid-cpp
Version: Development
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: 3.3
: ---
Assignee: Kim van der Riet
QA Contact: Messaging QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-09-01 13:10 UTC by Valiantsina Hubeika
Modified: 2024-01-19 19:11 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1159281 0 high CLOSED Mention request to increase ulimit of nofiles for larger deployments 2021-02-22 00:41:40 UTC
Red Hat Knowledge Base (Solution) 1425893 0 None None None Never

Internal Links: 1159281

Description Valiantsina Hubeika 2014-09-01 13:10:07 UTC
Description of problem:

After AIO resources get exhausted, an attempt to create a durable queue fails but the queue is listed as created.

# qpid-config add queue queue_aio_fail --argument durable=True

Failed: Exception: Exception from Agent: {u'error_code': 7, u'error_text': 'Queue queue_aio_fail: create() failed: jexception 0x0103 pmgr::initialize() threw JERR__AIO: AIO error. (io_queue_init() failed:  errno=11 (Resource temporarily unavailable)) (/builddir/build/BUILD/qpid-0.22/cpp/src/qpid/linearstore/MessageStoreImpl.cpp:421)'}

# qpid-stat -q | grep queue_aio_fail
  queue_aio_fail                            Y                      0     0      0       0      0        0         0     0

Journal is created for this queue:

root@mrg-jca-rhel6i_1:/var/dtests/node_data/clients# ls /var/lib/qpidd/qls/jrnl/queue_aio_fail/
7d6203b8-0fdf-45b4-a435-e9987592e221.jrnl

After broker restart, corrupted queue doesnt appear. 

Version-Release number of selected component (if applicable):
qpid-cpp-0.22-47


How reproducible:
100%

Steps to Reproduce:
1. exhaust fs.aio-max-nr
2. create a durable queue
qpid-config add queue queue_aio_fail --argument durable=True
3. check queue for existence
qpid-config queues queue_aio_fail
4. check journal

Actual results:
queue is listed as created, journal is created

Expected results:
no queue created, neither the journal

Additional info:

Comment 1 Valiantsina Hubeika 2014-09-01 14:31:29 UTC
info:

 root@mrg-jca-rhel6i_1:/var/dtests/node_data/clients# sysctl -a | grep fs.aio
fs.aio-nr = 65505
fs.aio-max-nr = 65536

 root@mrg-jca-rhel6i_1:/var/dtests/node_data/clients# for i in `seq 1 2000`;do qpid-config add queue Q$i --argument durable=True; done

Comment 2 Frantisek Reznicek 2014-09-02 12:10:14 UTC
Further information (as I was asked for review for Vienna urgency):

Qpidd survives this testing scenario, just after around 1982 created queues refuse to create new one.

Retested on latest MRG/M 2.x (qpid-cpp-*0.18-25.el6.x86_64) with similar results including seeing 'failed queues' in qpid-stat -q (see the bottom)


Note this situation is IMHO similar as with kernel file open limit. (ulimit -n).
The key is that qpidd keeps running.


As result, I'm keeping 3.1 and will double-check with kpvdr/jross.



MRG/M 2.5 details
[root@localhost gamma]# service qpidd restart
Stopping Qpid AMQP daemon:                                 [  OK  ]
Starting Qpid AMQP daemon: 2014-09-02 14:04:05 [Broker] debug Forked daemon child process
                                                           [  OK  ]
[root@localhost gamma]# for i in `seq 1 2000`;do qpid-config add queue Q$i --argument durable=True; done
Failed: Exception: Exception from Agent: {u'error_code': 7, u'error_text': 'Queue Q157: create() failed: jexception 0x0103 pmgr::initialize() threw JERR__AIO: AIO error. (io_queue_init() failed:  errno=11 (Resource temporarily unavailable)) (MessageStoreImpl.cpp:539)'}

Comment 3 Frantisek Reznicek 2014-09-03 20:08:00 UTC
Further clarifications after yesterday's discussion on 3.x call.

This defect does not track that qpidd should reuse AIO resources / more clever resource management.

The core of the problem is the non-atomicity of durable queue creation, atomicity should be established and in case one of the requirements (in this case AIO resource, but there are couple of others I believe) is not met then exception should be raised (working already) AND rollback needs to trash all created objects (journal files, QMF objects, ...)

Comment 5 Pavel Moravec 2015-04-27 15:12:36 UTC
Hi Kim,
could you please provide formula how many AIO requests are required by one durable queue?

This is necessary to know when scaling qpid e.g. in Satellite.

Thanks in advance.

Comment 6 Pavel Moravec 2015-05-04 17:34:36 UTC
(In reply to Pavel Moravec from comment #5)
> Hi Kim,
> could you please provide formula how many AIO requests are required by one
> durable queue?
> 
> This is necessary to know when scaling qpid e.g. in Satellite.
> 
> Thanks in advance.

Checking by myself: one durable queue consumes 33 AIO requests, i.e. creating one durable queue, fs.aio-nr is increased by 33.


Note You need to log in before you can comment on or make changes to this bug.