Bug 485429 - qpidd stopped by critical Broker start-up failed: Cannot lock ... Resource temporarily unavailable
Summary: qpidd stopped by critical Broker start-up failed: Cannot lock ... Resource te...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: qpid-cpp
Version: 1.1
Hardware: All
OS: Linux
high
high
Target Milestone: 1.3
: ---
Assignee: Andrew Stitcher
QA Contact: Frantisek Reznicek
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-02-13 15:30 UTC by Frantisek Reznicek
Modified: 2015-11-16 00:06 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-09-29 12:50:44 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2010:0773 0 normal SHIPPED_LIVE Moderate: Red Hat Enterprise MRG Messaging and Grid Version 1.3 2010-10-14 15:56:44 UTC

Description Frantisek Reznicek 2009-02-13 15:30:16 UTC
Description of problem:
This issue is very very rare, sometimes qpidd sometimes exits with ecode 1 sometime enters strange state. (I'm aware of weak description)

In all cases I used --data-dir <path> where path was relative or absolute.


Version-Release number of selected component (if applicable):
seen on
qpidd-0.4.738274-1.el4, rhm-0.4.3075-3.el4  [occurrence 2]
and
qpidd-0.4.743861-1.el5, rhm-0.4.3116-1.el5  [occurrence 1]

How reproducible:
<1% (very rarely)

Steps to Reproduce:
mkdir -p qpidd-data-dir.0000 ; qpidd --mgmt-enable yes --auth no --log-enable info+ --cluster-name qpid_mnode_cluster_test --data-dir qpidd-data-dir.0000 -p 0

in a loop or
qpid_test_transaction integrity
  
Actual results:
qpidd stops or enters strange state

Expected results:
qpidd should behave correctly

Additional info:

occurence 1]

mkdir -p qpidd-data-dir.0000 ; qpidd --mgmt-enable yes --auth no --log-enable info+ --cluster-name qpid_mnode_cluster_test --data-dir qpidd-data-dir.0000 -p 0

2009-feb-13 09:45:14 info Loaded Module: /usr/lib64/qpid/daemon/rdma.so
2009-feb-13 09:45:15 info Loaded Module: /usr/lib64/qpid/client/rdmaconnector.so
2009-feb-13 09:45:15 info SSL connector not enabled, you must set QPID_SSL_CERT_DB to enable it.
2009-feb-13 09:45:15 info Loaded Module: /usr/lib64/qpid/client/sslconnector.so
2009-feb-13 09:45:15 info Loaded Module: /usr/lib64/qpid/daemon/cluster.so
2009-feb-13 09:45:15 info Loaded Module: /usr/lib64/qpid/daemon/replicating_listener.so
2009-feb-13 09:45:16 info Loaded Module: /usr/lib64/qpid/daemon/msgstore.so
2009-feb-13 09:45:16 info Loaded Module: /usr/lib64/qpid/daemon/ssl.so
2009-feb-13 09:45:17 info Loaded Module: /usr/lib64/qpid/daemon/xml.so
2009-feb-13 09:45:17 info Loaded Module: /usr/lib64/qpid/daemon/replication_exchange.so
2009-feb-13 09:45:17 info Loaded Module: /usr/lib64/qpid/daemon/acl.so
2009-feb-13 09:45:17 critical Broker start-up failed: Cannot lock qpidd-data-dir.0000/lock: Resource temporarily unavailable
5f-a966-791a1a54d316
2009-feb-13 09:45:17 notice Journal "TplStore": Created
2009-feb-13 09:45:17 notice Store module initialized; dir=qpidd-data-dir.0000
2009-feb-13 09:45:17 info > Default files per journal: 8
2009-feb-13 09:45:17 info > Default journal file size: 24 (wpgs)
2009-feb-13 09:45:17 info > Default write cache page size: 32 (Kib)
2009-feb-13 09:45:17 info > Default number of write cache pages: 32
2009-feb-13 09:45:17 info > TPL files per journal: 8
2009-feb-13 09:45:17 info > TPL journal file size: 24 (wpgs)
2009-feb-13 09:45:17 info > TPL write cache page size: 4 (Kib)
2009-feb-13 09:45:17 info > TPL number of write cache pages: 64
2009-feb-13 09:45:18 notice SASL disabled: No Authentication Performed
2009-feb-13 09:45:18 notice Listening on TCP port 34523
libibverbs: Fatal: couldn't read uverbs ABI version.
2009-feb-13 09:45:18 info Rdma: Disabled: no rdma devices found
2009-feb-13 09:45:18 notice 10.16.64.106:728(INIT) joining cluster qpid_mnode_cluster_test with url=amqp:tcp:10.16.64.106:34523
2009-feb-13 09:45:18 info SSL plugin not enabled, you must set --qpid-ssl-cert-db to enable it.
2009-feb-13 09:45:18 info Registered xml exchange
2009-feb-13 09:45:18 info Registered replication exchange
2009-feb-13 09:45:18 info Policy file not specified. ACL Disabled, no ACL checking being done!
34523
2009-feb-13 09:45:18 notice Broker running
2009-feb-13 09:45:18 info 10.16.64.106:728(READY) member update: 10.16.64.106:728(member)
2009-feb-13 09:45:18 notice 10.16.64.106:728(READY) first in cluster
2009-feb-13 09:45:18 info 10.16.64.106:728(OFFER) send update-offer to 10.16.64.64:8392
2009-feb-13 09:45:18 info 10.16.64.106:728(UPDATER) stall for update to 10.16.64.64:8392 at amqp:tcp:10.16.64.64:48828
2009-feb-13 09:45:18 info Connecting to tcp:10.16.64.64:48828
2009-feb-13 09:45:18 info 10.16.64.106:728(UPDATER) sent update
2009-feb-13 09:45:18 info 10.16.64.106:728(OFFER) send update-offer to 10.16.65.59:5334
2009-feb-13 09:45:18 info 10.16.64.106:728(OFFER) member update: 10.16.65.59:5334(joiner) 10.16.64.64:8392(member) 10.16.64.106:728(member)
2009-feb-13 09:45:18 info 10.16.64.106:728(UPDATER) stall for update to 10.16.65.59:5334 at amqp:tcp:10.16.65.59:57392
2009-feb-13 09:45:18 info Connecting to tcp:10.16.65.59:57392
2009-feb-13 09:45:18 info 10.16.64.106:728(UPDATER) sent update
2009-feb-13 09:45:18 info 10.16.64.106:728(READY) member update: 10.16.65.59:5334(member) 10.16.64.64:8392(member) 10.16.64.106:728(member)
2009-feb-13 09:45:49 info 10.16.64.106:728(READY) member update: 10.16.65.59:5334(member) 10.16.64.106:728(member)
2009-feb-13 09:46:55 info 10.16.64.106:728(OFFER) send update-offer to 10.16.64.64:613
2009-feb-13 09:46:55 info 10.16.64.106:728(UPDATER) stall for update to 10.16.64.64:613 at amqp:tcp:10.16.64.64:43461
2009-feb-13 09:46:55 info Connecting to tcp:10.16.64.64:43461

occurrence 2]

data in mrg3..:/root/qpid_test_transaction_integrity_0006_fails_rhel47_090202.tar.bz2
qpidd log
qpid_test_transaction_integrity.0006/qpidd_txtest.transcript.log:8333

part here:
[14:42:45] begin (run(A) 61/75 @step7]   )
[14:42:45] Command:  /usr/sbin/qpidd -p 5672 --auth no --log-enable info+ --data-dir /root/qpid_test_transaction_integrity.0006/rhts_qpidd/20090131_1
30614/broker.0252
2009-jan-31 14:42:45 info Loaded Module: /usr/lib/qpid/daemon/xml.so
2009-jan-31 14:42:45 info Loaded Module: /usr/lib/qpid/daemon/msgstore.so
2009-jan-31 14:42:45 info Loaded Module: /usr/lib/qpid/daemon/acl.so
2009-jan-31 14:42:45 info Loaded Module: /usr/lib/qpid/daemon/ssl.so
2009-jan-31 14:42:45 info Loaded Module: /usr/lib/qpid/daemon/replicating_listener.so
2009-jan-31 14:42:45 info Loaded Module: /usr/lib/qpid/daemon/replication_exchange.so
2009-jan-31 14:42:45 critical Broker start-up failed: Cannot lock /root/qpid_test_transaction_integrity.0006/rhts_qpidd/20090131_130614/broker.0252/l
ock: Resource temporarily unavailable

Comment 1 Gordon Sim 2009-02-23 17:33:01 UTC
Did this occur on an NFS filesystem?

Comment 2 Jonathan Robie 2009-04-17 14:34:23 UTC
I managed to trigger this on mrg2, launching qpidd with & instead of -d:

/usr/sbin/qpidd -p 6672&
/usr/sbin/qpidd -p 7672&
/usr/sbin/qpidd -p 8672&

./runSample.sh org.apache.qpid.example.jmsexample.direct.Listener&
./runSample.sh org.apache.qpid.example.jmsexample.direct.Producer 1000000&

sleep 1
/usr/sbin/qpidd -p 5672 -q
sleep 1
/usr/sbin/qpidd -p 7672 -q
sleep 1
/usr/sbin/qpidd -p 6672 -q
sleep 1
/usr/sbin/qpidd -p 5672&
sleep 1
/usr/sbin/qpidd -p 8672 -q

Changing this to the following solved the problem:

/usr/sbin/qpidd -p 6672 -d
/usr/sbin/qpidd -p 7672 -d
/usr/sbin/qpidd -p 8672 -d

Reporting in case this gives any useful hints ...

Comment 3 Andrew Stitcher 2009-05-26 16:51:40 UTC
The only feasible way you can get this error is if there is already a running qpidd which has the lock file locked already.

So, given that if the taking the lock fails the broker startup immediately fails it might be that there is a slow to exit broker still running.

It may be that we're not using the best available lock file approach.

Comment 5 Frantisek Reznicek 2009-12-15 12:02:47 UTC
The issue has been not seen for couple of months. The posts above and my observation leads me to the conclusion that multiple qpidd were run on the same data-directory.

Anyone against for CLOSE/NOTABUG?

Comment 6 Andrew Stitcher 2009-12-15 15:35:33 UTC
I think there is a very good chance that this bug was fixed by checkin r824894. Before this was fixed it was much more likely that tests started a new broker in the same data dir before the previous one had actually exited.

So I'm changing to MODIFIED

Comment 8 Frantisek Reznicek 2010-09-29 12:19:20 UTC
My understanding of the scenarios from description and comment 2 is that in both cases there are run multiple brokers (in non-daemon mode) on the same data-dir directory.

I believe this is incorrect.

The long term test shows that
- single instance run in non-daemon mode on background
- single instance run in daemon mode

are able to start-up / stop (qpidd -q / kill) from existing / non-existing data-dir.

Multiple qpidd instances cannot share the same data-dir and thus I'm proposing CLOSED/NOTABUG as the issue resolution.

Could someone review my above lines, please?

Comment 9 Frantisek Reznicek 2010-09-29 12:50:44 UTC
The review with Gordon proved that my understanding is correct.

Therefore every qpidd instance should have it's dedicated data-dir storage. An attempt for running multiple qpidd instances on same data-dir rirectory causes 
 "critical Broker start-up failed: Cannot lock .../lock: Resource temporarily unavailable" which is corrent behavior.

Examination of above issue cases shows that multiple brokers were run on the same data-dir directory, which is incorrect.

Testing proved that qpidd broker starts / stops well from empty / clean qpidd data-dir.

I believe the above facts allow to close this defect as NOT-A-BUG.

-> CLOSED/NOTABUG

Comment 10 Andrew Stitcher 2010-10-11 19:27:08 UTC
No release note required as there was no real bug fixed here


Note You need to log in before you can comment on or make changes to this bug.