Bug 485429 - qpidd stopped by critical Broker start-up failed: Cannot lock ... Resource temporarily unavailable
qpidd stopped by critical Broker start-up failed: Cannot lock ... Resource te...
Status: CLOSED NOTABUG
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: qpid-cpp (Show other bugs)
1.1
All Linux
high Severity high
: 1.3
: ---
Assigned To: Andrew Stitcher
Frantisek Reznicek
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2009-02-13 10:30 EST by Frantisek Reznicek
Modified: 2015-11-15 19:06 EST (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2010-09-29 08:50:44 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Frantisek Reznicek 2009-02-13 10:30:16 EST
Description of problem:
This issue is very very rare, sometimes qpidd sometimes exits with ecode 1 sometime enters strange state. (I'm aware of weak description)

In all cases I used --data-dir <path> where path was relative or absolute.


Version-Release number of selected component (if applicable):
seen on
qpidd-0.4.738274-1.el4, rhm-0.4.3075-3.el4  [occurrence 2]
and
qpidd-0.4.743861-1.el5, rhm-0.4.3116-1.el5  [occurrence 1]

How reproducible:
<1% (very rarely)

Steps to Reproduce:
mkdir -p qpidd-data-dir.0000 ; qpidd --mgmt-enable yes --auth no --log-enable info+ --cluster-name qpid_mnode_cluster_test --data-dir qpidd-data-dir.0000 -p 0

in a loop or
qpid_test_transaction integrity
  
Actual results:
qpidd stops or enters strange state

Expected results:
qpidd should behave correctly

Additional info:

occurence 1]

mkdir -p qpidd-data-dir.0000 ; qpidd --mgmt-enable yes --auth no --log-enable info+ --cluster-name qpid_mnode_cluster_test --data-dir qpidd-data-dir.0000 -p 0

2009-feb-13 09:45:14 info Loaded Module: /usr/lib64/qpid/daemon/rdma.so
2009-feb-13 09:45:15 info Loaded Module: /usr/lib64/qpid/client/rdmaconnector.so
2009-feb-13 09:45:15 info SSL connector not enabled, you must set QPID_SSL_CERT_DB to enable it.
2009-feb-13 09:45:15 info Loaded Module: /usr/lib64/qpid/client/sslconnector.so
2009-feb-13 09:45:15 info Loaded Module: /usr/lib64/qpid/daemon/cluster.so
2009-feb-13 09:45:15 info Loaded Module: /usr/lib64/qpid/daemon/replicating_listener.so
2009-feb-13 09:45:16 info Loaded Module: /usr/lib64/qpid/daemon/msgstore.so
2009-feb-13 09:45:16 info Loaded Module: /usr/lib64/qpid/daemon/ssl.so
2009-feb-13 09:45:17 info Loaded Module: /usr/lib64/qpid/daemon/xml.so
2009-feb-13 09:45:17 info Loaded Module: /usr/lib64/qpid/daemon/replication_exchange.so
2009-feb-13 09:45:17 info Loaded Module: /usr/lib64/qpid/daemon/acl.so
2009-feb-13 09:45:17 critical Broker start-up failed: Cannot lock qpidd-data-dir.0000/lock: Resource temporarily unavailable
5f-a966-791a1a54d316
2009-feb-13 09:45:17 notice Journal "TplStore": Created
2009-feb-13 09:45:17 notice Store module initialized; dir=qpidd-data-dir.0000
2009-feb-13 09:45:17 info > Default files per journal: 8
2009-feb-13 09:45:17 info > Default journal file size: 24 (wpgs)
2009-feb-13 09:45:17 info > Default write cache page size: 32 (Kib)
2009-feb-13 09:45:17 info > Default number of write cache pages: 32
2009-feb-13 09:45:17 info > TPL files per journal: 8
2009-feb-13 09:45:17 info > TPL journal file size: 24 (wpgs)
2009-feb-13 09:45:17 info > TPL write cache page size: 4 (Kib)
2009-feb-13 09:45:17 info > TPL number of write cache pages: 64
2009-feb-13 09:45:18 notice SASL disabled: No Authentication Performed
2009-feb-13 09:45:18 notice Listening on TCP port 34523
libibverbs: Fatal: couldn't read uverbs ABI version.
2009-feb-13 09:45:18 info Rdma: Disabled: no rdma devices found
2009-feb-13 09:45:18 notice 10.16.64.106:728(INIT) joining cluster qpid_mnode_cluster_test with url=amqp:tcp:10.16.64.106:34523
2009-feb-13 09:45:18 info SSL plugin not enabled, you must set --qpid-ssl-cert-db to enable it.
2009-feb-13 09:45:18 info Registered xml exchange
2009-feb-13 09:45:18 info Registered replication exchange
2009-feb-13 09:45:18 info Policy file not specified. ACL Disabled, no ACL checking being done!
34523
2009-feb-13 09:45:18 notice Broker running
2009-feb-13 09:45:18 info 10.16.64.106:728(READY) member update: 10.16.64.106:728(member)
2009-feb-13 09:45:18 notice 10.16.64.106:728(READY) first in cluster
2009-feb-13 09:45:18 info 10.16.64.106:728(OFFER) send update-offer to 10.16.64.64:8392
2009-feb-13 09:45:18 info 10.16.64.106:728(UPDATER) stall for update to 10.16.64.64:8392 at amqp:tcp:10.16.64.64:48828
2009-feb-13 09:45:18 info Connecting to tcp:10.16.64.64:48828
2009-feb-13 09:45:18 info 10.16.64.106:728(UPDATER) sent update
2009-feb-13 09:45:18 info 10.16.64.106:728(OFFER) send update-offer to 10.16.65.59:5334
2009-feb-13 09:45:18 info 10.16.64.106:728(OFFER) member update: 10.16.65.59:5334(joiner) 10.16.64.64:8392(member) 10.16.64.106:728(member)
2009-feb-13 09:45:18 info 10.16.64.106:728(UPDATER) stall for update to 10.16.65.59:5334 at amqp:tcp:10.16.65.59:57392
2009-feb-13 09:45:18 info Connecting to tcp:10.16.65.59:57392
2009-feb-13 09:45:18 info 10.16.64.106:728(UPDATER) sent update
2009-feb-13 09:45:18 info 10.16.64.106:728(READY) member update: 10.16.65.59:5334(member) 10.16.64.64:8392(member) 10.16.64.106:728(member)
2009-feb-13 09:45:49 info 10.16.64.106:728(READY) member update: 10.16.65.59:5334(member) 10.16.64.106:728(member)
2009-feb-13 09:46:55 info 10.16.64.106:728(OFFER) send update-offer to 10.16.64.64:613
2009-feb-13 09:46:55 info 10.16.64.106:728(UPDATER) stall for update to 10.16.64.64:613 at amqp:tcp:10.16.64.64:43461
2009-feb-13 09:46:55 info Connecting to tcp:10.16.64.64:43461

occurrence 2]

data in mrg3..:/root/qpid_test_transaction_integrity_0006_fails_rhel47_090202.tar.bz2
qpidd log
qpid_test_transaction_integrity.0006/qpidd_txtest.transcript.log:8333

part here:
[14:42:45] begin (run(A) 61/75 @step7]   )
[14:42:45] Command:  /usr/sbin/qpidd -p 5672 --auth no --log-enable info+ --data-dir /root/qpid_test_transaction_integrity.0006/rhts_qpidd/20090131_1
30614/broker.0252
2009-jan-31 14:42:45 info Loaded Module: /usr/lib/qpid/daemon/xml.so
2009-jan-31 14:42:45 info Loaded Module: /usr/lib/qpid/daemon/msgstore.so
2009-jan-31 14:42:45 info Loaded Module: /usr/lib/qpid/daemon/acl.so
2009-jan-31 14:42:45 info Loaded Module: /usr/lib/qpid/daemon/ssl.so
2009-jan-31 14:42:45 info Loaded Module: /usr/lib/qpid/daemon/replicating_listener.so
2009-jan-31 14:42:45 info Loaded Module: /usr/lib/qpid/daemon/replication_exchange.so
2009-jan-31 14:42:45 critical Broker start-up failed: Cannot lock /root/qpid_test_transaction_integrity.0006/rhts_qpidd/20090131_130614/broker.0252/l
ock: Resource temporarily unavailable
Comment 1 Gordon Sim 2009-02-23 12:33:01 EST
Did this occur on an NFS filesystem?
Comment 2 Jonathan Robie 2009-04-17 10:34:23 EDT
I managed to trigger this on mrg2, launching qpidd with & instead of -d:

/usr/sbin/qpidd -p 6672&
/usr/sbin/qpidd -p 7672&
/usr/sbin/qpidd -p 8672&

./runSample.sh org.apache.qpid.example.jmsexample.direct.Listener&
./runSample.sh org.apache.qpid.example.jmsexample.direct.Producer 1000000&

sleep 1
/usr/sbin/qpidd -p 5672 -q
sleep 1
/usr/sbin/qpidd -p 7672 -q
sleep 1
/usr/sbin/qpidd -p 6672 -q
sleep 1
/usr/sbin/qpidd -p 5672&
sleep 1
/usr/sbin/qpidd -p 8672 -q

Changing this to the following solved the problem:

/usr/sbin/qpidd -p 6672 -d
/usr/sbin/qpidd -p 7672 -d
/usr/sbin/qpidd -p 8672 -d

Reporting in case this gives any useful hints ...
Comment 3 Andrew Stitcher 2009-05-26 12:51:40 EDT
The only feasible way you can get this error is if there is already a running qpidd which has the lock file locked already.

So, given that if the taking the lock fails the broker startup immediately fails it might be that there is a slow to exit broker still running.

It may be that we're not using the best available lock file approach.
Comment 5 Frantisek Reznicek 2009-12-15 07:02:47 EST
The issue has been not seen for couple of months. The posts above and my observation leads me to the conclusion that multiple qpidd were run on the same data-directory.

Anyone against for CLOSE/NOTABUG?
Comment 6 Andrew Stitcher 2009-12-15 10:35:33 EST
I think there is a very good chance that this bug was fixed by checkin r824894. Before this was fixed it was much more likely that tests started a new broker in the same data dir before the previous one had actually exited.

So I'm changing to MODIFIED
Comment 8 Frantisek Reznicek 2010-09-29 08:19:20 EDT
My understanding of the scenarios from description and comment 2 is that in both cases there are run multiple brokers (in non-daemon mode) on the same data-dir directory.

I believe this is incorrect.

The long term test shows that
- single instance run in non-daemon mode on background
- single instance run in daemon mode

are able to start-up / stop (qpidd -q / kill) from existing / non-existing data-dir.

Multiple qpidd instances cannot share the same data-dir and thus I'm proposing CLOSED/NOTABUG as the issue resolution.

Could someone review my above lines, please?
Comment 9 Frantisek Reznicek 2010-09-29 08:50:44 EDT
The review with Gordon proved that my understanding is correct.

Therefore every qpidd instance should have it's dedicated data-dir storage. An attempt for running multiple qpidd instances on same data-dir rirectory causes 
 "critical Broker start-up failed: Cannot lock .../lock: Resource temporarily unavailable" which is corrent behavior.

Examination of above issue cases shows that multiple brokers were run on the same data-dir directory, which is incorrect.

Testing proved that qpidd broker starts / stops well from empty / clean qpidd data-dir.

I believe the above facts allow to close this defect as NOT-A-BUG.

-> CLOSED/NOTABUG
Comment 10 Andrew Stitcher 2010-10-11 15:27:08 EDT
No release note required as there was no real bug fixed here

Note You need to log in before you can comment on or make changes to this bug.