Description of problem: This issue is very very rare, sometimes qpidd sometimes exits with ecode 1 sometime enters strange state. (I'm aware of weak description) In all cases I used --data-dir <path> where path was relative or absolute. Version-Release number of selected component (if applicable): seen on qpidd-0.4.738274-1.el4, rhm-0.4.3075-3.el4 [occurrence 2] and qpidd-0.4.743861-1.el5, rhm-0.4.3116-1.el5 [occurrence 1] How reproducible: <1% (very rarely) Steps to Reproduce: mkdir -p qpidd-data-dir.0000 ; qpidd --mgmt-enable yes --auth no --log-enable info+ --cluster-name qpid_mnode_cluster_test --data-dir qpidd-data-dir.0000 -p 0 in a loop or qpid_test_transaction integrity Actual results: qpidd stops or enters strange state Expected results: qpidd should behave correctly Additional info: occurence 1] mkdir -p qpidd-data-dir.0000 ; qpidd --mgmt-enable yes --auth no --log-enable info+ --cluster-name qpid_mnode_cluster_test --data-dir qpidd-data-dir.0000 -p 0 2009-feb-13 09:45:14 info Loaded Module: /usr/lib64/qpid/daemon/rdma.so 2009-feb-13 09:45:15 info Loaded Module: /usr/lib64/qpid/client/rdmaconnector.so 2009-feb-13 09:45:15 info SSL connector not enabled, you must set QPID_SSL_CERT_DB to enable it. 2009-feb-13 09:45:15 info Loaded Module: /usr/lib64/qpid/client/sslconnector.so 2009-feb-13 09:45:15 info Loaded Module: /usr/lib64/qpid/daemon/cluster.so 2009-feb-13 09:45:15 info Loaded Module: /usr/lib64/qpid/daemon/replicating_listener.so 2009-feb-13 09:45:16 info Loaded Module: /usr/lib64/qpid/daemon/msgstore.so 2009-feb-13 09:45:16 info Loaded Module: /usr/lib64/qpid/daemon/ssl.so 2009-feb-13 09:45:17 info Loaded Module: /usr/lib64/qpid/daemon/xml.so 2009-feb-13 09:45:17 info Loaded Module: /usr/lib64/qpid/daemon/replication_exchange.so 2009-feb-13 09:45:17 info Loaded Module: /usr/lib64/qpid/daemon/acl.so 2009-feb-13 09:45:17 critical Broker start-up failed: Cannot lock qpidd-data-dir.0000/lock: Resource temporarily unavailable 5f-a966-791a1a54d316 2009-feb-13 09:45:17 notice Journal "TplStore": Created 2009-feb-13 09:45:17 notice Store module initialized; dir=qpidd-data-dir.0000 2009-feb-13 09:45:17 info > Default files per journal: 8 2009-feb-13 09:45:17 info > Default journal file size: 24 (wpgs) 2009-feb-13 09:45:17 info > Default write cache page size: 32 (Kib) 2009-feb-13 09:45:17 info > Default number of write cache pages: 32 2009-feb-13 09:45:17 info > TPL files per journal: 8 2009-feb-13 09:45:17 info > TPL journal file size: 24 (wpgs) 2009-feb-13 09:45:17 info > TPL write cache page size: 4 (Kib) 2009-feb-13 09:45:17 info > TPL number of write cache pages: 64 2009-feb-13 09:45:18 notice SASL disabled: No Authentication Performed 2009-feb-13 09:45:18 notice Listening on TCP port 34523 libibverbs: Fatal: couldn't read uverbs ABI version. 2009-feb-13 09:45:18 info Rdma: Disabled: no rdma devices found 2009-feb-13 09:45:18 notice 10.16.64.106:728(INIT) joining cluster qpid_mnode_cluster_test with url=amqp:tcp:10.16.64.106:34523 2009-feb-13 09:45:18 info SSL plugin not enabled, you must set --qpid-ssl-cert-db to enable it. 2009-feb-13 09:45:18 info Registered xml exchange 2009-feb-13 09:45:18 info Registered replication exchange 2009-feb-13 09:45:18 info Policy file not specified. ACL Disabled, no ACL checking being done! 34523 2009-feb-13 09:45:18 notice Broker running 2009-feb-13 09:45:18 info 10.16.64.106:728(READY) member update: 10.16.64.106:728(member) 2009-feb-13 09:45:18 notice 10.16.64.106:728(READY) first in cluster 2009-feb-13 09:45:18 info 10.16.64.106:728(OFFER) send update-offer to 10.16.64.64:8392 2009-feb-13 09:45:18 info 10.16.64.106:728(UPDATER) stall for update to 10.16.64.64:8392 at amqp:tcp:10.16.64.64:48828 2009-feb-13 09:45:18 info Connecting to tcp:10.16.64.64:48828 2009-feb-13 09:45:18 info 10.16.64.106:728(UPDATER) sent update 2009-feb-13 09:45:18 info 10.16.64.106:728(OFFER) send update-offer to 10.16.65.59:5334 2009-feb-13 09:45:18 info 10.16.64.106:728(OFFER) member update: 10.16.65.59:5334(joiner) 10.16.64.64:8392(member) 10.16.64.106:728(member) 2009-feb-13 09:45:18 info 10.16.64.106:728(UPDATER) stall for update to 10.16.65.59:5334 at amqp:tcp:10.16.65.59:57392 2009-feb-13 09:45:18 info Connecting to tcp:10.16.65.59:57392 2009-feb-13 09:45:18 info 10.16.64.106:728(UPDATER) sent update 2009-feb-13 09:45:18 info 10.16.64.106:728(READY) member update: 10.16.65.59:5334(member) 10.16.64.64:8392(member) 10.16.64.106:728(member) 2009-feb-13 09:45:49 info 10.16.64.106:728(READY) member update: 10.16.65.59:5334(member) 10.16.64.106:728(member) 2009-feb-13 09:46:55 info 10.16.64.106:728(OFFER) send update-offer to 10.16.64.64:613 2009-feb-13 09:46:55 info 10.16.64.106:728(UPDATER) stall for update to 10.16.64.64:613 at amqp:tcp:10.16.64.64:43461 2009-feb-13 09:46:55 info Connecting to tcp:10.16.64.64:43461 occurrence 2] data in mrg3..:/root/qpid_test_transaction_integrity_0006_fails_rhel47_090202.tar.bz2 qpidd log qpid_test_transaction_integrity.0006/qpidd_txtest.transcript.log:8333 part here: [14:42:45] begin (run(A) 61/75 @step7] ) [14:42:45] Command: /usr/sbin/qpidd -p 5672 --auth no --log-enable info+ --data-dir /root/qpid_test_transaction_integrity.0006/rhts_qpidd/20090131_1 30614/broker.0252 2009-jan-31 14:42:45 info Loaded Module: /usr/lib/qpid/daemon/xml.so 2009-jan-31 14:42:45 info Loaded Module: /usr/lib/qpid/daemon/msgstore.so 2009-jan-31 14:42:45 info Loaded Module: /usr/lib/qpid/daemon/acl.so 2009-jan-31 14:42:45 info Loaded Module: /usr/lib/qpid/daemon/ssl.so 2009-jan-31 14:42:45 info Loaded Module: /usr/lib/qpid/daemon/replicating_listener.so 2009-jan-31 14:42:45 info Loaded Module: /usr/lib/qpid/daemon/replication_exchange.so 2009-jan-31 14:42:45 critical Broker start-up failed: Cannot lock /root/qpid_test_transaction_integrity.0006/rhts_qpidd/20090131_130614/broker.0252/l ock: Resource temporarily unavailable
Did this occur on an NFS filesystem?
I managed to trigger this on mrg2, launching qpidd with & instead of -d: /usr/sbin/qpidd -p 6672& /usr/sbin/qpidd -p 7672& /usr/sbin/qpidd -p 8672& ./runSample.sh org.apache.qpid.example.jmsexample.direct.Listener& ./runSample.sh org.apache.qpid.example.jmsexample.direct.Producer 1000000& sleep 1 /usr/sbin/qpidd -p 5672 -q sleep 1 /usr/sbin/qpidd -p 7672 -q sleep 1 /usr/sbin/qpidd -p 6672 -q sleep 1 /usr/sbin/qpidd -p 5672& sleep 1 /usr/sbin/qpidd -p 8672 -q Changing this to the following solved the problem: /usr/sbin/qpidd -p 6672 -d /usr/sbin/qpidd -p 7672 -d /usr/sbin/qpidd -p 8672 -d Reporting in case this gives any useful hints ...
The only feasible way you can get this error is if there is already a running qpidd which has the lock file locked already. So, given that if the taking the lock fails the broker startup immediately fails it might be that there is a slow to exit broker still running. It may be that we're not using the best available lock file approach.
The issue has been not seen for couple of months. The posts above and my observation leads me to the conclusion that multiple qpidd were run on the same data-directory. Anyone against for CLOSE/NOTABUG?
I think there is a very good chance that this bug was fixed by checkin r824894. Before this was fixed it was much more likely that tests started a new broker in the same data dir before the previous one had actually exited. So I'm changing to MODIFIED
My understanding of the scenarios from description and comment 2 is that in both cases there are run multiple brokers (in non-daemon mode) on the same data-dir directory. I believe this is incorrect. The long term test shows that - single instance run in non-daemon mode on background - single instance run in daemon mode are able to start-up / stop (qpidd -q / kill) from existing / non-existing data-dir. Multiple qpidd instances cannot share the same data-dir and thus I'm proposing CLOSED/NOTABUG as the issue resolution. Could someone review my above lines, please?
The review with Gordon proved that my understanding is correct. Therefore every qpidd instance should have it's dedicated data-dir storage. An attempt for running multiple qpidd instances on same data-dir rirectory causes "critical Broker start-up failed: Cannot lock .../lock: Resource temporarily unavailable" which is corrent behavior. Examination of above issue cases shows that multiple brokers were run on the same data-dir directory, which is incorrect. Testing proved that qpidd broker starts / stops well from empty / clean qpidd data-dir. I believe the above facts allow to close this defect as NOT-A-BUG. -> CLOSED/NOTABUG
No release note required as there was no real bug fixed here