Bug 491203 - "Timed out waiting for daemon" if recovery from journal takes a long time
Summary: "Timed out waiting for daemon" if recovery from journal takes a long time
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: qpid-cpp
Version: 1.1
Hardware: All
OS: Linux
high
medium
Target Milestone: 1.3
: ---
Assignee: Kim van der Riet
QA Contact: Jiri Kolar
URL:
Whiteboard:
Depends On:
Blocks: 527551
TreeView+ depends on / blocked
 
Reported: 2009-03-19 19:41 UTC by Gordon Sim
Modified: 2018-10-27 14:18 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
When running a broker as daemon (that is, using the "--daemon" command line option), the default timeout for its startup was set to 10 seconds. Because of this, having to recover a large storage may have prevented it from starting at all. With this update, the default timeout was increased to 10 minutes, so that the broker has enough time to recover even a large amount of data.
Clone Of:
Environment:
Last Closed: 2010-10-14 16:08:06 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2010:0773 0 normal SHIPPED_LIVE Moderate: Red Hat Enterprise MRG Messaging and Grid Version 1.3 2010-10-14 15:56:44 UTC

Description Gordon Sim 2009-03-19 19:41:51 UTC
If you have a large amount of data to recover from disk when restarting qpidd you can get:

Starting Qpid AMQP daemon: Timed out waiting for daemon
                                                           [FAILED]

when trying to start the qpidd service (or indeed any use of the --daemon mode). The workaround is to add e.g. QPIDD_OPTIONS="--wait 60" to /etc/sysconfig/qpidd to have the parent process wait longer for the forked child.

Comment 2 Kim van der Riet 2009-11-04 14:46:04 UTC
Looking at the code, there are not many options that are not disruptive.

1. It is not possible to get feedback as to the recovery without providing some sort of callback to the broker constructor, since the recovery is done within the broker construction. This seems messy and error-prone and would have to reach down to the store module (if it exists). So all we can do is wait for the constructor to finish, and this could under some conditions take a long time.

2. As the comment above suggests, setting an appropriate value for the timeout in the config file will fix the issue, but it does not work this way out of the box.

3. This comes down to a question of appropriate defaults. As this is the broker daemon starting, and there is no direct connection to the client, there is no reason why it should not wait longer in case of the possibility of an extended recovery. The startup can always be interrupted if needed.

This led to the solution of setting the timeout to 10 min instead of 10 sec. This should be long enough for most recoveries, but will not allow the startup to wait indefinitely if it should hang.

Changed timeout to 10 min in r. 832762

Comment 3 Kim van der Riet 2009-11-04 14:49:44 UTC
Release note added. If any revisions are required, please set the 
"requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

New Contents:
The default timeout for starting in daemon mode (ie using --daemon) is now 10 minutes (it was 10 sec, but was timing out if there was a large store to recover on startup). This allows time for possibly long store recoveries to be completed before the broker port is opened.

Comment 4 Lana Brindley 2009-11-23 20:33:14 UTC
Release note updated. If any revisions are required, please set the 
"requires_release_notes"  flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

Diffed Contents:
@@ -1 +1,9 @@
-The default timeout for starting in daemon mode (ie using --daemon) is now 10 minutes (it was 10 sec, but was timing out if there was a large store to recover on startup). This allows time for possibly long store recoveries to be completed before the broker port is opened.+Messaging bug fix
+
+C: The default timeout for starting the broker in daemon mode (using --daemon) was ten seconds.
+C: If there was a large store to be recovered on startup, the broker would time out before it could begin
+F: The default value has now been changed to ten minutes, to provide the broker enough time to recover any large amounts of data.
+R:  The broker now has time for possibly long store recoveries to be completed before the broker port is opened.
+
+
+The default timeout for starting the broker in daemon mode (using --daemon) was ten seconds. If there was a large store to be recovered on startup, the broker would time out before it could begin. The default value has now been changed to ten minutes. The broker now has time for possibly long store recoveries to be completed before the broker port is opened.

Comment 7 Jiri Kolar 2010-10-05 13:18:57 UTC
fixed in qpid-cpp-server-0.7.946106-17, daemon waits 10mins.

validated on RHEL5.5 / RHEL4.8  i386 / x86_64  

packages:
# rpm -qa | grep -E '(qpid|openais|rhm)' | sort -u

openais-0.80.6-16.el5_5.7
openais-devel-0.80.6-16.el5_5.7
python-qpid-0.7.946106-14.el5
qpid-cpp-client-0.7.946106-17.el5
qpid-cpp-client-devel-0.7.946106-17.el5
qpid-cpp-client-devel-docs-0.7.946106-17.el5
qpid-cpp-client-ssl-0.7.946106-17.el5
qpid-cpp-mrg-debuginfo-0.7.946106-14.el5
qpid-cpp-server-0.7.946106-17.el5
qpid-cpp-server-cluster-0.7.946106-17.el5
qpid-cpp-server-devel-0.7.946106-17.el5
qpid-cpp-server-ssl-0.7.946106-17.el5
qpid-cpp-server-store-0.7.946106-17.el5
qpid-cpp-server-xml-0.7.946106-17.el5
qpid-java-client-0.7.946106-10.el5
qpid-java-common-0.7.946106-10.el5
qpid-tools-0.7.946106-11.el5
rhm-docs-0.7.946106-5.el5
rh-tests-distribution-MRG-Messaging-qpid_common-1.6-53


->VERIFIED

Comment 8 Kim van der Riet 2010-10-05 15:06:07 UTC
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1,9 +1,7 @@
-Messaging bug fix
-
-C: The default timeout for starting the broker in daemon mode (using --daemon) was ten seconds.
-C: If there was a large store to be recovered on startup, the broker would time out before it could begin
-F: The default value has now been changed to ten minutes, to provide the broker enough time to recover any large amounts of data.
-R:  The broker now has time for possibly long store recoveries to be completed before the broker port is opened.
+Cause: The default timeout for starting the broker in daemon mode (using --daemon) was ten seconds.
+Consequence: If there was a large store to be recovered on startup, the broker would time out before it could begin
+Fix: The default value has now been changed to ten minutes, to provide the broker enough time to recover any large amounts of data.
+Result:  The broker now has time for possibly long store recoveries to be completed before the broker port is opened.
 
 
 The default timeout for starting the broker in daemon mode (using --daemon) was ten seconds. If there was a large store to be recovered on startup, the broker would time out before it could begin. The default value has now been changed to ten minutes. The broker now has time for possibly long store recoveries to be completed before the broker port is opened.

Comment 9 Jaromir Hradilek 2010-10-06 14:23:36 UTC
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1,7 +1 @@
-Cause: The default timeout for starting the broker in daemon mode (using --daemon) was ten seconds.
+When running a broker as daemon (that is, using the "--daemon" command line option), the default timeout for its startup was set to 10 seconds. Because of this, having to recover a large storage may have prevented it from starting at all. With this update, the default timeout was increased to 10 minutes, so that the broker has enough time to recover even a large amount of data.-Consequence: If there was a large store to be recovered on startup, the broker would time out before it could begin
-Fix: The default value has now been changed to ten minutes, to provide the broker enough time to recover any large amounts of data.
-Result:  The broker now has time for possibly long store recoveries to be completed before the broker port is opened.
-
-
-The default timeout for starting the broker in daemon mode (using --daemon) was ten seconds. If there was a large store to be recovered on startup, the broker would time out before it could begin. The default value has now been changed to ten minutes. The broker now has time for possibly long store recoveries to be completed before the broker port is opened.

Comment 11 errata-xmlrpc 2010-10-14 16:08:06 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2010-0773.html


Note You need to log in before you can comment on or make changes to this bug.