Bug 491203 - "Timed out waiting for daemon" if recovery from journal takes a long time
"Timed out waiting for daemon" if recovery from journal takes a long time
Status: CLOSED ERRATA
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: qpid-cpp (Show other bugs)
1.1
All Linux
high Severity medium
: 1.3
: ---
Assigned To: Kim van der Riet
Jiri Kolar
:
Depends On:
Blocks: 527551
  Show dependency treegraph
 
Reported: 2009-03-19 15:41 EDT by Gordon Sim
Modified: 2010-10-14 12:08 EDT (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
When running a broker as daemon (that is, using the "--daemon" command line option), the default timeout for its startup was set to 10 seconds. Because of this, having to recover a large storage may have prevented it from starting at all. With this update, the default timeout was increased to 10 minutes, so that the broker has enough time to recover even a large amount of data.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2010-10-14 12:08:06 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Gordon Sim 2009-03-19 15:41:51 EDT
If you have a large amount of data to recover from disk when restarting qpidd you can get:

Starting Qpid AMQP daemon: Timed out waiting for daemon
                                                           [FAILED]

when trying to start the qpidd service (or indeed any use of the --daemon mode). The workaround is to add e.g. QPIDD_OPTIONS="--wait 60" to /etc/sysconfig/qpidd to have the parent process wait longer for the forked child.
Comment 2 Kim van der Riet 2009-11-04 09:46:04 EST
Looking at the code, there are not many options that are not disruptive.

1. It is not possible to get feedback as to the recovery without providing some sort of callback to the broker constructor, since the recovery is done within the broker construction. This seems messy and error-prone and would have to reach down to the store module (if it exists). So all we can do is wait for the constructor to finish, and this could under some conditions take a long time.

2. As the comment above suggests, setting an appropriate value for the timeout in the config file will fix the issue, but it does not work this way out of the box.

3. This comes down to a question of appropriate defaults. As this is the broker daemon starting, and there is no direct connection to the client, there is no reason why it should not wait longer in case of the possibility of an extended recovery. The startup can always be interrupted if needed.

This led to the solution of setting the timeout to 10 min instead of 10 sec. This should be long enough for most recoveries, but will not allow the startup to wait indefinitely if it should hang.

Changed timeout to 10 min in r. 832762
Comment 3 Kim van der Riet 2009-11-04 09:49:44 EST
Release note added. If any revisions are required, please set the 
"requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

New Contents:
The default timeout for starting in daemon mode (ie using --daemon) is now 10 minutes (it was 10 sec, but was timing out if there was a large store to recover on startup). This allows time for possibly long store recoveries to be completed before the broker port is opened.
Comment 4 Lana Brindley 2009-11-23 15:33:14 EST
Release note updated. If any revisions are required, please set the 
"requires_release_notes"  flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

Diffed Contents:
@@ -1 +1,9 @@
-The default timeout for starting in daemon mode (ie using --daemon) is now 10 minutes (it was 10 sec, but was timing out if there was a large store to recover on startup). This allows time for possibly long store recoveries to be completed before the broker port is opened.+Messaging bug fix
+
+C: The default timeout for starting the broker in daemon mode (using --daemon) was ten seconds.
+C: If there was a large store to be recovered on startup, the broker would time out before it could begin
+F: The default value has now been changed to ten minutes, to provide the broker enough time to recover any large amounts of data.
+R:  The broker now has time for possibly long store recoveries to be completed before the broker port is opened.
+
+
+The default timeout for starting the broker in daemon mode (using --daemon) was ten seconds. If there was a large store to be recovered on startup, the broker would time out before it could begin. The default value has now been changed to ten minutes. The broker now has time for possibly long store recoveries to be completed before the broker port is opened.
Comment 7 Jiri Kolar 2010-10-05 09:18:57 EDT
fixed in qpid-cpp-server-0.7.946106-17, daemon waits 10mins.

validated on RHEL5.5 / RHEL4.8  i386 / x86_64  

packages:
# rpm -qa | grep -E '(qpid|openais|rhm)' | sort -u

openais-0.80.6-16.el5_5.7
openais-devel-0.80.6-16.el5_5.7
python-qpid-0.7.946106-14.el5
qpid-cpp-client-0.7.946106-17.el5
qpid-cpp-client-devel-0.7.946106-17.el5
qpid-cpp-client-devel-docs-0.7.946106-17.el5
qpid-cpp-client-ssl-0.7.946106-17.el5
qpid-cpp-mrg-debuginfo-0.7.946106-14.el5
qpid-cpp-server-0.7.946106-17.el5
qpid-cpp-server-cluster-0.7.946106-17.el5
qpid-cpp-server-devel-0.7.946106-17.el5
qpid-cpp-server-ssl-0.7.946106-17.el5
qpid-cpp-server-store-0.7.946106-17.el5
qpid-cpp-server-xml-0.7.946106-17.el5
qpid-java-client-0.7.946106-10.el5
qpid-java-common-0.7.946106-10.el5
qpid-tools-0.7.946106-11.el5
rhm-docs-0.7.946106-5.el5
rh-tests-distribution-MRG-Messaging-qpid_common-1.6-53


->VERIFIED
Comment 8 Kim van der Riet 2010-10-05 11:06:07 EDT
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1,9 +1,7 @@
-Messaging bug fix
-
-C: The default timeout for starting the broker in daemon mode (using --daemon) was ten seconds.
-C: If there was a large store to be recovered on startup, the broker would time out before it could begin
-F: The default value has now been changed to ten minutes, to provide the broker enough time to recover any large amounts of data.
-R:  The broker now has time for possibly long store recoveries to be completed before the broker port is opened.
+Cause: The default timeout for starting the broker in daemon mode (using --daemon) was ten seconds.
+Consequence: If there was a large store to be recovered on startup, the broker would time out before it could begin
+Fix: The default value has now been changed to ten minutes, to provide the broker enough time to recover any large amounts of data.
+Result:  The broker now has time for possibly long store recoveries to be completed before the broker port is opened.
 
 
 The default timeout for starting the broker in daemon mode (using --daemon) was ten seconds. If there was a large store to be recovered on startup, the broker would time out before it could begin. The default value has now been changed to ten minutes. The broker now has time for possibly long store recoveries to be completed before the broker port is opened.
Comment 9 Jaromir Hradilek 2010-10-06 10:23:36 EDT
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1,7 +1 @@
-Cause: The default timeout for starting the broker in daemon mode (using --daemon) was ten seconds.
+When running a broker as daemon (that is, using the "--daemon" command line option), the default timeout for its startup was set to 10 seconds. Because of this, having to recover a large storage may have prevented it from starting at all. With this update, the default timeout was increased to 10 minutes, so that the broker has enough time to recover even a large amount of data.-Consequence: If there was a large store to be recovered on startup, the broker would time out before it could begin
-Fix: The default value has now been changed to ten minutes, to provide the broker enough time to recover any large amounts of data.
-Result:  The broker now has time for possibly long store recoveries to be completed before the broker port is opened.
-
-
-The default timeout for starting the broker in daemon mode (using --daemon) was ten seconds. If there was a large store to be recovered on startup, the broker would time out before it could begin. The default value has now been changed to ten minutes. The broker now has time for possibly long store recoveries to be completed before the broker port is opened.
Comment 11 errata-xmlrpc 2010-10-14 12:08:06 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2010-0773.html

Note You need to log in before you can comment on or make changes to this bug.