When the services are set up with 'chkconfig <service> on' to start on system boot with the default ordering values, cumin starts right after the PostgreSQL and does not notice the database is still initializing. cumin-0.1.4560-1.el5 and many before How reproducible: 100% on system boot Steps to Reproduce: 1. Install and setup cumin along with PgSQL 2. Reboot the machine Actual results: See Cumin was not started during boot. ------------------------------------------------ ... Starting ntpd: [ OK ] Starting postgresql service: [ OK ] Cumin's database is not yet installed Run 'cumin-database install' as root Starting Sesame daemon: [ OK ] ... ------------------------------------------------ Expected results: Cumin started correctly during boot. Additional info: Either the database check should be changed to honor a not-yet-started SQL master or the order in which Cumin starts by default should be moved.
I am having trouble reproducing this, unless I skip the "cumin-dabase install" step after installing the packages. So after the machine boots, and you discover that cumin is not running, what do you have to do to make it run? Please change the following line and run again. This is the call that is generating the error based on the text above; the output may be helpful: line 23 in /etc/init.d/cumin, remove the "&> /dev/null", so it looks like cumin-database check || {
(In reply to comment #2) > I am having trouble reproducing this, unless I skip the "cumin-dabase install" > step after installing the packages. that would be "cumin-database install", actually :)
I am sorry, the reproducibility is not 100%. A while ago I reboted five times and two of them were positive (cumin did not start).
Trevor, would it be possible to simply change the default chkconfig start order for cumin so that it starts sometime after qpidd? I am sure that would clear out any PostgreSQL timing issues we are coping with ATM. BTW the recipe (the export-users part) has nothing to do with it, I can reproduce without it as well.
(In reply to comment #6) > Trevor, would it be possible to simply change the default chkconfig > start order for cumin so that it starts sometime after qpidd? I am sure > that would clear out any PostgreSQL timing issues we are coping with ATM. > Jan, This is certainly possible but I would like to avoid this and try to find out what the root cause is. Moving cumin further away in time from Postgres masks the underlying issue, and it could bite us again later. However, if you can verify that this actually works, then we can use it as a fallback until we can find the root cause. Will you run some tests with the cumin start pushed further toward the end of the init sequence, and verify that the issue goes away?
Yes, I will do that today and report back with results.
It seems everything behaves normally when cumin starts last (99). A simple toggle script: ------------------------------------------------------------------- if grep "chkconfig:" /etc/init.d/cumin | grep -q 80 then sed -i '/chkconfig:/ c# chkconfig: 2345 99 30' /etc/init.d/cumin else sed -i '/chkconfig:/ c# chkconfig: 2345 80 30' /etc/init.d/cumin fi chkconfig cumin off chkconfig cumin on ls /etc/rc3.d/*cumin -------------------------------------------------------------------
This is what happens with cumin-0.1.4669-1.el5 on boot: ----------------------------------------------------------------------- ntpd: Synchronizing with time server: [ OK ] Starting ntpd: [ OK ] Starting postgresql service: [ OK ] Cumin's database is not yet installed Run 'cumin-database install' as root (detailed output from cumin-database check:) Checking environment ........ OK Checking initialization ..... OK Checking configuration ...... OK Checking server ............. OK Checking database 'cumin' ... Error: The database is not created Hint: Run 'cumin-database create' ----------------------------------------------------------------------- When I start it a minute later without doing any changes, everything works
Possible fix in 4672. Added check-created-wait function that retries once a second for up to 30 seconds if output from psql indicates that a connection to the server could not be established. This function is called from 'cumin-database check' after an existing check to see if the postgres server is running. Based on Jan's comments above, it must be the case that after the server starts there may be an amount of time on some systems before psql commands can establish a connection to the server. A psql command is used to determine whether or not the database is created. Retrying for up to 30 seconds hopefully will give postgres time enough to spin up. If it still fails we can make the time longer.
Verified in cumin-0.1.4672-1.el5 I think 30 seconds will never be reached as Cumin needs maybe less than a second after postgresql is started to establish a connection. I mentioned that "minute" in Comment #14 just as an example of "later". Anyway, 30 seconds limit can stay there, shouldn't hurt anyone.
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Cause During bootup nn some systems the cumin database server may not be fully functional by the time the cumin service is started and checks the state of the database. Consequence This can result in /sbin/service cumin start reporting that the database has not been created, when in fact it actually has. In this case, starting cumin after a short delay (a few seconds) will actually work. Fix If cumin detects that the database server process is actually running but it cannot make a connection to the server, it will retry the connection for up to 30 seconds before it reports an error. The full thirty seconds should not be needed; a successful connection should be made within a few seconds. Normally the connection will be made on the first attempt and there will be no additional delay at all. Result Cumin should now correctly detect and handle the case where the database server is not fully functional at the time when the cumin service is started.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHEA-2011-0889.html