Bug 663827
Summary: | postgres-8 resource agent does not detect a failed start of postgres server | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Magnus Glantz <mglantz> | ||||||||
Component: | rgmanager | Assignee: | Marek Grac <mgrac> | ||||||||
Status: | CLOSED ERRATA | QA Contact: | Cluster QE <mspqa-list> | ||||||||
Severity: | high | Docs Contact: | |||||||||
Priority: | low | ||||||||||
Version: | 5.5 | CC: | cluster-maint, djansa, edamato | ||||||||
Target Milestone: | rc | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | Unspecified | ||||||||||
OS: | Unspecified | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | rgmanager-2.0.52-18.el5 | Doc Type: | Bug Fix | ||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | |||||||||||
: | 694816 (view as bug list) | Environment: | |||||||||
Last Closed: | 2011-07-21 10:48:12 UTC | Type: | --- | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Bug Depends On: | |||||||||||
Bug Blocks: | 694816, 917781 | ||||||||||
Attachments: |
|
Description
Magnus Glantz
2010-12-17 00:09:24 UTC
Created attachment 469236 [details]
full version of patched postgres-8.sh script
Created attachment 469237 [details]
renamed metadata file for full version script
A note for anyone evaluating this. This patch adds 1 second of sleep to allow pg_ctl to detect a started server. Needs to be evaluated if this is enough in most scenarios or if pg_ctl is better to be used to start-up the server as well. Also, pg_ctl currently needs to fetch "-D /path/to/pgsql/data" from $OCF_RESKEY_postmaster_options to work. Changed severity to high, as this bug renders postgres-8.sh dangerous to use - as end user may not notice this issue if a postgres config/binary corruption issue is not tested. Example cluster.conf extract: postmaster_options="-D /path/to/pgsql/data" needs to be defined for the patch to work. <resources> <postgres-9 config_file="/etc/cluster/postgresql.conf" name="database" postmaster_options="-D /var/lib/pgsql/9.0/data" postmaster_user="postgres" shutdown_wait="5"/> </resources> <service autostart="1" name="testservice"> <postgres-9 ref="database"/> </service> @Magnus: Thanks for a generous amount of information and proposed patch. If there is a problem with configuration we are not able to detect it at a start time but we will find a problem after first checking of status. -- after patch review I found out three possible problems: 1) stop_postgres() I don't like that code duplication. If using stop_generic_sigkill (kill -TERM, wait, kill -QUIT) is not a good solution. Then I will prefer to change existing function to support not running 'kill -TERM' when stop_timeout = 0. 2) part with ccs_fd=$(ccs_connect) ... get_service_ip_keys "$ccs_fd" $OCF_RESKEY_service_name. Redundant because if ccs is not working properly we are not able to obtain IP address. Correct service configuration should be: <service autostart="1" name="testservice"> <postgres-9 ref="database"/> <ip addr="1.1.1.1" monitor_link="yes" /> </service> so postgres can bind to proper IP address(es). 3) changing postmaster to pgctl - most important change. No real objections as it makes sense to change it. Timeout will have to be configured (1 second looks like good default value). pgctl is ignoring generated configuration file (that's why it working for you even without ip address) is this an intention? I can make modifications for 1) and 3) after you agree on them. Mainly 1) as I'm not a posgres expert. Thanks Marek, 1) So, if we got "stop_timeout = 0" then "kill -QUIT" will be issued? If so, that's fine. 3) 1 second timeout is a good default value. Not sure I understand "pgctl is ignoring generated configuration file". Can you please evolve on this? @Magnus: 3) All resource agents should have ability to run several instances of application on same server (eg 2x apache + 3x postres on 1 node). As application usually do not accept configuration options (configuration file, locking file, ...) directly we have to 'patch' existing configuration and create a new one which used for application. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-1000.html |