Bug 817550

Summary: Change oracledb.sh script so that it properly checks the status of an Oracle database
Product: Red Hat Enterprise Linux 6 Reporter: cphillip
Component: resource-agentsAssignee: Ryan McCabe <rmccabe>
Status: CLOSED ERRATA QA Contact: Cluster QE <mspqa-list>
Severity: medium Docs Contact:
Priority: medium    
Version: 6.4CC: agk, bugzilla, cfeist, cluster-maint, djansa, lhh, mjuricek, sbradley
Target Milestone: rc   
Target Release: 6.4   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: resource-agents-3.9.2-14.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-02-21 07:52:06 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 782183, 840699    

Description cphillip 2012-04-30 13:27:25 UTC
Description of problem:

The oracledb.sh script when called with a status argument should only check the status of oracle and report this back to rgmanager.  It should not restart the database as part of the status function.

The following function gets called by status_oracle when the oracledb.sh script is called with a status argument.

As you can see instead of just reporting the status as down, it will restart the database without any notification to rgmanager:

get_db_status()
{
        declare -i subsys_lock=$1
        declare -i i=0
        declare -i rv=0
        declare ora_procname

        for procname in $DB_PROCNAMES ; do

                ora_procname="ora_${procname}_${ORACLE_SID}"

                status $ora_procname
                if [ $? -eq 0 ] ; then
                        # This one's okay; go to the next one.
                        continue
                fi

                #
                # We're not supposed to be running, and we are,
                # in fact, not running...
                # XXX only works when monitoring one db process; consider
                # extending in future.
                #
                if [ $subsys_lock -ne 0 ]; then
                        return 3
                fi

                for (( i=$RESTART_RETRIES ; i; i-- )) ; do
                        # this db process is down - stop and
                        # (re)start all ora_XXXX_$ORACLE_SID processes
                        initlog -q -n $SCRIPT -s "Restarting Oracle Database..."
                        stop_db immediate
                        if [ $? != 0 ] ; then
                                # stop failed - return 1
                                return 1
                        fi

                        start_db
                        if [ $? == 0 ] ; then
                                # ora_XXXX_$ORACLE_SID processes started
                                # successfully, so break out of the
                                # stop/start # 'for' loop
                                break
                        fi
                done

                if [ $i -eq 0 ]; then
                        # stop/start's failed - return 1 (failure)
                        return 1
                fi
        done
        return 0


This behaviour of the oracledb.sh script when called with a status argument is the bug that needs to be fixed. 

The way that the code is written oracle could be failing and get restarted by the status commands repeatedly (an infinite number of times) with no notification to rgmanger,  this means that is it not possible to accurately control the behaviour of the oracle resource or the service based on the status of the database, and this effects options like coalesce.

This issue is solely with the oracledb.sh script, if a script resource is used this will not occur because the code that will be run will not include the issue.  It seems a little overkill to rewrite the code to start/stop and monitor oracle for what is a fairly simple issue with the existing resource script.

The following change to the oracledb.sh resource script should fix this:

declare -i      RESTART_RETRIES=3
to 
declare -i      RESTART_RETRIES=0

Comment 7 errata-xmlrpc 2013-02-21 07:52:06 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-0288.html