Bug 960162

Summary: orainstance.sh
Product: Red Hat Enterprise Linux 5 Reporter: Jeremy <jerlyon>
Component: rgmanagerAssignee: Ryan McCabe <rmccabe>
Status: CLOSED DUPLICATE QA Contact: Cluster QE <mspqa-list>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 5.9CC: cluster-maint, dvossel
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-05-20 18:42:31 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jeremy 2013-05-06 15:21:36 UTC
Description of problem:
orainstance.sh not detecting failures as best it could, thus allowing a failed DB instance to stay down/failed while the cluster continues to see it in a started state and get_db_status() continually tries to restart the DB. While the steps used here are an edge case, any false positive on a failure on a cluster service created for fail-over is an issue.

Version-Release number of selected component (if applicable):
rgmanager-2.0.52-37.el5_9.1

How reproducible:
Always.

Steps to Reproduce:
1. Setup DB instance in cluster
2. Start DB instance successfully
3. Break DB by moving pfile.
4. Kill Oracle processes
5. Allow cluster to detect failed DB
  
Actual results:
DB tries to restart, does not see failure and loop begins.

Expected results:
DB start failure detected and proper service recovery policy followed.

Additional info:
Logs from failure.

Apr 26 17:09:01 ilsvm0072 orainstance.sh: Restarting Oracle Database...
Apr 26 17:09:01 ilsvm0072 cat:
Apr 26 17:09:01 ilsvm0072 cat: SQL*Plus: Release 11.2.0.3.0 Production on Fri Apr 26 17:09:01 2013
Apr 26 17:09:01 ilsvm0072 cat:
Apr 26 17:09:01 ilsvm0072 cat: Copyright (c) 1982, 2011, Oracle.  All rights reserved.
Apr 26 17:09:01 ilsvm0072 cat:
Apr 26 17:09:01 ilsvm0072 cat: Connected to an idle instance.
Apr 26 17:09:01 ilsvm0072 cat:
Apr 26 17:09:01 ilsvm0072 cat: SQL> ORA-01078: failure in processing system parameters
Apr 26 17:09:01 ilsvm0072 cat: LRM-00109: could not open parameter file '/db/oratest/oracle/product/11203_64/dbs/initorarhel5.ora'
Apr 26 17:09:01 ilsvm0072 cat: SQL> Disconnected
Apr 26 17:10:01 ilsvm0072 orainstance.sh: Restarting Oracle Database...
Apr 26 17:10:01 ilsvm0072 cat:
Apr 26 17:10:01 ilsvm0072 cat: SQL*Plus: Release 11.2.0.3.0 Production on Fri Apr 26 17:10:01 2013
Apr 26 17:10:01 ilsvm0072 cat:
Apr 26 17:10:01 ilsvm0072 cat: Copyright (c) 1982, 2011, Oracle.  All rights reserved.
Apr 26 17:10:01 ilsvm0072 cat:
Apr 26 17:10:01 ilsvm0072 cat: Connected to an idle instance.
Apr 26 17:10:01 ilsvm0072 cat:
Apr 26 17:10:01 ilsvm0072 cat: SQL> ORA-01078: failure in processing system parameters
Apr 26 17:10:01 ilsvm0072 cat: LRM-00109: could not open parameter file '/db/oratest/oracle/product/11203_64/dbs/initorarhel5.ora'
Apr 26 17:10:01 ilsvm0072 cat: SQL> Disconnected
Apr 26 17:11:01 ilsvm0072 orainstance.sh: Restarting Oracle Database...

and so on...

The issue seems to be this:

# egrep -n "grep.*\^ORA" /usr/share/cluster/orainstance.sh
108:    grep -q "^ORA-" $logfile
158:    grep -q "^ORA-" $logfile

The $logfile entry is "^SQL> ORA-"