Bug 960162 - orainstance.sh
Summary: orainstance.sh
Keywords:
Status: CLOSED DUPLICATE of bug 670024
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: rgmanager
Version: 5.9
Hardware: All
OS: Linux
unspecified
medium
Target Milestone: rc
: ---
Assignee: Ryan McCabe
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-05-06 15:21 UTC by Jeremy
Modified: 2013-05-20 18:42 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-05-20 18:42:31 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Jeremy 2013-05-06 15:21:36 UTC
Description of problem:
orainstance.sh not detecting failures as best it could, thus allowing a failed DB instance to stay down/failed while the cluster continues to see it in a started state and get_db_status() continually tries to restart the DB. While the steps used here are an edge case, any false positive on a failure on a cluster service created for fail-over is an issue.

Version-Release number of selected component (if applicable):
rgmanager-2.0.52-37.el5_9.1

How reproducible:
Always.

Steps to Reproduce:
1. Setup DB instance in cluster
2. Start DB instance successfully
3. Break DB by moving pfile.
4. Kill Oracle processes
5. Allow cluster to detect failed DB
  
Actual results:
DB tries to restart, does not see failure and loop begins.

Expected results:
DB start failure detected and proper service recovery policy followed.

Additional info:
Logs from failure.

Apr 26 17:09:01 ilsvm0072 orainstance.sh: Restarting Oracle Database...
Apr 26 17:09:01 ilsvm0072 cat:
Apr 26 17:09:01 ilsvm0072 cat: SQL*Plus: Release 11.2.0.3.0 Production on Fri Apr 26 17:09:01 2013
Apr 26 17:09:01 ilsvm0072 cat:
Apr 26 17:09:01 ilsvm0072 cat: Copyright (c) 1982, 2011, Oracle.  All rights reserved.
Apr 26 17:09:01 ilsvm0072 cat:
Apr 26 17:09:01 ilsvm0072 cat: Connected to an idle instance.
Apr 26 17:09:01 ilsvm0072 cat:
Apr 26 17:09:01 ilsvm0072 cat: SQL> ORA-01078: failure in processing system parameters
Apr 26 17:09:01 ilsvm0072 cat: LRM-00109: could not open parameter file '/db/oratest/oracle/product/11203_64/dbs/initorarhel5.ora'
Apr 26 17:09:01 ilsvm0072 cat: SQL> Disconnected
Apr 26 17:10:01 ilsvm0072 orainstance.sh: Restarting Oracle Database...
Apr 26 17:10:01 ilsvm0072 cat:
Apr 26 17:10:01 ilsvm0072 cat: SQL*Plus: Release 11.2.0.3.0 Production on Fri Apr 26 17:10:01 2013
Apr 26 17:10:01 ilsvm0072 cat:
Apr 26 17:10:01 ilsvm0072 cat: Copyright (c) 1982, 2011, Oracle.  All rights reserved.
Apr 26 17:10:01 ilsvm0072 cat:
Apr 26 17:10:01 ilsvm0072 cat: Connected to an idle instance.
Apr 26 17:10:01 ilsvm0072 cat:
Apr 26 17:10:01 ilsvm0072 cat: SQL> ORA-01078: failure in processing system parameters
Apr 26 17:10:01 ilsvm0072 cat: LRM-00109: could not open parameter file '/db/oratest/oracle/product/11203_64/dbs/initorarhel5.ora'
Apr 26 17:10:01 ilsvm0072 cat: SQL> Disconnected
Apr 26 17:11:01 ilsvm0072 orainstance.sh: Restarting Oracle Database...

and so on...

The issue seems to be this:

# egrep -n "grep.*\^ORA" /usr/share/cluster/orainstance.sh
108:    grep -q "^ORA-" $logfile
158:    grep -q "^ORA-" $logfile

The $logfile entry is "^SQL> ORA-"


Note You need to log in before you can comment on or make changes to this bug.