Bug 948258 - Incorrect regex in oracledb.sh
Summary: Incorrect regex in oracledb.sh
Keywords:
Status: CLOSED DUPLICATE of bug 670024
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: rgmanager
Version: 5.10
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Ryan McCabe
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-04-04 12:25 UTC by Josef Zimek
Modified: 2018-12-02 18:58 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-05-16 21:01:58 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Josef Zimek 2013-04-04 12:25:08 UTC
Description of problem:

this occured during cluster tests with the resource agent script oracledb that we used unmodified in this cluster to start up an Oracle DB instance.

For test purposes we tentatively renamed the instance's pfile (i.e. the init.ora) so that the oracledb RA was doomed to fail bringing the instance back up after we had it shut down manually.

We would have expected that after a limited number of failed restart attempts on local node that the rgmanager would fail down the whole service on the local node and try to fail over the whole service on the failover node which at the time was ready and enabled to take up the service's resources.

However, this didn't happen.
Instead the cluster service was hung in an infinite loop where the above RA would retry to bring the DB instance up locally ad infinitum.
This was to my understanding a violation of the atomic nature of a resource group/service.
If you cannot make the whole lot available on the current node go an try to fail it over to another ready node in the failover domain.

I then tried, without much success, to add an attribute such as "max_restarts" to the resource tag of oracledb, similar to the one you may add to a service tag, which of course was unknown to the XML parser in this oracledb resource context, why the "ccs_tool update" command would fail.

I mean these kind of service tag attributes concerning restart attempts:

[root@aruba:~]
# grep service.*lola /etc/cluster/cluster.conf
                <service name="lola" autostart="0" domain="baros-fod" exclusive="0" max_restarts="1" recovery="restart" restart_expire_time="0">



Here is the excerpt of the oracledb tag from our cluster.conf:


<oracledb home="/app/oracle/product/11.2.0" name="LOLA" type="base" user="oracle" listener_name="L_LOLA">
    <script name="oracle_em" file="/etc/cluster/itdz/script_oracle_em.sh"/>
</oracledb>



Here's the resources hierarchy of the affected service during start up:

[root@aruba:~]
# /usr/sbin/rg_test noop /etc/cluster/cluster.conf start service lola
Running in test mode.
Starting lola...
[start] service:lola
[start] lvm:VG lola dbf
[start] fs:FS lola data01
[start] fs:FS lola data02
[start] fs:FS lola data03
[start] fs:FS lola data04
[start] lvm:VG lola log
[start] fs:FS lola data05
[start] fs:FS lola data06
[start] fs:FS lola data07
[start] fs:FS lola reorg
[start] ip:10.25.128.120
[start] oracledb:LOLA
[start] script:oracle_em
Start of lola complete





rgmanager-2.0.52-28.el5



How reproducible:
always 

Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

The testing patch was already provided by jrummy and successfully tested by customer but we need to have this in supported fashion.



In oracledb.sh we're looking for the ORA-XXXX error:

        grep -q "^ORA-" $logfile

And this is the output we get:

  SQL> ORA-01078: failure in processing system parameters

"ORA-" is not at the start of the line (^), so this doesn't match.  We need to account for the possibility of "SQL> " at the start of the line.


Better regex: 

  grep -qE "^(SQL>)?\s*ORA-"





Private branch:
################
  private-jruemker-case711279

Scratch Build:
################
  http://brewweb.devel.redhat.com/brew/taskinfo?taskID=4993514

Patch:
#######
diff -up rgmanager-2.0.52/src/resources/oracledb.sh.case711279 rgmanager-2.0.52/src/resources/oracledb.sh
--- rgmanager-2.0.52/src/resources/oracledb.sh.case711279	2012-10-18 10:42:12.078366058 -0400
+++ rgmanager-2.0.52/src/resources/oracledb.sh	2012-10-18 10:45:34.310620633 -0400
@@ -299,7 +299,7 @@ start_db()
 	#
 
 	rm -f $tmpfile
-	grep -q "^ORA-" $logfile
+	grep -qE "^(SQL>)?\s*ORA-" $logfile
 	if [ $? -eq 0 ]; then
 		rm -f $tmpfile
 	echo "ORACLE_SID Incorrectly set?"
@@ -348,7 +348,7 @@ stop_db()
 	# If we see 'failure' in the log, we're done.
 	#
 	rm -f $tmpfile
-	grep -q "^ORA-" $logfile
+	grep -qE "^(SQL>)?\s*ORA-" $logfile
 	if [ $? -eq 0 ]; then
 		echo_failure
 		echo

Comment 4 Ryan McCabe 2013-05-16 21:01:58 UTC
This should be fixed as a side effect of fixing Bug 670024

Please reopen if you still have problems with rgmanager-2.0.52-41.el5 or later.

*** This bug has been marked as a duplicate of bug 670024 ***


Note You need to log in before you can comment on or make changes to this bug.