+++ This bug was initially created as a clone of Bug #581515 +++ Description of problem: When the DBA's install oracle they don't install the enterprise manager or iSQL*Plus. The oracledb.sh script does not check to see if these components are installed, it just assumes they are and tries to start them. Since those components are not installed, the script fails, because they don't start and then the cluster fails. Version-Release number of selected component (if applicable): All versions of rgmanager. How reproducible: Always, unless you comment out the following section of the oracledb.sh script. # if [ "$ORACLE_TYPE" = "base-em" ]; then # action "Starting iSQL*Plus:" isqlplusctl start || return 1 # action "Starting Oracle EM DB Console:" emctl start dbconsole || return 1 # elif [ "$ORACLE_TYPE" = "ias" ]; then # action "Starting Oracle EM:" emctl start em || return 1 # action "Starting iAS Infrastructure:" opmnctl startall || return 1 # fi Steps to Reproduce: 1. Install oracle 10G without iSQL Plus and the Oracle EM DB Console. 2. Cluster an oracle 10G DB 3. start, stop or move it between nodes. Actual results: The oracle service fails. Apr 11 06:42:46 cusnwd0v kernel: kjournald starting. Commit interval 5 seconds Apr 11 06:42:46 cusnwd0v kernel: EXT3 FS on dm-0, internal journal Apr 11 06:42:46 cusnwd0v kernel: EXT3-fs: mounted filesystem with ordered data mode. Apr 11 06:42:47 cusnwd0v kernel: kjournald starting. Commit interval 5 seconds Apr 11 06:42:47 cusnwd0v kernel: EXT3 FS on dm-1, internal journal Apr 11 06:42:47 cusnwd0v kernel: EXT3-fs: mounted filesystem with ordered data mode. Apr 11 06:42:47 cusnwd0v kernel: kjournald starting. Commit interval 5 seconds Apr 11 06:42:47 cusnwd0v kernel: EXT3 FS on dm-2, internal journal Apr 11 06:42:47 cusnwd0v kernel: EXT3-fs: mounted filesystem with ordered data mode. Apr 11 06:42:47 cusnwd0v kernel: kjournald starting. Commit interval 5 seconds Apr 11 06:42:47 cusnwd0v kernel: EXT3 FS on dm-3, internal journal Apr 11 06:42:47 cusnwd0v kernel: EXT3-fs: mounted filesystem with ordered data mode. Apr 11 06:42:47 cusnwd0v kernel: kjournald starting. Commit interval 5 seconds Apr 11 06:42:47 cusnwd0v kernel: EXT3 FS on dm-4, internal journal Apr 11 06:42:47 cusnwd0v kernel: EXT3-fs: mounted filesystem with ordered data mode. Apr 11 06:42:48 cusnwd0v kernel: kjournald starting. Commit interval 5 seconds Apr 11 06:42:48 cusnwd0v kernel: EXT3 FS on dm-5, internal journal Apr 11 06:42:48 cusnwd0v kernel: EXT3-fs: mounted filesystem with ordered data mode. Apr 11 06:42:48 cusnwd0v kernel: kjournald starting. Commit interval 5 seconds Apr 11 06:42:48 cusnwd0v kernel: EXT3 FS on dm-7, internal journal Apr 11 06:42:48 cusnwd0v kernel: EXT3-fs: mounted filesystem with ordered data mode. Apr 11 06:42:49 cusnwd0v kernel: kjournald starting. Commit interval 5 seconds Apr 11 06:42:49 cusnwd0v kernel: EXT3 FS on dm-6, internal journal Apr 11 06:42:49 cusnwd0v kernel: EXT3-fs: mounted filesystem with ordered data mode. Apr 11 06:42:49 cusnwd0v kernel: kjournald starting. Commit interval 5 seconds Apr 11 06:42:49 cusnwd0v kernel: EXT3 FS on dm-8, internal journal Apr 11 06:42:49 cusnwd0v kernel: EXT3-fs: mounted filesystem with ordered data mode. Apr 11 06:42:49 cusnwd0v kernel: kjournald starting. Commit interval 5 seconds Apr 11 06:42:49 cusnwd0v kernel: EXT3 FS on dm-9, internal journal Apr 11 06:42:49 cusnwd0v kernel: EXT3-fs: mounted filesystem with ordered data mode. Apr 11 06:42:50 cusnwd0v kernel: kjournald starting. Commit interval 5 seconds Apr 11 06:42:50 cusnwd0v kernel: EXT3 FS on dm-10, internal journal Apr 11 06:42:50 cusnwd0v kernel: EXT3-fs: mounted filesystem with ordered data mode. Apr 11 06:42:52 cusnwd0v luci[8567]: Unable to retrieve batch 1675237673 status from cusnwd0v-ic.carrier.utc.com:11111: module scheduled for execution Apr 11 06:43:04 cusnwd0v last message repeated 2 times Apr 11 06:43:06 cusnwd0v cat: Apr 11 06:43:06 cusnwd0v cat: SQL*Plus: Release 10.2.0.4.0 - Production on Sun Apr 11 06:42:53 2010 Apr 11 06:43:06 cusnwd0v cat: Apr 11 06:43:06 cusnwd0v cat: Copyright (c) 1982, 2007, Oracle. All Rights Reserved. Apr 11 06:43:06 cusnwd0v cat: Apr 11 06:43:06 cusnwd0v cat: Connected to an idle instance. Apr 11 06:43:06 cusnwd0v cat: Apr 11 06:43:06 cusnwd0v cat: SQL> ORACLE instance started. Apr 11 06:43:06 cusnwd0v cat: Apr 11 06:43:06 cusnwd0v cat: Total System Global Area 1.6106E+10 bytes Apr 11 06:43:06 cusnwd0v cat: Fixed Size 2112088 bytes Apr 11 06:43:06 cusnwd0v cat: Variable Size 6777996712 bytes Apr 11 06:43:06 cusnwd0v cat: Database Buffers 9294577664 bytes Apr 11 06:43:06 cusnwd0v cat: Redo Buffers 31440896 bytes Apr 11 06:43:06 cusnwd0v cat: Database mounted. Apr 11 06:43:06 cusnwd0v cat: Database opened. Apr 11 06:43:06 cusnwd0v cat: SQL> Disconnected from Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production Apr 11 06:43:06 cusnwd0v cat: With the Partitioning, Data Mining and Real Application Testing options Apr 11 06:43:10 cusnwd0v clurgmgrd[8632]: <notice> start on oracledb "WCHILL1P" returned 1 (generic error) Apr 11 06:43:10 cusnwd0v clurgmgrd[8632]: <warning> #68: Failed to start service:WNDCHLLDB; return value: 1 Apr 11 06:43:10 cusnwd0v clurgmgrd[8632]: <notice> Stopping service service:WNDCHLLDB Apr 11 06:43:11 cusnwd0v luci[8567]: Unable to retrieve batch 1675237673 status from cusnwd0v-ic.carrier.utc.com:11111: module scheduled for execution Apr 11 06:43:14 cusnwd0v clurgmgrd[8632]: <notice> stop on oracledb "WCHILL1P" returned 1 (generic error) Apr 11 06:43:17 cusnwd0v luci[8567]: Unable to retrieve batch 1675237673 status from cusnwd0v-ic.carrier.utc.com:11111: module scheduled for execution Apr 11 06:43:24 cusnwd0v luci[8567]: Unable to retrieve batch 1675237673 status from cusnwd0v-ic.carrier.utc.com:11111: module scheduled for execution Apr 11 06:43:24 cusnwd0v clurgmgrd: [8632]: <notice> Forcefully unmounting /wchillp/redo_logsb Apr 11 06:43:25 cusnwd0v clurgmgrd: [8632]: <warning> killing process 12608 (oracle oracle /wchillp/redo_logsb) Apr 11 06:43:25 cusnwd0v clurgmgrd: [8632]: <warning> killing process 12656 (oracle oracle /wchillp/redo_logsb) Apr 11 06:43:30 cusnwd0v luci[8567]: Unable to retrieve batch 1675237673 status from cusnwd0v-ic.carrier.utc.com:11111: module scheduled for execution Apr 11 06:43:31 cusnwd0v clurgmgrd: [8632]: <notice> Forcefully unmounting /wchillp/app/oracle Apr 11 06:43:32 cusnwd0v clurgmgrd: [8632]: <warning> killing process 12682 (oracle tnslsnr /wchillp/app/oracle) Apr 11 06:43:37 cusnwd0v luci[8567]: Unable to retrieve batch 1675237673 status from cusnwd0v-ic.carrier.utc.com:11111: module scheduled for execution Apr 11 06:43:37 cusnwd0v clurgmgrd[8632]: <crit> #12: RG service:WNDCHLLDB failed to stop; intervention required Apr 11 06:43:37 cusnwd0v clurgmgrd[8632]: <notice> Service service:WNDCHLLDB is failed Apr 11 06:43:38 cusnwd0v clurgmgrd[8632]: <crit> #13: Service service:WNDCHLLDB failed to stop cleanly Apr 11 06:43:43 cusnwd0v luci[8567]: Unable to retrieve batch 1675237673 status from cusnwd0v-ic.carrier.utc.com:11111: clusvcadm start failed to start WNDCHLLDB: Apr 11 07:13:25 cusnwd0v clurgmgrd[8632]: <notice> Stopping service service:WNDCHLLDB Apr 11 07:13:26 cusnwd0v clurgmgrd[8632]: <notice> Service service:WNDCHLLDB is disabled Expected results: The oracledb.sh should check to see if enterprise manager or iSQL*Plus is installed and not just assume that those components are there. For now when I patch the linux server and rgmanager gets updated, I need to make sure I edit the oracledb.sh file and comment out the following lines. # if [ "$ORACLE_TYPE" = "base-em" ]; then # action "Starting iSQL*Plus:" isqlplusctl start || return 1 # action "Starting Oracle EM DB Console:" emctl start dbconsole || return 1 # elif [ "$ORACLE_TYPE" = "ias" ]; then # action "Starting Oracle EM:" emctl start em || return 1 # action "Starting iAS Infrastructure:" opmnctl startall || return 1 # fi Additional info: --- Additional comment from lhh on 2010-04-12 09:57:53 EDT --- Please attach your cluster.conf --- Additional comment from tkornmes on 2010-04-12 10:13:24 EDT --- Created an attachment (id=405972) Cluster.conf file from cluster --- Additional comment from tkornmes on 2010-04-12 10:16:18 EDT --- Created an attachment (id=405974) oracledb.sh file with the changes I've had to make to make sure it starts Oracle via the cluster --- Additional comment from lhh on 2010-04-16 15:23:14 EDT --- <longdesc lang="en"> This is the Oracle installation type: base - Database Instance and Listener only base-em (or 10g) - Database, Listener, Enterprise Manager, and iSQL*Plus ias (or 10g-ias) </longdesc> Sounds like setting it to "base" should work for your configuration without editing the script. --- Additional comment from tkornmes on 2010-04-16 16:15:36 EDT --- I understand I could set it to base, but where would you set that such that you don't run the risk of it getting overwritten? Any changes to the oracledb.sh file will get overwritten when an update is applied to the rgmanager package. Which is exactly what happened here. The script incorrectly identifies any 10G install as base-em, which is incorrect. --- Additional comment from lhh on 2010-04-16 16:38:30 EDT --- Changing this line in cluster.conf: <oracledb home="/wchillp/app/oracle/product/10.2.0" name="WCHILL1P" type="10g" user="oracle" vhost="vip-windchilldb.carrier.utc.com"/> to: <oracledb home="/wchillp/app/oracle/product/10.2.0" name="WCHILL1P" type="base" user="oracle" vhost="vip-windchilldb.carrier.utc.com"/> ... should do it. --- Additional comment from lhh on 2010-04-16 16:59:30 EDT --- (Don't forget to change the config version and so forth; making a change to the agent will cause the Oracle instance to be restarted, so do it at your next maintenance window) --- Additional comment from lhh on 2010-04-16 17:00:23 EDT --- Oops -- making a change to the resource line in cluster.conf, not the agent, will cause the resource instance to be restarted. --- Additional comment from tkornmes on 2010-04-21 14:15:46 EDT --- Oh, Ok... I guess. I wish this was better documented and when you try to use the agent configuration via luci, it gives you a pull down menu for the type and let you choose from the three options: base, base-em or base-asi. I had contacted Redhat support when I was initially trying to get this to work, but they were not very helpful. All they did was have me create my own oracle agent, which was still not the right answer. --- Additional comment from lhh on 2010-04-29 15:33:00 EDT --- I'm sorry that Red Hat Support did not answer your question as required. If you'd like, we can clone this bug against the luci interface and/or Documentation for clarification as to what the base/base-em/base-ias possibilities mean. However, as far as rgmanager is concerned, this isn't a bug.
Fixing title. This bugzilla is a request for clarification in the Conga user interface for the different possible types of oracledb resources. base - Listener and database instance only base-em - Listener, database, and Enterprise Manager base-ias - Listener, database, EM, and iAS
http://www.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5.5/html/Cluster_Administration/ap-ha-resource-params-CA.html#tb-oracledb-resource-CA Is actually missing the 'type' parameter.
Created attachment 437295 [details] patch to fix bug
Having the ability to select the type from luci for the Oracle resource would be perfect. Not sure why it was missing in the first place.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2011-0033.html