Description of problem: When the DBA's install oracle they don't install the enterprise manager or iSQL*Plus. The oracledb.sh script does not check to see if these components are installed, it just assumes they are and tries to start them. Since those components are not installed, the script fails, because they don't start and then the cluster fails. Version-Release number of selected component (if applicable): All versions of rgmanager. How reproducible: Always, unless you comment out the following section of the oracledb.sh script. # if [ "$ORACLE_TYPE" = "base-em" ]; then # action "Starting iSQL*Plus:" isqlplusctl start || return 1 # action "Starting Oracle EM DB Console:" emctl start dbconsole || return 1 # elif [ "$ORACLE_TYPE" = "ias" ]; then # action "Starting Oracle EM:" emctl start em || return 1 # action "Starting iAS Infrastructure:" opmnctl startall || return 1 # fi Steps to Reproduce: 1. Install oracle 10G without iSQL Plus and the Oracle EM DB Console. 2. Cluster an oracle 10G DB 3. start, stop or move it between nodes. Actual results: The oracle service fails. Apr 11 06:42:46 cusnwd0v kernel: kjournald starting. Commit interval 5 seconds Apr 11 06:42:46 cusnwd0v kernel: EXT3 FS on dm-0, internal journal Apr 11 06:42:46 cusnwd0v kernel: EXT3-fs: mounted filesystem with ordered data mode. Apr 11 06:42:47 cusnwd0v kernel: kjournald starting. Commit interval 5 seconds Apr 11 06:42:47 cusnwd0v kernel: EXT3 FS on dm-1, internal journal Apr 11 06:42:47 cusnwd0v kernel: EXT3-fs: mounted filesystem with ordered data mode. Apr 11 06:42:47 cusnwd0v kernel: kjournald starting. Commit interval 5 seconds Apr 11 06:42:47 cusnwd0v kernel: EXT3 FS on dm-2, internal journal Apr 11 06:42:47 cusnwd0v kernel: EXT3-fs: mounted filesystem with ordered data mode. Apr 11 06:42:47 cusnwd0v kernel: kjournald starting. Commit interval 5 seconds Apr 11 06:42:47 cusnwd0v kernel: EXT3 FS on dm-3, internal journal Apr 11 06:42:47 cusnwd0v kernel: EXT3-fs: mounted filesystem with ordered data mode. Apr 11 06:42:47 cusnwd0v kernel: kjournald starting. Commit interval 5 seconds Apr 11 06:42:47 cusnwd0v kernel: EXT3 FS on dm-4, internal journal Apr 11 06:42:47 cusnwd0v kernel: EXT3-fs: mounted filesystem with ordered data mode. Apr 11 06:42:48 cusnwd0v kernel: kjournald starting. Commit interval 5 seconds Apr 11 06:42:48 cusnwd0v kernel: EXT3 FS on dm-5, internal journal Apr 11 06:42:48 cusnwd0v kernel: EXT3-fs: mounted filesystem with ordered data mode. Apr 11 06:42:48 cusnwd0v kernel: kjournald starting. Commit interval 5 seconds Apr 11 06:42:48 cusnwd0v kernel: EXT3 FS on dm-7, internal journal Apr 11 06:42:48 cusnwd0v kernel: EXT3-fs: mounted filesystem with ordered data mode. Apr 11 06:42:49 cusnwd0v kernel: kjournald starting. Commit interval 5 seconds Apr 11 06:42:49 cusnwd0v kernel: EXT3 FS on dm-6, internal journal Apr 11 06:42:49 cusnwd0v kernel: EXT3-fs: mounted filesystem with ordered data mode. Apr 11 06:42:49 cusnwd0v kernel: kjournald starting. Commit interval 5 seconds Apr 11 06:42:49 cusnwd0v kernel: EXT3 FS on dm-8, internal journal Apr 11 06:42:49 cusnwd0v kernel: EXT3-fs: mounted filesystem with ordered data mode. Apr 11 06:42:49 cusnwd0v kernel: kjournald starting. Commit interval 5 seconds Apr 11 06:42:49 cusnwd0v kernel: EXT3 FS on dm-9, internal journal Apr 11 06:42:49 cusnwd0v kernel: EXT3-fs: mounted filesystem with ordered data mode. Apr 11 06:42:50 cusnwd0v kernel: kjournald starting. Commit interval 5 seconds Apr 11 06:42:50 cusnwd0v kernel: EXT3 FS on dm-10, internal journal Apr 11 06:42:50 cusnwd0v kernel: EXT3-fs: mounted filesystem with ordered data mode. Apr 11 06:42:52 cusnwd0v luci[8567]: Unable to retrieve batch 1675237673 status from cusnwd0v-ic.carrier.utc.com:11111: module scheduled for execution Apr 11 06:43:04 cusnwd0v last message repeated 2 times Apr 11 06:43:06 cusnwd0v cat: Apr 11 06:43:06 cusnwd0v cat: SQL*Plus: Release 10.2.0.4.0 - Production on Sun Apr 11 06:42:53 2010 Apr 11 06:43:06 cusnwd0v cat: Apr 11 06:43:06 cusnwd0v cat: Copyright (c) 1982, 2007, Oracle. All Rights Reserved. Apr 11 06:43:06 cusnwd0v cat: Apr 11 06:43:06 cusnwd0v cat: Connected to an idle instance. Apr 11 06:43:06 cusnwd0v cat: Apr 11 06:43:06 cusnwd0v cat: SQL> ORACLE instance started. Apr 11 06:43:06 cusnwd0v cat: Apr 11 06:43:06 cusnwd0v cat: Total System Global Area 1.6106E+10 bytes Apr 11 06:43:06 cusnwd0v cat: Fixed Size 2112088 bytes Apr 11 06:43:06 cusnwd0v cat: Variable Size 6777996712 bytes Apr 11 06:43:06 cusnwd0v cat: Database Buffers 9294577664 bytes Apr 11 06:43:06 cusnwd0v cat: Redo Buffers 31440896 bytes Apr 11 06:43:06 cusnwd0v cat: Database mounted. Apr 11 06:43:06 cusnwd0v cat: Database opened. Apr 11 06:43:06 cusnwd0v cat: SQL> Disconnected from Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production Apr 11 06:43:06 cusnwd0v cat: With the Partitioning, Data Mining and Real Application Testing options Apr 11 06:43:10 cusnwd0v clurgmgrd[8632]: <notice> start on oracledb "WCHILL1P" returned 1 (generic error) Apr 11 06:43:10 cusnwd0v clurgmgrd[8632]: <warning> #68: Failed to start service:WNDCHLLDB; return value: 1 Apr 11 06:43:10 cusnwd0v clurgmgrd[8632]: <notice> Stopping service service:WNDCHLLDB Apr 11 06:43:11 cusnwd0v luci[8567]: Unable to retrieve batch 1675237673 status from cusnwd0v-ic.carrier.utc.com:11111: module scheduled for execution Apr 11 06:43:14 cusnwd0v clurgmgrd[8632]: <notice> stop on oracledb "WCHILL1P" returned 1 (generic error) Apr 11 06:43:17 cusnwd0v luci[8567]: Unable to retrieve batch 1675237673 status from cusnwd0v-ic.carrier.utc.com:11111: module scheduled for execution Apr 11 06:43:24 cusnwd0v luci[8567]: Unable to retrieve batch 1675237673 status from cusnwd0v-ic.carrier.utc.com:11111: module scheduled for execution Apr 11 06:43:24 cusnwd0v clurgmgrd: [8632]: <notice> Forcefully unmounting /wchillp/redo_logsb Apr 11 06:43:25 cusnwd0v clurgmgrd: [8632]: <warning> killing process 12608 (oracle oracle /wchillp/redo_logsb) Apr 11 06:43:25 cusnwd0v clurgmgrd: [8632]: <warning> killing process 12656 (oracle oracle /wchillp/redo_logsb) Apr 11 06:43:30 cusnwd0v luci[8567]: Unable to retrieve batch 1675237673 status from cusnwd0v-ic.carrier.utc.com:11111: module scheduled for execution Apr 11 06:43:31 cusnwd0v clurgmgrd: [8632]: <notice> Forcefully unmounting /wchillp/app/oracle Apr 11 06:43:32 cusnwd0v clurgmgrd: [8632]: <warning> killing process 12682 (oracle tnslsnr /wchillp/app/oracle) Apr 11 06:43:37 cusnwd0v luci[8567]: Unable to retrieve batch 1675237673 status from cusnwd0v-ic.carrier.utc.com:11111: module scheduled for execution Apr 11 06:43:37 cusnwd0v clurgmgrd[8632]: <crit> #12: RG service:WNDCHLLDB failed to stop; intervention required Apr 11 06:43:37 cusnwd0v clurgmgrd[8632]: <notice> Service service:WNDCHLLDB is failed Apr 11 06:43:38 cusnwd0v clurgmgrd[8632]: <crit> #13: Service service:WNDCHLLDB failed to stop cleanly Apr 11 06:43:43 cusnwd0v luci[8567]: Unable to retrieve batch 1675237673 status from cusnwd0v-ic.carrier.utc.com:11111: clusvcadm start failed to start WNDCHLLDB: Apr 11 07:13:25 cusnwd0v clurgmgrd[8632]: <notice> Stopping service service:WNDCHLLDB Apr 11 07:13:26 cusnwd0v clurgmgrd[8632]: <notice> Service service:WNDCHLLDB is disabled Expected results: The oracledb.sh should check to see if enterprise manager or iSQL*Plus is installed and not just assume that those components are there. For now when I patch the linux server and rgmanager gets updated, I need to make sure I edit the oracledb.sh file and comment out the following lines. # if [ "$ORACLE_TYPE" = "base-em" ]; then # action "Starting iSQL*Plus:" isqlplusctl start || return 1 # action "Starting Oracle EM DB Console:" emctl start dbconsole || return 1 # elif [ "$ORACLE_TYPE" = "ias" ]; then # action "Starting Oracle EM:" emctl start em || return 1 # action "Starting iAS Infrastructure:" opmnctl startall || return 1 # fi Additional info:
Please attach your cluster.conf
Created attachment 405972 [details] Cluster.conf file from cluster
Created attachment 405974 [details] oracledb.sh file with the changes I've had to make to make sure it starts Oracle via the cluster
<longdesc lang="en"> This is the Oracle installation type: base - Database Instance and Listener only base-em (or 10g) - Database, Listener, Enterprise Manager, and iSQL*Plus ias (or 10g-ias) </longdesc> Sounds like setting it to "base" should work for your configuration without editing the script.
I understand I could set it to base, but where would you set that such that you don't run the risk of it getting overwritten? Any changes to the oracledb.sh file will get overwritten when an update is applied to the rgmanager package. Which is exactly what happened here. The script incorrectly identifies any 10G install as base-em, which is incorrect.
Changing this line in cluster.conf: <oracledb home="/wchillp/app/oracle/product/10.2.0" name="WCHILL1P" type="10g" user="oracle" vhost="vip-windchilldb.carrier.utc.com"/> to: <oracledb home="/wchillp/app/oracle/product/10.2.0" name="WCHILL1P" type="base" user="oracle" vhost="vip-windchilldb.carrier.utc.com"/> ... should do it.
(Don't forget to change the config version and so forth; making a change to the agent will cause the Oracle instance to be restarted, so do it at your next maintenance window)
Oops -- making a change to the resource line in cluster.conf, not the agent, will cause the resource instance to be restarted.
Oh, Ok... I guess. I wish this was better documented and when you try to use the agent configuration via luci, it gives you a pull down menu for the type and let you choose from the three options: base, base-em or base-asi. I had contacted Redhat support when I was initially trying to get this to work, but they were not very helpful. All they did was have me create my own oracle agent, which was still not the right answer.
I'm sorry that Red Hat Support did not answer your question as required. If you'd like, we can clone this bug against the luci interface and/or Documentation for clarification as to what the base/base-em/base-ias possibilities mean. However, as far as rgmanager is concerned, this isn't a bug.