1318240 – Oracle resource agent unable to start because of ORA-01081

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1318240 - Oracle resource agent unable to start because of ORA-01081

Summary: Oracle resource agent unable to start because of ORA-01081

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	resource-agents
Sub Component:
Version:	6.7
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	rc
Target Release:	---
Assignee:	Oyvind Albrigtsen
QA Contact:	cluster-qe@redhat.com
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1318985
TreeView+	depends on / blocked

Reported:	2016-03-16 10:45 UTC by michal novacek
Modified:	2017-03-21 09:27 UTC (History)
CC List:	5 users (show)
Fixed In Version:	resource-agents-3.9.5-43.el6
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1318985 (view as bug list)
Environment:
Last Closed:	2017-03-21 09:27:21 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
'pcs cluster report' output (1.56 MB, application/x-bzip) 2016-03-16 10:45 UTC, michal novacek	no flags	Details
This patch seems to fix the issue. (543 bytes, patch) 2016-03-18 09:28 UTC, michal novacek	no flags	Details \| Diff
Show Obsolete (1) View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2017:0602	0	normal	SHIPPED_LIVE	resource-agents bug fix update	2017-03-21 12:26:23 UTC

Description michal novacek 2016-03-16 10:45:01 UTC

Created attachment 1136972 [details]
'pcs cluster report' output

Description of problem:
It might happen that oracle cannot be started by resource agent with the following error:
INFO: ORA-01081 error found, trying to cleanup oracle (dbstart_mount output:
ORA-01081: cannot start already-running ORACLE - shut it down first)

Version-Release number of selected component (if applicable):
resource-agents-3.9.5-34.el6.x86_64

How reproducible: always, once in that state

Steps to Reproduce:
1. pcs resource debug-start oracle

Actual results: Oracle will never start.

Expected results: Oracle starts happily.

Additional info:

I believe that this happens with recovery tests after all ora_* processes have
been killed and resource agent tried to start Oracle on another node.

To enable resource agent to start it again it is necessary to issue 'shutdown
immediate;' in sqlplus.

This is how the problem demonstrates itself:

# pcs resource debug-start oracle
Error performing operation: Operation not permitted
Operation start for oracle (ocf:heartbeat:oracle) returned 1
 >  stderr: INFO: ORA-01081 error found, trying to cleanup oracle (dbstart_mount output: ORA-01081: cannot start already-running ORACLE - shut it down first)
 >  stderr: ls: cannot access /u01/app/oracle/product/12.1.0/dbhome_1/dbs/lk*: No such file or directory
 >  stderr: ERROR: oracle oradb can not be mounted (status: OPEN)

Comment 2 michal novacek 2016-03-18 09:28:00 UTC

Created attachment 1137726 [details]
This patch seems to fix the issue.

Comment 3 Oyvind Albrigtsen 2016-03-18 10:18:16 UTC

Comment on attachment 1137726 [details]
This patch seems to fix the issue.

Tested and verified patch available upstream: https://github.com/ClusterLabs/resource-agents/pull/783

Comment 4 Mike McCune 2016-03-28 23:14:23 UTC

This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune with any questions

Comment 9 michal novacek 2017-01-23 15:56:54 UTC

I have verified that oracle resource agent is _always_ able to start with no
ORA-01081 error after its processes have been forcefully terminated in
resource-agents-3.9.5-43.el6.

-----

common setup: running cluster configured with oracle group (1) with the group
started (2)

before the patch (resource-agents-3.9.5-34)
===========================================

[root@tardis-02 ~]# pkill -9 ora_
[root@tardis-02 ~]# pgrep ora_
[root@tardis-02 ~]# sleep 30
[root@tardis-02 ~]# pcs resource
 Resource Group: ora-group
     vip        (ocf::heartbeat:IPaddr2):       Started tardis-02.cluster-qe.lab.eng.brq.redhat.com
     halvm      (ocf::heartbeat:LVM):   Started tardis-02.cluster-qe.lab.eng.brq.redhat.com
     fs (ocf::heartbeat:Filesystem):    Started tardis-02.cluster-qe.lab.eng.brq.redhat.com
     oracle     (ocf::heartbeat:oracle):        Stopped

[root@tardis-02 ~]# pcs resource debug-start oracle
Error performing operation: Operation not permitted
Operation start for oracle (ocf:heartbeat:oracle) returned 1
 >  stderr: INFO: ORA-01081 error found, trying to cleanup oracle (dbstart_mount output: ORA-01081: cannot start already-running ORACLE - shut it down first)
 >  stderr: ERROR: oracle oradb can not be mounted (status: OPEN)

 after the patch (resource-agents-3.9.5-43)
 ==========================================

[root@tardis-02 ~]# pkill -9 ora_
[root@tardis-02 ~]# pgrep ora_
[root@tardis-02 ~]# sleep 30
[root@tardis-02 ~]# pcs resource
 Resource Group: ora-group
     vip        (ocf::heartbeat:IPaddr2):       Started tardis-02.cluster-qe.lab.eng.brq.redhat.com
     halvm      (ocf::heartbeat:LVM):   Started tardis-02.cluster-qe.lab.eng.brq.redhat.com
     fs (ocf::heartbeat:Filesystem):    Started tardis-02.cluster-qe.lab.eng.brq.redhat.com
     oracle     (ocf::heartbeat:oracle):        Started tardis-02.cluster-qe.lab.eng.brq.redhat.com

-----
> (1) pcs config
[root@tardis-02 ~]# pcs config
Cluster Name: STSRHTS1683
Corosync Nodes:
 tardis-01.cluster-qe.lab.eng.brq.redhat.com tardis-02.cluster-qe.lab.eng.brq.redhat.com
Pacemaker Nodes:
 tardis-01.cluster-qe.lab.eng.brq.redhat.com tardis-02.cluster-qe.lab.eng.brq.redhat.com

Resources:
 Group: ora-group
  Resource: vip (class=ocf provider=heartbeat type=IPaddr2)
   Attributes: ip=10.34.69.58 cidr_netmask=22
   Operations: start interval=0s timeout=20s (vip-start-interval-0s)
               stop interval=0s timeout=20s (vip-stop-interval-0s)
               monitor interval=30s (vip-monitor-interval-30s)
  Resource: halvm (class=ocf provider=heartbeat type=LVM)
   Attributes: exclusive=true partial_activation=false volgrpname=shared
   Operations: start interval=0s timeout=30 (halvm-start-interval-0s)
               stop interval=0s timeout=30 (halvm-stop-interval-0s)
               monitor interval=10 timeout=30 (halvm-monitor-interval-10)
  Resource: fs (class=ocf provider=heartbeat type=Filesystem)
   Attributes: device=/dev/shared/shared0 directory=/u01 fstype=ext4 options=
   Operations: start interval=0s timeout=60 (fs-start-interval-0s)
               stop interval=0s timeout=60 (fs-stop-interval-0s)
               monitor interval=30s (fs-monitor-interval-30s)
  Resource: oracle (class=ocf provider=heartbeat type=oracle)
   Attributes: sid=oradb
   Operations: start interval=0s timeout=120 (oracle-start-interval-0s)
               stop interval=0s timeout=120 (oracle-stop-interval-0s)
               monitor interval=30s (oracle-monitor-interval-30s)

Stonith Devices:
 Resource: fence-tardis-01 (class=stonith type=fence_ipmilan)
  Attributes: delay=5 passwd=admin login=admin pcmk_host_check=static-list ipaddr=tardis-01-ilo pcmk_host_list=tardis-01.cluster-qe.lab.eng.brq.redhat.com
  Operations: monitor interval=60s (fence-tardis-01-monitor-interval-60s)
 Resource: fence-tardis-02 (class=stonith type=fence_ipmilan)
  Attributes: passwd=admin login=admin pcmk_host_check=static-list ipaddr=tardis-02-ilo pcmk_host_list=tardis-02.cluster-qe.lab.eng.brq.redhat.com
  Operations: monitor interval=60s (fence-tardis-02-monitor-interval-60s)
Fencing Levels:

Location Constraints:
Ordering Constraints:
Colocation Constraints:
Ticket Constraints:

Alerts:
 No alerts defined

Resources Defaults:
 No defaults set
Operations Defaults:
 No defaults set

Cluster Properties:
 cluster-infrastructure: cman
 dc-version: 1.1.15-4.el6-e174ec8
 have-watchdog: false
 no-quorum-policy: ignore

 > (2) pcs resource
[root@tardis-02 ~]# pcs resource
 Resource Group: ora-group
     vip        (ocf::heartbeat:IPaddr2):       Started tardis-02.cluster-qe.lab.eng.brq.redhat.com
     halvm      (ocf::heartbeat:LVM):   Started tardis-02.cluster-qe.lab.eng.brq.redhat.com
     fs (ocf::heartbeat:Filesystem):    Started tardis-02.cluster-qe.lab.eng.brq.redhat.com
     oracle     (ocf::heartbeat:oracle):        Started tardis-02.cluster-qe.lab.eng.brq.redhat.com

Comment 11 errata-xmlrpc 2017-03-21 09:27:21 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2017-0602.html

Note You need to log in before you can comment on or make changes to this bug.