Bug 1318240

Summary:

Oracle resource agent unable to start because of ORA-01081

Product:

Red Hat Enterprise Linux 6

Reporter:

michal novacek <mnovacek>

Component:

resource-agents

Assignee:

Oyvind Albrigtsen <oalbrigt>

Status:

CLOSED ERRATA

QA Contact:

cluster-qe <cluster-qe>

Severity:

unspecified

Docs Contact:

Priority:

unspecified

Version:

6.7

CC:

agk, cfeist, cluster-maint, fdinitto, tlavigne

Target Milestone:

Target Release:

---

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

resource-agents-3.9.5-43.el6

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Clones:

1318985 (view as bug list)

Environment:

Last Closed:

2017-03-21 09:27:21 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

1318985

Attachments:

Description	Flags
'pcs cluster report' output	none
This patch seems to fix the issue.	none

Description michal novacek 2016-03-16 10:45:01 UTC

Created attachment 1136972 [details]
'pcs cluster report' output

Description of problem:
It might happen that oracle cannot be started by resource agent with the following error:
INFO: ORA-01081 error found, trying to cleanup oracle (dbstart_mount output:
ORA-01081: cannot start already-running ORACLE - shut it down first)

Version-Release number of selected component (if applicable):
resource-agents-3.9.5-34.el6.x86_64

How reproducible: always, once in that state

Steps to Reproduce:
1. pcs resource debug-start oracle

Actual results: Oracle will never start.

Expected results: Oracle starts happily.

Additional info:

I believe that this happens with recovery tests after all ora_* processes have
been killed and resource agent tried to start Oracle on another node.

To enable resource agent to start it again it is necessary to issue 'shutdown
immediate;' in sqlplus.

This is how the problem demonstrates itself:

# pcs resource debug-start oracle
Error performing operation: Operation not permitted
Operation start for oracle (ocf:heartbeat:oracle) returned 1
 >  stderr: INFO: ORA-01081 error found, trying to cleanup oracle (dbstart_mount output: ORA-01081: cannot start already-running ORACLE - shut it down first)
 >  stderr: ls: cannot access /u01/app/oracle/product/12.1.0/dbhome_1/dbs/lk*: No such file or directory
 >  stderr: ERROR: oracle oradb can not be mounted (status: OPEN)

Comment 2 michal novacek 2016-03-18 09:28:00 UTC

Created attachment 1137726 [details]
This patch seems to fix the issue.

Comment 3 Oyvind Albrigtsen 2016-03-18 10:18:16 UTC

Comment on attachment 1137726 [details]
This patch seems to fix the issue.

Tested and verified patch available upstream: https://github.com/ClusterLabs/resource-agents/pull/783

Comment 4 Mike McCune 2016-03-28 23:14:23 UTC

This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune with any questions

Comment 9 michal novacek 2017-01-23 15:56:54 UTC

I have verified that oracle resource agent is _always_ able to start with no
ORA-01081 error after its processes have been forcefully terminated in
resource-agents-3.9.5-43.el6.

-----

common setup: running cluster configured with oracle group (1) with the group
started (2)

before the patch (resource-agents-3.9.5-34)
===========================================

[root@tardis-02 ~]# pkill -9 ora_
[root@tardis-02 ~]# pgrep ora_
[root@tardis-02 ~]# sleep 30
[root@tardis-02 ~]# pcs resource
 Resource Group: ora-group
     vip        (ocf::heartbeat:IPaddr2):       Started tardis-02.cluster-qe.lab.eng.brq.redhat.com
     halvm      (ocf::heartbeat:LVM):   Started tardis-02.cluster-qe.lab.eng.brq.redhat.com
     fs (ocf::heartbeat:Filesystem):    Started tardis-02.cluster-qe.lab.eng.brq.redhat.com
     oracle     (ocf::heartbeat:oracle):        Stopped

[root@tardis-02 ~]# pcs resource debug-start oracle
Error performing operation: Operation not permitted
Operation start for oracle (ocf:heartbeat:oracle) returned 1
 >  stderr: INFO: ORA-01081 error found, trying to cleanup oracle (dbstart_mount output: ORA-01081: cannot start already-running ORACLE - shut it down first)
 >  stderr: ERROR: oracle oradb can not be mounted (status: OPEN)

 after the patch (resource-agents-3.9.5-43)
 ==========================================

[root@tardis-02 ~]# pkill -9 ora_
[root@tardis-02 ~]# pgrep ora_
[root@tardis-02 ~]# sleep 30
[root@tardis-02 ~]# pcs resource
 Resource Group: ora-group
     vip        (ocf::heartbeat:IPaddr2):       Started tardis-02.cluster-qe.lab.eng.brq.redhat.com
     halvm      (ocf::heartbeat:LVM):   Started tardis-02.cluster-qe.lab.eng.brq.redhat.com
     fs (ocf::heartbeat:Filesystem):    Started tardis-02.cluster-qe.lab.eng.brq.redhat.com
     oracle     (ocf::heartbeat:oracle):        Started tardis-02.cluster-qe.lab.eng.brq.redhat.com

-----
> (1) pcs config
[root@tardis-02 ~]# pcs config
Cluster Name: STSRHTS1683
Corosync Nodes:
 tardis-01.cluster-qe.lab.eng.brq.redhat.com tardis-02.cluster-qe.lab.eng.brq.redhat.com
Pacemaker Nodes:
 tardis-01.cluster-qe.lab.eng.brq.redhat.com tardis-02.cluster-qe.lab.eng.brq.redhat.com

Resources:
 Group: ora-group
  Resource: vip (class=ocf provider=heartbeat type=IPaddr2)
   Attributes: ip=10.34.69.58 cidr_netmask=22
   Operations: start interval=0s timeout=20s (vip-start-interval-0s)
               stop interval=0s timeout=20s (vip-stop-interval-0s)
               monitor interval=30s (vip-monitor-interval-30s)
  Resource: halvm (class=ocf provider=heartbeat type=LVM)
   Attributes: exclusive=true partial_activation=false volgrpname=shared
   Operations: start interval=0s timeout=30 (halvm-start-interval-0s)
               stop interval=0s timeout=30 (halvm-stop-interval-0s)
               monitor interval=10 timeout=30 (halvm-monitor-interval-10)
  Resource: fs (class=ocf provider=heartbeat type=Filesystem)
   Attributes: device=/dev/shared/shared0 directory=/u01 fstype=ext4 options=
   Operations: start interval=0s timeout=60 (fs-start-interval-0s)
               stop interval=0s timeout=60 (fs-stop-interval-0s)
               monitor interval=30s (fs-monitor-interval-30s)
  Resource: oracle (class=ocf provider=heartbeat type=oracle)
   Attributes: sid=oradb
   Operations: start interval=0s timeout=120 (oracle-start-interval-0s)
               stop interval=0s timeout=120 (oracle-stop-interval-0s)
               monitor interval=30s (oracle-monitor-interval-30s)

Stonith Devices:
 Resource: fence-tardis-01 (class=stonith type=fence_ipmilan)
  Attributes: delay=5 passwd=admin login=admin pcmk_host_check=static-list ipaddr=tardis-01-ilo pcmk_host_list=tardis-01.cluster-qe.lab.eng.brq.redhat.com
  Operations: monitor interval=60s (fence-tardis-01-monitor-interval-60s)
 Resource: fence-tardis-02 (class=stonith type=fence_ipmilan)
  Attributes: passwd=admin login=admin pcmk_host_check=static-list ipaddr=tardis-02-ilo pcmk_host_list=tardis-02.cluster-qe.lab.eng.brq.redhat.com
  Operations: monitor interval=60s (fence-tardis-02-monitor-interval-60s)
Fencing Levels:

Location Constraints:
Ordering Constraints:
Colocation Constraints:
Ticket Constraints:

Alerts:
 No alerts defined

Resources Defaults:
 No defaults set
Operations Defaults:
 No defaults set

Cluster Properties:
 cluster-infrastructure: cman
 dc-version: 1.1.15-4.el6-e174ec8
 have-watchdog: false
 no-quorum-policy: ignore

 > (2) pcs resource
[root@tardis-02 ~]# pcs resource
 Resource Group: ora-group
     vip        (ocf::heartbeat:IPaddr2):       Started tardis-02.cluster-qe.lab.eng.brq.redhat.com
     halvm      (ocf::heartbeat:LVM):   Started tardis-02.cluster-qe.lab.eng.brq.redhat.com
     fs (ocf::heartbeat:Filesystem):    Started tardis-02.cluster-qe.lab.eng.brq.redhat.com
     oracle     (ocf::heartbeat:oracle):        Started tardis-02.cluster-qe.lab.eng.brq.redhat.com

Comment 11 errata-xmlrpc 2017-03-21 09:27:21 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2017-0602.html