Bug 1318240

Summary: Oracle resource agent unable to start because of ORA-01081
Product: Red Hat Enterprise Linux 6 Reporter: michal novacek <mnovacek>
Component: resource-agentsAssignee: Oyvind Albrigtsen <oalbrigt>
Status: CLOSED ERRATA QA Contact: cluster-qe <cluster-qe>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 6.7CC: agk, cfeist, cluster-maint, fdinitto, tlavigne
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: resource-agents-3.9.5-43.el6 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1318985 (view as bug list) Environment:
Last Closed: 2017-03-21 09:27:21 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1318985    
Attachments:
Description Flags
'pcs cluster report' output
none
This patch seems to fix the issue. none

Description michal novacek 2016-03-16 10:45:01 UTC
Created attachment 1136972 [details]
'pcs cluster report' output

Description of problem:
It might happen that oracle cannot be started by resource agent with the following error:
INFO: ORA-01081 error found, trying to cleanup oracle (dbstart_mount output:
ORA-01081: cannot start already-running ORACLE - shut it down first)

Version-Release number of selected component (if applicable):
resource-agents-3.9.5-34.el6.x86_64

How reproducible: always, once in that state

Steps to Reproduce:
1. pcs resource debug-start oracle

Actual results: Oracle will never start.

Expected results: Oracle starts happily.

Additional info:

I believe that this happens with recovery tests after all ora_* processes have
been killed and resource agent tried to start Oracle on another node.

To enable resource agent to start it again it is necessary to issue 'shutdown
immediate;' in sqlplus.

This is how the problem demonstrates itself:

# pcs resource debug-start oracle
Error performing operation: Operation not permitted
Operation start for oracle (ocf:heartbeat:oracle) returned 1
 >  stderr: INFO: ORA-01081 error found, trying to cleanup oracle (dbstart_mount output: ORA-01081: cannot start already-running ORACLE - shut it down first)
 >  stderr: ls: cannot access /u01/app/oracle/product/12.1.0/dbhome_1/dbs/lk*: No such file or directory
 >  stderr: ERROR: oracle oradb can not be mounted (status: OPEN)

Comment 2 michal novacek 2016-03-18 09:28:00 UTC
Created attachment 1137726 [details]
This patch seems to fix the issue.

Comment 3 Oyvind Albrigtsen 2016-03-18 10:18:16 UTC
Comment on attachment 1137726 [details]
This patch seems to fix the issue.

Tested and verified patch available upstream: https://github.com/ClusterLabs/resource-agents/pull/783

Comment 4 Mike McCune 2016-03-28 23:14:23 UTC
This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune with any questions

Comment 9 michal novacek 2017-01-23 15:56:54 UTC
I have verified that oracle resource agent is _always_ able to start with no
ORA-01081 error after its processes have been forcefully terminated in
resource-agents-3.9.5-43.el6.

-----

common setup: running cluster configured with oracle group (1) with the group
started (2)

before the patch (resource-agents-3.9.5-34)
===========================================

[root@tardis-02 ~]# pkill -9 ora_
[root@tardis-02 ~]# pgrep ora_
[root@tardis-02 ~]# sleep 30
[root@tardis-02 ~]# pcs resource
 Resource Group: ora-group
     vip        (ocf::heartbeat:IPaddr2):       Started tardis-02.cluster-qe.lab.eng.brq.redhat.com
     halvm      (ocf::heartbeat:LVM):   Started tardis-02.cluster-qe.lab.eng.brq.redhat.com
     fs (ocf::heartbeat:Filesystem):    Started tardis-02.cluster-qe.lab.eng.brq.redhat.com
     oracle     (ocf::heartbeat:oracle):        Stopped

[root@tardis-02 ~]# pcs resource debug-start oracle
Error performing operation: Operation not permitted
Operation start for oracle (ocf:heartbeat:oracle) returned 1
 >  stderr: INFO: ORA-01081 error found, trying to cleanup oracle (dbstart_mount output: ORA-01081: cannot start already-running ORACLE - shut it down first)
 >  stderr: ERROR: oracle oradb can not be mounted (status: OPEN)

 after the patch (resource-agents-3.9.5-43)
 ==========================================

[root@tardis-02 ~]# pkill -9 ora_
[root@tardis-02 ~]# pgrep ora_
[root@tardis-02 ~]# sleep 30
[root@tardis-02 ~]# pcs resource
 Resource Group: ora-group
     vip        (ocf::heartbeat:IPaddr2):       Started tardis-02.cluster-qe.lab.eng.brq.redhat.com
     halvm      (ocf::heartbeat:LVM):   Started tardis-02.cluster-qe.lab.eng.brq.redhat.com
     fs (ocf::heartbeat:Filesystem):    Started tardis-02.cluster-qe.lab.eng.brq.redhat.com
     oracle     (ocf::heartbeat:oracle):        Started tardis-02.cluster-qe.lab.eng.brq.redhat.com

-----
> (1) pcs config
[root@tardis-02 ~]# pcs config
Cluster Name: STSRHTS1683
Corosync Nodes:
 tardis-01.cluster-qe.lab.eng.brq.redhat.com tardis-02.cluster-qe.lab.eng.brq.redhat.com
Pacemaker Nodes:
 tardis-01.cluster-qe.lab.eng.brq.redhat.com tardis-02.cluster-qe.lab.eng.brq.redhat.com

Resources:
 Group: ora-group
  Resource: vip (class=ocf provider=heartbeat type=IPaddr2)
   Attributes: ip=10.34.69.58 cidr_netmask=22
   Operations: start interval=0s timeout=20s (vip-start-interval-0s)
               stop interval=0s timeout=20s (vip-stop-interval-0s)
               monitor interval=30s (vip-monitor-interval-30s)
  Resource: halvm (class=ocf provider=heartbeat type=LVM)
   Attributes: exclusive=true partial_activation=false volgrpname=shared
   Operations: start interval=0s timeout=30 (halvm-start-interval-0s)
               stop interval=0s timeout=30 (halvm-stop-interval-0s)
               monitor interval=10 timeout=30 (halvm-monitor-interval-10)
  Resource: fs (class=ocf provider=heartbeat type=Filesystem)
   Attributes: device=/dev/shared/shared0 directory=/u01 fstype=ext4 options=
   Operations: start interval=0s timeout=60 (fs-start-interval-0s)
               stop interval=0s timeout=60 (fs-stop-interval-0s)
               monitor interval=30s (fs-monitor-interval-30s)
  Resource: oracle (class=ocf provider=heartbeat type=oracle)
   Attributes: sid=oradb
   Operations: start interval=0s timeout=120 (oracle-start-interval-0s)
               stop interval=0s timeout=120 (oracle-stop-interval-0s)
               monitor interval=30s (oracle-monitor-interval-30s)

Stonith Devices:
 Resource: fence-tardis-01 (class=stonith type=fence_ipmilan)
  Attributes: delay=5 passwd=admin login=admin pcmk_host_check=static-list ipaddr=tardis-01-ilo pcmk_host_list=tardis-01.cluster-qe.lab.eng.brq.redhat.com
  Operations: monitor interval=60s (fence-tardis-01-monitor-interval-60s)
 Resource: fence-tardis-02 (class=stonith type=fence_ipmilan)
  Attributes: passwd=admin login=admin pcmk_host_check=static-list ipaddr=tardis-02-ilo pcmk_host_list=tardis-02.cluster-qe.lab.eng.brq.redhat.com
  Operations: monitor interval=60s (fence-tardis-02-monitor-interval-60s)
Fencing Levels:

Location Constraints:
Ordering Constraints:
Colocation Constraints:
Ticket Constraints:

Alerts:
 No alerts defined

Resources Defaults:
 No defaults set
Operations Defaults:
 No defaults set

Cluster Properties:
 cluster-infrastructure: cman
 dc-version: 1.1.15-4.el6-e174ec8
 have-watchdog: false
 no-quorum-policy: ignore

 > (2) pcs resource
[root@tardis-02 ~]# pcs resource
 Resource Group: ora-group
     vip        (ocf::heartbeat:IPaddr2):       Started tardis-02.cluster-qe.lab.eng.brq.redhat.com
     halvm      (ocf::heartbeat:LVM):   Started tardis-02.cluster-qe.lab.eng.brq.redhat.com
     fs (ocf::heartbeat:Filesystem):    Started tardis-02.cluster-qe.lab.eng.brq.redhat.com
     oracle     (ocf::heartbeat:oracle):        Started tardis-02.cluster-qe.lab.eng.brq.redhat.com

Comment 11 errata-xmlrpc 2017-03-21 09:27:21 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2017-0602.html