Hide Forgot
Created attachment 1136972 [details] 'pcs cluster report' output Description of problem: It might happen that oracle cannot be started by resource agent with the following error: INFO: ORA-01081 error found, trying to cleanup oracle (dbstart_mount output: ORA-01081: cannot start already-running ORACLE - shut it down first) Version-Release number of selected component (if applicable): resource-agents-3.9.5-34.el6.x86_64 How reproducible: always, once in that state Steps to Reproduce: 1. pcs resource debug-start oracle Actual results: Oracle will never start. Expected results: Oracle starts happily. Additional info: I believe that this happens with recovery tests after all ora_* processes have been killed and resource agent tried to start Oracle on another node. To enable resource agent to start it again it is necessary to issue 'shutdown immediate;' in sqlplus. This is how the problem demonstrates itself: # pcs resource debug-start oracle Error performing operation: Operation not permitted Operation start for oracle (ocf:heartbeat:oracle) returned 1 > stderr: INFO: ORA-01081 error found, trying to cleanup oracle (dbstart_mount output: ORA-01081: cannot start already-running ORACLE - shut it down first) > stderr: ls: cannot access /u01/app/oracle/product/12.1.0/dbhome_1/dbs/lk*: No such file or directory > stderr: ERROR: oracle oradb can not be mounted (status: OPEN)
Created attachment 1137726 [details] This patch seems to fix the issue.
Comment on attachment 1137726 [details] This patch seems to fix the issue. Tested and verified patch available upstream: https://github.com/ClusterLabs/resource-agents/pull/783
This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune with any questions
I have verified that oracle resource agent is _always_ able to start with no ORA-01081 error after its processes have been forcefully terminated in resource-agents-3.9.5-43.el6. ----- common setup: running cluster configured with oracle group (1) with the group started (2) before the patch (resource-agents-3.9.5-34) =========================================== [root@tardis-02 ~]# pkill -9 ora_ [root@tardis-02 ~]# pgrep ora_ [root@tardis-02 ~]# sleep 30 [root@tardis-02 ~]# pcs resource Resource Group: ora-group vip (ocf::heartbeat:IPaddr2): Started tardis-02.cluster-qe.lab.eng.brq.redhat.com halvm (ocf::heartbeat:LVM): Started tardis-02.cluster-qe.lab.eng.brq.redhat.com fs (ocf::heartbeat:Filesystem): Started tardis-02.cluster-qe.lab.eng.brq.redhat.com oracle (ocf::heartbeat:oracle): Stopped [root@tardis-02 ~]# pcs resource debug-start oracle Error performing operation: Operation not permitted Operation start for oracle (ocf:heartbeat:oracle) returned 1 > stderr: INFO: ORA-01081 error found, trying to cleanup oracle (dbstart_mount output: ORA-01081: cannot start already-running ORACLE - shut it down first) > stderr: ERROR: oracle oradb can not be mounted (status: OPEN) after the patch (resource-agents-3.9.5-43) ========================================== [root@tardis-02 ~]# pkill -9 ora_ [root@tardis-02 ~]# pgrep ora_ [root@tardis-02 ~]# sleep 30 [root@tardis-02 ~]# pcs resource Resource Group: ora-group vip (ocf::heartbeat:IPaddr2): Started tardis-02.cluster-qe.lab.eng.brq.redhat.com halvm (ocf::heartbeat:LVM): Started tardis-02.cluster-qe.lab.eng.brq.redhat.com fs (ocf::heartbeat:Filesystem): Started tardis-02.cluster-qe.lab.eng.brq.redhat.com oracle (ocf::heartbeat:oracle): Started tardis-02.cluster-qe.lab.eng.brq.redhat.com ----- > (1) pcs config [root@tardis-02 ~]# pcs config Cluster Name: STSRHTS1683 Corosync Nodes: tardis-01.cluster-qe.lab.eng.brq.redhat.com tardis-02.cluster-qe.lab.eng.brq.redhat.com Pacemaker Nodes: tardis-01.cluster-qe.lab.eng.brq.redhat.com tardis-02.cluster-qe.lab.eng.brq.redhat.com Resources: Group: ora-group Resource: vip (class=ocf provider=heartbeat type=IPaddr2) Attributes: ip=10.34.69.58 cidr_netmask=22 Operations: start interval=0s timeout=20s (vip-start-interval-0s) stop interval=0s timeout=20s (vip-stop-interval-0s) monitor interval=30s (vip-monitor-interval-30s) Resource: halvm (class=ocf provider=heartbeat type=LVM) Attributes: exclusive=true partial_activation=false volgrpname=shared Operations: start interval=0s timeout=30 (halvm-start-interval-0s) stop interval=0s timeout=30 (halvm-stop-interval-0s) monitor interval=10 timeout=30 (halvm-monitor-interval-10) Resource: fs (class=ocf provider=heartbeat type=Filesystem) Attributes: device=/dev/shared/shared0 directory=/u01 fstype=ext4 options= Operations: start interval=0s timeout=60 (fs-start-interval-0s) stop interval=0s timeout=60 (fs-stop-interval-0s) monitor interval=30s (fs-monitor-interval-30s) Resource: oracle (class=ocf provider=heartbeat type=oracle) Attributes: sid=oradb Operations: start interval=0s timeout=120 (oracle-start-interval-0s) stop interval=0s timeout=120 (oracle-stop-interval-0s) monitor interval=30s (oracle-monitor-interval-30s) Stonith Devices: Resource: fence-tardis-01 (class=stonith type=fence_ipmilan) Attributes: delay=5 passwd=admin login=admin pcmk_host_check=static-list ipaddr=tardis-01-ilo pcmk_host_list=tardis-01.cluster-qe.lab.eng.brq.redhat.com Operations: monitor interval=60s (fence-tardis-01-monitor-interval-60s) Resource: fence-tardis-02 (class=stonith type=fence_ipmilan) Attributes: passwd=admin login=admin pcmk_host_check=static-list ipaddr=tardis-02-ilo pcmk_host_list=tardis-02.cluster-qe.lab.eng.brq.redhat.com Operations: monitor interval=60s (fence-tardis-02-monitor-interval-60s) Fencing Levels: Location Constraints: Ordering Constraints: Colocation Constraints: Ticket Constraints: Alerts: No alerts defined Resources Defaults: No defaults set Operations Defaults: No defaults set Cluster Properties: cluster-infrastructure: cman dc-version: 1.1.15-4.el6-e174ec8 have-watchdog: false no-quorum-policy: ignore > (2) pcs resource [root@tardis-02 ~]# pcs resource Resource Group: ora-group vip (ocf::heartbeat:IPaddr2): Started tardis-02.cluster-qe.lab.eng.brq.redhat.com halvm (ocf::heartbeat:LVM): Started tardis-02.cluster-qe.lab.eng.brq.redhat.com fs (ocf::heartbeat:Filesystem): Started tardis-02.cluster-qe.lab.eng.brq.redhat.com oracle (ocf::heartbeat:oracle): Started tardis-02.cluster-qe.lab.eng.brq.redhat.com
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2017-0602.html