Bug 1318240
Summary: | Oracle resource agent unable to start because of ORA-01081 | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | michal novacek <mnovacek> | ||||||
Component: | resource-agents | Assignee: | Oyvind Albrigtsen <oalbrigt> | ||||||
Status: | CLOSED ERRATA | QA Contact: | cluster-qe <cluster-qe> | ||||||
Severity: | unspecified | Docs Contact: | |||||||
Priority: | unspecified | ||||||||
Version: | 6.7 | CC: | agk, cfeist, cluster-maint, fdinitto, tlavigne | ||||||
Target Milestone: | rc | ||||||||
Target Release: | --- | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | resource-agents-3.9.5-43.el6 | Doc Type: | If docs needed, set a value | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | |||||||||
: | 1318985 (view as bug list) | Environment: | |||||||
Last Closed: | 2017-03-21 09:27:21 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 1318985 | ||||||||
Attachments: |
|
Created attachment 1137726 [details]
This patch seems to fix the issue.
Comment on attachment 1137726 [details] This patch seems to fix the issue. Tested and verified patch available upstream: https://github.com/ClusterLabs/resource-agents/pull/783 This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune with any questions
I have verified that oracle resource agent is _always_ able to start with no
ORA-01081 error after its processes have been forcefully terminated in
resource-agents-3.9.5-43.el6.
-----
common setup: running cluster configured with oracle group (1) with the group
started (2)
before the patch (resource-agents-3.9.5-34)
===========================================
[root@tardis-02 ~]# pkill -9 ora_
[root@tardis-02 ~]# pgrep ora_
[root@tardis-02 ~]# sleep 30
[root@tardis-02 ~]# pcs resource
Resource Group: ora-group
vip (ocf::heartbeat:IPaddr2): Started tardis-02.cluster-qe.lab.eng.brq.redhat.com
halvm (ocf::heartbeat:LVM): Started tardis-02.cluster-qe.lab.eng.brq.redhat.com
fs (ocf::heartbeat:Filesystem): Started tardis-02.cluster-qe.lab.eng.brq.redhat.com
oracle (ocf::heartbeat:oracle): Stopped
[root@tardis-02 ~]# pcs resource debug-start oracle
Error performing operation: Operation not permitted
Operation start for oracle (ocf:heartbeat:oracle) returned 1
> stderr: INFO: ORA-01081 error found, trying to cleanup oracle (dbstart_mount output: ORA-01081: cannot start already-running ORACLE - shut it down first)
> stderr: ERROR: oracle oradb can not be mounted (status: OPEN)
after the patch (resource-agents-3.9.5-43)
==========================================
[root@tardis-02 ~]# pkill -9 ora_
[root@tardis-02 ~]# pgrep ora_
[root@tardis-02 ~]# sleep 30
[root@tardis-02 ~]# pcs resource
Resource Group: ora-group
vip (ocf::heartbeat:IPaddr2): Started tardis-02.cluster-qe.lab.eng.brq.redhat.com
halvm (ocf::heartbeat:LVM): Started tardis-02.cluster-qe.lab.eng.brq.redhat.com
fs (ocf::heartbeat:Filesystem): Started tardis-02.cluster-qe.lab.eng.brq.redhat.com
oracle (ocf::heartbeat:oracle): Started tardis-02.cluster-qe.lab.eng.brq.redhat.com
-----
> (1) pcs config
[root@tardis-02 ~]# pcs config
Cluster Name: STSRHTS1683
Corosync Nodes:
tardis-01.cluster-qe.lab.eng.brq.redhat.com tardis-02.cluster-qe.lab.eng.brq.redhat.com
Pacemaker Nodes:
tardis-01.cluster-qe.lab.eng.brq.redhat.com tardis-02.cluster-qe.lab.eng.brq.redhat.com
Resources:
Group: ora-group
Resource: vip (class=ocf provider=heartbeat type=IPaddr2)
Attributes: ip=10.34.69.58 cidr_netmask=22
Operations: start interval=0s timeout=20s (vip-start-interval-0s)
stop interval=0s timeout=20s (vip-stop-interval-0s)
monitor interval=30s (vip-monitor-interval-30s)
Resource: halvm (class=ocf provider=heartbeat type=LVM)
Attributes: exclusive=true partial_activation=false volgrpname=shared
Operations: start interval=0s timeout=30 (halvm-start-interval-0s)
stop interval=0s timeout=30 (halvm-stop-interval-0s)
monitor interval=10 timeout=30 (halvm-monitor-interval-10)
Resource: fs (class=ocf provider=heartbeat type=Filesystem)
Attributes: device=/dev/shared/shared0 directory=/u01 fstype=ext4 options=
Operations: start interval=0s timeout=60 (fs-start-interval-0s)
stop interval=0s timeout=60 (fs-stop-interval-0s)
monitor interval=30s (fs-monitor-interval-30s)
Resource: oracle (class=ocf provider=heartbeat type=oracle)
Attributes: sid=oradb
Operations: start interval=0s timeout=120 (oracle-start-interval-0s)
stop interval=0s timeout=120 (oracle-stop-interval-0s)
monitor interval=30s (oracle-monitor-interval-30s)
Stonith Devices:
Resource: fence-tardis-01 (class=stonith type=fence_ipmilan)
Attributes: delay=5 passwd=admin login=admin pcmk_host_check=static-list ipaddr=tardis-01-ilo pcmk_host_list=tardis-01.cluster-qe.lab.eng.brq.redhat.com
Operations: monitor interval=60s (fence-tardis-01-monitor-interval-60s)
Resource: fence-tardis-02 (class=stonith type=fence_ipmilan)
Attributes: passwd=admin login=admin pcmk_host_check=static-list ipaddr=tardis-02-ilo pcmk_host_list=tardis-02.cluster-qe.lab.eng.brq.redhat.com
Operations: monitor interval=60s (fence-tardis-02-monitor-interval-60s)
Fencing Levels:
Location Constraints:
Ordering Constraints:
Colocation Constraints:
Ticket Constraints:
Alerts:
No alerts defined
Resources Defaults:
No defaults set
Operations Defaults:
No defaults set
Cluster Properties:
cluster-infrastructure: cman
dc-version: 1.1.15-4.el6-e174ec8
have-watchdog: false
no-quorum-policy: ignore
> (2) pcs resource
[root@tardis-02 ~]# pcs resource
Resource Group: ora-group
vip (ocf::heartbeat:IPaddr2): Started tardis-02.cluster-qe.lab.eng.brq.redhat.com
halvm (ocf::heartbeat:LVM): Started tardis-02.cluster-qe.lab.eng.brq.redhat.com
fs (ocf::heartbeat:Filesystem): Started tardis-02.cluster-qe.lab.eng.brq.redhat.com
oracle (ocf::heartbeat:oracle): Started tardis-02.cluster-qe.lab.eng.brq.redhat.com
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2017-0602.html |
Created attachment 1136972 [details] 'pcs cluster report' output Description of problem: It might happen that oracle cannot be started by resource agent with the following error: INFO: ORA-01081 error found, trying to cleanup oracle (dbstart_mount output: ORA-01081: cannot start already-running ORACLE - shut it down first) Version-Release number of selected component (if applicable): resource-agents-3.9.5-34.el6.x86_64 How reproducible: always, once in that state Steps to Reproduce: 1. pcs resource debug-start oracle Actual results: Oracle will never start. Expected results: Oracle starts happily. Additional info: I believe that this happens with recovery tests after all ora_* processes have been killed and resource agent tried to start Oracle on another node. To enable resource agent to start it again it is necessary to issue 'shutdown immediate;' in sqlplus. This is how the problem demonstrates itself: # pcs resource debug-start oracle Error performing operation: Operation not permitted Operation start for oracle (ocf:heartbeat:oracle) returned 1 > stderr: INFO: ORA-01081 error found, trying to cleanup oracle (dbstart_mount output: ORA-01081: cannot start already-running ORACLE - shut it down first) > stderr: ls: cannot access /u01/app/oracle/product/12.1.0/dbhome_1/dbs/lk*: No such file or directory > stderr: ERROR: oracle oradb can not be mounted (status: OPEN)