Hide Forgot
Description of problem: oracledb and orainstance implements oracle instance and listener recovery retries inside the get_db_status. In case instance is started fine, status returns 0. This can lead a restart loop in scenarios where oracle dies after beint started (or resource thinks it was successfully started when it actually failed) for some reason: 1. Oracle instance is running fine. 2. Oracle dies. 3. oracledb.sh or oracleinstance.sh status detects it and restarts it. 4. Oracle dies again 5. oracledb.sh or oracleinstance.sh status detects it and restarts it. 6. no end.... As oracledb has not a maximum number or restarts or something like that. I think, instance recovery should be managed by rgmanager which provides a mechanism to limit the maximum restarts in a period of time with max_restart and restart_expire_time instead of inside oracle status check. The number or retries inside the resource is fixed by variable RESTART_RETRIES, so recovery can be disabled setting it to 0. It could be set from cluster.conf through an oracledb or orainstance parameter. Version-Release number of selected component (if applicable): rgmanager-2.0.52-9.el5_6.1 How reproducible: Whenever Oracle dies for a reason just Steps to Reproduce: 1. Get a cluster running an oracle instance. 2. stop oracle right after is started by resource Actual results: Oracle instance is started in the same node even if it repeatedly dies right after. Expected results: Service recovery policy should be applied to oracle resource failures. Additional info:
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux release for currently deployed products. This request is not yet committed for inclusion in a release.
You can have per-resource policy today: <orainstance __max_restarts="3" __restart_expire_time="3600" __independent_subtree="1" ... /> Use __independent_subtree="2" if you want it to be allowed to gracefully fail without taking out other resources.