Bug 1516180
Summary: | db2 resource agent fails to promote Slave when Master has crashed/failed abruptly | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Ondrej Faměra <ofamera> | ||||||
Component: | resource-agents | Assignee: | Oyvind Albrigtsen <oalbrigt> | ||||||
Status: | CLOSED ERRATA | QA Contact: | cluster-qe <cluster-qe> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | high | ||||||||
Version: | 7.4 | CC: | agk, aherr, cfeist, cluster-maint, fdinitto, jruemker, mmuzikov, mnovacek, qguo, sbradley | ||||||
Target Milestone: | rc | Keywords: | ZStream | ||||||
Target Release: | --- | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | resource-agents-3.9.5-115.el7 | Doc Type: | If docs needed, set a value | ||||||
Doc Text: |
Previously, the DB2 resource agent failed to promote a Slave resource to Master. As a consequence, automatic failover of the Master resource was not possible. With this update, additional keywords for disconnected peer have been added to the agent. As a result, the DB2 agent is able to detect the resource state correctly, and the described problem no longer occurs.
|
Story Points: | --- | ||||||
Clone Of: | |||||||||
: | 1521019 (view as bug list) | Environment: | |||||||
Last Closed: | 2018-04-10 12:09:28 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 1521019 | ||||||||
Attachments: |
|
Description
Ondrej Faměra
2017-11-22 08:36:21 UTC
Created attachment 1357296 [details]
extra debugging seen in outputs from description
== Additional information: DB2 11.1 on same RHEL 7.4 experience the same issue. Note that original package at least shows now the first error message that it failed with. Still the code relies on the second for loop to get the takeover with peer window and that works with patched package. === original package - promote fails Nov 22 13:56:41 [1310] fastvm-rhel-7-4-164 lrmd: info: log_execute: executing - rsc:DB2_HADR action:promote call_id:26 Nov 22 13:56:45 db2(DB2_HADR)[4776]: INFO: DB2 database db2inst1(0)/sample has HADR status STANDBY/PEER/CONNECTED and will be promoted Nov 22 13:56:46 [1308] fastvm-rhel-7-4-164 cib: info: cib_process_ping: Reporting our current digest to fastvm-rhel-7-4-164: b94a9db433150f5a8109e2fe6bc33eba for 0.10.90 (0x5574388c19f0 0) Nov 22 13:56:49 db2(DB2_HADR)[4776]: ERROR: DB2 database db2inst1(0)/sample promote failed: SQL1770N Takeover HADR cannot complete. Reason code = "1". Nov 22 13:56:49 [1310] fastvm-rhel-7-4-164 lrmd: info: log_finished: finished - rsc:DB2_HADR action:promote call_id:26 pid:4776 exit-code:1 exec-time:7622ms queue-time:0ms Nov 22 13:56:49 [1314] fastvm-rhel-7-4-164 crmd: notice: process_lrm_event: Result of promote operation for DB2_HADR on fastvm-rhel-7-4-164: 1 (unknown error) | call=26 key=DB2_HADR_promote_0 confirmed=true cib-update=47 === updated package with patch proposed to upstream - promote works Nov 22 14:03:11 [1310] fastvm-rhel-7-4-164 lrmd: info: log_execute: executing - rsc:DB2_HADR action:promote call_id:68 Nov 22 14:03:11 db2(DB2_HADR)[14174]: INFO: DB2 database db2inst1(0)/sample has HADR status STANDBY/PEER/CONNECTED and will be promoted Nov 22 14:03:15 [1308] fastvm-rhel-7-4-164 cib: info: cib_process_ping: Reporting our current digest to fastvm-rhel-7-4-164: 193e41f6eded9a5c61cdcc23d4c8f925 for 0.10.151 (0x55743851a0d0 0) Nov 22 14:03:28 db2(DB2_HADR)[14174]: INFO: DB2 database db2inst1(0)/sample has HADR status STANDBY/DISCONNECTED_PEER/DISCONNECTED and will be promoted Nov 22 14:03:40 db2(DB2_HADR)[14174]: INFO: DB20000I The ARCHIVE LOG command completed successfully. Nov 22 14:03:40 [1310] fastvm-rhel-7-4-164 lrmd: info: log_finished: finished - rsc:DB2_HADR action:promote call_id:68 pid:14174 exit-code:0 exec-time:29627ms queue-time:0ms Marking Verified as SanityOnly in version resource-agents-3.9.5-119.el7. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:0757 |