Bug 2010084

Summary: VirtualDomain migration run twice and resource failed
Product: Red Hat Enterprise Linux 8 Reporter: tbsky <tbskyd>
Component: pacemakerAssignee: Reid Wahl <nwahl>
Status: CLOSED ERRATA QA Contact: cluster-qe <cluster-qe>
Severity: medium Docs Contact: Steven J. Levine <slevine>
Priority: unspecified    
Version: 8.5CC: cluster-maint, kgaillot, msmazova, nwahl, slevine
Target Milestone: rcKeywords: Reopened, Triaged
Target Release: 8.9Flags: pm-rhel: mirror+
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: pacemaker-2.1.6-1.el8 Doc Type: Bug Fix
Doc Text:
.Pacemaker Designated Controller elections no longer finalized until all pending actions are complete When a cluster elects a new Designated Controller (DC), all nodes send their current history to the new DC, which saves it to the CIB. As a consequence, if actions were already in progress when a new DC is elected, and the actions finish after the nodes send their current history to the new DC, the actions' results could be lost. With this fix, DC elections are not finalized until all pending actions are complete and no action results are lost.
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-11-14 15:32:34 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
migration log at source node
none
migration log at destination node none

Description tbsky 2021-10-03 15:28:52 UTC
Created attachment 1828623 [details]
migration log at source node

Created attachment 1828623 [details]
migration log at source node

Description of problem:

sometimes pacemaker decide to run VirtualDomain live migration twice. the second migration will fail and cause the resource to be stopped.

please check the migration-source.log file logged at node-1. resource kvm-16 start migrate_to at "09:15:32" and success at "09:15:33". but later at "09:15:33" pacemaker want to migrate again. it failed because the vm 'Not active locally'.

check the migration-destination.log logged at node-2. the first migration seems ignored since it is busy doing other things.

maybe the situation is like bug 2000595?


Version-Release number of selected component (if applicable):
pacemaker-2.1.0-8.el8.x86_64


How reproducible:
sometimes

Steps to Reproduce:
1. create vm and migrate them at different nodes concurrently for testing
2. sometimes the migration will fail
3.

Actual results:
migration sometimes fail

Expected results:
migration always success

Additional info:

Comment 1 tbsky 2021-10-03 15:37:33 UTC
Created attachment 1828624 [details]
migration log at destination node

Comment 2 Ken Gaillot 2021-10-04 14:36:09 UTC
*** Bug 2010086 has been marked as a duplicate of this bug. ***

Comment 3 Ken Gaillot 2021-10-04 15:32:56 UTC
(In reply to tbsky from comment #0)
> please check the migration-source.log file logged at node-1. resource kvm-16
> start migrate_to at "09:15:32" and success at "09:15:33". but later at
> "09:15:33" pacemaker want to migrate again. it failed because the vm 'Not
> active locally'.
> 
> check the migration-destination.log logged at node-2. the first migration
> seems ignored since it is busy doing other things.
> 
> maybe the situation is like bug 2000595?

The situation has the same trigger, the DC election is being repeated frequently, which causes each node to re-record its resource history in the CIB status section. It's a timing issue that goes like:

Oct 02 09:15:32 love-1-test.example.com pacemaker-controld  [3325405] (te_rsc_command)  notice: Initiating migrate_to operation kvm-16_migrate_to_0 on love-2-test.example.com | action 41

^ The first migration is started


Oct 02 09:15:32 love-1-test.example.com pacemaker-based     [3325400] (cib_process_request)     info: Completed cib_replace operation for section configuration: OK (rc=0, origin=love-1-test.example.com/cibadmin/2, version=0.29071.0)
...
Oct 02 09:15:32 love-1-test.example.com pacemaker-controld  [3325405] (do_state_transition)     info: State transition S_TRANSITION_ENGINE -> S_ELECTION | input=I_ELECTION cause=C_FSA_INTERNAL origin=do_cib_replaced

^ Something modifies the configuration with a full replacement (likely a pcs command, I'm guessing in a script or cron somewhere), which triggers a new DC election.


Oct 02 09:15:33 love-1-test.example.com pacemaker-controld  [3325405] (process_graph_event)     info: Transition 14127 action 41 (kvm-16_migrate_to_0 on love-2-test.example.com) confirmed: ok | rc=0 call-id=19369

^ The migration completes successfully and is recorded in the resource history


Oct 02 09:15:33 love-1-test.example.com pacemaker-controld  [3325405] (controld_delete_node_state)      info: Deleting resource history for node love-2-test.example.com (via CIB call 142828) | xpath=//node_state[@uname='love-2-test.example.com']/lrm

^ However the node's resource history is immediately erased due to the election


Oct 02 09:15:33 love-1-test.example.com pacemaker-based     [3325400] (cib_perform_op)  info: ++                                       <lrm_resource id="kvm-16" type="VirtualDomain" class="ocf" provider="heartbeat">
Oct 02 09:15:33 love-1-test.example.com pacemaker-based     [3325400] (cib_perform_op)  info: ++                                         <lrm_rsc_op id="kvm-16_last_failure_0" operation_key="kvm-16_monitor_0" operation="monitor" crm-debug-origin="build_active_RAs" crm_feature_set="3.11.0" transition-key="33:14062:7:4a7f8335-f28c-4644-9e9d-fa408252ce19" transition-magic="0:0;33:14062:7:4a7f8335-f28c-4644-9e9d-fa408252ce19" exit-reason="" on_node="love-2-test.example.com" call-id="19236" rc-code="0" op-status=
Oct 02 09:15:33 love-1-test.example.com pacemaker-based     [3325400] (cib_perform_op)  info: ++                                         <lrm_rsc_op id="kvm-16_last_0" operation_key="kvm-16_start_0" operation="start" crm-debug-origin="build_active_RAs" crm_feature_set="3.11.0" transition-key="36:14123:0:4a7f8335-f28c-4644-9e9d-fa408252ce19" transition-magic="0:0;36:14123:0:4a7f8335-f28c-4644-9e9d-fa408252ce19" exit-reason="" on_node="love-2-test.example.com" call-id="19364" rc-code="0" op-status="0" interval
Oct 02 09:15:33 love-1-test.example.com pacemaker-based     [3325400] (cib_perform_op)  info: ++                                       </lrm_resource>

^ When the node updates its resource history, it doesn't include the migration result, so the cluster thinks the migration still needs to be done


I suspect this a timing issue related to the election-triggered history refresh and the migration action happening at the same time. The sequence appears to be:

1. The migration command completes and a new DC election is initiated around the same time.
2. For the election, the node grabs its resource history and sends it to the DC (before the migration result has been recorded).
3. For the migration, the node receives the result and records it in the resource history.
4. The DC receives the originally sent resource history, and overwrites the (newer) resource history in the CIB with it.

The fix for bz2000595 (erasing the old history and overwriting it with the new in one operation) won't help here since the problem is a result arriving after the new history has already been grabbed. I'm not sure at this point what the fix should be. Perhaps the newly elected DC should re-request the resource history if the most recent action result arrived after the timestamp in the history.

Comment 4 Ken Gaillot 2021-10-04 15:34:32 UTC
The best workaround in the meantime will be to identify what's repeatedly replacing the CIB configuration, and see if there's a way to reduce that frequency

Comment 5 tbsky 2021-10-04 15:55:53 UTC
thanks again for the detailed analyze. I run scripts to create/migrate/delete vm at two nodes for testing cluster stability. the scripts will lock each other. so at anytime there will be only one "pcs" command which can run at whole cluster. unfortunately the result need time to spread, so locking didn't prevent racing. 

to workaround with bz2000595 I run "crm_mon -1" multiple times until I get the correct one. but this kind of racing seems hard to workaround.

Comment 6 Ken Gaillot 2021-10-04 16:05:48 UTC
One possibility would be to use the --wait option with the pcs commands (the ones that support it). That will wait until the cluster settles before returning. Another way of doing the same thing would be to run "crm_resource --wait".

Comment 7 tbsky 2021-10-04 23:59:00 UTC
indeed pcs has many command which has "--wait" option. I will try to test with it. thanks a lot for the hint!

Comment 9 Reid Wahl 2023-03-23 23:18:44 UTC
Fixed upstream by commit 39a497e

Comment 10 tbsky 2023-03-24 01:48:15 UTC
thanks a lot for the fix!
the story of the commit is a good read. 
it is great that we don't need to keep racing in mind when doing jobs.

Comment 11 RHEL Program Management 2023-04-03 07:28:07 UTC
After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.

Comment 12 Reid Wahl 2023-04-03 08:08:04 UTC
Why does it keep doing this...

Comment 22 Markéta Smazová 2023-07-19 10:24:00 UTC
verified as SanityOnly in pacemaker-2.1.6-1.el8

Comment 25 errata-xmlrpc 2023-11-14 15:32:34 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (pacemaker bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2023:6970