Created attachment 1828623 [details] migration log at source node Created attachment 1828623 [details] migration log at source node Description of problem: sometimes pacemaker decide to run VirtualDomain live migration twice. the second migration will fail and cause the resource to be stopped. please check the migration-source.log file logged at node-1. resource kvm-16 start migrate_to at "09:15:32" and success at "09:15:33". but later at "09:15:33" pacemaker want to migrate again. it failed because the vm 'Not active locally'. check the migration-destination.log logged at node-2. the first migration seems ignored since it is busy doing other things. maybe the situation is like bug 2000595? Version-Release number of selected component (if applicable): pacemaker-2.1.0-8.el8.x86_64 How reproducible: sometimes Steps to Reproduce: 1. create vm and migrate them at different nodes concurrently for testing 2. sometimes the migration will fail 3. Actual results: migration sometimes fail Expected results: migration always success Additional info:
Created attachment 1828624 [details] migration log at destination node
*** Bug 2010086 has been marked as a duplicate of this bug. ***
(In reply to tbsky from comment #0) > please check the migration-source.log file logged at node-1. resource kvm-16 > start migrate_to at "09:15:32" and success at "09:15:33". but later at > "09:15:33" pacemaker want to migrate again. it failed because the vm 'Not > active locally'. > > check the migration-destination.log logged at node-2. the first migration > seems ignored since it is busy doing other things. > > maybe the situation is like bug 2000595? The situation has the same trigger, the DC election is being repeated frequently, which causes each node to re-record its resource history in the CIB status section. It's a timing issue that goes like: Oct 02 09:15:32 love-1-test.example.com pacemaker-controld [3325405] (te_rsc_command) notice: Initiating migrate_to operation kvm-16_migrate_to_0 on love-2-test.example.com | action 41 ^ The first migration is started Oct 02 09:15:32 love-1-test.example.com pacemaker-based [3325400] (cib_process_request) info: Completed cib_replace operation for section configuration: OK (rc=0, origin=love-1-test.example.com/cibadmin/2, version=0.29071.0) ... Oct 02 09:15:32 love-1-test.example.com pacemaker-controld [3325405] (do_state_transition) info: State transition S_TRANSITION_ENGINE -> S_ELECTION | input=I_ELECTION cause=C_FSA_INTERNAL origin=do_cib_replaced ^ Something modifies the configuration with a full replacement (likely a pcs command, I'm guessing in a script or cron somewhere), which triggers a new DC election. Oct 02 09:15:33 love-1-test.example.com pacemaker-controld [3325405] (process_graph_event) info: Transition 14127 action 41 (kvm-16_migrate_to_0 on love-2-test.example.com) confirmed: ok | rc=0 call-id=19369 ^ The migration completes successfully and is recorded in the resource history Oct 02 09:15:33 love-1-test.example.com pacemaker-controld [3325405] (controld_delete_node_state) info: Deleting resource history for node love-2-test.example.com (via CIB call 142828) | xpath=//node_state[@uname='love-2-test.example.com']/lrm ^ However the node's resource history is immediately erased due to the election Oct 02 09:15:33 love-1-test.example.com pacemaker-based [3325400] (cib_perform_op) info: ++ <lrm_resource id="kvm-16" type="VirtualDomain" class="ocf" provider="heartbeat"> Oct 02 09:15:33 love-1-test.example.com pacemaker-based [3325400] (cib_perform_op) info: ++ <lrm_rsc_op id="kvm-16_last_failure_0" operation_key="kvm-16_monitor_0" operation="monitor" crm-debug-origin="build_active_RAs" crm_feature_set="3.11.0" transition-key="33:14062:7:4a7f8335-f28c-4644-9e9d-fa408252ce19" transition-magic="0:0;33:14062:7:4a7f8335-f28c-4644-9e9d-fa408252ce19" exit-reason="" on_node="love-2-test.example.com" call-id="19236" rc-code="0" op-status= Oct 02 09:15:33 love-1-test.example.com pacemaker-based [3325400] (cib_perform_op) info: ++ <lrm_rsc_op id="kvm-16_last_0" operation_key="kvm-16_start_0" operation="start" crm-debug-origin="build_active_RAs" crm_feature_set="3.11.0" transition-key="36:14123:0:4a7f8335-f28c-4644-9e9d-fa408252ce19" transition-magic="0:0;36:14123:0:4a7f8335-f28c-4644-9e9d-fa408252ce19" exit-reason="" on_node="love-2-test.example.com" call-id="19364" rc-code="0" op-status="0" interval Oct 02 09:15:33 love-1-test.example.com pacemaker-based [3325400] (cib_perform_op) info: ++ </lrm_resource> ^ When the node updates its resource history, it doesn't include the migration result, so the cluster thinks the migration still needs to be done I suspect this a timing issue related to the election-triggered history refresh and the migration action happening at the same time. The sequence appears to be: 1. The migration command completes and a new DC election is initiated around the same time. 2. For the election, the node grabs its resource history and sends it to the DC (before the migration result has been recorded). 3. For the migration, the node receives the result and records it in the resource history. 4. The DC receives the originally sent resource history, and overwrites the (newer) resource history in the CIB with it. The fix for bz2000595 (erasing the old history and overwriting it with the new in one operation) won't help here since the problem is a result arriving after the new history has already been grabbed. I'm not sure at this point what the fix should be. Perhaps the newly elected DC should re-request the resource history if the most recent action result arrived after the timestamp in the history.
The best workaround in the meantime will be to identify what's repeatedly replacing the CIB configuration, and see if there's a way to reduce that frequency
thanks again for the detailed analyze. I run scripts to create/migrate/delete vm at two nodes for testing cluster stability. the scripts will lock each other. so at anytime there will be only one "pcs" command which can run at whole cluster. unfortunately the result need time to spread, so locking didn't prevent racing. to workaround with bz2000595 I run "crm_mon -1" multiple times until I get the correct one. but this kind of racing seems hard to workaround.
One possibility would be to use the --wait option with the pcs commands (the ones that support it). That will wait until the cluster settles before returning. Another way of doing the same thing would be to run "crm_resource --wait".
indeed pcs has many command which has "--wait" option. I will try to test with it. thanks a lot for the hint!
Fixed upstream by commit 39a497e
thanks a lot for the fix! the story of the commit is a good read. it is great that we don't need to keep racing in mind when doing jobs.
After evaluating this issue, there are no plans to address it further or fix it in an upcoming release. Therefore, it is being closed. If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.
Why does it keep doing this...
verified as SanityOnly in pacemaker-2.1.6-1.el8