Could we reset here and start with a fresh description of the problem (ideally in the shape of "I did X, I expected Y, I got Z")? It's getting hard to follow with all the merged patches and passing/failing jobs in the 18 comments before this one.
Comment 20Sofer Athlan-Guyot
2023-02-21 14:39:56 UTC
Hi,
agreed, we're going to implement some testing after reboot to ensure the state of the vm after migration. That will ease debugging if the problem is still happening.
Meanwhile closing this as CI doesn't *seem* to fail.
Description of problem: os-migrateLive errors during an Overcloud reboot (after successful update) 2022-08-29 18:12:34.506 | failed: [compute-1 -> undercloud-0] (item=80cbc604-2f06-4a96-bd7b-b9b8fe361e64) => { 2022-08-29 18:12:34.512 | "ansible_loop_var": "item", 2022-08-29 18:12:34.518 | "attempts": 30, 2022-08-29 18:12:34.522 | "changed": true, 2022-08-29 18:12:34.527 | "cmd": "source /home/stack/qe-Cloud-0rc\nnova live-migration 80cbc604-2f06-4a96-bd7b-b9b8fe361e64 compute-1.redhat.local\nopenstack server show 80cbc604-2f06-4a96-bd7b-b9b8fe361e64 -f json | jq -r -c '. | .[\"OS-EXT-SRV-ATTR:host\"]'\n", 2022-08-29 18:12:34.533 | "delta": "0:00:04.419188", 2022-08-29 18:12:34.538 | "end": "2022-08-29 18:12:34.457701", 2022-08-29 18:12:34.541 | "item": "80cbc604-2f06-4a96-bd7b-b9b8fe361e64", 2022-08-29 18:12:34.544 | "rc": 0, 2022-08-29 18:12:34.547 | "start": "2022-08-29 18:12:30.038513" 2022-08-29 18:12:34.549 | } 2022-08-29 18:12:34.551 | 2022-08-29 18:12:34.554 | STDOUT: 2022-08-29 18:12:34.557 | 2022-08-29 18:12:34.560 | compute-0.redhat.local 2022-08-29 18:12:34.562 | 2022-08-29 18:12:34.563 | 2022-08-29 18:12:34.566 | STDERR: 2022-08-29 18:12:34.568 | 2022-08-29 18:12:34.570 | ERROR (Conflict): Cannot 'os-migrateLive' instance 80cbc604-2f06-4a96-bd7b-b9b8fe361e64 while it is in vm_state error (HTTP 409) (Request-ID: req-b3f1f6a4-60f6-4757-98a2-8369e6ebdd45) 2022-08-29 18:12:34.575 | 2022-08-29 18:12:34.581 | NO MORE HOSTS LEFT ************************************************************* 2022-08-29 18:12:34.585 | 2022-08-29 18:12:34.589 | PLAY RECAP ********************************************************************* 2022-08-29 18:12:34.592 | ceph-0 : ok=19 changed=11 unreachable=0 failed=0 skipped=103 rescued=0 ignored=3 2022-08-29 18:12:34.594 | ceph-1 : ok=19 changed=11 unreachable=0 failed=0 skipped=103 rescued=0 ignored=3 2022-08-29 18:12:34.597 | ceph-2 : ok=19 changed=11 unreachable=0 failed=0 skipped=103 rescued=0 ignored=3 2022-08-29 18:12:34.599 | compute-0 : ok=17 changed=10 unreachable=0 failed=0 skipped=105 rescued=0 ignored=2 2022-08-29 18:12:34.605 | compute-1 : ok=19 changed=12 unreachable=0 failed=1 skipped=102 rescued=0 ignored=2 Version-Release number of selected component (if applicable) RHOS-17.0-RHEL-9-20220729.n.2 To RHOS-17.0-RHEL-9-20220825.n.1 How reproducible: Perform reboot after successful Overcloud update Steps to Reproduce: 1. Deploy Overcloud 2. Update Overcloud - for now, 17.0 to 17.0 3. Run nova live migration of VMs from compute-1 (nova host-evacuate-live) to compute-0 4. Reboot compute-1 5. After reboot... 6. Migrate VMs back to compute-1 (from compute-0) 7. ERROR: VM instances not getting migrated back to compute-1 properly Actual results: Seeing migration failures (above) Expected results: Success Additional info: