Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 2122654

Summary: os-migrateLive errors during an Overcloud reboot (after successful update
Product: Red Hat OpenStack Reporter: Owen McGonagle <omcgonag>
Component: openstack-novaAssignee: Owen McGonagle <omcgonag>
Status: CLOSED WORKSFORME QA Contact: Archana Singh <arcsingh>
Severity: high Docs Contact:
Priority: medium    
Version: 17.0 (Wallaby)CC: alifshit, dasmith, eglynn, jgrosso, jhakimra, jpretori, kchamart, kthakre, mciecier, sathlang, sbauza, sgordon, smooney, twilson, vromanso
Target Milestone: ---Keywords: TestOnly, Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-02-21 14:39:56 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2085583    
Bug Blocks:    

Description Owen McGonagle 2022-08-30 14:25:58 UTC
Description of problem:

os-migrateLive errors during an Overcloud reboot (after successful update)

2022-08-29 18:12:34.506 | failed: [compute-1 -> undercloud-0] (item=80cbc604-2f06-4a96-bd7b-b9b8fe361e64) => {
2022-08-29 18:12:34.512 |     "ansible_loop_var": "item",
2022-08-29 18:12:34.518 |     "attempts": 30,
2022-08-29 18:12:34.522 |     "changed": true,
2022-08-29 18:12:34.527 |     "cmd": "source /home/stack/qe-Cloud-0rc\nnova live-migration  80cbc604-2f06-4a96-bd7b-b9b8fe361e64 compute-1.redhat.local\nopenstack server show 80cbc604-2f06-4a96-bd7b-b9b8fe361e64  -f json | jq -r -c '. | .[\"OS-EXT-SRV-ATTR:host\"]'\n",
2022-08-29 18:12:34.533 |     "delta": "0:00:04.419188",
2022-08-29 18:12:34.538 |     "end": "2022-08-29 18:12:34.457701",
2022-08-29 18:12:34.541 |     "item": "80cbc604-2f06-4a96-bd7b-b9b8fe361e64",
2022-08-29 18:12:34.544 |     "rc": 0,
2022-08-29 18:12:34.547 |     "start": "2022-08-29 18:12:30.038513"
2022-08-29 18:12:34.549 | }
2022-08-29 18:12:34.551 |
2022-08-29 18:12:34.554 | STDOUT:
2022-08-29 18:12:34.557 |
2022-08-29 18:12:34.560 | compute-0.redhat.local
2022-08-29 18:12:34.562 |
2022-08-29 18:12:34.563 |
2022-08-29 18:12:34.566 | STDERR:
2022-08-29 18:12:34.568 |
2022-08-29 18:12:34.570 | ERROR (Conflict): Cannot 'os-migrateLive' instance 80cbc604-2f06-4a96-bd7b-b9b8fe361e64 while it is in vm_state error (HTTP 409) (Request-ID: req-b3f1f6a4-60f6-4757-98a2-8369e6ebdd45)
2022-08-29 18:12:34.575 |
2022-08-29 18:12:34.581 | NO MORE HOSTS LEFT *************************************************************
2022-08-29 18:12:34.585 |
2022-08-29 18:12:34.589 | PLAY RECAP *********************************************************************
2022-08-29 18:12:34.592 | ceph-0                     : ok=19   changed=11   unreachable=0    failed=0    skipped=103  rescued=0    ignored=3
2022-08-29 18:12:34.594 | ceph-1                     : ok=19   changed=11   unreachable=0    failed=0    skipped=103  rescued=0    ignored=3
2022-08-29 18:12:34.597 | ceph-2                     : ok=19   changed=11   unreachable=0    failed=0    skipped=103  rescued=0    ignored=3
2022-08-29 18:12:34.599 | compute-0                  : ok=17   changed=10   unreachable=0    failed=0    skipped=105  rescued=0    ignored=2
2022-08-29 18:12:34.605 | compute-1                  : ok=19   changed=12   unreachable=0    failed=1    skipped=102  rescued=0    ignored=2

Version-Release number of selected component (if applicable)

RHOS-17.0-RHEL-9-20220729.n.2
To
RHOS-17.0-RHEL-9-20220825.n.1

How reproducible:
Perform reboot after successful Overcloud update

Steps to Reproduce:
1. Deploy Overcloud
2. Update Overcloud - for now, 17.0 to 17.0
3. Run nova live migration of VMs from compute-1 (nova host-evacuate-live) to compute-0
4. Reboot compute-1
5. After reboot...
6. Migrate VMs back to compute-1 (from compute-0)
7. ERROR: VM instances not getting migrated back to compute-1 properly

Actual results:
Seeing migration failures (above)

Expected results:
Success

Additional info:

Comment 19 Artom Lifshitz 2022-12-12 16:03:25 UTC
Could we reset here and start with a fresh description of the problem (ideally in the shape of "I did X, I expected Y, I got Z")? It's getting hard to follow with all the merged patches and passing/failing jobs in the 18 comments before this one.

Comment 20 Sofer Athlan-Guyot 2023-02-21 14:39:56 UTC
Hi,

agreed, we're going to implement some testing after reboot to ensure the state of the vm after migration.  That will ease debugging if the problem is still happening.

Meanwhile closing this as CI doesn't *seem* to fail.