| Summary: | Network issue with storage on SPM host with a HA VM ends in having host in unassigned state and HA stuck in migration | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Jiri Belka <jbelka> | ||||
| Component: | ovirt-engine | Assignee: | Gilad Chaplik <gchaplik> | ||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Artyom <alukiano> | ||||
| Severity: | urgent | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 3.2.0 | CC: | acathrow, dfediuck, iheim, lpeer, mavital, ofrenkel, Rhev-m-bugs, yeylon | ||||
| Target Milestone: | --- | ||||||
| Target Release: | 3.3.0 | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | sla | ||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2014-01-21 22:11:48 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | SLA | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Bug Depends On: | |||||||
| Bug Blocks: | 1019461 | ||||||
| Attachments: |
|
||||||
|
Description
Jiri Belka
2013-09-13 09:12:05 UTC
engine=> select * from job;
job_id | action_type | description | status |
owner_id | visible | start_time | end_time | last_update_time | correlation_id
--------------------------------------+----------------------+------------------------------------------------------------------------+---------+--------------
------------------------+---------+----------------------------+----------+----------------------------+----------------
a04fc611-6e93-40f2-b438-956e86f6a727 | InternalMigrateVm | Migrating VM test-rh6-x64 because previous Host became non-operational | STARTED | 00000000-0000
-0000-0000-000000000000 | t | 2013-09-13 10:20:54.162+02 | | 2013-09-13 10:20:54.362+02 | 27886777
945abad7-1c26-4d94-bd20-3c210acd70c9 | SetNonOperationalVds | Setting Host dell-r210ii-04 to Non-Operational mode. | STARTED | 00000000-0000
-0000-0000-000000000000 | t | 2013-09-13 10:26:03.468+02 | | 2013-09-13 10:26:03.482+02 | 72575726
(2 rows)
engine=> select * from async_tasks;
task_id | action_type | status | result | action_parameters | action_params_class | step_id | command_id | started_at | storage_pool_id | task_type | task_par
ameters | task_params_class
---------+-------------+--------+--------+-------------------+---------------------+---------+------------+------------+-----------------+-----------+---------
--------+-------------------
(0 rows)
After all of this I tried to 'Power Off' the VM. The action is reported by engine as successful and the state is 'Down'. But in fact the qemu-kvm process is still running on the original host (host4). I cleaned all tasks, I killed the qemu-kvm process but no change, the host is still in unassigned state. Restart of vdsmd did not help as well. After doing some crazy things (removing forgotten VM from vds_dynamics for the original host), restarting engine, vdsmd, I got the host in 'Non Operational' state, which I suppose is correct status. After "repairing" network issue and unmounting hanged nfs storage, I was able to active the host successfully. It's likely that this is fixed in 3.2.2 already (bz 984943) Omer please confirm? not related to bz 984943 this is related to issues we encountered in host monitoring that is stuck, which cause host status to stuck in unassigned and all vms statuses on this host not to be refreshed (bz 977169) Since bug 977169 has a fix in 3.3, which may handle this issue I suggest to test this bz and see if it can be reproduced. doron, per comment 5 - why isn't this ON_QA to be tested? After additional check we were not able to get the same results. So I'd like this to be re-tested using the fix included in bug 977169. Verified on is26 After block connection between src host and storage, SPM passed to second host and also HA vm migrated to second host without trouble and vdsClient show that no vms on src host, also src host in unassign state. After restore connection between src host and storage, host change state to UP. Closing - RHEV 3.3 Released |