Bug 1684350
| Summary: | VM's fails to shutdown after re-balance | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | bipin <bshetty> |
| Component: | rhhi | Assignee: | Sahina Bose <sabose> |
| Status: | CLOSED DUPLICATE | QA Contact: | bipin <bshetty> |
| Severity: | medium | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | rhhiv-1.6 | CC: | bperkins, bshetty, godas, pasik, rhs-bugs, sankarshan, sasundar |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2019-05-03 10:38:04 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
bipin
2019-03-01 03:26:03 UTC
Engine log:
===========
2019-02-28 11:13:55,444+05 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (EE-ManagedThreadFactory-engine-Thread-195506) [97b743be-e2d4-4f38-b986-19cf609c5a9e] Failed in 'DestroyVDS' method
2019-02-28 11:13:55,453+05 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-195506) [97b743be-e2d4-4f38-b986-19cf609c5a9e] EVENT_ID: VDS_BROKER_
COMMAND_FAILURE(10,802), VDSM rhsqa-grafton9-nic2.lab.eng.blr.redhat.com command DestroyVDS failed: Virtual machine destroy error
2019-02-28 11:13:55,454+05 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (EE-ManagedThreadFactory-engine-Thread-195506) [97b743be-e2d4-4f38-b986-19cf609c5a9e] Command 'org.ovirt.engine.core
.vdsbroker.vdsbroker.DestroyVDSCommand' return value 'StatusOnlyReturn [status=Status [code=42, message=Virtual machine destroy error]]'
2019-02-28 11:13:55,454+05 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (EE-ManagedThreadFactory-engine-Thread-195506) [97b743be-e2d4-4f38-b986-19cf609c5a9e] HostName = rhsqa-grafton9-nic2
.lab.eng.blr.redhat.com
2019-02-28 11:13:55,454+05 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (EE-ManagedThreadFactory-engine-Thread-195506) [97b743be-e2d4-4f38-b986-19cf609c5a9e] Command 'DestroyVDSCommand(Hos
tName = rhsqa-grafton9-nic2.lab.eng.blr.redhat.com, DestroyVmVDSCommandParameters:{hostId='a37bef1d-9214-4476-a22b-2e556f71949f', vmId='ceefada7-bda7-4c7e-b7d8-1b72bd131e61', secondsToWait='0', gracefully='false
', reason='', ignoreNoVm='false'})' execution failed: VDSGenericException: VDSErrorException: Failed to DestroyVDS, error = Virtual machine destroy error, code = 42
2019-02-28 11:13:55,454+05 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (EE-ManagedThreadFactory-engine-Thread-195506) [97b743be-e2d4-4f38-b986-19cf609c5a9e] FINISH, DestroyVDSCommand, ret
urn: , log id: 6977bc42
2019-02-28 11:13:55,454+05 ERROR [org.ovirt.engine.core.vdsbroker.DestroyVmVDSCommand] (EE-ManagedThreadFactory-engine-Thread-195506) [97b743be-e2d4-4f38-b986-19cf609c5a9e] Failed to destroy VM 'ceefada7-bda7-4c
7e-b7d8-1b72bd131e61' in VDS = 'a37bef1d-9214-4476-a22b-2e556f71949f' , error = 'org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to DestroyVDS, error =
Virtual machine destroy error, code = 42'
2019-02-28 11:13:55,454+05 INFO [org.ovirt.engine.core.vdsbroker.DestroyVmVDSCommand] (EE-ManagedThreadFactory-engine-Thread-195506) [97b743be-e2d4-4f38-b986-19cf609c5a9e] FINISH, DestroyVmVDSCommand, return: ,
log id: 58eabca
2019-02-28 11:13:55,454+05 ERROR [org.ovirt.engine.core.bll.StopVmCommand] (EE-ManagedThreadFactory-engine-Thread-195506) [97b743be-e2d4-4f38-b986-19cf609c5a9e] Command 'org.ovirt.engine.core.bll.StopVmCommand'
failed: EngineException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to DestroyVDS, error = Virtual machine destroy error, code = 42 (Failed with e
rror destroyErr and code 42)
2019-02-28 11:13:55,464+05 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-195506) [97b743be-e2d4-4f38-b986-19cf609c5a9e] EVENT_ID: USER_FAILED_STOP_VM(56), Failed to power off VM vm1_Vol_expansion (Host: rhsqa-grafton9-nic2.lab.eng.blr.redhat.com, User: admin@internal-authz)
FUSE mount log:
==============
[2019-02-27 11:28:04.401930] I [MSGID: 109064] [dht-layout.c:837:dht_layout_dir_mismatch] 2-Vol1_Expansion-dht: subvol: Vol1_Expansion-replicate-0; inode layout - 0 - 4294967295 - 1; disk layout - 2147483647 - 4294967295 - 3820200388
[2019-02-27 11:28:04.402050] I [MSGID: 109018] [dht-common.c:1790:dht_revalidate_cbk] 2-Vol1_Expansion-dht: Mismatching layouts for /52029ff1-9890-4c96-9edd-c5ff7052545c, gfid = 44063cb5-d123-423e-bef4-3b51bbe63ae6
[2019-02-27 11:28:04.402073] I [MSGID: 109064] [dht-layout.c:837:dht_layout_dir_mismatch] 2-Vol1_Expansion-dht: subvol: Vol1_Expansion-replicate-1; inode layout - 0 - 0 - 0; disk layout - 0 - 2147483646 - 3820200388
[2019-02-27 11:28:04.403391] I [MSGID: 109018] [dht-common.c:1790:dht_revalidate_cbk] 2-Vol1_Expansion-dht: Mismatching layouts for /52029ff1-9890-4c96-9edd-c5ff7052545c/images, gfid = 4986b231-207d-42a2-8eae-6d4dd09d1482
[2019-02-27 11:28:04.404863] I [MSGID: 109018] [dht-common.c:1790:dht_revalidate_cbk] 2-Vol1_Expansion-dht: Mismatching layouts for /52029ff1-9890-4c96-9edd-c5ff7052545c/images/58b4f4b2-f28a-4eae-90b5-7b576ad07874, gfid = 324c942a-6fcb-43e7-94f1-485ba0d9db3c
The message "I [MSGID: 109064] [dht-layout.c:837:dht_layout_dir_mismatch] 2-Vol1_Expansion-dht: subvol: Vol1_Expansion-replicate-0; inode layout - 0 - 4294967295 - 1; disk layout - 2147483647 - 4294967295 - 3820200388" repeated 3 times between [2019-02-27 11:28:04.401930] and [2019-02-27 11:28:10.331865]
[2019-02-27 11:28:10.331869] I [MSGID: 109018] [dht-common.c:1790:dht_revalidate_cbk] 2-Vol1_Expansion-dht: Mismatching layouts for /52029ff1-9890-4c96-9edd-c5ff7052545c/dom_md, gfid = c1d3d32a-80af-40b3-8775-c22e23b95cb1
[2019-02-27 11:28:10.331944] I [MSGID: 109018] [dht-common.c:1790:dht_revalidate_cbk] 2-Vol1_Expansion-dht: Mismatching layouts for /52029ff1-9890-4c96-9edd-c5ff7052545c/dom_md, gfid = c1d3d32a-80af-40b3-8775-c22e23b95cb1
[2019-02-27 11:28:22.392578] W [MSGID: 114031] [client-rpc-fops.c:2865:client3_3_lookup_cbk] 2-Vol1_Expansion-client-0: remote operation failed. Path: <gfid:a24c1d8e-ad3f-43f3-8c4a-96be4cf22496> (a24c1d8e-ad3f-43f3-8c4a-96be4cf22496) [No such file or directory]
[2019-02-27 11:28:22.392760] W [MSGID: 114031] [client-rpc-fops.c:2865:client3_3_lookup_cbk] 2-Vol1_Expansion-client-2: remote operation failed. Path: <gfid:a24c1d8e-ad3f-43f3-8c4a-96be4cf22496> (a24c1d8e-ad3f-43f3-8c4a-96be4cf22496) [No such file or directory]
[2019-02-27 11:28:22.392764] W [MSGID: 114031] [client-rpc-fops.c:2865:client3_3_lookup_cbk] 2-Vol1_Expansion-client-1: remote operation failed. Path: <gfid:a24c1d8e-ad3f-43f3-8c4a-96be4cf22496> (a24c1d8e-ad3f-43f3-8c4a-96be4cf22496) [No such file or directory]
The message "I [MSGID: 109064] [dht-layout.c:837:dht_layout_dir_mismatch] 2-Vol1_Expansion-dht: subvol: Vol1_Expansion-replicate-1; inode layout - 0 - 0 - 0; disk layout - 0 - 2147483646 - 3820200388" repeated 4 times between [2019-02-27 11:28:04.402073] and [2019-02-27 11:28:33.071945]
[2019-02-27 11:28:33.071948] I [MSGID: 109018] [dht-common.c:1790:dht_revalidate_cbk] 2-Vol1_Expansion-dht: Mismatching layouts for /52029ff1-9890-4c96-9edd-c5ff7052545c/images/86d94115-6969-41a9-afc2-f8453820370f, gfid = 5384fdfe-38b9-4ebb-9b97-02ac02f73f81
[2019-02-27 11:28:33.072133] I [MSGID: 109064] [dht-layout.c:837:dht_layout_dir_mismatch] 2-Vol1_Expansion-dht: subvol: Vol1_Expansion-replicate-0; inode layout - 0 - 4294967295 - 1; disk layout - 2147483647 - 4294967295 - 3820200388
Hi Bipin,
I was checking vdsm logs and could not find any error which explains the reason.
2019-02-28 13:24:45,811+0530 WARN (jsonrpc/3) [virt.vm] (vmId='6998fc6d-79e3-46ca-a741-5d5f0322679e') Failed to destroy VM '6998fc6d-79e3-46ca-a741-5d5f0322679e' forcefully (error=38) (vm:5212)
2019-02-28 13:24:45,812+0530 INFO (jsonrpc/3) [api.virt] FINISH destroy return={'status': {'message': 'Virtual machine destroy error', 'code': 42}} from=::ffff:10.70.36.247,57014, flow_id=2bfe5929-1906-44a2-bc3c-4d3520fd4070, vmId=6998fc6d-79e3-46ca-a741-5d5f0322679e (api:54)
Can you please confirm the engine log and vdsm log captured same time?
Gobinda, I updated with proper vdsm log now. Could you please check? Bipin, I don't find any thing related in given logs.Could you please attach libvirt log as well? On the side note, I have also hit the similar problem with qemu/kvm environment. 1. Used the hypervisor as RHEL 7.6 server 2. Created a replica 3 volume with 3 VMs and fuse mounted the volume in this RHEL 7.6 host 3. Created few VMs and started installation OS on it. 4. While the OS installation is still in progress, scaled the volume from 1x3 to 2x3 5. Performed rebalance 6. Expanded the volume once again from 2x3 to 3x3, followed by successful rebalance 7. Removed a replica set to convert the volume back to 2x3 8. At this point, powered down the VM. and observed that the VM never powered down and stuck in the 'in shutdown' state. Gobinda, I tried reproducing twice but couldn't see it. It worked fine, but since Sas also hit the issue lets not close the Bug. I am reducing the severity and will update the bug if we see it later. (In reply to SATHEESARAN from comment #7) > On the side note, I have also hit the similar problem with qemu/kvm > environment. > > 1. Used the hypervisor as RHEL 7.6 server > 2. Created a replica 3 volume with 3 VMs and fuse mounted the volume in this > RHEL 7.6 host > 3. Created few VMs and started installation OS on it. Can you try with a fully installed VM, with qemu-guest-agent, etc.? This issue looks same as https://bugzilla.redhat.com/show_bug.cgi?id=1702686 and we did see that QEMU process becoming a defunct process. *** This bug has been marked as a duplicate of bug 1705953 *** |