Bug 1603155
| Summary: | Guest fails to resume after paused due to I/O error when macTableManager='libvirt' | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux Advanced Virtualization | Reporter: | Fangge Jin <fjin> | ||||
| Component: | libvirt | Assignee: | Jiri Denemark <jdenemar> | ||||
| Status: | CLOSED ERRATA | QA Contact: | yalzhang <yalzhang> | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | medium | ||||||
| Version: | 8.0 | CC: | dyuan, jdenemar, jsuchane, lmen, rbalakri, xuzhang, yalzhang | ||||
| Target Milestone: | rc | Keywords: | Reopened, Triaged | ||||
| Target Release: | --- | Flags: | pm-rhel:
mirror+
|
||||
| Hardware: | x86_64 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | libvirt-7.4.0-1.el8 | Doc Type: | If docs needed, set a value | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2021-11-16 07:49:54 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | 7.4.0 | ||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
I can also reproduce this bug with steps: https://bugzilla.redhat.com/show_bug.cgi?id=1560854#c10 *** Bug 1647058 has been marked as a duplicate of this bug. *** Wherever the guest is being paused by the error, the fdb needs to be updated at that point to remove the entry for the guest's MAC. That way when the guest is restarted, attempting to re-add the entry for the guest's MAC won't generate an error. After evaluating this issue, there are no plans to address it further or fix it in an upcoming release. Therefore, it is being closed. If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened. This is now fixed upstream with
commit 241c22a9a531cb39d2b6b892561fe856f32f310d
Refs: v7.3.0-5-g241c22a9a5
Author: Jiri Denemark <jdenemar>
AuthorDate: Fri Apr 30 17:25:29 2021 +0200
Commit: Jiri Denemark <jdenemar>
CommitDate: Mon May 3 11:12:58 2021 +0200
virnetdevbridge: Ignore EEXIST when adding an entry to fdb
When updating entries in a bridge forwarding database (i.e., when
macTableManager='libvirt' is configured for the bridge), we may end up
in a situation when the entry we want to add is already present. Let's
just ignore the error in such a case.
This fixes an error to resume a domain when fdb entries were not
properly removed when the domain was paused:
virsh # resume test
error: Failed to resume domain test
error: error adding fdb entry for vnet2: File exists
For some reason, fdb entries are only removed when libvirt explicitly
stops CPUs, but nothing happens when we just get STOP event from QEMU.
An alternative approach would be to make sure we always remove the
entries regardless on why a domain was paused (e.g., during migration),
but that would be a significantly more disruptive change with possible
side effects.
https://bugzilla.redhat.com/show_bug.cgi?id=1603155
Signed-off-by: Jiri Denemark <jdenemar>
Reviewed-by: Laine Stump <laine>
Reproduce this bug on libvirt-7.0.0-13.module+el8.4.0+10604+5608c2b4.x86_64 with steps in bug 1647058#c0: 1. create a libvirt NAT network with <bridge name='kvm' stp='off' delay='0' macTableManager='libvirt'/> 2. create a domain with an interface connected to the network from step 1 3. start the domain 4. run below command: # virsh qemu-monitor-command rhel '{"execute":"stop"}' {"return":{},"id":"libvirt-13"} 5. check the vm status, it is paused: # virsh list Id Name State --------------------- 1 rhel paused 6. try to resume the vm, failed: # virsh resume rhel error: Failed to resume domain 'rhel' error: error adding fdb entry for vnet0: File exists # cat /var/log/libvirt/libvirtd.log | grep error 2021-05-08 10:30:58.375+0000: 24359: error : virNetDevBridgeFDBAddDel:1071 : error adding fdb entry for vnet0: File exists Test on v7.3.0-163-g156315cff4 with the same steps as above, the bug is fixed
➜ ~ virsh qemu-monitor-command pc '{"execute":"stop"}'
{"return":{},"id":"libvirt-379"}
➜ ~ virsh list
Id Name State
---------------------
1 pc paused
➜ ~ virsh resume pc
Domain 'pc' resumed
➜ ~ virsh list
Id Name State
----------------------
1 pc running
Test on libvirt-7.4.0-1.module+el8.5.0+11218+83343022.x86_64 with the same steps in comment 10: 1. Configure the network with "macTableManager='libvirt'": <bridge name='virbr0' stp='on' delay='0' macTableManager='libvirt'/> 2. Start a vm with an interface connected to the network; 3. # virsh qemu-monitor-command rhel '{"execute":"stop"}' {"return":{},"id":"libvirt-13"} 4. check the vm is paused, then try to resume it, succeed, check the logs, no errors or failures. # virsh list Id Name State --------------------- 1 rhel paused # virsh resume rhel Domain 'rhel' resumed Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (virt:av bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:4684 |
Created attachment 1459974 [details] libvirtd log Description of problem: Guest fails to resume after paused due to I/O error when macTableManager='libvirt' Version-Release number of selected component: libvirt-4.5.0-3.virtcov.el7.x86_64 How reproducible: 100% Steps to Reproduce: 1. Prepare a virtual network, set macTableManager='libvirt' # virsh net-dumpxml default <network> <name>default</name> <uuid>ac4e8219-6225-40f7-95b2-7d731a91ea75</uuid> <forward mode='nat'> <nat> <port start='1024' end='65535'/> </nat> </forward> <bridge name='virbr0' stp='on' delay='0' macTableManager='libvirt'/> <mtu size='9000'/> <mac address='52:54:00:70:1f:6f'/> <bandwidth> <inbound average='1000' peak='5000' burst='5120'/> <outbound average='128' peak='256' burst='256'/> </bandwidth> <ip address='192.168.122.1' netmask='255.255.255.0'> <dhcp> <range start='192.168.122.2' end='192.168.122.254'/> </dhcp> </ip> </network> 2. Start a guest with virtual disk and interface as below: ... <disk type='file' device='disk'> <driver name='qemu' type='qcow2' cache='none' error_policy='stop' rerror_policy='stop' io='threads' discard='unmap'/> <source file='/nfs/RHEL-7.6-x86_64-latest.qcow2'/> <target dev='vda' bus='virtio'/> ... <interface type='network'> <mac address='52:54:00:c6:3b:95'/> <source network='default'/> <bandwidth> <inbound average='1000' peak='5000' floor='200' burst='1024'/> <outbound average='128' peak='256' burst='256'/> </bandwidth> <model type='virtio'/> <driver name='vhost' txmode='iothread' ioeventfd='on' event_idx='off' queues='5'> <host csum='off' gso='off' tso4='off' tso6='off' ecn='off' ufo='off' mrg_rxbuf='off'/> <guest csum='off' tso4='off' tso6='off' ecn='off' ufo='off'/> </driver> <link state='up'/> <mtu size='9000'/> <coalesce> <rx> <frames max='7'/> </rx> </coalesce> <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> </interface> 3. Start guest 4. Chown to root:root for /nfs/RHEL-7.6-x86_64-latest.qcow2 # chown root:root /nfs/RHEL-7.6-x86_64-latest.qcow2 5. Check guest status, it is paused # virsh list Id name status ---------------------------------------------------- 5 rhel7.5 paused 6. Try to resume guest: # chown qemu:qemu /nfs/RHEL-7.6-x86_64-latest.qcow2 # virsh resume 5 error:Resume domain 5 failed error:error adding fdb entry for vnet2: File exists Actual results: As step6, guest fails to resume Expected results: Guest can resume successfully Additional info: Guest can resume successfully if suspend it manually: # virsh suspend $guest # virsh resume $guest