Bug 1607202

Summary: The referred nwfilter can be undefined causing the vm shutoff after libvirtd restart
Product: Red Hat Enterprise Linux 7 Reporter: yalzhang <yalzhang>
Component: libvirtAssignee: John Ferlan <jferlan>
Status: CLOSED WONTFIX QA Contact: yalzhang <yalzhang>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.6CC: chhu, fjin, jdenemar, lcheng, lmen, tburke, xuzhang
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
undefined
Story Points: ---
Clone Of:
: 1648544 (view as bug list) Environment:
Last Closed: 2019-04-03 11:58:37 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1648544    

Description yalzhang@redhat.com 2018-07-23 02:19:39 UTC
Description of problem:
The referred nwfilter can be undefined causing the vm shutoff after libvirtd restart

Version-Release number of selected component (if applicable):
libvirt-4.5.0-3.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Start a vm with an interface referred to nwfilter
# virsh dumpxml rhel |grep /interface -B6
    <interface type='network'>
      <mac address='52:54:00:fa:81:87'/>
      <source network='default'/>
      <model type='rtl8139'/>
      <filterref filter='clean-traffic'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </interface>
# virsh start rhel
Domain rhel started

# virsh nwfilter-binding-list 
 Port Dev              Filter               
------------------------------------------------------------------
 vnet0                 clean-traffic     

2. Undefine the referred nwfilter is not allowed, this is expected.
# virsh nwfilter-undefine clean-traffic
error: Failed to undefine network filter clean-traffic
error: Requested operation is not valid: nwfilter is in use

3. But after delete the binding, the referred nwfilter can be undefined.
# virsh nwfilter-binding-delete vnet0
Network filter binding on vnet0 deleted

# virsh nwfilter-undefine clean-traffic
Network filter clean-traffic undefined

4. after undefine the nwfilter, as the nwfilter is still configured in vm's xml, restart the libvirtd will cause the vm shutoff.
# virsh list
 Id    Name                           State
----------------------------------------------------
 4     rhel                           running

# systemctl restart libvirtd

# virsh list --all
 Id    Name                           State
----------------------------------------------------
 -     rhel                           shut off

Actual results:
The referred nwfilter can be undefined causing the vm shutoff after libvirtd restart

Expected results:
The nwfilter can not be undefined even the nwfilter-binding is deleted

Additional info:
on libvirt-3.9.0-14.el7_5.4.x86_64
# virsh start rhel
Domain rhel started

# virsh dumpxml rhel  | grep /interface -B8
    <interface type='network'>
      <mac address='52:54:00:fa:81:87'/>
      <source network='default' bridge='virbr0'/>
      <target dev='vnet0'/>
      <model type='rtl8139'/>
      <filterref filter='clean-traffic'/>
      <alias name='net0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </interface>

# virsh nwfilter-undefine clean-traffic
error: Failed to undefine network filter clean-traffic
error: Requested operation is not valid: nwfilter is in use

Comment 2 John Ferlan 2018-08-22 21:45:06 UTC
It's an "interesting problem". On the one hand, as is documented in the code for the delete binding:

/*
 * Note that this is primarily intended for usage by the hypervisor
 * drivers. it is exposed to the admin, however, and nothing stops
 * an admin from deleting filter bindings created by the hypervisor
 * drivers. IOW, it is the admin's responsibility not to shoot
 * themself in the foot
 */
static int
nwfilterBindingDelete(virNWFilterBindingPtr binding)

So in a way, you've found a way to shoot yourself in the foot and what you see is what you get.

Still, we shouldn't cause the guest to shutdown on libvirtd restart, so I've posted a patch upstream that can handle this case:

https://www.redhat.com/archives/libvir-list/2018-August/msg01407.html

Comment 3 John Ferlan 2018-08-31 21:43:40 UTC
Created a v3: 

https://www.redhat.com/archives/libvir-list/2018-August/msg01587.html

waiting on review

Since someone has to work in order to have this happen and really not blocker type material, I'm moving to rhel-7.7.0.  Deleting a binding and a filter takes a couple steps.

Comment 4 John Ferlan 2018-09-20 11:43:25 UTC
Patch has been pushed upstream:

commit 9e52c6496650d1412662a9e6cf98301141fbbbca
Author: John Ferlan <jferlan>
Date:   Fri Aug 24 09:29:24 2018 -0400

    qemu: Ignore nwfilter binding instantiation issues during reconnect
    
    ...
    
    It's essentially stated in the nwfilterBindingDelete that we
    will allow the admin to shoot themselves in the foot by deleting
    the nwfilter binding which then allows them to undefine the
    nwfilter that is in use for the running guest...
    
    However, by allowing this we cause a problem for libvirtd
    restart reconnect processing which would then try to recreate
    the missing binding attempting to use the deleted filter
    resulting in an error and thus shutting the guest down.
    
    So rather than keep adding virDomainConfNWFilterInstantiate
    flags to "ignore" specific error conditions, modify the logic
    to ignore, but VIR_WARN errors other than ignoreExists. This
    will at least allow the guest to not shutdown for only nwfilter
    binding errors that we can now perhaps recover from since we
    have the binding create/delete capability.
    
$ git describe 9e52c6496650d1412662a9e6cf98301141fbbbca
v4.7.0-173-g9e52c64966
$

Comment 5 yalzhang@redhat.com 2019-02-26 05:31:53 UTC
Test on upstream libvirt-5.1.0-1.el7.x86_64, it works as expected.

# virsh nwfilter-binding-list
 Port Dev   Filter
---------------------------
 vnet0      clean-traffic

# virsh nwfilter-binding-delete vnet0
Network filter binding on vnet0 deleted

# virsh nwfilter-binding-list
 Port Dev   Filter
--------------------

# virsh nwfilter-undefine clean-traffic
Network filter clean-traffic undefined

# systemctl restart libvirtd
# virsh list
 Id   Name     State
------------------------
 2    rhel77   running

# virsh nwfilter-binding-list
 Port Dev   Filter
--------------------

# cat /var/log/libvirt/libvirtd.log | grep error
2019-02-26 05:31:00.258+0000: 43382: error : virNWFilterObjListFindInstantiateFilter:201 : internal error: referenced filter 'clean-traffic' is missing
2019-02-26 05:31:00.258+0000: 43382: warning : qemuProcessFiltersInstantiate:3299 : filter 'clean-traffic' instantiation for 'vnet0' failed 'internal error: referenced filter 'clean-traffic' is missing'

Comment 6 RHEL Program Management 2019-04-03 11:58:37 UTC
Development Management has reviewed and declined this request. You may appeal this decision by reopening this request.