Bug 1631622

Summary: guest with an interface referred to nwfilter killed by libvirt when restart libvirtd with access driver enabled
Product: Red Hat Enterprise Linux 7 Reporter: yafu <yafu>
Component: libvirtAssignee: John Ferlan <jferlan>
Status: CLOSED ERRATA QA Contact: yalzhang <yalzhang>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.6CC: berrange, dyuan, fjin, jdenemar, jferlan, xuzhang, yalzhang
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: libvirt-4.5.0-12.el7 Doc Type: No Doc Update
Doc Text:
undefined
Story Points: ---
Clone Of:
: 1648546 (view as bug list) Environment:
Last Closed: 2019-08-06 13:13:56 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1648546    

Description yafu 2018-09-21 06:04:02 UTC
Description of problem:
guest with an interface referred to nwfilter killed by libvirtd when restart libvirtd with access driver enabled

Version-Release number of selected component (if applicable):
libvirt-4.5.0-10.el7.x86_64
qemu-kvm-rhev-2.12.0-15.el7.x86_64
polkit-0.112-18.el7.x86_64


How reproducible:
100%

Steps to Reproduce:
1. Enable 'polkit' as the Access control driver 
#vim /etc/libvirt/libvirtd.conf
access_drivers = [ "polkit" ] 

2.Granted local user permission to connect to libvirt in full read-write mode
#vi /etc/libvirt/libvirtd.conf 
unix_sock_rw_perms = "0777"
auth_unix_rw = "none"

3.Restart libvirtd:
# systemctl restart libvirtd

4.Start a vm with an interface referred to nwfilter :
#virsh dumpxml 76
  <interface type='network'>
      <mac address='54:52:00:54:9e:f4'/>
      <source network='default'/>
      <model type='virtio'/>
      <filterref filter='clean-traffic'/>
      <alias name='ua-04c2decd-4e33-4889-84de-a2205c777af7'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x0a' function='0x0'/>
    </interface>


5.Restart libvirtd:
#systemctl restart libvirtd

6.Check the guest status:
#virsh list --all
 Id    Name                           State
----------------------------------------------------
 -     76                             shut off

7.Check the qemu log:
#cat /var/log/libvirt/qemu/76.log
2018-09-21 05:24:28.272+0000: shutting down, reason=crashed
2018-09-21T05:24:28.275442Z qemu-kvm: terminating on signal 15 from pid 2370 (<unknown process>)


Actual results:
guest with an interface referred to nwfilter killed by libvirtd when restart libvirtd with access driver enabled

Expected results:
Guest should still running after libvirtd restart.

Additional info:
1.Check the libvirtd.log
#cat /var/log/libvirt/libvirtd.log  | grep -i error
2018-09-21 05:24:28.271+0000: 2440: error : virAccessDriverPolkitGetCaller:87 : access denied: Policy kit denied action org.libvirt.api.connect.getattr from <anonymous>
2018-09-21 05:24:28.272+0000: 2440: error : virAccessManagerSanitizeError:203 : access denied
2018-09-21 05:24:28.272+0000: 2440: error : virAccessManagerSanitizeError:203 : access denied
2018-09-21 05:24:28.272+0000: 2440: debug : virConnectOpenInternal:1045 : driver 8 nwfilter returned ERROR
2018-09-21 05:24:28.274+0000: 2440: error : virAccessDriverPolkitGetCaller:87 : access denied: Policy kit denied action org.libvirt.api.connect.getattr from <anonymous>
2018-09-21 05:24:28.274+0000: 2440: error : virAccessManagerSanitizeError:203 : access denied
2018-09-21 05:24:28.274+0000: 2440: error : virAccessManagerSanitizeError:203 : access denied
2018-09-21 05:24:28.274+0000: 2440: debug : virConnectOpenInternal:1045 : driver 8 nwfilter returned ERROR

Comment 2 John Ferlan 2018-11-10 00:40:56 UTC
Interesting problem - part of this is fixed as a "byproduct" of bz1607202. The fix for that results in failure of the API to just "VIR_WARN" any errors and continue on through restart without causing a failure.  So to that degree your guest will not reboot.

However, the second half of this is the error :

2018-09-21 05:24:28.271+0000: 2440: error : virAccessDriverPolkitGetCaller:87 : access denied: Policy kit denied action org.libvirt.api.connect.getattr from <anonymous>

and ends up generating those sanitized error reports.

What's interesting about this is that it doesn't seem we should be getting <anonymous> in virAccessDriverPolkitGetCaller, but that's coming because virIdentityGetCurrent is returning NULL.

This didn't quite make sense until one considers that the qemuProcessReconnect* code was made into a thread in order to process the reconnect - this way multiple domains can be reconnected simultaneously rather than in single file.

However, in creating that thread we don't maintain the identity values from the caller and thus when we try to make a connection call from that thread we fail because we don't have identity.

I've proposed a fix upstream as patch 2 of :

https://www.redhat.com/archives/libvir-list/2018-November/msg00339.html

Comment 3 John Ferlan 2018-11-14 19:35:37 UTC
After a review cycle, patch v2 series posted:

https://www.redhat.com/archives/libvir-list/2018-November/msg00434.html

reviewed and pushed:


commit b04b82f8cb671f067bad2d5e922acf88f13f0934
Author: John Ferlan <jferlan>
Date:   Mon Nov 12 08:27:26 2018 -0500

    qemu: Set identity for the reconnect all thread
    
    ...
    
    If polkit authentication is enabled, an attempt to open
    the connection failed during virAccessDriverPolkitGetCaller
    when the call to virIdentityGetCurrent returned NULL resulting
    in the errors:
    
      virAccessDriverPolkitGetCaller:87 : access denied:
      Policy kit denied action org.libvirt.api.connect.getattr from <anonymous>
    
    Because qemuProcessReconnect runs in a thread during
    daemonRunStateInit processing it doesn't have the thread
    local identity. Thus when the virGetConnectNWFilter is
    called as part of the qemuProcessFiltersInstantiate when
    virDomainConfNWFilterInstantiate is run the attempt to get
    the idenity fails and results in the anonymous error above.
    
    To fix this, let's grab/use the virIdenityPtr of the process
    that will be creating the thread, e.g. what daemonRunStateInit
    has set and use that for our thread. That way any other similar
    processing that uses/requires an identity for any other call
    that would have previously been successfully run won't fail in
    a similar manner.
 ...

$ git describe b04b82f8cb671f067bad2d5e922acf88f13f0934
v4.9.0-56-gb04b82f8cb
$

Comment 7 yalzhang@redhat.com 2019-04-10 02:51:59 UTC
Reproduce the error with the steps in comment 0 on libvirt-4.5.0-10.el7_6.7.x86_64
Test on libvirt-4.5.0-12.el7.x86_64, the bug is fixed.

1. Edit the libvirtd.conf with below settings:
unix_sock_rw_perms = "0777"
auth_unix_rw = "none"
access_drivers = [ "polkit" ]
log_level=2
log_outputs='2:file:/var/log/libvirt/libvirtd.log'

2. Restart libvirtd and start a vm with nwfilter configured.
3. Restart libvirtd.
4. check the vm is still running and healthy.
# virsh list 
 Id    Name                           State
----------------------------------------------------
 2     rhel                           running

5. check libvirtd log, no errors

Comment 9 errata-xmlrpc 2019-08-06 13:13:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2019:2294