Bug 1416043

Summary:	libvirtd loses connection to dbus and needs to be restarted
Product:	Red Hat Enterprise Linux 7	Reporter:	Andreas Bleischwitz <ableisch>
Component:	libvirt	Assignee:	Martin Kletzander <mkletzan>
Status:	CLOSED DUPLICATE	QA Contact:	yafu <yafu>
Severity:	high	Docs Contact:
Priority:	high
Version:	7.3	CC:	ableisch, alexander.hass, amureini, berrange, dmardones, dmoessne, dyuan, eblake, erik, gwatson, henning.sackewitz, linux, mkardeh, mtessun, nsoffer, peter.engel, pzhang, rbalakri, rworkman, sbonazzo, tnisan, xuzhang, zpeng
Target Milestone:	pre-dev-freeze	Keywords:	Reopened
Target Release:	---
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2017-03-21 10:04:38 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1420851

Description Andreas Bleischwitz 2017-01-24 13:09:34 UTC

Description of problem:
If for whatever reason libvirtd looses the connection to dbus, vdsm will no longer be able to communicate with libvirtd.

One will see vdsm-messages similar to the following:
---
libvirtd[]: error from service: ListActivatableNames: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.
libvirtd[]: Unable to get index for interface vnet3: No such device
---

Within /var/log/messages one could find messages like this:
---
journal: error from service: CheckAuthorization: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken
---

Version-Release number of selected component (if applicable):
* libvirt-daemon-1.2.17-13.el7_2.5.x86_64
* vdsm-4.17.33-1.el7ev.noarch
* dbus-1.6.12-13.el7.x86_64

How reproducible:
* stopp the dbus-daemon and dbus-socket services on a system, and try to start a VM on that host. You get:

  Jan 12 16:27:12 dhcp-49-15.bos.redhat.com libvirtd[9751]: error from service: ListActivatableNames: Connection is closed

The VM failed to start.

Restart libvirtd, and you will be able to start the VM on that host. In this specific case, libvirtd started up without some specific functionality as the dbus service was still down;

  Jan 12 16:32:51 dhcp-49-15.bos.redhat.com libvirtd[19288]: libvirt version: 1.2.17, package: 13.el7_2.5 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2016-05-26-07:48:46, x86-020.build.eng.bos.redhat.com)
  Jan 12 16:32:51 dhcp-49-15.bos.redhat.com libvirtd[19288]: internal error: Unable to get DBus system bus connection: Failed to connect to socket /var/run/dbus/system_bus_socket: Connection refused
  Jan 12 16:32:51 dhcp-49-15.bos.redhat.com libvirtd[19288]: DBus not available, disabling firewalld support in bridge_network_driver: internal error: Unable to get DBus system bus connection: Failed to connect to socket /var/run/dbus/system_bus_socket: Connection refused


So, what this shows is that libvirtd can run without dbus, but if it is started with dbus active and then dbus encounters a problem, libvirtd will not be able perform certain functions, like starting a VM, until it is restarted.

Actual results:
* virtual machine can not be managed (neither stopped, started, migrated)

Expected results:
* virtual machine can be managed

Comment 4 Daniel Berrangé 2017-01-24 16:22:25 UTC

AFAIK, it is *not* supported to restart the DBus system bus - it is expected to remain running uninterrupted for the duration of the OS boot. The libdbus client library also does not expect applications to reconnect to dbus - it just provides a single global dbus connection handle and applications aren't supposed to try recreate it. So if you do restart DBus daemon for whatever reason, it is normal that applications using dbus will break  and themselves need to be restarted. 

IOW, I don't think there's anything that needs fixing in libvirt here. Simply don't ever stop the dbus system daemon. In the event that dbus was mistakenly restarted, then simply restart libvirtd too.

Comment 5 Eric Blake 2017-01-24 16:45:47 UTC

Maybe systemd should be smarter about automatically restarting services that depend on the system bus - but if so, that's a systemd bug report, not libvirtd (unless the libvirtd.service file needs a one-line addition to inform systemd that it does indeed depend on the systemd bus)

Comment 9 Daniel Berrangé 2017-01-27 13:32:15 UTC

(In reply to Eric Blake from comment #5)
> Maybe systemd should be smarter about automatically restarting services that
> depend on the system bus - but if so, that's a systemd bug report, not
> libvirtd (unless the libvirtd.service file needs a one-line addition to
> inform systemd that it does indeed depend on the systemd bus)

If dbus daemon has stopped/restarted, then likely the entire OS should be restarted rather than trying to fix it up by figuring out which services need restarting.

Comment 12 Martin Tessun 2017-02-23 09:26:57 UTC

(In reply to Daniel Berrange from comment #9)
> (In reply to Eric Blake from comment #5)
> > Maybe systemd should be smarter about automatically restarting services that
> > depend on the system bus - but if so, that's a systemd bug report, not
> > libvirtd (unless the libvirtd.service file needs a one-line addition to
> > inform systemd that it does indeed depend on the systemd bus)
> 
> If dbus daemon has stopped/restarted, then likely the entire OS should be
> restarted rather than trying to fix it up by figuring out which services
> need restarting.

According to the process times, dbus hasn't been restarted, but somehow the libvirt-dbus-connection got dropped.

Any idea how to procede here?

From my point of view:
Workaround: Restart libvirt (could this be automated somehow?)
Solution: Find the reason why the connection drops and resolve this (any ideas on how to debug this, especially as it does not happen this often?)

So any ideas how to move the analysis of this one forward?

Some snippets from the processes:
dbus      1165  0.0  0.0  45664 12456 ?        Ssl   2016   3:05 /bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation
root      2616  2.4  0.0 961808 34444 ?        S<sl 14:14   0:25 /usr/bin/python /usr/share/vdsm/supervdsmServer --sockfile /var/run/vdsm/svdsm.sock
root      2636  0.3  0.0 1236080 21356 ?       Ssl  14:14   0:03 /usr/sbin/libvirtd --listen
vdsm      3638 97.4  0.0 3773852 65020 ?       S<sl 14:14  16:43 /usr/bin/python /usr/share/vdsm/vdsm
vdsm      3639  0.5  0.0 892392 22064 ?        Ssl  14:14   0:06 python /usr/sbin/momd -c /etc/vdsm/mom.conf
vdsm      3754  0.0  0.0 603812  1292 ?        S<l  14:14   0:00 /usr/libexec/ioprocess --read-pipe-fd 53 --write-pipe-fd 51 --max-threads 10 --max-queued-requests 10
vdsm      3814  0.0  0.0 382616  1132 ?        S<l  14:14   0:00 /usr/libexec/ioprocess --read-pipe-fd 71 --write-pipe-fd 70 --max-threads 10 --max-queued-requests 10

So dbus was started in 2016 and libvirt was (re-)started 2017-01-09 14:14

Before the restart libvirt, vdsm and dbus looked like:
root      1155  7.3  0.0 1543456 35308 ?       S<sl  2016 3301:27 /usr/bin/python /usr/share/vdsm/supervdsmServer --sockfile /var/run/vdsm/svdsm.sock
dbus      1165  0.0  0.0  45532 12196 ?        Ssl   2016   3:05 /bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation
root      2052 21.1  0.0 1308424 285836 ?      Ssl   2016 9468:21 /usr/sbin/libvirtd --listen
vdsm      5529 30.6  0.0 5576832 431476 ?      S<sl  2016 13712:09 /usr/bin/python /usr/share/vdsm/vdsm
vdsm      5533  0.6  0.0 1039856 138616 ?      Ssl   2016 289:54 python /usr/sbin/momd -c /etc/vdsm/mom.conf

As the problem didn't occur earlier than 2017-01, I would expect that somehow the libvirt-dbus connection got closed/lost, which should probably not happen or am I missing something here?

Comment 13 Daniel Berrangé 2017-02-23 09:54:49 UTC

(In reply to Martin Tessun from comment #12)
> (In reply to Daniel Berrange from comment #9)
> > (In reply to Eric Blake from comment #5)
> > > Maybe systemd should be smarter about automatically restarting services that
> > > depend on the system bus - but if so, that's a systemd bug report, not
> > > libvirtd (unless the libvirtd.service file needs a one-line addition to
> > > inform systemd that it does indeed depend on the systemd bus)
> > 
> > If dbus daemon has stopped/restarted, then likely the entire OS should be
> > restarted rather than trying to fix it up by figuring out which services
> > need restarting.
> 
> According to the process times, dbus hasn't been restarted, but somehow the
> libvirt-dbus-connection got dropped.

UNIX domain sockets simply don't break on their own - one end must have terminated the connection. If dbus wasn't restarted, either dbus closed the socket, or libvirt mistakenly closed its end of the socket. Unless the scenario is reliably reproducable this is pretty much impossible to debug further.
 
> Any idea how to procede here?

CLOSED -> WORKSFORME / INSUFFICIENT_DATA since AFAICT, there's not enough to show what's happening and without it being reliably reproducable, there's no way to get the required info.

Comment 14 Martin Tessun 2017-02-24 11:11:55 UTC

(In reply to Daniel Berrange from comment #13)
> (In reply to Martin Tessun from comment #12)
> > 
> > According to the process times, dbus hasn't been restarted, but somehow the
> > libvirt-dbus-connection got dropped.
> 
> UNIX domain sockets simply don't break on their own - one end must have
> terminated the connection. If dbus wasn't restarted, either dbus closed the
> socket, or libvirt mistakenly closed its end of the socket. Unless the
> scenario is reliably reproducable this is pretty much impossible to debug
> further.
>  
> > Any idea how to procede here?
> 
> CLOSED -> WORKSFORME / INSUFFICIENT_DATA since AFAICT, there's not enough to
> show what's happening and without it being reliably reproducable, there's no
> way to get the required info.

Ok. So anything, we can ask the affected customers to run/collect to being able to further debug this issue?
I believe we just need to ask for the data we need to further debugging the issue, so we can analyze it.

Thanks!
Martin

Comment 15 Mota Kardeh 2017-02-28 16:42:20 UTC

Hi All,

we had the same issue more times on our systems in SAP LinuxLab. SAP colleagues are running permanently tests on that.

As duplicate with this bug, we closed the https://bugzilla.redhat.com/show_bug.cgi?id=1422556 - VM occasionally losing disk.

I would like to share the info on our system

----------------------------------------------------------------
"libvirt" and "qemu" version are:

~# rpm -q libvirt
libvirt-2.0.0-10.el7_3.4.x86_64

~# rpm -qa | grep qemu-
qemu-img-rhev-2.6.0-28.el7_3.3.x86_64
qemu-kvm-common-rhev-2.6.0-28.el7_3.3.x86_64
qemu-kvm-tools-rhev-2.6.0-28.el7_3.3.x86_64
ipxe-roms-qemu-20160127-5.git6366fa7a.el7.noarch
qemu-kvm-rhev-2.6.0-28.el7_3.3.x86_64
libvirt-daemon-driver-qemu-2.0.0-10.el7_3.4.x86_64
-------------------------------------------------------------

I attached the system logs, vdsm log file and sosreport-LogCollector.tar.xz to the Bug 1422556. Maybe these help you also to analyse the error.

Two questions would be:

1- Why is libvirtd getting into a state where the "ListActivatableName" error produced?

2- What is the reason for deactivated disk or missing disk? 

Thanks and best regards,
Mota

Comment 16 Mota Kardeh 2017-03-03 14:29:35 UTC

(In reply to Martin Tessun from comment #14)
> (In reply to Daniel Berrange from comment #13)
> > (In reply to Martin Tessun from comment #12)
> > > 
> > > According to the process times, dbus hasn't been restarted, but somehow the
> > > libvirt-dbus-connection got dropped.
> > 
> > UNIX domain sockets simply don't break on their own - one end must have
> > terminated the connection. If dbus wasn't restarted, either dbus closed the
> > socket, or libvirt mistakenly closed its end of the socket. Unless the
> > scenario is reliably reproducable this is pretty much impossible to debug
> > further.
> >  
> > > Any idea how to procede here?
> > 
> > CLOSED -> WORKSFORME / INSUFFICIENT_DATA since AFAICT, there's not enough to
> > show what's happening and without it being reliably reproducable, there's no
> > way to get the required info.
> 
> Ok. So anything, we can ask the affected customers to run/collect to being
> able to further debug this issue?
> I believe we just need to ask for the data we need to further debugging the
> issue, so we can analyze it.

Martin,

We got again the issue on our VM.

Please let me know, if you need any additional info regarding to run/collect any log files to debug and analyze the issue.

P.S. I tried to call you a couple of minutes ago.

> 
> Thanks!
> Martin

Comment 17 Martin Tessun 2017-03-03 14:36:41 UTC

Hi Mota,

I am not sure what is needed to further analyze this is. I also asked for what is needed.

@Daniel: Anything that can be collected to get more informaton on this issue?

Thanks!
Martin

Comment 18 Jaroslav Suchanek 2017-03-03 15:05:38 UTC

Assigning to Martin for further discussion about libvirtd dependency on
systemd's bus restart, as was discussed by Daniel and Eric in comment 4 and
comment 5.

Please note that we might not be able to find the root cause of this issue and
close it the same way Daniel did. If you are reopening it please provide your
justification for doing it.

Related changes in handling connection to systemd bus were proposed recently
upstream by Jan Tomko:
https://www.redhat.com/archives/libvir-list/2017-February/msg01271.html

Comment 20 Daniel Berrangé 2017-03-03 16:17:03 UTC

The only think we can really do is turn up libvirtd debug logs to a high level, but that'll generate many 100's of MBs of logs in any host that's actively used. So identifying the problem in there will be very hard, if not impossible. Realistically we're only likely to be able to solve it if someone can identify a way to reliably reproduce the problem in clean system.

Comment 21 Mota Kardeh 2017-03-03 16:32:03 UTC

(In reply to Daniel Berrange from comment #20)
> The only think we can really do is turn up libvirtd debug logs to a high
> level, but that'll generate many 100's of MBs of logs in any host that's
> actively used. So identifying the problem in there will be very hard, if not
> impossible. Realistically we're only likely to be able to solve it if
> someone can identify a way to reliably reproduce the problem in clean system.

Daniel,
the issue is reproduced on our different systems. So, I have not yet get the reply of comment #17. 
Anything that can be collected to get more information on this issue? I attached the systemlogs, sosreport-LogCollector to the Bug 1422556. 
Maybe is there any way to collect the debug log on only the host, that the issue happened there?

Regards,
Mota

Comment 22 Martin Tessun 2017-03-07 08:05:53 UTC

Hi Mota,

(In reply to Mota Kardeh from comment #21)
> (In reply to Daniel Berrange from comment #20)
> > The only think we can really do is turn up libvirtd debug logs to a high
> > level, but that'll generate many 100's of MBs of logs in any host that's
> > actively used. So identifying the problem in there will be very hard, if not
> > impossible. Realistically we're only likely to be able to solve it if
> > someone can identify a way to reliably reproduce the problem in clean system.
> 
> Daniel,
> the issue is reproduced on our different systems. So, I have not yet get the
> reply of comment #17. 
> Anything that can be collected to get more information on this issue? I
> attached the systemlogs, sosreport-LogCollector to the Bug 1422556. 
> Maybe is there any way to collect the debug log on only the host, that the
> issue happened there?

as said by Daniel in comment 20 the thinug to do is, to turn up the libvirtd logs to debug level.
Please note that this will generate lots of logs, so be sure to have enough space for the logs available and don't forget to reduce the logging back to normal once the situation has been successfully captured.

Also please note that identifying the problem in these logs might not be possible due to the amounts of logs generated.

The loglevel is defined in /etc/libvirt/libvirtd.log:

log_level=1

Also the location should be set to a separate file, as systemd will probably not log everything. So setting

log_outputs="1:file:/<FILE_PATH>

should also be set.

Please zip and upload the logs to a location we can access and notify us, once the logfiles are there.

BTW: It is probably best to open a support case for this, so that a Support Engineer can have a look at the logs first.

> 
> Regards,
> Mota


Cheers,
Martin

Comment 23 Mota Kardeh 2017-03-07 10:33:45 UTC

(In reply to Martin Tessun from comment #22)
> Hi Mota,
> 
> (In reply to Mota Kardeh from comment #21)
> > (In reply to Daniel Berrange from comment #20)
> > > The only think we can really do is turn up libvirtd debug logs to a high
> > > level, but that'll generate many 100's of MBs of logs in any host that's
> > > actively used. So identifying the problem in there will be very hard, if not
> > > impossible. Realistically we're only likely to be able to solve it if
> > > someone can identify a way to reliably reproduce the problem in clean system.
> > 
> > Daniel,
> > the issue is reproduced on our different systems. So, I have not yet get the
> > reply of comment #17. 
> > Anything that can be collected to get more information on this issue? I
> > attached the systemlogs, sosreport-LogCollector to the Bug 1422556. 
> > Maybe is there any way to collect the debug log on only the host, that the
> > issue happened there?
> 
> as said by Daniel in comment 20 the thinug to do is, to turn up the libvirtd
> logs to debug level.
> Please note that this will generate lots of logs, so be sure to have enough
> space for the logs available and don't forget to reduce the logging back to
> normal once the situation has been successfully captured.
> 
> Also please note that identifying the problem in these logs might not be
> possible due to the amounts of logs generated.
> 
> The loglevel is defined in /etc/libvirt/libvirtd.log:
> 
Martin,

thanks for your update. :-)
Maybe you mean /etc/libvirt/libvirtd.conf, because /etc/libvirt/libvirtd.log doesn't exist in the system.

> log_level=1
> 
> Also the location should be set to a separate file, as systemd will probably
> not log everything. So setting
> 
> log_outputs="1:file:/<FILE_PATH>

It would be great to know, how can this separate file help us. Because the all required log collector file are attached to the Bug 1422556 and sosreport-LogCollector.tar.xz under:

https://drive.google.com/open?id=0BwWzNYssmDVabVo1THFHdXVFS2c

> 
> should also be set.
> 
> Please zip and upload the logs to a location we can access and notify us,
> once the logfiles are there.
> 
> BTW: It is probably best to open a support case for this, so that a Support
> Engineer can have a look at the logs first.

Unfortunately we can not open a support case for this, because we are here software engineers and the part of the support task is related to somebody from support team.

Thanks and best regards,
Mota
> 
> > 
> > Regards,
> > Mota
> 
> 
> Cheers,
> Martin

Comment 24 Martin Tessun 2017-03-07 14:33:17 UTC

(In reply to Mota Kardeh from comment #23)
> (In reply to Martin Tessun from comment #22)
> > The loglevel is defined in /etc/libvirt/libvirtd.log:
> > 
> Martin,
> 
> thanks for your update. :-)
> Maybe you mean /etc/libvirt/libvirtd.conf, because /etc/libvirt/libvirtd.log
> doesn't exist in the system.

Yes, indeed. Sorry for the typo.

> > Also the location should be set to a separate file, as systemd will probably
> > not log everything. So setting
> > 
> > log_outputs="1:file:/<FILE_PATH>
> 
> It would be great to know, how can this separate file help us. Because the
> all required log collector file are attached to the Bug 1422556 and
> sosreport-LogCollector.tar.xz under:

The debug level is the setting that does change the things. As standard logging is through syslog/systemd, you don't want this amount of data processed there. As such I proposed to write the debug data to a separate file instead of syslog.

Does this make it clear?

Cheers,
Martin

Comment 25 Mota Kardeh 2017-03-13 10:32:23 UTC

Hello Martin, 

Sorry for delay. I asked Jaroslav regarding to the issue, what is the best strategy to continue with the process?  Shall we continue working on this Bug or primary Bug of our issue Bug 1422556.

To get the debug log of the libvirt, I run the following steps:

- open /etc/libvirt/libvirtd.conf
- find & replace, or set these variables:
   log_level = 1
   log_filters="3:remote 4:event 3:json 3:rpc"
   log_outputs="1:file:/var/log/libvirt/libvirtd.log"
- save and exit
- restart libvirtd service
   systemctl restart libvirtd.service

The libvirt.log file of the system is available under:
https://drive.google.com/open?id=0BwWzNYssmDVaWE11bTVZQ0ZRSGM

Please let me know if you need any additional log files or Info.

Thanks and best regards,
Mota

Comment 26 Martin Kletzander 2017-03-21 10:04:38 UTC

Since the BZs are going the same direction, I'm closing this one as a duplicate so that we don't get confused even more.

*** This bug has been marked as a duplicate of bug 1422556 ***

Comment 27 Robby Workman 2017-06-11 19:43:32 UTC

I hate to comment on a closed bug, but I think we're running into this issue after enabling polkit access controls, and since the dupe bug is private, it's impossible to see ongoing progress. Any hints or tips for mitigation?

Comment 28 Martin Kletzander 2017-06-12 20:47:46 UTC

(In reply to Robby Workman from comment #27)
Oh, the private settings.  I don't know why the other bug s private, I'll ask about that.  Anyway, there is not much progress.  It looks like the problem is in the dbus daemon as the connection cannot break in any other place.  It must be an overloaded system or a bug along the way, but the disconnection is not easy to achieve on a UNIX socket otherwise.

Comment 29 Robby Workman 2017-06-12 21:45:04 UTC

What version of dbus are you other folks running? We're on 1.10.8 on the affected system, and in looking over the dbus git log for 1.10.x branch, I see this commit which went into 1.10.14:

commit 3f407671ecf821eb38ea7af5b160bfb93a9f4584
Author: Simon McVittie <simon.mcvittie.uk>
Date:   Fri Nov 11 16:40:44 2016 +0000

    Make uid 0 immune to pending_fd_timeout limit
    
    This is a workaround for
    <https://bugs.freedesktop.org/show_bug.cgi?id=95263>. If a service
    sends a file descriptor sufficiently frequently that its queue of
    messages never goes down to 0 fds pending, then it will eventually be
    disconnected. logind is one such service.
    
    We do not currently have a good solution for this: the proposed
    patches either don't work, or reintroduce a denial of service
    security vulnerability (CVE-2014-3637). Neither seems desirable.
    However, we can avoid the worst symptoms by trusting uid 0 not to be
    malicious.

...

That sounds like a possibility perhaps...

Comment 30 Robby Workman 2017-06-14 15:27:01 UTC

Nope, no change with dbus-1.10.18 now. Sigh. :/

Comment 31 Robby Workman 2017-10-09 17:28:20 UTC

I might be getting hopes up for nothing, but with libvirt-3.8.0 plus the patch from below, we've been up for over 24 hours with multiple virt-manager instances connected and monitoring with no lost connections. At the very least, this seems to have helped...

commit 65a983eca1f44eaf2a209ae232f72aa659184945
Author: Kothapally Madhu Pavan <kmp.ibm.com>
Date:   Wed Oct 4 14:08:45 2017 +0530

    util: Free a pointer in virPolkitCheckAuth
    
    Free DBusMessage pointer in virPolkitCheckAuth
    
    Signed-off-by: Kothapally Madhu Pavan <kmp.ibm.com>

diff --git a/src/util/virpolkit.c b/src/util/virpolkit.c
index c735ca9b1..4559431ba 100644
--- a/src/util/virpolkit.c
+++ b/src/util/virpolkit.c
@@ -138,6 +138,7 @@ int virPolkitCheckAuth(const char *actionid,
 
  cleanup:
     virStringListFreeCount(retdetails, nretdetails);
+    virDBusMessageUnref(reply);
     return ret;
 }

Comment 32 Martin Kletzander 2017-10-10 10:43:39 UTC

(In reply to Robby Workman from comment #31)
I don't think this has anything to do with it, but I'm glad to hear this is better for you now.  Are you sure you did not upgrade anything else on the system?

Comment 33 Daniel Berrangé 2017-10-10 10:46:49 UTC

(In reply to Robby Workman from comment #31)
> I might be getting hopes up for nothing, but with libvirt-3.8.0 plus the
> patch from below, we've been up for over 24 hours with multiple virt-manager
> instances connected and monitoring with no lost connections. At the very
> least, this seems to have helped...
> 
> commit 65a983eca1f44eaf2a209ae232f72aa659184945
> Author: Kothapally Madhu Pavan <kmp.ibm.com>
> Date:   Wed Oct 4 14:08:45 2017 +0530
> 
>     util: Free a pointer in virPolkitCheckAuth
>     
>     Free DBusMessage pointer in virPolkitCheckAuth
>     
>     Signed-off-by: Kothapally Madhu Pavan <kmp.ibm.com>
> 
> diff --git a/src/util/virpolkit.c b/src/util/virpolkit.c
> index c735ca9b1..4559431ba 100644
> --- a/src/util/virpolkit.c
> +++ b/src/util/virpolkit.c
> @@ -138,6 +138,7 @@ int virPolkitCheckAuth(const char *actionid,
>  
>   cleanup:
>      virStringListFreeCount(retdetails, nretdetails);
> +    virDBusMessageUnref(reply);
>      return ret;
>  }

I don't see how this could resolve problems with the dbus connection as it is merely a memory leak fix.

Comment 34 Robby Workman 2017-10-10 17:52:55 UTC

Note that the problem first appeared when we enabled polkit auth in libvirtd.conf
with the following:
  auth_unix_rw = "polkit"
  access_drivers = [ "polkit" ]

Aside from routine distribution-provided updates, here's a timeline of upgraded packages since we first encountered this problem (I first commented on Jun 11):

Jun 15  /var/log/packages/polkit-0.113-x86_64-1_spike14.2
Jul 27  /var/log/packages/dbus-1.10.20-x86_64-1_spike14.2
Sep 19  /var/log/packages/libasr-1.0.2-x86_64-1_spike14.2
Sep 19  /var/log/packages/opensmtpd-6.0.2p1-x86_64-1_spike14.2
Oct 7  /var/log/packages/libvirt-3.8.0-x86_64-1_spike14.2
Oct 7  /var/log/packages/libvirt-python-3.8.0-x86_64-1_spike14.2
Oct 7  /var/log/packages/libvirt-glib-1.0.0-x86_64-2_spike14.2

The custom polkit package includes several memleak fixes from the git repo.
This did not seem to have any effect on the bug.

The custom dbus package is an upgrade from 1.10.8 (the distro version).
This did not seem to have any effect on the bug.

libasr and opensmtpd should be irrelevant.

What's left are libvirt and friends, wherein libvirt 3.8.0 package also includes that patch referenced above. 

It may well not be the fix, but after digging through what seemed like infinite lines of debug output without a single clue being found, and knowing that there wasn't a problem at all without the polkit access controls in use (this machine is one of several virt hosts I have running almost identical setups other than polkit access controls), I've basically been watching git logs of libvirt, polkit, and dbus with a large amount of hope that someone stumbles across more than I did. 

On a slightly related note, when closing a bug as a dupe of another, it would be nice if the one remaining open isn't marked as private.

Comment 35 Martin Kletzander 2017-10-11 11:41:02 UTC

(In reply to Robby Workman from comment #34)
I asked for it to be publicized.