Bug 1269570

Summary: Running 'virsh list' with libvirt-0.10.2-54.el6.x86_64 causes "Failed to connect socket to '/var/run/libvirt/libvirt-sock': Connection refused"
Product: Red Hat Enterprise Linux 6 Reporter: Robert McSwain <rmcswain>
Component: libvirtAssignee: Jiri Denemark <jdenemar>
Status: CLOSED NOTABUG QA Contact: Virtualization Bugs <virt-bugs>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 6.5CC: adevolder, dyuan, fjin, jdenemar, mkletzan, rbalakri, rmcswain, xuzhang, yafu, yalzhang, zhwang
Target Milestone: rcKeywords: Reopened
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-10-25 11:59:24 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1269194, 1359965    
Attachments:
Description Flags
LIBVIRT_DEBUG=1 LIBVIRT_LOG_OUTPUTS=stderr libvirtd none

Description Robert McSwain 2015-10-07 14:47:44 UTC
Description of problem:
After applying the patch "libvirt-0.10.2-54.el6.x86_64.rpm" on physical server we are getting the below appended error message resulting inaccessibility of the virtual machine.

When the command "virsh list" is executed we get the below error message.

error: Failed to reconnect to the hypervisor
error: no valid connection
error: Failed to connect socket to '/var/run/libvirt/libvirt-sock': Connection refused

Version-Release number of selected component (if applicable):
libvirt-0.10.2-54.el6.x86_64
RHEL 6.5

How reproducible:
install libvirt-0.10.2-54.el6.x86_64 and attempt to start this customer's particular VM

Steps to Reproduce:
Other than running libvirt-0.10.2-54.el6.x86_64 with this particular VM, this is currently unknown.
 
Actual results:
VM is inaccessible

Expected results:
VM console is showed as normal

Additional info:
System is Sun x86, x2-4, and libvirt-0.10.2-29.el6_5.11.x86_64 does not show this same behavior and reliably resolves the issue. This is not a long term solution for this customer.

Troubleshooting attempted:

# service libvirtd stop
# ps auxw | grep <guest_name>


# ls /var/run/libvirtd.pid
# ls /var/run/libvirt/libvirt-sock*  
# ls /var/run/libvirt/qemu/<guest_name>*
# ls /var/lib/libvirt/qemu/<guest_name>*
# ls /var/lib/libvirt/qemu/save/*

If there is a libvirt-sock file or a save file relating to this VM, these were backed up

# mv /var/run/libvirt/libvirt-sock*  /root/
# mv /var/lib/libvirt/qemu/save/*

and then libvirtd was started and the status of the VM was checked again

# service libvirtd start
# service libvirtd status
# virsh list

Output from this will be added as a comment

Comment 3 Jiri Denemark 2015-10-08 07:48:05 UTC
> [root@hq1-beprod-s1 yum.repos.d]# service libvirtd start
> Starting libvirtd daemon:                                  [  OK  ]
> [root@hq1-beprod-s1 yum.repos.d]# service libvirtd status
> libvirtd dead but pid file exists

This suggests that libvirtd failed early after starting. Could you please attach /var/log/libvirt/libvirtd.log so that we can see the error?

Comment 4 Jiri Denemark 2015-10-08 08:37:09 UTC
BTW, I looked at the sosreport attached to the case (it should have been attached to this bz too) and the logs there are pretty strange and confusing. Partially, this is because they downgraded to an older running libvirt before capturing the sosreport. That's pretty useless. We need them to capture the sosreport after they installed -54 libvirt, and confirmed they can't connect to the daemon and restarting the daemon doesn't help.

Comment 11 Martin Kletzander 2016-01-20 10:02:58 UTC
One more question.  Does this happen *only* with 'virsh lsit' or with other commands too?  Could you try doing 'virsh destroy domain_name_that_does_NOT_exist' ?

Comment 13 Martin Kletzander 2016-02-02 08:00:59 UTC
Putting back the needinfo as we still need the debug logs, without them we don't have much to do.

Comment 14 Martin Kletzander 2016-02-10 15:11:37 UTC
Since there are no debug logs available and this issue is not reproducible for us (or anyone else as far as I can tell), I'm closing this as INSUFFICIENT_DATA, feel free to reopen this BZ (or rather create a new one) with that debug logs included.  More info about how to enable/use them can be found here: http://wiki.libvirt.org/page/DebugLogs

Comment 19 Robert McSwain 2016-03-23 20:06:51 UTC
Martin,

The customer reminded me that Debug logs are not being written because as soon as the affected version of libvirt is installed, the libvirt service is not started and hence no logs are written. What can we do to work around this?

In his words: 


Please note after I update to the version (-54) I am having problem and look for process there is no process started



ps -C libvirtd

  PID TTY          TIME CMD



This is after I revert back the packages (-29).

ps -C libvirtd

  PID TTY          TIME CMD

41137 ?        00:00:00 libvirtd

Comment 20 Martin Kletzander 2016-03-24 14:50:07 UTC
That doesn't feel right because the first VIR_DEBUG() call is way before the non-fatal error that is seen in the log.  And there's plenty more in between, so there should be many lines in the logfile.

If the debug logs cannot be gathered, I reckon someone should try debugging the daemon on-site to see where it fails/crashes.  This looks more like a crash, so anything like abrt/ulimit/gdb that you can use should suffice.

Comment 21 Robert McSwain 2016-04-04 13:50:55 UTC
Martin,

I'm not sure what else to tell the customer, as we can't seem to get these logs from the affected version of libvirt. As soon as they install the affected version, libvirtd fails to start the daemon and no logs are generated. As logs are generated only when the daemon is started successfully, is there anything else you can think of or any specific directions on how to instruct this customer to use gdb to get what would be most helpful here?

Comment 22 Martin Kletzander 2016-04-04 14:02:28 UTC
(In reply to Robert McSwain from comment #21)
Logs are generated even when the daemon cannot start, but let's say that's not possible (maybe some other bug).  Let's try one more thing for getting the logs a bit differently.  In the meantime I'll try to come up with other ideas how to move on with this.  After that affected package is installed, stop the service (even when it already crashed) and the as root run the daemon manually with the following command-line:

  LIBVIRT_DEBUG=1 LIBVIRT_LOG_OUTPUTS=stderr libvirtd

Comment 23 Martin Kletzander 2016-04-04 14:11:11 UTC
(In reply to Robert McSwain from comment #21)
As another idea, would it be possible for the customer to set up a system that experiences this error and provide some (at least limited) access to that system?  I'm guessing not, but I had to ask :)

Another way would be running the daemon with strace for example.  That will, however, generate lot of output and there's a very low chance it will show something we need.  You can also run it with gdb as said in some previous comments.  Just running "gdb libvirtd" as root and then typing "run" and enter should show whether it crashes or not.  If it does, use the command "t a a bt full" to get all the stacktraces so we know where it crashed.

However it doesn't look like it's crashing so that might not help either.  The last thing that I can think of right now is bisecting so we get closer to the change that caused it.  Doing git bisect and going commit by commit won't probably fly, so I'd suggest at least figuring out the exact package version that caused it.  Meaning find two packages with sequential release numbers (e.g. -51 and -52) where the older one works and the newer one does not.

I'll try to think of other options in case none of this is possible/helps, but for now I've got nothing else on my mind.

Comment 25 Robert McSwain 2016-05-06 13:22:41 UTC
Created attachment 1154648 [details]
LIBVIRT_DEBUG=1 LIBVIRT_LOG_OUTPUTS=stderr libvirtd

Comment 26 Martin Kletzander 2016-06-07 11:09:43 UTC
This is great news, there is valuable information in that log.  However it doesn't look like a problem with libvirt, maybe with libvirt's requirements.  But I would see it as a some kind of backward incompatibility of libdevmapper.  Jirka, could you check the builds are OK for us and move it to appropriate places?  Thanks.

Comment 27 Jiri Denemark 2016-10-25 11:59:24 UTC
This is actually a bug in device-mapper packages which started to export dm_task_get_info_with_deferred_remove symbol without bumping so version (mainly because the symbol was already there for some time, but it wasn't exported). But that confused dependency tracking since even an older library which does not export the symbol still satisfies the dependency. The bug should not show up on a fully updated system, though. So either don't install libvirt from 6.7 on a 6.5 system or manually update both libvirt and device-mapper-libs or just update the system completely.