Bug 665075 - minor libvirt API break in error reporting
Summary: minor libvirt API break in error reporting
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: libvirt
Version: 5.6
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: rc
: 5.7
Assignee: Eric Blake
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-12-22 16:47 UTC by Eric Blake
Modified: 2011-07-21 10:32 UTC (History)
9 users (show)

Fixed In Version: libvirt-0.8.2-19.el5
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-07-21 10:32:26 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2011:1019 0 normal SHIPPED_LIVE Moderate: libvirt security, bug fix, and enhancement update 2011-07-21 10:31:00 UTC

Description Eric Blake 2010-12-22 16:47:03 UTC
Description of problem:
Upstream libvirt 0.8.0 introduced a minor API break in the definition of enum virErrorNumber, which is passed over the wire between server and client.  Since libvirt was rebased from 0.6.3 to 0.8.2 in the move from RHEL 5.5 to RHEL 5.6, this means that anyone using RHEL 5.6 to talk to an external libvirt server running libvirt versions 0.7.1 through 0.7.7 may misinterpret the virErrorNumber and print a misleading message for failures related to any of the affected enum values.  Fortunately, there is no way to crash a server or client due to the API difference; the extent of the problem is merely a misleading error message.

Version-Release number of selected component (if applicable):
libvirt-0.8.2-15.el5

How reproducible:
so far, this problem was found by code inspection; I haven't tried to actually create a testcase where it makes a difference, but think this will work:

Steps to Reproduce:
1. Use a RHEL 5.6 virsh client (libvirt 0.8.2) to connect to an external (non-RHEL) libvirt server, where the server is in the range of 0.7.1 to 0.7.7 inclusive.
2. Trigger an action that causes the server to report a failure with one of the affected virErrorNumber values.  To be more specific: it should be possible to trigger VIR_ERR_CONFIG_UNSUPPORTED by specifying a timezone in the <clock> element of the XML for a xen domain (see xen/xend_internal.c:xenDaemonFormatSxpr).
3. The libvirt client on RHEL will issue a misleading message; in the VIR_ERR_CONFIG_UNSUPPORTED case, the client will use the error msaage for VIR_ERR_BUILD_FIREWALL instead.
  
Actual results:


Expected results:


Additional info:
An upstream patch to teach clients how to interoperate with older servers was posted here:
https://www.redhat.com/archives/libvir-list/2010-December/msg00855.html
This patch should be backported to RHEL 5.7, but the issue is probably not severe enough to warrant worrying about it in RHEL 5.6.

In the converse direction, there's no easy way to make the server hand back modified enum values to an external libvirt 0.7.* client, but as 0.7.* was lacking APIs that would trigger the error messages that changed in 0.8.0, this is not likely to be much of a problem.

Comment 5 Eric Blake 2011-05-12 02:50:40 UTC
Another case that might reproduce the issue, using qemu rather than requiring a libvirt server on a xen host, might be as follows (also untested by me):

1. Two machines, client RHEL 5 machine with pre- and post-patch builds to be tested, and server on any other machine running a self-built libvirt from an 0.7.7 tarball
2. start a qemu guest running on the second machine
3. attach gdb to the libvirtd on the second machine, and register a breakpoint in virCondWaitUntil, and resume the program
4. from the RHEL 5 client, create a remote connection to the second machine and trigger virDomainSuspend of the running domain (via 'virsh -c ssh://second/system pause domain')
5. when gdb breaks inside virCondWaitUntil on the server, corrupt the timespec before the pthread_cond_timedwait call (via 'p ts.tv_nsec=-10), then step past the system call (which should fail with EINVAL), then change errno (via 'p ret=ETIMEDOUT') prior to the code that assigns 'errno = ret'.  This should be sufficient to make the caller (qemuDomainObjBeginJobWithDriver) see a failure and go ahead and report VIR_ERR_OPERATION_TIMEDOUT with "cannot acquire state change lock"
6. resume libvirt on the second machine, and the client on the first machine will see VIR_ERR_OPERATION_TIMEDOUT "Timed out during operation" if the libvirt client is patched, but VIR_ERR_INVALID_NWFILTER "Invalid network filter" if the client is not compensating for the older server's response being inconsistent with the client's expectations.

Comment 6 Vivian Bian 2011-05-13 08:06:49 UTC
(In reply to comment #5)

> 5. when gdb breaks inside virCondWaitUntil on the server, corrupt the timespec
> before the pthread_cond_timedwait call (via 'p ts.tv_nsec=-10), then step past
> the system call (which should fail with EINVAL), then change errno (via 'p
> ret=ETIMEDOUT') prior to the code that assigns 'errno = ret'.  This should be
> sufficient to make the caller (qemuDomainObjBeginJobWithDriver) see a failure
> and go ahead and report VIR_ERR_OPERATION_TIMEDOUT with "cannot acquire state
> change lock"

Hi Eric , 
Today I installed libvirt-debuginfo , and with the gdb libvirtd started on the second machine , I will get the following result :
(gdb) p ts.tv_nsec=-10
No symbol "ts" in current context.
(gdb) p ret='ETIMEDOUT'
No symbol "ETIMEDOUT" in current context.

So anything I missed to installed on the second machine ? 
And btw after perform 'p ts.tv_nsec=-10' what should I do to step past the system call  (which should fail with EINVAL) ? 

Addtional info :
What was installed on the second machine :
libvirt-client-0.7.7-3.fc13.x86_64
libvirt-0.7.7-3.fc13.x86_64
libvirt-python-0.7.7-3.fc13.x86_64
libvirt-debuginfo-0.7.7-3.fc13.x86_64
qemu-common-0.12.3-8.fc13.x86_64
qemu-debuginfo-0.13.0-1.fc13.x86_64
gpxe-roms-qemu-1.0.0-1.fc13.noarch
qemu-img-0.12.3-8.fc13.x86_64
qemu-system-x86-0.12.3-8.fc13.x86_64
qemu-kvm-0.12.3-8.fc13.x86_64


Thanks
Vivian

Comment 7 Vivian Bian 2011-05-13 08:07:07 UTC
(In reply to comment #5)

> 5. when gdb breaks inside virCondWaitUntil on the server, corrupt the timespec
> before the pthread_cond_timedwait call (via 'p ts.tv_nsec=-10), then step past
> the system call (which should fail with EINVAL), then change errno (via 'p
> ret=ETIMEDOUT') prior to the code that assigns 'errno = ret'.  This should be
> sufficient to make the caller (qemuDomainObjBeginJobWithDriver) see a failure
> and go ahead and report VIR_ERR_OPERATION_TIMEDOUT with "cannot acquire state
> change lock"

Hi Eric , 
Today I installed libvirt-debuginfo , and with the gdb libvirtd started on the second machine , I will get the following result :
(gdb) p ts.tv_nsec=-10
No symbol "ts" in current context.
(gdb) p ret='ETIMEDOUT'
No symbol "ETIMEDOUT" in current context.

So anything I missed to installed on the second machine ? 
And btw after perform 'p ts.tv_nsec=-10' what should I do to step past the system call  (which should fail with EINVAL) ? 

Addtional info :
What was installed on the second machine :
libvirt-client-0.7.7-3.fc13.x86_64
libvirt-0.7.7-3.fc13.x86_64
libvirt-python-0.7.7-3.fc13.x86_64
libvirt-debuginfo-0.7.7-3.fc13.x86_64
qemu-common-0.12.3-8.fc13.x86_64
qemu-debuginfo-0.13.0-1.fc13.x86_64
gpxe-roms-qemu-1.0.0-1.fc13.noarch
qemu-img-0.12.3-8.fc13.x86_64
qemu-system-x86-0.12.3-8.fc13.x86_64
qemu-kvm-0.12.3-8.fc13.x86_64


Thanks
Vivian

Comment 8 Osier Yang 2011-06-02 07:44:34 UTC
Vivian, I guess your gdb didn't breaks at virCondWaitUntil. As virCondWaitUntil will only be invoked when there is already a running job, and you trying to start another job.

<quote>
4. from the RHEL 5 client, create a remote connection to the second machine and
trigger virDomainSuspend of the running domain (via 'virsh -c
ssh://second/system pause domain')
</quote>

Change above step into following will work:

4. from the RHEL 5 client, connect to server and try to do a coredump ("virsh -c qemu+ssh://second/system dump $domain $domain.dump"), before the dumping completes, open another terminal, and try to pause the domain ("virsh -c qemu+ssh://second/system suspend $domain").

Then your gdb will breaks into virCondWaitUntil, then you can do as left steps as Eric wrote.

Comment 9 Vivian Bian 2011-06-02 09:40:42 UTC
Tried following steps to verify this bug . And passed . 


[Verified versions]

libvirt-0.7.7-3.fc13.x86_64
libvirt-0.8.2-19.el5

[Client] libvirt-0.8.2-19.el5
# virsh -c qemu+ssh://second/system suspend test
root@second's password: 
error: Failed to suspend domain test
error: Timed out during operation: cannot acquire state change lock


[Server] libvirt-0.7.7-3.fc13.x86_64

(gdb) b virCondWaitUntil
Breakpoint 1 at 0x42d450: file util/threads-pthread.c, line 96.
(gdb) r
Starting program: /usr/sbin/libvirtd 
[Thread debugging using libthread_db enabled]
[New Thread 0x7ffff6f99710 (LWP 10251)]
[New Thread 0x7ffff6598710 (LWP 10252)]
[New Thread 0x7ffff5b97710 (LWP 10253)]
Detaching after fork from child process 10254.
[New Thread 0x7ffff5196710 (LWP 10255)]
Detaching after fork from child process 10257.
[New Thread 0x7fffeffff710 (LWP 10258)]
[New Thread 0x7fffef5fe710 (LWP 10259)]
Detaching after fork from child process 10260.
Detaching after fork from child process 10261.
Detaching after fork from child process 10262.
Detaching after fork from child process 10263.
Detaching after fork from child process 10264.
Detaching after fork from child process 10265.
Detaching after fork from child process 10266.
Detaching after fork from child process 10267.
Detaching after fork from child process 10268.
Detaching after fork from child process 10269.
Detaching after fork from child process 10270.
Detaching after fork from child process 10271.
Detaching after fork from child process 10272.
Detaching after fork from child process 10273.
Detaching after fork from child process 10274.
Detaching after fork from child process 10275.
Detaching after fork from child process 10276.
Detaching after fork from child process 10277.
17:25:37.610: warning : qemudStartup:1150 : Unable to create cgroup for driver: No such device or address
Detaching after fork from child process 10278.
Detaching after fork from child process 10279.
Detaching after fork from child process 10280.
Detaching after fork from child process 10281.
Detaching after fork from child process 10282.
Detaching after fork from child process 10283.
Detaching after fork from child process 10284.
Detaching after fork from child process 10285.
Detaching after fork from child process 10286.
Detaching after fork from child process 10287.
17:25:37.883: warning : lxcStartup:1748 : Unable to create cgroup for driver: No such device or address
[Switching to Thread 0x7ffff6598710 (LWP 10252)]

Breakpoint 1, virCondWaitUntil (c=0x8dd900, m=0x8dd8a0, whenms=1307006777426)
    at util/threads-pthread.c:96
96	    ts.tv_sec = whenms / 1000;
(gdb) s
92	{
(gdb) s
96	    ts.tv_sec = whenms / 1000;
(gdb) s
92	{
(gdb) s
96	    ts.tv_sec = whenms / 1000;
(gdb) s
92	{
(gdb) s
96	    ts.tv_sec = whenms / 1000;
(gdb) s
97	    ts.tv_nsec = (whenms % 1000) * 1000;
(gdb) p ts.tv_nsec=-10
$1 = -10
(gdb) s
99	    if ((ret = pthread_cond_timedwait(&c->cond, &m->lock, &ts)) != 0) {
(gdb) p errno=110
$2 = 110
(gdb) s
97	    ts.tv_nsec = (whenms % 1000) * 1000;
(gdb) p ts.tv_nsec=-10
$3 = -10
(gdb) s
99	    if ((ret = pthread_cond_timedwait(&c->cond, &m->lock, &ts)) != 0) {
(gdb) s
pthread_cond_timedwait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:53
53		pushq	%r12
(gdb) p errno=110
$4 = 110
(gdb) s
56		pushq	%r13
(gdb) fin
Run till exit from #0  pthread_cond_timedwait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:56
0x000000000042d492 in virCondWaitUntil (c=<value optimized out>, 
    m=<value optimized out>, whenms=<value optimized out>)
    at util/threads-pthread.c:99
99	    if ((ret = pthread_cond_timedwait(&c->cond, &m->lock, &ts)) != 0) {
(gdb) p errno=110
$5 = 110
(gdb) s
100	        errno = ret;
(gdb) p errno=110
$6 = 110
(gdb) s
__errno_location () at ../csu/errno-loc.c:34
34	{
(gdb) s
__errno_location () at ../csu/errno-loc.c:36
36	}
(gdb) s
virCondWaitUntil (c=<value optimized out>, m=<value optimized out>, 
    whenms=<value optimized out>) at util/threads-pthread.c:101
101	        return -1;
(gdb) s
104	}
(gdb) s
qemuDomainObjBeginJobWithDriver (driver=0x8d9e30, obj=0x8dd8a0)
    at qemu/qemu_driver.c:377
377	            virDomainObjUnref(obj);
(gdb) s
virDomainObjUnref (dom=0x8dd8a0) at conf/domain_conf.c:705
705	{
(gdb) s
707	    VIR_DEBUG("obj=%p refs=%d", dom, dom->refs);
(gdb) s
705	{
(gdb) s
706	    dom->refs--;
(gdb) s
707	    VIR_DEBUG("obj=%p refs=%d", dom, dom->refs);
(gdb) s
virLogMessage (category=0x3d29b0e9f7 "file.conf/domain_conf.c", priority=1, 
    funcname=0x3d29b10b60 "virDomainObjUnref", linenr=707, flags=0, 
    fmt=0x3d29b0e9e8 "obj=%p refs=%d") at util/logging.c:517
517	                   long long linenr, int flags, const char *fmt, ...) {
(gdb) s
524	    if (!virLogInitialized)
(gdb) s
518	    char *str = NULL;
(gdb) s
524	    if (!virLogInitialized)
(gdb) s
527	    if (fmt == NULL)
(gdb) s
533	    fprio = virLogFiltersCheck(category);
(gdb) s
virLogFiltersCheck (category=0x3d29b0e9f7 "file.conf/domain_conf.c", 
    priority=1, funcname=0x3d29b10b60 "virDomainObjUnref", linenr=707, 
    flags=0, fmt=0x3d29b0e9e8 "obj=%p refs=%d") at util/logging.c:422
422	    virLogLock();
(gdb) s
virLogLock () at util/logging.c:138
138	    virMutexLock(&virLogMutex);
(gdb) s
virMutexLock (m=0x6ca640) at util/threads-pthread.c:52
52	    pthread_mutex_lock(&m->lock);
(gdb) s
__pthread_mutex_lock (mutex=0x6ca640) at pthread_mutex_lock.c:50
50	  unsigned int type = PTHREAD_MUTEX_TYPE (mutex);
(gdb) s
47	{
(gdb) s
51	  if (__builtin_expect (type & ~PTHREAD_MUTEX_KIND_MASK_NP, 0))
(gdb) s
50	  unsigned int type = PTHREAD_MUTEX_TYPE (mutex);
(gdb) s
51	  if (__builtin_expect (type & ~PTHREAD_MUTEX_KIND_MASK_NP, 0))
(gdb) s
54	  pid_t id = THREAD_GETMEM (THREAD_SELF, tid);
(gdb) s
56	  if (__builtin_expect (type, PTHREAD_MUTEX_TIMED_NP)
(gdb) s
61	      LLL_MUTEX_LOCK (mutex);
(gdb) s
127	  ++mutex->__data.__nusers;
(gdb) s
125	  mutex->__data.__owner = id;
(gdb) s
127	  ++mutex->__data.__nusers;
(gdb) s
__pthread_mutex_lock (mutex=0x6ca640) at pthread_mutex_lock.c:131
131	}
(gdb) s
virLogFiltersCheck (category=0x3d29b0e9f7 "file.conf/domain_conf.c", 
    priority=1, funcname=0x3d29b10b60 "virDomainObjUnref", linenr=707, 
    flags=0, fmt=0x3d29b0e9e8 "obj=%p refs=%d") at util/logging.c:423
423	    for (i = 0;i < virLogNbFilters;i++) {
(gdb) s
virLogMessage (category=0x3d29b0e9f7 "file.conf/domain_conf.c", priority=1, 
    funcname=0x3d29b10b60 "virDomainObjUnref", linenr=707, flags=0, 
    fmt=0x3d29b0e9e8 "obj=%p refs=%d") at util/logging.c:533
533	    fprio = virLogFiltersCheck(category);
(gdb) s
virLogFiltersCheck (category=0x3d29b0e9f7 "file.conf/domain_conf.c", 
    priority=1, funcname=0x3d29b10b60 "virDomainObjUnref", linenr=707, 
    flags=0, fmt=0x3d29b0e9e8 "obj=%p refs=%d") at util/logging.c:429
429	    virLogUnlock();
(gdb) s
virLogUnlock () at util/logging.c:142
142	    virMutexUnlock(&virLogMutex);
(gdb) s
virMutexUnlock (m=0x6ca640) at util/threads-pthread.c:57
57	    pthread_mutex_unlock(&m->lock);
(gdb) s
__pthread_mutex_unlock (mutex=0x6ca640) at pthread_mutex_unlock.c:290
290	  return __pthread_mutex_unlock_usercnt (mutex, 1);
(gdb) s
__pthread_mutex_unlock_usercnt (mutex=0x6ca640) at pthread_mutex_unlock.c:289
289	{
(gdb) s
290	  return __pthread_mutex_unlock_usercnt (mutex, 1);
(gdb) s
__pthread_mutex_unlock_usercnt (mutex=0x6ca640) at pthread_mutex_unlock.c:37
37	  int type = PTHREAD_MUTEX_TYPE (mutex);
(gdb) s
38	  if (__builtin_expect (type & ~PTHREAD_MUTEX_KIND_MASK_NP, 0))
(gdb) s
41	  if (__builtin_expect (type, PTHREAD_MUTEX_TIMED_NP)
(gdb) s
46	      mutex->__data.__owner = 0;
(gdb) s
49		--mutex->__data.__nusers;
(gdb) s
52	      lll_unlock (mutex->__data.__lock, PTHREAD_MUTEX_PSHARED (mutex));
(gdb) s
virLogMessage (category=0x3d29b0e9f7 "file.conf/domain_conf.c", priority=1, 
    funcname=0x3d29b10b60 "virDomainObjUnref", linenr=707, flags=0, 
    fmt=0x3d29b0e9e8 "obj=%p refs=%d") at util/logging.c:535
535	        if (priority < virLogDefaultPriority)
(gdb) s
587	}
(gdb) s
virDomainObjUnref (dom=0x8dd8a0) at conf/domain_conf.c:708
708	    if (dom->refs == 0) {
(gdb) s
714	}
(gdb) s
qemuDomainObjBeginJobWithDriver (driver=0x8d9e30, obj=0x8dd8a0)
    at qemu/qemu_driver.c:378
378	            if (errno == ETIMEDOUT)
(gdb) s
__errno_location () at ../csu/errno-loc.c:34
34	{
(gdb) s
__errno_location () at ../csu/errno-loc.c:36
36	}
(gdb) s
qemuDomainObjBeginJobWithDriver (driver=0x8d9e30, obj=<value optimized out>)
    at qemu/qemu_driver.c:379
379	                qemuReportError(VIR_ERR_OPERATION_TIMEOUT,
(gdb) s
__dcgettext (domainname=0x48da09 "libvirt", 
    msgid=0x494710 "cannot acquire state change lock", category=5)
    at dcgettext.c:52
52	{
(gdb) c
Continuing.
17:28:26.061: error : qemuDomainObjBeginJobWithDriver:380 : Timed out during operation: cannot acquire state change lock
libvir: QEMU error : Timed out during operation: cannot acquire state change lock

Comment 10 Cui Chun 2011-06-09 09:36:08 UTC
In comment 9, server can be setup with Fedora 13(release version) and install libvirt-debuginfo-0.7.7-3.fc13.x86_64.

First [Server]
1.# /etc/init.d/libvirtd stop
2.# gdb libivrtd
(gdb) b virCondWaitUntil
Breakpoint 1 at 0x42d450: file util/threads-pthread.c, line 96.
(gdb) r
Starting program: /usr/sbin/libvirtd 

Then [Client]
1. in one console 
# virsh -c qemu+ssh://server/system dump vm /home/vm.dump

2. in another console and before step1 finish
# virsh -c qemu+ssh://server/system suspend vm

Third [Server]
Continue "First" part gdb process

# gdb libivrtd
(gdb) b virCondWaitUntil
Breakpoint 1 at 0x42d450: file util/threads-pthread.c, line 96.
(gdb) r
Starting program: /usr/sbin/libvirtd 
...........
(gdb) s
........

Comment 11 errata-xmlrpc 2011-07-21 10:32:26 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-1019.html


Note You need to log in before you can comment on or make changes to this bug.