Bug 820500 - Test program segfaults when trying to obtain Guest metrics from vhostmd
Test program segfaults when trying to obtain Guest metrics from vhostmd
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: vhostmd (Show other bugs)
6.0
x86_64 Linux
unspecified Severity medium
: rc
: ---
Assigned To: Richard W.M. Jones
Virtualization Bugs
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-05-10 04:32 EDT by Dinakar Guniguntala
Modified: 2013-11-21 03:12 EST (History)
6 users (show)

See Also:
Fixed In Version: vhostmd-0.4-2.8.el6
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-11-21 03:12:10 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Test program to gather host and guest metrics from vhostmd (2.06 KB, text/plain)
2012-05-10 04:36 EDT, Dinakar Guniguntala
no flags Details
0001-libmetrics-Return-error-indication-up-through-get_me.patch (881 bytes, patch)
2012-05-10 05:17 EDT, Richard W.M. Jones
no flags Details | Diff
Updated test program to gather host and guest metrics from vhostmd (2.25 KB, text/plain)
2012-05-10 13:11 EDT, Dinakar Guniguntala
no flags Details

  None (edit)
Description Dinakar Guniguntala 2012-05-10 04:32:55 EDT
I have a RHEL 6 host system that is running several RHEL 5.x and RHEL 6.0 guest OSes

1. Host Config

   Red Hat Enterprise Linux Server release 6.0 (Santiago)
   Linux nibble.in.ibm.com 2.6.32-131.17.1.el6.x86_64 #1 SMP Thu Sep 29 10:24:25 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux

   vhostmd-0.4-2.7.el6.x86_64
   Configured vhostmd and add the vhostmd disk to guests as read only

2. Guest-I config

   Red Hat Enterprise Linux Server release 6.0 (Santiago)
   Linux lass.in.ibm.com 2.6.32-71.el6.x86_64 #1 SMP Wed Sep 1 01:33:01 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux

   vm-dump-metrics-devel-0.4-2.7.el6.x86_64
   vm-dump-metrics-0.4-2.7.el6.x86_64

   Inside guest-I, run vm-dump-metrics to make sure that the vhostmd disk is setup right
   Then run the following test program as root

   gcc -Wall -O2 -g -ldl -o test_dl test.c

   # ./test_dl
   Host: TotalCPUTime:  15250446.36
   LIBMETRICS: get_mdef(): No metrics found that matches TotalCPUTime in context:vm or malformed definition
   Segmentation fault (core dumped)

3. Guest-II config

   Red Hat Enterprise Linux Server release 6.0 (Santiago)
   Linux loop.in.ibm.com 2.6.32-131.17.1.el6.x86_64 #1 SMP Thu Sep 29 10:24:25 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux

   vm-dump-metrics-devel-0.4-2.7.el6.x86_64
   vm-dump-metrics-0.4-2.7.el6.x86_64

   Inside guest-II, run vm-dump-metrics to make sure that the vhostmd disk is setup right
   Inside guest-II, run the following test program as root

   gcc -Wall -O2 -g -ldl -o test_dl test.c

   # ./test_dl
   Host: TotalCPUTime:  15320326.74
   LIBMETRICS: get_mdef(): No metrics found that matches TotalCPUTime in context:vm or malformed definition
   Segmentation fault (core dumped)
Comment 1 Dinakar Guniguntala 2012-05-10 04:36:39 EDT
Created attachment 583474 [details]
Test program to gather host and guest metrics from vhostmd

For some reason bugzilla threw away the attachment when creating the bug
Comment 3 Richard W.M. Jones 2012-05-10 04:57:37 EDT
I can't reproduce this.  Can you please obtain a stack trace
as I asked you before and post it here.
Comment 4 Richard W.M. Jones 2012-05-10 05:17:56 EDT
Created attachment 583488 [details]
0001-libmetrics-Return-error-indication-up-through-get_me.patch

I don't know if this will fix the problem, but there is a bug
that libmetrics doesn't propagate an error back to the user
correctly.  Proposed patch (sent upstream) is attached.
Comment 5 Dinakar Guniguntala 2012-05-10 13:08:39 EDT
Here is the backtrace, without your patch. This environment is slightly different, RHEL 6.2 host with two fedora guests - fedora 16 & fedora 17,
all of them running vhostmd-0.5-1)

[root@fedora16 f16]# ./test_ldl
Host: TotalCPUTime:     10625.07
LIBMETRICS: get_mdef(): No metrics found that matches TotalCPUTime in context:vm or malformed definition
Segmentation fault (core dumped)
[root@fedora16 f16]# gdb ./test_ldl
GNU gdb (GDB) Fedora (7.3.50.20110722-9.fc16)
Copyright (C) 2011 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "i686-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/dino/f16/test_ldl...done.
(gdb) run
Starting program: /home/dino/f16/test_ldl
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/libthread_db.so.1".
Host: TotalCPUTime:     10641.27
LIBMETRICS: get_mdef(): No metrics found that matches TotalCPUTime in context:vm or malformed definition

Program received signal SIGSEGV, Segmentation fault.
0x0804860d in main (argc=1, argv=0xbffff604) at test-1.c:80
80                      printf("VM: TotalCPUTime: %12.2f\n", m->value.r64);
(gdb) bt
#0  0x0804860d in main (argc=1, argv=0xbffff604) at test-1.c:80
(gdb)

The segfault is because the get_mdef fails to get VM related info and the metric value is NULL.

However very strangely the same testcase passes if I shutdown all but one VM
(Here I shutdown the fedora 16 VM)

[root@fedora17 f17]# ./test_ldl 
Host: TotalCPUTime:     10879.24
VM: TotalCPUTime:       689.40

I think when multiple VM's are running the there seems to be some issue parsing
the data for the correct VM (UUID of the current VM not available ?). Hence the error
 "LIBMETRICS: get_mdef(): No metrics found that matches TotalCPUTime in context:vm or malformed definition"

This is recreatable on my other RHEL 6.0 setup as well.
Comment 6 Dinakar Guniguntala 2012-05-10 13:11:15 EDT
Created attachment 583641 [details]
Updated test program to gather host and guest metrics from vhostmd

The earlier testcase refused to run on a fedora 16/17 VM. I had to load xenstore dynamically for it to work.
Comment 7 Richard W.M. Jones 2012-05-10 13:21:20 EDT
(In reply to comment #5)
> However very strangely the same testcase passes if I shutdown all but one VM
> (Here I shutdown the fedora 16 VM)
> 
> [root@fedora17 f17]# ./test_ldl 
> Host: TotalCPUTime:     10879.24
> VM: TotalCPUTime:       689.40
> 
> I think when multiple VM's are running the there seems to be some issue parsing
> the data for the correct VM (UUID of the current VM not available ?). Hence the
> error
>  "LIBMETRICS: get_mdef(): No metrics found that matches TotalCPUTime in
> context:vm or malformed definition"

Yes, looks like the xpath query that libmetrics performs is wrong, so
it's two bugs :-(
Comment 8 Dinakar Guniguntala 2012-05-10 13:33:30 EDT
Forgot to mention that your patch fixes the segfault, but I still dont get the
metrics when more than one VM is running

[root@fedora17 f17]# ./test_ldl 
Host: TotalCPUTime:     11317.10
LIBMETRICS: get_mdef(): No metrics found that matches TotalCPUTime in
context:vm or malformed definition
VM: TotalCPUTime: metric not found
Comment 9 Dinakar Guniguntala 2012-05-25 01:23:19 EDT
Any Update on this ?
Comment 10 Richard W.M. Jones 2012-05-25 03:26:18 EDT
I'm going to fix this for RHEL 6.4.  If the bug is important,
please escalate it through ordinary support channels.
Comment 11 Dinakar Guniguntala 2012-05-29 02:58:38 EDT
Richard, Thanks for the update. any idea on the timelines for RHEL 6.4 ?
Comment 12 Richard W.M. Jones 2012-05-29 03:33:47 EDT
I believe it'll be released at the end of the year, but
development finishes much sooner than that.  I don't have
exact dates -- please ask your account manager if you need that.
Comment 14 RHEL Product and Program Management 2012-07-10 04:07:59 EDT
This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.
Comment 15 RHEL Product and Program Management 2012-07-10 21:56:06 EDT
This request was erroneously removed from consideration in Red Hat Enterprise Linux 6.4, which is currently under development.  This request will be evaluated for inclusion in Red Hat Enterprise Linux 6.4.
Comment 16 Wei Shi 2012-08-15 22:09:46 EDT
Reproduced this bug on these environments:

Host:
Linux intel-i72600-03.qe.lab.eng.nay.redhat.com.englab.nay.redhat.com 2.6.32-131.17.1.el6.x86_64

vhostmd-0.4-2.7.el6.x86_64

Guest1:
Linux rhel6.1 2.6.32-71.el6.x86_64

vm-dump-metrics-0.4-2.7.el6.x86_64
vm-dump-metrics-devel-0.4-2.7.el6.x86_64

Guest2:
Linux rhel6.1 2.6.32-71.el6.x86_64

vm-dump-metrics-0.4-2.7.el6.x86_64 (with rjones' patch)
vm-dump-metrics-devel-0.4-2.7.el6.x86_64 (with rjones' patch)

1. install vhostmd on rhel6.1 and start the vhostmd service, check if "/dev/shm/vhostmd0" exists
2. prepare guest1 with vm-dump-metrics installed
3. apply patch to vhostmd.src and rebuild rpm package
4. prepare guest2 with vm-dump-metrics(patched) installed
5. add the following section into <devices> in both guest1 and guest2
   <disk type='file' device='disk'>
      <driver name='qemu' type='raw'/>
      <source file='/dev/shm/vhostmd0'/>
      <target dev='vdd' bus='virtio'/>
      <readonly/>
   </disk>
6. both launch guest1 and guest2
7. compile test.c and run it on guest1
[root@rhel6 ~]# ./test_dl
Host: TotalCPUTime:      4970.82
LIBMETRICS: get_mdef(): No metrics found that matches TotalCPUTime in context:vm or malformed definition
Segmentation fault (core dumped)
8. compile test.c and run it on guest2
[root@rhel6 ~]# ./test_dl
Host: TotalCPUTime:      4980.04
LIBMETRICS: get_mdef(): No metrics found that matches TotalCPUTime in context:vm or malformed definition
VM: TotalCPUTime: metric not found
Comment 17 RHEL Product and Program Management 2012-09-07 01:28:51 EDT
This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated
in the current release, Red Hat is unable to address this
request at this time.

Red Hat invites you to ask your support representative to
propose this request, if appropriate, in the next release of
Red Hat Enterprise Linux.
Comment 18 Richard W.M. Jones 2012-09-07 04:18:31 EDT
Not exactly sure what happened here.  I have a fix lined
up for this bug ...
Comment 19 Richard W.M. Jones 2013-06-05 07:21:09 EDT
I *still* have a fix lined up for this bug.  Needs a PM ack however.
Comment 20 RHEL Product and Program Management 2013-06-07 10:31:39 EDT
This request was evaluated by Red Hat Product Management for
inclusion in a Red Hat Enterprise Linux release.  Product
Management has requested further review of this request by
Red Hat Engineering, for potential inclusion in a Red Hat
Enterprise Linux release for currently deployed products.
This request is not yet committed for inclusion in a release.
Comment 21 Richard W.M. Jones 2013-06-07 10:55:04 EDT
FYI this is now on the approved components list.  Will update
the package and provide an erratum soon.
Comment 22 Richard W.M. Jones 2013-06-07 17:09:25 EDT
Needs a new branch to be set up in git (called 'sap-rhel-6.5').
Comment 25 Laura Novich 2013-06-18 09:22:46 EDT
Please confirm that this is strictly a bug fix and does not require documentation outside a release note.
Thank you
Laura
Comment 26 Richard W.M. Jones 2013-06-18 10:17:37 EDT
Laura, that's correct.  This does not require any
docs changes.
Comment 32 bfan 2013-10-11 03:11:07 EDT
It still has same issue with fixed vhostmd package

host(rhel6.5):
kernel: 2.6.32-420.el6.x86_64 
vhostmd-0.4-2.8.el6.x86_64

guest(rhel5.9):
kernel: 2.6.18-371.el5 
vm-dump-metrics-0.4-2.2.el5.x86_64.rpm
vm-dump-metrics-devel-0.4-2.2.el5.x86_64.rpm
xen-libs-3.0.3-142.el5_9.3

step:
1. add the following section into <devices> in both guest
   <disk type='file' device='disk'>
      <driver name='qemu' type='raw'/>
      <source file='/dev/shm/vhostmd0'/>
      <target dev='vdd' bus='virtio'/>
      <readonly/>
   </disk>

2. use "libxenstore.so.3.0" instead of "libxenstore.so" in test.c
3. compile test.c and run it
[root@]# gcc -Wall -O2 -g -ldl -o test_dl test.c
[root@]# ./test_dl
Host: TotalCPUTime:     20802.48
LIBMETRICS: get_mdef(): No metrics found that matches TotalCPUTime in context:vm or malformed definition
Segmentation fault


two guests are same.
Comment 33 Richard W.M. Jones 2013-10-16 08:04:31 EDT
(In reply to bfan from comment #32)
> It still has same issue with fixed vhostmd package
> 
> host(rhel6.5):
> kernel: 2.6.32-420.el6.x86_64 
> vhostmd-0.4-2.8.el6.x86_64
> 
> guest(rhel5.9):
> kernel: 2.6.18-371.el5 
> vm-dump-metrics-0.4-2.2.el5.x86_64.rpm
> vm-dump-metrics-devel-0.4-2.2.el5.x86_64.rpm
> xen-libs-3.0.3-142.el5_9.3

The problem is the fix is in the guest, and you're not
using a RHEL 6 guest (+ vhostmd-0.4-2.8.el6).

This bug isn't fixed in RHEL 5 (and won't be fixed -- no one
has asked for that).
Comment 34 bfan 2013-10-21 03:20:01 EDT
Hello, Richard
What's the expect result? I tried rhel6.5 guest, and get the same as #C8

It can works when just one vm is running.
[root@localhost ~]# ./test_dl 
Host: TotalCPUTime:      3246.44
VM: TotalCPUTime:        22.80

But failed when more than one vm is running, 
[root@localhost ~]# ./test_dl 
Host: TotalCPUTime:      3248.70
LIBMETRICS: get_mdef(): No metrics found that matches TotalCPUTime in context:vm or malformed definition
VM: TotalCPUTime: metric not found

Segmentation fault will not appear
Comment 35 Richard W.M. Jones 2013-10-28 05:52:17 EDT
Yes, this looks fixed to me.
Comment 36 bfan 2013-10-28 05:57:50 EDT
Change the status to verified according to above comments
Comment 37 errata-xmlrpc 2013-11-21 03:12:10 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1579.html

Note You need to log in before you can comment on or make changes to this bug.