Bug 820500

Summary: Test program segfaults when trying to obtain Guest metrics from vhostmd
Product: Red Hat Enterprise Linux 6 Reporter: Dinakar Guniguntala <dino>
Component: vhostmdAssignee: Richard W.M. Jones <rjones>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 6.0CC: bfan, leiwang, lnovich, qwan, rjones, wshi
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: vhostmd-0.4-2.8.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-11-21 08:12:10 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Test program to gather host and guest metrics from vhostmd
none
0001-libmetrics-Return-error-indication-up-through-get_me.patch
none
Updated test program to gather host and guest metrics from vhostmd none

Description Dinakar Guniguntala 2012-05-10 08:32:55 UTC
I have a RHEL 6 host system that is running several RHEL 5.x and RHEL 6.0 guest OSes

1. Host Config

   Red Hat Enterprise Linux Server release 6.0 (Santiago)
   Linux nibble.in.ibm.com 2.6.32-131.17.1.el6.x86_64 #1 SMP Thu Sep 29 10:24:25 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux

   vhostmd-0.4-2.7.el6.x86_64
   Configured vhostmd and add the vhostmd disk to guests as read only

2. Guest-I config

   Red Hat Enterprise Linux Server release 6.0 (Santiago)
   Linux lass.in.ibm.com 2.6.32-71.el6.x86_64 #1 SMP Wed Sep 1 01:33:01 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux

   vm-dump-metrics-devel-0.4-2.7.el6.x86_64
   vm-dump-metrics-0.4-2.7.el6.x86_64

   Inside guest-I, run vm-dump-metrics to make sure that the vhostmd disk is setup right
   Then run the following test program as root

   gcc -Wall -O2 -g -ldl -o test_dl test.c

   # ./test_dl
   Host: TotalCPUTime:  15250446.36
   LIBMETRICS: get_mdef(): No metrics found that matches TotalCPUTime in context:vm or malformed definition
   Segmentation fault (core dumped)

3. Guest-II config

   Red Hat Enterprise Linux Server release 6.0 (Santiago)
   Linux loop.in.ibm.com 2.6.32-131.17.1.el6.x86_64 #1 SMP Thu Sep 29 10:24:25 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux

   vm-dump-metrics-devel-0.4-2.7.el6.x86_64
   vm-dump-metrics-0.4-2.7.el6.x86_64

   Inside guest-II, run vm-dump-metrics to make sure that the vhostmd disk is setup right
   Inside guest-II, run the following test program as root

   gcc -Wall -O2 -g -ldl -o test_dl test.c

   # ./test_dl
   Host: TotalCPUTime:  15320326.74
   LIBMETRICS: get_mdef(): No metrics found that matches TotalCPUTime in context:vm or malformed definition
   Segmentation fault (core dumped)

Comment 1 Dinakar Guniguntala 2012-05-10 08:36:39 UTC
Created attachment 583474 [details]
Test program to gather host and guest metrics from vhostmd

For some reason bugzilla threw away the attachment when creating the bug

Comment 3 Richard W.M. Jones 2012-05-10 08:57:37 UTC
I can't reproduce this.  Can you please obtain a stack trace
as I asked you before and post it here.

Comment 4 Richard W.M. Jones 2012-05-10 09:17:56 UTC
Created attachment 583488 [details]
0001-libmetrics-Return-error-indication-up-through-get_me.patch

I don't know if this will fix the problem, but there is a bug
that libmetrics doesn't propagate an error back to the user
correctly.  Proposed patch (sent upstream) is attached.

Comment 5 Dinakar Guniguntala 2012-05-10 17:08:39 UTC
Here is the backtrace, without your patch. This environment is slightly different, RHEL 6.2 host with two fedora guests - fedora 16 & fedora 17,
all of them running vhostmd-0.5-1)

[root@fedora16 f16]# ./test_ldl
Host: TotalCPUTime:     10625.07
LIBMETRICS: get_mdef(): No metrics found that matches TotalCPUTime in context:vm or malformed definition
Segmentation fault (core dumped)
[root@fedora16 f16]# gdb ./test_ldl
GNU gdb (GDB) Fedora (7.3.50.20110722-9.fc16)
Copyright (C) 2011 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "i686-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/dino/f16/test_ldl...done.
(gdb) run
Starting program: /home/dino/f16/test_ldl
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/libthread_db.so.1".
Host: TotalCPUTime:     10641.27
LIBMETRICS: get_mdef(): No metrics found that matches TotalCPUTime in context:vm or malformed definition

Program received signal SIGSEGV, Segmentation fault.
0x0804860d in main (argc=1, argv=0xbffff604) at test-1.c:80
80                      printf("VM: TotalCPUTime: %12.2f\n", m->value.r64);
(gdb) bt
#0  0x0804860d in main (argc=1, argv=0xbffff604) at test-1.c:80
(gdb)

The segfault is because the get_mdef fails to get VM related info and the metric value is NULL.

However very strangely the same testcase passes if I shutdown all but one VM
(Here I shutdown the fedora 16 VM)

[root@fedora17 f17]# ./test_ldl 
Host: TotalCPUTime:     10879.24
VM: TotalCPUTime:       689.40

I think when multiple VM's are running the there seems to be some issue parsing
the data for the correct VM (UUID of the current VM not available ?). Hence the error
 "LIBMETRICS: get_mdef(): No metrics found that matches TotalCPUTime in context:vm or malformed definition"

This is recreatable on my other RHEL 6.0 setup as well.

Comment 6 Dinakar Guniguntala 2012-05-10 17:11:15 UTC
Created attachment 583641 [details]
Updated test program to gather host and guest metrics from vhostmd

The earlier testcase refused to run on a fedora 16/17 VM. I had to load xenstore dynamically for it to work.

Comment 7 Richard W.M. Jones 2012-05-10 17:21:20 UTC
(In reply to comment #5)
> However very strangely the same testcase passes if I shutdown all but one VM
> (Here I shutdown the fedora 16 VM)
> 
> [root@fedora17 f17]# ./test_ldl 
> Host: TotalCPUTime:     10879.24
> VM: TotalCPUTime:       689.40
> 
> I think when multiple VM's are running the there seems to be some issue parsing
> the data for the correct VM (UUID of the current VM not available ?). Hence the
> error
>  "LIBMETRICS: get_mdef(): No metrics found that matches TotalCPUTime in
> context:vm or malformed definition"

Yes, looks like the xpath query that libmetrics performs is wrong, so
it's two bugs :-(

Comment 8 Dinakar Guniguntala 2012-05-10 17:33:30 UTC
Forgot to mention that your patch fixes the segfault, but I still dont get the
metrics when more than one VM is running

[root@fedora17 f17]# ./test_ldl 
Host: TotalCPUTime:     11317.10
LIBMETRICS: get_mdef(): No metrics found that matches TotalCPUTime in
context:vm or malformed definition
VM: TotalCPUTime: metric not found

Comment 9 Dinakar Guniguntala 2012-05-25 05:23:19 UTC
Any Update on this ?

Comment 10 Richard W.M. Jones 2012-05-25 07:26:18 UTC
I'm going to fix this for RHEL 6.4.  If the bug is important,
please escalate it through ordinary support channels.

Comment 11 Dinakar Guniguntala 2012-05-29 06:58:38 UTC
Richard, Thanks for the update. any idea on the timelines for RHEL 6.4 ?

Comment 12 Richard W.M. Jones 2012-05-29 07:33:47 UTC
I believe it'll be released at the end of the year, but
development finishes much sooner than that.  I don't have
exact dates -- please ask your account manager if you need that.

Comment 14 RHEL Program Management 2012-07-10 08:07:59 UTC
This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.

Comment 15 RHEL Program Management 2012-07-11 01:56:06 UTC
This request was erroneously removed from consideration in Red Hat Enterprise Linux 6.4, which is currently under development.  This request will be evaluated for inclusion in Red Hat Enterprise Linux 6.4.

Comment 16 Wei Shi 2012-08-16 02:09:46 UTC
Reproduced this bug on these environments:

Host:
Linux intel-i72600-03.qe.lab.eng.nay.redhat.com.englab.nay.redhat.com 2.6.32-131.17.1.el6.x86_64

vhostmd-0.4-2.7.el6.x86_64

Guest1:
Linux rhel6.1 2.6.32-71.el6.x86_64

vm-dump-metrics-0.4-2.7.el6.x86_64
vm-dump-metrics-devel-0.4-2.7.el6.x86_64

Guest2:
Linux rhel6.1 2.6.32-71.el6.x86_64

vm-dump-metrics-0.4-2.7.el6.x86_64 (with rjones' patch)
vm-dump-metrics-devel-0.4-2.7.el6.x86_64 (with rjones' patch)

1. install vhostmd on rhel6.1 and start the vhostmd service, check if "/dev/shm/vhostmd0" exists
2. prepare guest1 with vm-dump-metrics installed
3. apply patch to vhostmd.src and rebuild rpm package
4. prepare guest2 with vm-dump-metrics(patched) installed
5. add the following section into <devices> in both guest1 and guest2
   <disk type='file' device='disk'>
      <driver name='qemu' type='raw'/>
      <source file='/dev/shm/vhostmd0'/>
      <target dev='vdd' bus='virtio'/>
      <readonly/>
   </disk>
6. both launch guest1 and guest2
7. compile test.c and run it on guest1
[root@rhel6 ~]# ./test_dl
Host: TotalCPUTime:      4970.82
LIBMETRICS: get_mdef(): No metrics found that matches TotalCPUTime in context:vm or malformed definition
Segmentation fault (core dumped)
8. compile test.c and run it on guest2
[root@rhel6 ~]# ./test_dl
Host: TotalCPUTime:      4980.04
LIBMETRICS: get_mdef(): No metrics found that matches TotalCPUTime in context:vm or malformed definition
VM: TotalCPUTime: metric not found

Comment 17 RHEL Program Management 2012-09-07 05:28:51 UTC
This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated
in the current release, Red Hat is unable to address this
request at this time.

Red Hat invites you to ask your support representative to
propose this request, if appropriate, in the next release of
Red Hat Enterprise Linux.

Comment 18 Richard W.M. Jones 2012-09-07 08:18:31 UTC
Not exactly sure what happened here.  I have a fix lined
up for this bug ...

Comment 19 Richard W.M. Jones 2013-06-05 11:21:09 UTC
I *still* have a fix lined up for this bug.  Needs a PM ack however.

Comment 20 RHEL Program Management 2013-06-07 14:31:39 UTC
This request was evaluated by Red Hat Product Management for
inclusion in a Red Hat Enterprise Linux release.  Product
Management has requested further review of this request by
Red Hat Engineering, for potential inclusion in a Red Hat
Enterprise Linux release for currently deployed products.
This request is not yet committed for inclusion in a release.

Comment 21 Richard W.M. Jones 2013-06-07 14:55:04 UTC
FYI this is now on the approved components list.  Will update
the package and provide an erratum soon.

Comment 22 Richard W.M. Jones 2013-06-07 21:09:25 UTC
Needs a new branch to be set up in git (called 'sap-rhel-6.5').

Comment 25 Laura Novich 2013-06-18 13:22:46 UTC
Please confirm that this is strictly a bug fix and does not require documentation outside a release note.
Thank you
Laura

Comment 26 Richard W.M. Jones 2013-06-18 14:17:37 UTC
Laura, that's correct.  This does not require any
docs changes.

Comment 32 bfan 2013-10-11 07:11:07 UTC
It still has same issue with fixed vhostmd package

host(rhel6.5):
kernel: 2.6.32-420.el6.x86_64 
vhostmd-0.4-2.8.el6.x86_64

guest(rhel5.9):
kernel: 2.6.18-371.el5 
vm-dump-metrics-0.4-2.2.el5.x86_64.rpm
vm-dump-metrics-devel-0.4-2.2.el5.x86_64.rpm
xen-libs-3.0.3-142.el5_9.3

step:
1. add the following section into <devices> in both guest
   <disk type='file' device='disk'>
      <driver name='qemu' type='raw'/>
      <source file='/dev/shm/vhostmd0'/>
      <target dev='vdd' bus='virtio'/>
      <readonly/>
   </disk>

2. use "libxenstore.so.3.0" instead of "libxenstore.so" in test.c
3. compile test.c and run it
[root@]# gcc -Wall -O2 -g -ldl -o test_dl test.c
[root@]# ./test_dl
Host: TotalCPUTime:     20802.48
LIBMETRICS: get_mdef(): No metrics found that matches TotalCPUTime in context:vm or malformed definition
Segmentation fault


two guests are same.

Comment 33 Richard W.M. Jones 2013-10-16 12:04:31 UTC
(In reply to bfan from comment #32)
> It still has same issue with fixed vhostmd package
> 
> host(rhel6.5):
> kernel: 2.6.32-420.el6.x86_64 
> vhostmd-0.4-2.8.el6.x86_64
> 
> guest(rhel5.9):
> kernel: 2.6.18-371.el5 
> vm-dump-metrics-0.4-2.2.el5.x86_64.rpm
> vm-dump-metrics-devel-0.4-2.2.el5.x86_64.rpm
> xen-libs-3.0.3-142.el5_9.3

The problem is the fix is in the guest, and you're not
using a RHEL 6 guest (+ vhostmd-0.4-2.8.el6).

This bug isn't fixed in RHEL 5 (and won't be fixed -- no one
has asked for that).

Comment 34 bfan 2013-10-21 07:20:01 UTC
Hello, Richard
What's the expect result? I tried rhel6.5 guest, and get the same as #C8

It can works when just one vm is running.
[root@localhost ~]# ./test_dl 
Host: TotalCPUTime:      3246.44
VM: TotalCPUTime:        22.80

But failed when more than one vm is running, 
[root@localhost ~]# ./test_dl 
Host: TotalCPUTime:      3248.70
LIBMETRICS: get_mdef(): No metrics found that matches TotalCPUTime in context:vm or malformed definition
VM: TotalCPUTime: metric not found

Segmentation fault will not appear

Comment 35 Richard W.M. Jones 2013-10-28 09:52:17 UTC
Yes, this looks fixed to me.

Comment 36 bfan 2013-10-28 09:57:50 UTC
Change the status to verified according to above comments

Comment 37 errata-xmlrpc 2013-11-21 08:12:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1579.html