Bug 820500
| Summary: | Test program segfaults when trying to obtain Guest metrics from vhostmd | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Dinakar Guniguntala <dino> |
| Component: | vhostmd | Assignee: | Richard W.M. Jones <rjones> |
| Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> |
| Severity: | medium | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 6.0 | CC: | bfan, leiwang, lnovich, qwan, rjones, wshi |
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | vhostmd-0.4-2.8.el6 | Doc Type: | Bug Fix |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2013-11-21 08:12:10 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Attachments: | |||
|
Description
Dinakar Guniguntala
2012-05-10 08:32:55 UTC
Created attachment 583474 [details]
Test program to gather host and guest metrics from vhostmd
For some reason bugzilla threw away the attachment when creating the bug
I can't reproduce this. Can you please obtain a stack trace as I asked you before and post it here. Created attachment 583488 [details]
0001-libmetrics-Return-error-indication-up-through-get_me.patch
I don't know if this will fix the problem, but there is a bug
that libmetrics doesn't propagate an error back to the user
correctly. Proposed patch (sent upstream) is attached.
Here is the backtrace, without your patch. This environment is slightly different, RHEL 6.2 host with two fedora guests - fedora 16 & fedora 17, all of them running vhostmd-0.5-1) [root@fedora16 f16]# ./test_ldl Host: TotalCPUTime: 10625.07 LIBMETRICS: get_mdef(): No metrics found that matches TotalCPUTime in context:vm or malformed definition Segmentation fault (core dumped) [root@fedora16 f16]# gdb ./test_ldl GNU gdb (GDB) Fedora (7.3.50.20110722-9.fc16) Copyright (C) 2011 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "i686-redhat-linux-gnu". For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>... Reading symbols from /home/dino/f16/test_ldl...done. (gdb) run Starting program: /home/dino/f16/test_ldl [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/libthread_db.so.1". Host: TotalCPUTime: 10641.27 LIBMETRICS: get_mdef(): No metrics found that matches TotalCPUTime in context:vm or malformed definition Program received signal SIGSEGV, Segmentation fault. 0x0804860d in main (argc=1, argv=0xbffff604) at test-1.c:80 80 printf("VM: TotalCPUTime: %12.2f\n", m->value.r64); (gdb) bt #0 0x0804860d in main (argc=1, argv=0xbffff604) at test-1.c:80 (gdb) The segfault is because the get_mdef fails to get VM related info and the metric value is NULL. However very strangely the same testcase passes if I shutdown all but one VM (Here I shutdown the fedora 16 VM) [root@fedora17 f17]# ./test_ldl Host: TotalCPUTime: 10879.24 VM: TotalCPUTime: 689.40 I think when multiple VM's are running the there seems to be some issue parsing the data for the correct VM (UUID of the current VM not available ?). Hence the error "LIBMETRICS: get_mdef(): No metrics found that matches TotalCPUTime in context:vm or malformed definition" This is recreatable on my other RHEL 6.0 setup as well. Created attachment 583641 [details]
Updated test program to gather host and guest metrics from vhostmd
The earlier testcase refused to run on a fedora 16/17 VM. I had to load xenstore dynamically for it to work.
(In reply to comment #5) > However very strangely the same testcase passes if I shutdown all but one VM > (Here I shutdown the fedora 16 VM) > > [root@fedora17 f17]# ./test_ldl > Host: TotalCPUTime: 10879.24 > VM: TotalCPUTime: 689.40 > > I think when multiple VM's are running the there seems to be some issue parsing > the data for the correct VM (UUID of the current VM not available ?). Hence the > error > "LIBMETRICS: get_mdef(): No metrics found that matches TotalCPUTime in > context:vm or malformed definition" Yes, looks like the xpath query that libmetrics performs is wrong, so it's two bugs :-( Forgot to mention that your patch fixes the segfault, but I still dont get the metrics when more than one VM is running [root@fedora17 f17]# ./test_ldl Host: TotalCPUTime: 11317.10 LIBMETRICS: get_mdef(): No metrics found that matches TotalCPUTime in context:vm or malformed definition VM: TotalCPUTime: metric not found Any Update on this ? I'm going to fix this for RHEL 6.4. If the bug is important, please escalate it through ordinary support channels. Richard, Thanks for the update. any idea on the timelines for RHEL 6.4 ? I believe it'll be released at the end of the year, but development finishes much sooner than that. I don't have exact dates -- please ask your account manager if you need that. This request was not resolved in time for the current release. Red Hat invites you to ask your support representative to propose this request, if still desired, for consideration in the next release of Red Hat Enterprise Linux. This request was erroneously removed from consideration in Red Hat Enterprise Linux 6.4, which is currently under development. This request will be evaluated for inclusion in Red Hat Enterprise Linux 6.4. Reproduced this bug on these environments:
Host:
Linux intel-i72600-03.qe.lab.eng.nay.redhat.com.englab.nay.redhat.com 2.6.32-131.17.1.el6.x86_64
vhostmd-0.4-2.7.el6.x86_64
Guest1:
Linux rhel6.1 2.6.32-71.el6.x86_64
vm-dump-metrics-0.4-2.7.el6.x86_64
vm-dump-metrics-devel-0.4-2.7.el6.x86_64
Guest2:
Linux rhel6.1 2.6.32-71.el6.x86_64
vm-dump-metrics-0.4-2.7.el6.x86_64 (with rjones' patch)
vm-dump-metrics-devel-0.4-2.7.el6.x86_64 (with rjones' patch)
1. install vhostmd on rhel6.1 and start the vhostmd service, check if "/dev/shm/vhostmd0" exists
2. prepare guest1 with vm-dump-metrics installed
3. apply patch to vhostmd.src and rebuild rpm package
4. prepare guest2 with vm-dump-metrics(patched) installed
5. add the following section into <devices> in both guest1 and guest2
<disk type='file' device='disk'>
<driver name='qemu' type='raw'/>
<source file='/dev/shm/vhostmd0'/>
<target dev='vdd' bus='virtio'/>
<readonly/>
</disk>
6. both launch guest1 and guest2
7. compile test.c and run it on guest1
[root@rhel6 ~]# ./test_dl
Host: TotalCPUTime: 4970.82
LIBMETRICS: get_mdef(): No metrics found that matches TotalCPUTime in context:vm or malformed definition
Segmentation fault (core dumped)
8. compile test.c and run it on guest2
[root@rhel6 ~]# ./test_dl
Host: TotalCPUTime: 4980.04
LIBMETRICS: get_mdef(): No metrics found that matches TotalCPUTime in context:vm or malformed definition
VM: TotalCPUTime: metric not found
This request was evaluated by Red Hat Product Management for inclusion in the current release of Red Hat Enterprise Linux. Because the affected component is not scheduled to be updated in the current release, Red Hat is unable to address this request at this time. Red Hat invites you to ask your support representative to propose this request, if appropriate, in the next release of Red Hat Enterprise Linux. Not exactly sure what happened here. I have a fix lined up for this bug ... I *still* have a fix lined up for this bug. Needs a PM ack however. This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux release for currently deployed products. This request is not yet committed for inclusion in a release. FYI this is now on the approved components list. Will update the package and provide an erratum soon. Needs a new branch to be set up in git (called 'sap-rhel-6.5'). Please confirm that this is strictly a bug fix and does not require documentation outside a release note. Thank you Laura Laura, that's correct. This does not require any docs changes. It still has same issue with fixed vhostmd package
host(rhel6.5):
kernel: 2.6.32-420.el6.x86_64
vhostmd-0.4-2.8.el6.x86_64
guest(rhel5.9):
kernel: 2.6.18-371.el5
vm-dump-metrics-0.4-2.2.el5.x86_64.rpm
vm-dump-metrics-devel-0.4-2.2.el5.x86_64.rpm
xen-libs-3.0.3-142.el5_9.3
step:
1. add the following section into <devices> in both guest
<disk type='file' device='disk'>
<driver name='qemu' type='raw'/>
<source file='/dev/shm/vhostmd0'/>
<target dev='vdd' bus='virtio'/>
<readonly/>
</disk>
2. use "libxenstore.so.3.0" instead of "libxenstore.so" in test.c
3. compile test.c and run it
[root@]# gcc -Wall -O2 -g -ldl -o test_dl test.c
[root@]# ./test_dl
Host: TotalCPUTime: 20802.48
LIBMETRICS: get_mdef(): No metrics found that matches TotalCPUTime in context:vm or malformed definition
Segmentation fault
two guests are same.
(In reply to bfan from comment #32) > It still has same issue with fixed vhostmd package > > host(rhel6.5): > kernel: 2.6.32-420.el6.x86_64 > vhostmd-0.4-2.8.el6.x86_64 > > guest(rhel5.9): > kernel: 2.6.18-371.el5 > vm-dump-metrics-0.4-2.2.el5.x86_64.rpm > vm-dump-metrics-devel-0.4-2.2.el5.x86_64.rpm > xen-libs-3.0.3-142.el5_9.3 The problem is the fix is in the guest, and you're not using a RHEL 6 guest (+ vhostmd-0.4-2.8.el6). This bug isn't fixed in RHEL 5 (and won't be fixed -- no one has asked for that). Hello, Richard What's the expect result? I tried rhel6.5 guest, and get the same as #C8 It can works when just one vm is running. [root@localhost ~]# ./test_dl Host: TotalCPUTime: 3246.44 VM: TotalCPUTime: 22.80 But failed when more than one vm is running, [root@localhost ~]# ./test_dl Host: TotalCPUTime: 3248.70 LIBMETRICS: get_mdef(): No metrics found that matches TotalCPUTime in context:vm or malformed definition VM: TotalCPUTime: metric not found Segmentation fault will not appear Yes, this looks fixed to me. Change the status to verified according to above comments Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-1579.html |