Bug 1540780

Summary: Libvirtd crashed when try to start a guest with cachetune after remount resctrl/
Product: Red Hat Enterprise Linux 7 Reporter: Luyao Huang <lhuang>
Component: libvirtAssignee: Martin Kletzander <mkletzan>
Status: CLOSED ERRATA QA Contact: Luyao Huang <lhuang>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.5CC: dyuan, lmiksik, mkletzan, mtessun, rbalakri, xuzhang, yalzhang
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: libvirt-3.9.0-12.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-04-10 11:04:21 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Luyao Huang 2018-02-01 00:56:01 UTC
Description of problem:
Libvirtd crashed when try to start a guest with cachetune after remount resctrl/
without restart libvirtd

Version-Release number of selected component (if applicable):
libvirt-3.9.0-10.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1. mount resctrl
# mount -t resctrl resctrl /sys/fs/resctrl/

2. restart libvirtd
# service libvirtd restart
Redirecting to /bin/systemctl restart libvirtd.service

3. umount resctrl and remount with opt:

# umount /sys/fs/resctrl
# mount -t resctrl resctrl -o cdp /sys/fs/resctrl

4. start a guest which have cachetune:

# virsh dumpxml vm1

...
  <cputune>
    <cachetune vcpus='0'>
      <cache id='0' level='3' type='code' size='1' unit='MiB'/>
      <cache id='0' level='3' type='data' size='2' unit='MiB'/>
    </cachetune>
    <cachetune vcpus='1'>
      <cache id='0' level='3' type='code' size='2' unit='MiB'/>
      <cache id='0' level='3' type='data' size='1' unit='MiB'/>
    </cachetune>
    <cachetune vcpus='2'>
      <cache id='0' level='3' type='code' size='2' unit='MiB'/>
      <cache id='0' level='3' type='data' size='1' unit='MiB'/>
    </cachetune>
    <cachetune vcpus='3'>
      <cache id='0' level='3' type='data' size='1' unit='MiB'/>
    </cachetune>
  </cputune>
...

5. start guest

# virsh start vm1
error: Disconnected from qemu:///system due to end of file
error: Failed to start domain vm1
error: End of file while reading data: Input/output error


Actual results:
libvirtd crashed in the 5 step

Expected results:
report error instead of crash

Additional info:

backtrace:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f3096827700 (LWP 38384)]
0x00007f30a6453be8 in virResctrlAllocParseProcessCache (resctrl=<optimized out>, cache=<optimized out>, type=VIR_CACHE_TYPE_BOTH, level=3, alloc=0x7f30680112d0) at util/virresctrl.c:944
944	    virBitmapShrink(mask, resctrl->levels[level]->types[type]->bits);
(gdb) t a a bt

Thread 17 (Thread 0x7f3096827700 (LWP 38384)):
#0  0x00007f30a6453be8 in virResctrlAllocParseProcessCache (resctrl=<optimized out>, cache=<optimized out>, type=VIR_CACHE_TYPE_BOTH, level=3, alloc=0x7f30680112d0) at util/virresctrl.c:944
#1  virResctrlAllocParseProcessLine (line=0x7f3068011376 "", alloc=0x7f30680112d0, resctrl=0x7f308022b3a0) at util/virresctrl.c:1004
#2  virResctrlAllocParse (schemata=<optimized out>, alloc=0x7f30680112d0, resctrl=0x7f308022b3a0) at util/virresctrl.c:1027
#3  virResctrlAllocGetGroup (resctrl=0x7f308022b3a0, groupname=groupname@entry=0x7f30a6632ba9 ".", alloc=alloc@entry=0x7f30968263d0) at util/virresctrl.c:1058
#4  0x00007f30a6453dce in virResctrlAllocGetDefault (resctrl=<optimized out>) at util/virresctrl.c:1076
#5  0x00007f30a6454ea6 in virResctrlAllocGetUnused (resctrl=resctrl@entry=0x7f308022b3a0) at util/virresctrl.c:1201
#6  0x00007f30a6455221 in virResctrlAllocMasksAssign (alloc=0x7f308028fc70, resctrl=0x7f308022b3a0) at util/virresctrl.c:1441
#7  virResctrlAllocCreate (resctrl=0x7f308022b3a0, alloc=0x7f308028fc70, machinename=<optimized out>) at util/virresctrl.c:1558
#8  0x00007f308891ae9e in qemuProcessResctrlCreate (vm=0x7f3080233140, vm=0x7f3080233140, driver=0x7f30800f0b40) at qemu/qemu_process.c:2521
#9  qemuProcessLaunch (conn=conn@entry=0x7f3080005b40, driver=driver@entry=0x7f30800f0b40, vm=vm@entry=0x7f3080233140, asyncJob=asyncJob@entry=QEMU_ASYNC_JOB_START, incoming=incoming@entry=0x0, 
    snapshot=snapshot@entry=0x0, vmop=vmop@entry=VIR_NETDEV_VPORT_PROFILE_OP_CREATE, flags=flags@entry=17) at qemu/qemu_process.c:5937
#10 0x00007f308891ec27 in qemuProcessStart (conn=conn@entry=0x7f3080005b40, driver=driver@entry=0x7f30800f0b40, vm=vm@entry=0x7f3080233140, updatedCPU=updatedCPU@entry=0x0, 
    asyncJob=asyncJob@entry=QEMU_ASYNC_JOB_START, migrateFrom=migrateFrom@entry=0x0, migrateFd=migrateFd@entry=-1, migratePath=migratePath@entry=0x0, snapshot=snapshot@entry=0x0, 
    vmop=vmop@entry=VIR_NETDEV_VPORT_PROFILE_OP_CREATE, flags=17, flags@entry=1) at qemu/qemu_process.c:6210
#11 0x00007f30889830f6 in qemuDomainObjStart (conn=0x7f3080005b40, driver=driver@entry=0x7f30800f0b40, vm=0x7f3080233140, flags=flags@entry=0, asyncJob=QEMU_ASYNC_JOB_START) at qemu/qemu_driver.c:7298
#12 0x00007f3088983836 in qemuDomainCreateWithFlags (dom=0x7f306800ead0, flags=0) at qemu/qemu_driver.c:7352
#13 0x00007f30a652198c in virDomainCreate (domain=domain@entry=0x7f306800ead0) at libvirt-domain.c:6531
#14 0x000055981cd97a73 in remoteDispatchDomainCreate (server=0x55981dce6f90, msg=0x55981dd01160, args=<optimized out>, rerr=0x7f3096826c10, client=0x55981dd00e40) at remote_dispatch.h:4222
#15 remoteDispatchDomainCreateHelper (server=0x55981dce6f90, client=0x55981dd00e40, msg=0x55981dd01160, rerr=0x7f3096826c10, args=<optimized out>, ret=0x7f30680008e0) at remote_dispatch.h:4198
#16 0x00007f30a6591fe2 in virNetServerProgramDispatchCall (msg=0x55981dd01160, client=0x55981dd00e40, server=0x55981dce6f90, prog=0x55981dcfe990) at rpc/virnetserverprogram.c:437
#17 virNetServerProgramDispatch (prog=0x55981dcfe990, server=server@entry=0x55981dce6f90, client=0x55981dd00e40, msg=0x55981dd01160) at rpc/virnetserverprogram.c:307
#18 0x000055981cda8c7d in virNetServerProcessMsg (msg=<optimized out>, prog=<optimized out>, client=<optimized out>, srv=0x55981dce6f90) at rpc/virnetserver.c:148
#19 virNetServerHandleJob (jobOpaque=<optimized out>, opaque=0x55981dce6f90) at rpc/virnetserver.c:169
#20 0x00007f30a646c191 in virThreadPoolWorker (opaque=opaque@entry=0x55981dcdb5e0) at util/virthreadpool.c:167
#21 0x00007f30a646b518 in virThreadHelper (data=<optimized out>) at util/virthread.c:206
#22 0x00007f30a3871dd5 in start_thread () from /lib64/libpthread.so.0
#23 0x00007f30a359baed in clone () from /lib64/libc.so.6

Comment 1 Martin Kletzander 2018-02-02 14:28:20 UTC
Patches posted upstream:

https://www.redhat.com/archives/libvir-list/2018-February/msg00128.html

Comment 4 Luyao Huang 2018-02-08 02:48:18 UTC
Verify this bug with libvirt-3.9.0-12.el7.x86_64:

1. mount resctrl
# mount -t resctrl resctrl /sys/fs/resctrl/

2. restart libvirtd
# service libvirtd restart
Redirecting to /bin/systemctl restart libvirtd.service

3. umount resctrl and remount with opt:

# umount /sys/fs/resctrl
# mount -t resctrl resctrl -o cdp /sys/fs/resctrl

4. start a guest which have cachetune:

# virsh start vm1
error: Failed to start domain vm1
error: unsupported configuration: Not enough room for allocation of 1048576 bytes for level 3 cache 0 scope type 'code'


And can hit this kind of error when start guest during mount/umount resctrl in a loop:

error: Failed to start domain vm1
error: internal error: Missing or inconsistent resctrl info for level '3d' type 'both'

Comment 5 Luyao Huang 2018-02-08 02:53:47 UTC
Hi Martin,

I think there is a typo in the error message:

error: Failed to start domain vm1
error: internal error: Missing or inconsistent resctrl info for level '3d' type 'both'

The level should be '3'. And check the code:

+        virReportError(VIR_ERR_INTERNAL_ERROR,
+                       _("Missing or inconsistent resctrl info for "
+                         "level '%ud' type '%s'"),
+                       level, virCacheTypeToString(type));

Could you please help to check if this is a typo in the code ? if yes, can i verify this bug and ignore this issue in this bug ? (since it is a really tiny issue, we can fix this in the upstream)

Thanks in advance for your reply

Comment 6 Martin Kletzander 2018-02-08 10:38:32 UTC
(In reply to Luyao Huang from comment #5)

Yes, that's a typo =)  Thanks for letting me know, I'll add it to my future patches.

Comment 7 Luyao Huang 2018-02-09 00:40:06 UTC
(In reply to Martin Kletzander from comment #6)
> (In reply to Luyao Huang from comment #5)
> 
> Yes, that's a typo =)  Thanks for letting me know, I'll add it to my future
> patches.

you are welcome :) thanks for your quick reply !

And move this bug to verified.

Comment 11 errata-xmlrpc 2018-04-10 11:04:21 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:0704