Bugzilla (bugzilla.redhat.com) will be under maintenance for infrastructure upgrades and will not be available on July 31st between 12:30 AM - 05:30 AM UTC. We appreciate your understanding and patience. You can follow status.redhat.com for details.
Bug 1570902 - LXC domains with nbd attached qcow2 image creates kernel stack trace
Summary: LXC domains with nbd attached qcow2 image creates kernel stack trace
Status: NEW
Alias: None
Product: Virtualization Tools
Classification: Community
Component: libvirt
Version: unspecified
Hardware: x86_64
OS: Linux
Target Milestone: ---
Assignee: Libvirt Maintainers
QA Contact:
Depends On:
TreeView+ depends on / blocked
Reported: 2018-04-23 16:38 UTC by ralph.schmieder
Modified: 2018-10-16 16:53 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Last Closed:

Attachments (Terms of Use)
libvirt stack trace and read errors from dmesg (4.96 KB, text/plain)
2018-04-23 16:38 UTC, ralph.schmieder
no flags Details

Description ralph.schmieder 2018-04-23 16:38:24 UTC
Created attachment 1425701 [details]
libvirt stack trace and read errors from dmesg

Description of problem:
When running a LXC domain in libvirt where the disk is defined as 

    <filesystem type='file' accessmode='passthrough'>
      <driver type='nbd' format='qcow2' wrpolicy='immediate'/>
      <source file='/var/local/some_disk.qcow2'/>
      <target dir='/'/>

then the domain comes up and runs fine. In this case, this is an Alpine 3.7 container.

However, when stopping/destroying the VM, read errors from the nbd and a kernel stack trace (attached) can be observed and several zombie processes are the result.

Discussed this on IRC, this was the gist of the 'brainstorming':

<danpb> we're putting the qemu-nbd process into the same cgroup  as the rest of the container
<danpb> and thus just relying on all pids in the cgroup being purged
<danpb> there's nothing that ensures we kill qemu-nbd  last
<cbosdonnat> danpb, hum... that would be interesting to try that indeed
<danpb> so any process in the container could still be reading/writing files in the mount on top of the  NBD volume at the time qemu-nbd is killed
<danpb> i think we need to take qemu-nbd out of the cgroup and  use  qemu-nbd -d   to explicitly terminate it at the right time
<danpb> this would also solve the memory pressure deadlocks we sometimes can hit

Version-Release number of selected component (if applicable):

Linux somebox 4.15.17-300.fc27.x86_64 #1 SMP Thu Apr 12 18:19:17 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

How reproducible:
always, using the alpine3.7 container

Steps to Reproduce:
1. start container using the root fs on nbd attached qcow2 image
2. stop  container
3. observe problem

Actual results:
- domain will not completely stop, 
- no "stopped" event seen
- several processes listed as zombies on host
- stack trace

Expected results:
- clean shutdown
- event "stopped" emitted
- no hanging processes

Additional info:
see attachment

Comment 1 ralph.schmieder 2018-04-23 17:17:52 UTC
when destroying, this can be seen in addition:

error: Failed to destroy domain vm_1
error: internal error: Some processes refused to die

Note You need to log in before you can comment on or make changes to this bug.