1570902 – LXC domains with nbd attached qcow2 image creates kernel stack trace

Bug 1570902 - LXC domains with nbd attached qcow2 image creates kernel stack trace

Summary: LXC domains with nbd attached qcow2 image creates kernel stack trace

Keywords:
Status:	CLOSED DEFERRED
Alias:	None
Product:	Virtualization Tools
Classification:	Community
Component:	libvirt
Sub Component:
Version:	unspecified
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	Libvirt Maintainers
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-04-23 16:38 UTC by ralph.schmieder
Modified:	2024-12-17 12:25 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2024-12-17 12:25:32 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
libvirt stack trace and read errors from dmesg (4.96 KB, text/plain) 2018-04-23 16:38 UTC, ralph.schmieder	no flags	Details
View All

Description ralph.schmieder 2018-04-23 16:38:24 UTC

Created attachment 1425701 [details]
libvirt stack trace and read errors from dmesg

Description of problem:
When running a LXC domain in libvirt where the disk is defined as 

    <filesystem type='file' accessmode='passthrough'>
      <driver type='nbd' format='qcow2' wrpolicy='immediate'/>
      <source file='/var/local/some_disk.qcow2'/>
      <target dir='/'/>
    </filesystem>

then the domain comes up and runs fine. In this case, this is an Alpine 3.7 container.

However, when stopping/destroying the VM, read errors from the nbd and a kernel stack trace (attached) can be observed and several zombie processes are the result.

Discussed this on IRC, this was the gist of the 'brainstorming':

<danpb> we're putting the qemu-nbd process into the same cgroup  as the rest of the container
<danpb> and thus just relying on all pids in the cgroup being purged
<danpb> there's nothing that ensures we kill qemu-nbd  last
<cbosdonnat> danpb, hum... that would be interesting to try that indeed
<danpb> so any process in the container could still be reading/writing files in the mount on top of the  NBD volume at the time qemu-nbd is killed
<danpb> i think we need to take qemu-nbd out of the cgroup and  use  qemu-nbd -d   to explicitly terminate it at the right time
<danpb> this would also solve the memory pressure deadlocks we sometimes can hit

Version-Release number of selected component (if applicable):

libvirt-daemon-lxc-3.7.0-4.fc27.x86_64
Linux somebox 4.15.17-300.fc27.x86_64 #1 SMP Thu Apr 12 18:19:17 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

How reproducible:
always, using the alpine3.7 container

Steps to Reproduce:
1. start container using the root fs on nbd attached qcow2 image
2. stop  container
3. observe problem

Actual results:
- domain will not completely stop, 
- no "stopped" event seen
- several processes listed as zombies on host
- stack trace

Expected results:
- clean shutdown
- event "stopped" emitted
- no hanging processes


Additional info:
see attachment

Comment 1 ralph.schmieder 2018-04-23 17:17:52 UTC

when destroying, this can be seen in addition:

error: Failed to destroy domain vm_1
error: internal error: Some processes refused to die

Comment 2 Daniel Berrangé 2024-12-17 12:25:32 UTC

Thank you for reporting this issue to the libvirt project. Unfortunately we have been unable to resolve this issue due to insufficient maintainer capacity and it will now be closed. This is not a reflection on the possible validity of the issue, merely the lack of resources to investigate and address it, for which we apologise. If you none the less feel the issue is still important, you may choose to report it again at the new project issue tracker https://gitlab.com/libvirt/libvirt/-/issues The project also welcomes contribution from anyone who believes they can provide a solution.

Note You need to log in before you can comment on or make changes to this bug.