Created attachment 1425701 [details] libvirt stack trace and read errors from dmesg Description of problem: When running a LXC domain in libvirt where the disk is defined as <filesystem type='file' accessmode='passthrough'> <driver type='nbd' format='qcow2' wrpolicy='immediate'/> <source file='/var/local/some_disk.qcow2'/> <target dir='/'/> </filesystem> then the domain comes up and runs fine. In this case, this is an Alpine 3.7 container. However, when stopping/destroying the VM, read errors from the nbd and a kernel stack trace (attached) can be observed and several zombie processes are the result. Discussed this on IRC, this was the gist of the 'brainstorming': <danpb> we're putting the qemu-nbd process into the same cgroup as the rest of the container <danpb> and thus just relying on all pids in the cgroup being purged <danpb> there's nothing that ensures we kill qemu-nbd last <cbosdonnat> danpb, hum... that would be interesting to try that indeed <danpb> so any process in the container could still be reading/writing files in the mount on top of the NBD volume at the time qemu-nbd is killed <danpb> i think we need to take qemu-nbd out of the cgroup and use qemu-nbd -d to explicitly terminate it at the right time <danpb> this would also solve the memory pressure deadlocks we sometimes can hit Version-Release number of selected component (if applicable): libvirt-daemon-lxc-3.7.0-4.fc27.x86_64 Linux somebox 4.15.17-300.fc27.x86_64 #1 SMP Thu Apr 12 18:19:17 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux How reproducible: always, using the alpine3.7 container Steps to Reproduce: 1. start container using the root fs on nbd attached qcow2 image 2. stop container 3. observe problem Actual results: - domain will not completely stop, - no "stopped" event seen - several processes listed as zombies on host - stack trace Expected results: - clean shutdown - event "stopped" emitted - no hanging processes Additional info: see attachment
when destroying, this can be seen in addition: error: Failed to destroy domain vm_1 error: internal error: Some processes refused to die