Bug 1570902
Summary: | LXC domains with nbd attached qcow2 image creates kernel stack trace | ||||||
---|---|---|---|---|---|---|---|
Product: | [Community] Virtualization Tools | Reporter: | ralph.schmieder | ||||
Component: | libvirt | Assignee: | Libvirt Maintainers <libvirt-maint> | ||||
Status: | CLOSED DEFERRED | QA Contact: | |||||
Severity: | unspecified | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | unspecified | CC: | berrange, libvirt-maint, ralph.schmieder | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2024-12-17 12:25:32 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
when destroying, this can be seen in addition: error: Failed to destroy domain vm_1 error: internal error: Some processes refused to die Thank you for reporting this issue to the libvirt project. Unfortunately we have been unable to resolve this issue due to insufficient maintainer capacity and it will now be closed. This is not a reflection on the possible validity of the issue, merely the lack of resources to investigate and address it, for which we apologise. If you none the less feel the issue is still important, you may choose to report it again at the new project issue tracker https://gitlab.com/libvirt/libvirt/-/issues The project also welcomes contribution from anyone who believes they can provide a solution. |
Created attachment 1425701 [details] libvirt stack trace and read errors from dmesg Description of problem: When running a LXC domain in libvirt where the disk is defined as <filesystem type='file' accessmode='passthrough'> <driver type='nbd' format='qcow2' wrpolicy='immediate'/> <source file='/var/local/some_disk.qcow2'/> <target dir='/'/> </filesystem> then the domain comes up and runs fine. In this case, this is an Alpine 3.7 container. However, when stopping/destroying the VM, read errors from the nbd and a kernel stack trace (attached) can be observed and several zombie processes are the result. Discussed this on IRC, this was the gist of the 'brainstorming': <danpb> we're putting the qemu-nbd process into the same cgroup as the rest of the container <danpb> and thus just relying on all pids in the cgroup being purged <danpb> there's nothing that ensures we kill qemu-nbd last <cbosdonnat> danpb, hum... that would be interesting to try that indeed <danpb> so any process in the container could still be reading/writing files in the mount on top of the NBD volume at the time qemu-nbd is killed <danpb> i think we need to take qemu-nbd out of the cgroup and use qemu-nbd -d to explicitly terminate it at the right time <danpb> this would also solve the memory pressure deadlocks we sometimes can hit Version-Release number of selected component (if applicable): libvirt-daemon-lxc-3.7.0-4.fc27.x86_64 Linux somebox 4.15.17-300.fc27.x86_64 #1 SMP Thu Apr 12 18:19:17 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux How reproducible: always, using the alpine3.7 container Steps to Reproduce: 1. start container using the root fs on nbd attached qcow2 image 2. stop container 3. observe problem Actual results: - domain will not completely stop, - no "stopped" event seen - several processes listed as zombies on host - stack trace Expected results: - clean shutdown - event "stopped" emitted - no hanging processes Additional info: see attachment