Bug 1570902
Summary: | LXC domains with nbd attached qcow2 image creates kernel stack trace | ||||||
---|---|---|---|---|---|---|---|
Product: | [Community] Virtualization Tools | Reporter: | ralph.schmieder | ||||
Component: | libvirt | Assignee: | Libvirt Maintainers <libvirt-maint> | ||||
Status: | NEW --- | QA Contact: | |||||
Severity: | unspecified | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | unspecified | CC: | libvirt-maint, ralph.schmieder | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | Type: | Bug | |||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
when destroying, this can be seen in addition: error: Failed to destroy domain vm_1 error: internal error: Some processes refused to die |
Created attachment 1425701 [details] libvirt stack trace and read errors from dmesg Description of problem: When running a LXC domain in libvirt where the disk is defined as <filesystem type='file' accessmode='passthrough'> <driver type='nbd' format='qcow2' wrpolicy='immediate'/> <source file='/var/local/some_disk.qcow2'/> <target dir='/'/> </filesystem> then the domain comes up and runs fine. In this case, this is an Alpine 3.7 container. However, when stopping/destroying the VM, read errors from the nbd and a kernel stack trace (attached) can be observed and several zombie processes are the result. Discussed this on IRC, this was the gist of the 'brainstorming': <danpb> we're putting the qemu-nbd process into the same cgroup as the rest of the container <danpb> and thus just relying on all pids in the cgroup being purged <danpb> there's nothing that ensures we kill qemu-nbd last <cbosdonnat> danpb, hum... that would be interesting to try that indeed <danpb> so any process in the container could still be reading/writing files in the mount on top of the NBD volume at the time qemu-nbd is killed <danpb> i think we need to take qemu-nbd out of the cgroup and use qemu-nbd -d to explicitly terminate it at the right time <danpb> this would also solve the memory pressure deadlocks we sometimes can hit Version-Release number of selected component (if applicable): libvirt-daemon-lxc-3.7.0-4.fc27.x86_64 Linux somebox 4.15.17-300.fc27.x86_64 #1 SMP Thu Apr 12 18:19:17 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux How reproducible: always, using the alpine3.7 container Steps to Reproduce: 1. start container using the root fs on nbd attached qcow2 image 2. stop container 3. observe problem Actual results: - domain will not completely stop, - no "stopped" event seen - several processes listed as zombies on host - stack trace Expected results: - clean shutdown - event "stopped" emitted - no hanging processes Additional info: see attachment