Bug 1478201
Summary: | kernel runs out of memory with 256 virtio-scsi disks | ||||||
---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Richard W.M. Jones <rjones> | ||||
Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> | ||||
Status: | CLOSED UPSTREAM | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||
Severity: | unspecified | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | rawhide | CC: | gansalmon, ichavero, itamar, jforbes, jonathan, kernel-maint, labbott, madhu.chinakonda, mchehab, pcahyna, rjones, tbzatek | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2018-05-02 08:55:16 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 910269 | ||||||
Attachments: |
|
Description
Richard W.M. Jones
2017-08-03 21:44:22 UTC
I bisected this to: 5c279bd9e40624f4ab6e688671026d6005b066fa is the first bad commit commit 5c279bd9e40624f4ab6e688671026d6005b066fa Author: Christoph Hellwig <hch> Date: Fri Jun 16 10:27:55 2017 +0200 scsi: default to scsi-mq Remove the SCSI_MQ_DEFAULT config option and default to the blk-mq I/O path now that we had plenty of testing, and have I/O schedulers for blk-mq. The module option to disable the blk-mq path is kept around for now. Signed-off-by: Christoph Hellwig <hch> Signed-off-by: Martin K. Petersen <martin.petersen> :040000 040000 57ec7d5d2ba76592a695f533a69f747700c31966 c79f6ecb070acc4fadf6fc05ca9ba32bc9c0c665 M drivers To bisect this I used the following libguestfs script which adds 1 appliance disk + 255 scratch disks (all virtio-scsi) to a VM, and checks that it boots up to userspace. The crash happens before we reach userspace. #!/usr/bin/perl -w use Sys::Guestfs; my $g = Sys::Guestfs->new (); $g->set_trace (1); $g->set_verbose (1); my $i; for ($i = 0; $i < 255; ++$i) { $g->add_drive_scratch (1024*1024); } $g->launch (); $g->shutdown (); print "PASSED\n" I wrote a script to find using a binary search the max number of disks that can be added to our guest which has 1 vCPU and 500MB RAM (no swap): With scsi-mq enabled: 175 disks With scsi-mq disabled: 1755 disks Created attachment 1309205 [details] find-max-disks.pl The test I used for comment 3. This requires supermin >= 5.1.18 and a patched libguestfs: https://github.com/rwmjones/libguestfs/tree/max-disks I started a thread on LKML. No takers at present ... https://lkml.org/lkml/2017/8/4/601 Patches posted to the kernel: https://lkml.org/lkml/2017/8/10/708 and qemu: https://lists.nongnu.org/archive/html/qemu-devel/2017-08/msg02085.html If these are accepted then we will also need changes to libvirt and libguestfs. Did these get picked up? This is not fixed upstream. Please leave this bug open. I have some more news to report on this. It was temporarily fixed in 4.15/4.16, but it has regressed again in 4.17.0-rc1. 4.15.0-0.rc2.git2.1.fc28.x86_64: >= 256 virtio-scsi disks * 4.15.0-0.rc8.git0.1.fc28.x86_64: >= 256 virtio-scsi disks * 4.16.3-300.fc28.x86_64: >= 256 virtio-scsi disks * 4.17.0-0.rc1.git1.1.fc29.x86_64: 191 virtio-scsi disks Could this be something to do with Rawhide kernels & debug settings? How do I find out if a Rawhide kernel has debug enabled? * The version of libguestfs I'm using doesn't allow me to add more than 256 disks. In general, rawhide kernels have debug enabled other than the rc*-git0.1 versions. If you want to test whether it is a debug vs non debug issue, you can always check the kernels from the rawhide-nodebug respository. kernel-4.17.0-0.rc3.git1.2.fc29.x86_64 (nodebug): >= 256 virtio-scsi disks So yes it looks like enabling debug reduces the number of virtio-scsi disks that can be added for whatever reason. Since this is now working I'm going to close this bug as fixed upstream. Is it really fixed? I am having a similar problem with the scsi_debug driver, bz1675071. The bug does not show up in Fedora, but this seems to be simply because scsi-mq is off by default. |