Bug 1478201
| Summary: | kernel runs out of memory with 256 virtio-scsi disks | ||||||
|---|---|---|---|---|---|---|---|
| Product: | [Fedora] Fedora | Reporter: | Richard W.M. Jones <rjones> | ||||
| Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> | ||||
| Status: | CLOSED UPSTREAM | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||
| Severity: | unspecified | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | rawhide | CC: | gansalmon, ichavero, itamar, jforbes, jonathan, kernel-maint, labbott, madhu.chinakonda, mchehab, pcahyna, rjones, tbzatek | ||||
| Target Milestone: | --- | ||||||
| Target Release: | --- | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2018-05-02 08:55:16 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | |||||||
| Bug Blocks: | 910269 | ||||||
| Attachments: |
|
||||||
|
Description
Richard W.M. Jones
2017-08-03 21:44:22 UTC
I bisected this to:
5c279bd9e40624f4ab6e688671026d6005b066fa is the first bad commit
commit 5c279bd9e40624f4ab6e688671026d6005b066fa
Author: Christoph Hellwig <hch>
Date: Fri Jun 16 10:27:55 2017 +0200
scsi: default to scsi-mq
Remove the SCSI_MQ_DEFAULT config option and default to the blk-mq I/O
path now that we had plenty of testing, and have I/O schedulers for
blk-mq. The module option to disable the blk-mq path is kept around for
now.
Signed-off-by: Christoph Hellwig <hch>
Signed-off-by: Martin K. Petersen <martin.petersen>
:040000 040000 57ec7d5d2ba76592a695f533a69f747700c31966 c79f6ecb070acc4fadf6fc05ca9ba32bc9c0c665 M drivers
To bisect this I used the following libguestfs script which adds
1 appliance disk + 255 scratch disks (all virtio-scsi) to a VM,
and checks that it boots up to userspace. The crash happens before
we reach userspace.
#!/usr/bin/perl -w
use Sys::Guestfs;
my $g = Sys::Guestfs->new ();
$g->set_trace (1);
$g->set_verbose (1);
my $i;
for ($i = 0; $i < 255; ++$i) {
$g->add_drive_scratch (1024*1024);
}
$g->launch ();
$g->shutdown ();
print "PASSED\n"
I wrote a script to find using a binary search the max number of disks that can be added to our guest which has 1 vCPU and 500MB RAM (no swap): With scsi-mq enabled: 175 disks With scsi-mq disabled: 1755 disks Created attachment 1309205 [details] find-max-disks.pl The test I used for comment 3. This requires supermin >= 5.1.18 and a patched libguestfs: https://github.com/rwmjones/libguestfs/tree/max-disks I started a thread on LKML. No takers at present ... https://lkml.org/lkml/2017/8/4/601 Patches posted to the kernel: https://lkml.org/lkml/2017/8/10/708 and qemu: https://lists.nongnu.org/archive/html/qemu-devel/2017-08/msg02085.html If these are accepted then we will also need changes to libvirt and libguestfs. Did these get picked up? This is not fixed upstream. Please leave this bug open. I have some more news to report on this. It was temporarily fixed in 4.15/4.16, but it has regressed again in 4.17.0-rc1. 4.15.0-0.rc2.git2.1.fc28.x86_64: >= 256 virtio-scsi disks * 4.15.0-0.rc8.git0.1.fc28.x86_64: >= 256 virtio-scsi disks * 4.16.3-300.fc28.x86_64: >= 256 virtio-scsi disks * 4.17.0-0.rc1.git1.1.fc29.x86_64: 191 virtio-scsi disks Could this be something to do with Rawhide kernels & debug settings? How do I find out if a Rawhide kernel has debug enabled? * The version of libguestfs I'm using doesn't allow me to add more than 256 disks. In general, rawhide kernels have debug enabled other than the rc*-git0.1 versions. If you want to test whether it is a debug vs non debug issue, you can always check the kernels from the rawhide-nodebug respository. kernel-4.17.0-0.rc3.git1.2.fc29.x86_64 (nodebug): >= 256 virtio-scsi disks So yes it looks like enabling debug reduces the number of virtio-scsi disks that can be added for whatever reason. Since this is now working I'm going to close this bug as fixed upstream. Is it really fixed? I am having a similar problem with the scsi_debug driver, bz1675071. The bug does not show up in Fedora, but this seems to be simply because scsi-mq is off by default. |