Bug 1478201 - kernel runs out of memory with 256 virtio-scsi disks
kernel runs out of memory with 256 virtio-scsi disks
Status: NEW
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
rawhide
Unspecified Unspecified
unspecified Severity unspecified
: ---
: ---
Assigned To: Kernel Maintainer List
Fedora Extras Quality Assurance
:
Depends On:
Blocks: TRACKER-bugs-affecting-libguestfs
  Show dependency treegraph
 
Reported: 2017-08-03 17:44 EDT by Richard W.M. Jones
Modified: 2017-08-10 13:17 EDT (History)
7 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
find-max-disks.pl (785 bytes, text/plain)
2017-08-04 17:02 EDT, Richard W.M. Jones
no flags Details

  None (edit)
Description Richard W.M. Jones 2017-08-03 17:44:22 EDT
Description of problem:

A recent change to the Rawhide kernel has made it consume
much more RAM when scanning virtio-scsi disks.  Now it cannot
add 256 disks without failing with:

[    1.266507] scsi_alloc_sdev: Allocation failure during SCSI scanning, some SCSI devices might not be configured
[    1.272271] scsi_alloc_sdev: Allocation failure during SCSI scanning, some SCSI devices might not be configured
[    1.277880] scsi_alloc_sdev: Allocation failure during SCSI scanning, some SCSI devices might not be configured

(This happens after 238 disks in this case).  The VM has 500 MB
of RAM and nothing else is running.

Version-Release number of selected component (if applicable):

kernel-4.13.0-0.rc2.git3.1.fc27.x86_64

Didn't fail with kernel-4.11.9-300.fc26.x86_64

How reproducible:

100%

Steps to Reproduce:
1. Add 256 virtio-scsi disks to a VM.
Comment 1 Richard W.M. Jones 2017-08-04 16:07:58 EDT
I bisected this to:

5c279bd9e40624f4ab6e688671026d6005b066fa is the first bad commit
commit 5c279bd9e40624f4ab6e688671026d6005b066fa
Author: Christoph Hellwig <hch@lst.de>
Date:   Fri Jun 16 10:27:55 2017 +0200

    scsi: default to scsi-mq
    
    Remove the SCSI_MQ_DEFAULT config option and default to the blk-mq I/O
    path now that we had plenty of testing, and have I/O schedulers for
    blk-mq.  The module option to disable the blk-mq path is kept around for
    now.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

:040000 040000 57ec7d5d2ba76592a695f533a69f747700c31966 c79f6ecb070acc4fadf6fc05ca9ba32bc9c0c665 M	drivers
Comment 2 Richard W.M. Jones 2017-08-04 16:15:40 EDT
To bisect this I used the following libguestfs script which adds
1 appliance disk + 255 scratch disks (all virtio-scsi) to a VM,
and checks that it boots up to userspace.  The crash happens before
we reach userspace.

#!/usr/bin/perl -w

use Sys::Guestfs;

my $g = Sys::Guestfs->new ();
$g->set_trace (1);
$g->set_verbose (1);
my $i;
for ($i = 0; $i < 255; ++$i) {
    $g->add_drive_scratch (1024*1024);
}
$g->launch ();
$g->shutdown ();
print "PASSED\n"
Comment 3 Richard W.M. Jones 2017-08-04 16:59:48 EDT
I wrote a script to find using a binary search the max number of disks
that can be added to our guest which has 1 vCPU and 500MB RAM (no swap):

With scsi-mq enabled:   175 disks
With scsi-mq disabled: 1755 disks
Comment 4 Richard W.M. Jones 2017-08-04 17:02 EDT
Created attachment 1309205 [details]
find-max-disks.pl

The test I used for comment 3.  This requires supermin >= 5.1.18 and
a patched libguestfs: https://github.com/rwmjones/libguestfs/tree/max-disks
Comment 5 Richard W.M. Jones 2017-08-05 02:27:46 EDT
I started a thread on LKML.  No takers at present ...
https://lkml.org/lkml/2017/8/4/601
Comment 6 Richard W.M. Jones 2017-08-10 13:17:29 EDT
Patches posted to the kernel:
https://lkml.org/lkml/2017/8/10/708

and qemu:
https://lists.nongnu.org/archive/html/qemu-devel/2017-08/msg02085.html

If these are accepted then we will also need changes to libvirt and
libguestfs.

Note You need to log in before you can comment on or make changes to this bug.