Bug 1471203 - BUG: unable to handle kernel NULL pointer dereference at 0000000000000018; IP: get_request+0x14a
BUG: unable to handle kernel NULL pointer dereference at 0000000000000018; IP...
Status: NEW
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
25
x86_64 Linux
unspecified Severity medium
: ---
: ---
Assigned To: Kernel Maintainer List
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-07-14 12:56 EDT by Scott Mayhew
Modified: 2017-07-14 12:59 EDT (History)
8 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Scott Mayhew 2017-07-14 12:56:42 EDT
Description of problem:

I had the following oops when processing a GETDEVICEINFO call from a client:

[  123.079423] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
[  123.080348] IP: get_request+0x14a/0x7b0
[  123.080728] PGD 0 
[  123.080728] 
[  123.081129] Oops: 0000 [#1] SMP
[  123.081470] Modules linked in: rpcsec_gss_krb5 nfsd auth_rpcgss nfs_acl lockd grace binfmt_misc snd_hda_codec_generic snd_hda_intel sunrpc snd_hda_codec crct10dif_pclmul crc32_pclmul snd_hda_core snd_hwdep ghash_clmulni_intel snd_pcm ppdev snd_timer joydev snd virtio_balloon soundcore parport_pc parport i2c_piix4 qemu_fw_cfg tpm_tis tpm_tis_core tpm xfs libcrc32c virtio_net virtio_console virtio_blk qxl drm_kms_helper ttm drm crc32c_intel virtio_pci serio_raw virtio_ring virtio ata_generic pata_acpi
[  123.085581] CPU: 2 PID: 11739 Comm: nfsd Not tainted 4.11.9-200.fc25.x86_64 #1
[  123.086256] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.3-1.fc25 04/01/2014
[  123.087211] task: ffff94fe3b512680 task.stack: ffffa091c1a0c000
[  123.087835] RIP: 0010:get_request+0x14a/0x7b0
[  123.088293] RSP: 0018:ffffa091c1a0fc58 EFLAGS: 00010046
[  123.088855] RAX: ffff94fdf6875198 RBX: ffff94fdf6875198 RCX: ffff94fe3b512680
[  123.089590] RDX: 0000000000000000 RSI: ffff94fdf68751c0 RDI: ffffffffa635f260
[  123.090208] RBP: ffffa091c1a0fd08 R08: ffff94fe3fd1ca80 R09: ffff94fe39fe9300
[  123.090806] R10: ffff94fe39fe9300 R11: ffff94fe3b854220 R12: 0000000000000000
[  123.091399] R13: 0000000000000020 R14: ffff94fe38266200 R15: 0000000000000000
[  123.092034] FS:  0000000000000000(0000) GS:ffff94fe3fd00000(0000) knlGS:0000000000000000
[  123.092703] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  123.093178] CR2: 0000000000000018 CR3: 0000000036079000 CR4: 00000000003406e0
[  123.093768] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  123.094461] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  123.095148] Call Trace:
[  123.095409]  ? remove_wait_queue+0x70/0x70
[  123.095791]  blk_get_request+0x8e/0x120
[  123.096143]  nfsd4_scsi_proc_getdeviceinfo+0xc8/0x3c0 [nfsd]
[  123.096652]  nfsd4_getdeviceinfo+0x11c/0x1c0 [nfsd]
[  123.097101]  nfsd4_proc_compound+0x371/0x6c0 [nfsd]
[  123.097553]  nfsd_dispatch+0xc8/0x260 [nfsd]
[  123.097941]  svc_process_common+0x374/0x6b0 [sunrpc]
[  123.098382]  svc_process+0xfe/0x1b0 [sunrpc]
[  123.098779]  nfsd+0xe9/0x150 [nfsd]
[  123.099105]  kthread+0x109/0x140
[  123.099390]  ? nfsd_destroy+0x60/0x60 [nfsd]
[  123.100082]  ? kthread_park+0x90/0x90
[  123.100686]  ret_from_fork+0x25/0x30
[  123.101167] Code: ff ff ff 48 8d 14 00 48 01 c2 48 c1 e0 02 48 8d 74 d3 28 48 89 45 80 48 01 d8 48 89 45 90 48 89 b5 60 ff ff ff 4c 8b 3b 4d 85 e4 <49> 8b 47 18 48 8b 00 48 89 85 78 ff ff ff 0f 84 a1 03 00 00 4d 
[  123.103141] RIP: get_request+0x14a/0x7b0 RSP: ffffa091c1a0fc58
[  123.103805] CR2: 0000000000000018
Version-Release number of selected component (if applicable):
4.11.9-200.fc25.x86_64

How reproducible:
Easy

Steps to Reproduce:
1. Create an export with 'pnfs' in the export options
2. Mount with '-o v4.1' from the client
3. Attempt to write a file from the client

Actual results:
The oops occurs.  If the machine isn't configured to panic on oops then you can actually ctrl+c on the client and it looks to me like that ties up that session slot on the server (because all subsequent SEQUENCE ops using the same slot number & sequence number return NFS4ERR_DELAY...)

Expected results:
No oops.

Additional info:

I was actually trying to test flexfile layouts... but since the directory I'm exporting is on an xfs filesystem (albeit on a logical volume on my laptop's ssd), in the GETATTR reply for supported layout types the server reports block & scsi layout types in addition to flexfiles.
Comment 1 Scott Mayhew 2017-07-14 12:57:26 EDT
crash> bt
PID: 11739  TASK: ffff94fe3b512680  CPU: 2   COMMAND: "nfsd"
 #0 [ffffa091c1a0f8a8] machine_kexec at ffffffffa505873a
 #1 [ffffa091c1a0f908] __crash_kexec at ffffffffa5139ced
 #2 [ffffa091c1a0f9d0] __crash_kexec at ffffffffa5139dc5
 #3 [ffffa091c1a0f9e8] crash_kexec at ffffffffa5139e0b
 #4 [ffffa091c1a0fa08] oops_end at ffffffffa502a484
 #5 [ffffa091c1a0fa30] no_context at ffffffffa506575f
 #6 [ffffa091c1a0fa98] __bad_area_nosemaphore at ffffffffa5065a31
 #7 [ffffa091c1a0fad8] bad_area_nosemaphore at ffffffffa5065b74
 #8 [ffffa091c1a0fae8] __do_page_fault at ffffffffa5065f1e
 #9 [ffffa091c1a0fb50] trace_do_page_fault at ffffffffa50663f1
#10 [ffffa091c1a0fb88] do_async_page_fault at ffffffffa505fdca
#11 [ffffa091c1a0fba0] async_page_fault at ffffffffa5879e28
    [exception RIP: get_request+0x14a]
    RIP: ffffffffa53e374a  RSP: ffffa091c1a0fc58  RFLAGS: 00010046
    RAX: ffff94fdf6875198  RBX: ffff94fdf6875198  RCX: ffff94fe3b512680
    RDX: 0000000000000000  RSI: ffff94fdf68751c0  RDI: ffffffffa635f260
    RBP: ffffa091c1a0fd08   R8: ffff94fe3fd1ca80   R9: ffff94fe39fe9300
    R10: ffff94fe39fe9300  R11: ffff94fe3b854220  R12: 0000000000000000
    R13: 0000000000000020  R14: ffff94fe38266200  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
#12 [ffffa091c1a0fd10] blk_get_request at ffffffffa53e48be
#13 [ffffa091c1a0fd40] nfsd4_scsi_proc_getdeviceinfo at ffffffffc055e158 [nfsd]
#14 [ffffa091c1a0fd78] nfsd4_getdeviceinfo at ffffffffc053e9ec [nfsd]
#15 [ffffa091c1a0fdb8] nfsd4_proc_compound at ffffffffc053f7c1 [nfsd]
#16 [ffffa091c1a0fe18] nfsd_dispatch at ffffffffc052c338 [nfsd]
#17 [ffffa091c1a0fe50] svc_process_common at ffffffffc04c3ec4 [sunrpc]
#18 [ffffa091c1a0feb8] svc_process at ffffffffc04c516e [sunrpc]
#19 [ffffa091c1a0fee0] nfsd at ffffffffc052bd89 [nfsd]
#20 [ffffa091c1a0ff08] kthread at ffffffffa50c5f29
#21 [ffffa091c1a0ff50] ret_from_fork at ffffffffa5878b55

crash> dis -lr get_request+0x14a
...
/usr/src/debug/kernel-4.11.fc25/linux-4.11.9-200.fc25.x86_64/block/blk-core.c: 1054
0xffffffffa53e374a <get_request+0x14a>: mov    0x18(%r15),%rax

static struct request *__get_request(struct request_list *rl, unsigned int op,
                struct bio *bio, gfp_t gfp_mask)
{
        struct request_queue *q = rl->q;
        struct request *rq;
        struct elevator_type *et = q->elevator->type; <-----HERE
...

crash> request_queue.elevator -o
struct request_queue {
   [0x18] struct elevator_queue *elevator;
}

crash> mount | grep xfs
ffff94fe3ca15080 ffff94fe3ca21800 xfs    /dev/mapper/fedora-root /

crash> super_block.s_bdev ffff94fe3ca21800
  s_bdev = 0xffff94fe3b168d00

crash> block_device.bd_disk 0xffff94fe3b168d00
  bd_disk = 0xffff94fdf6779800

crash> gendisk.queue 0xffff94fdf6779800
  queue = 0xffff94fdf6875160

Call chain for nfsd4_scsi_proc_getdeviceinfo:

nfsd4_scsi_proc_getdeviceinfo
  nfsd4_block_get_device_info_scsi
    nfsd4_scsi_identify_device
      blk_get_request

struct request *blk_get_request(struct request_queue *q, int rw, gfp_t gfp_mask)
{
        if (q->mq_ops)
                return blk_mq_alloc_request(q, rw,
                        (gfp_mask & __GFP_DIRECT_RECLAIM) ?
                                0 : BLK_MQ_REQ_NOWAIT);
        else
                return blk_old_get_request(q, rw, gfp_mask);
...

crash> request_queue.mq_ops 0xffff94fdf6875160
  mq_ops = 0x0

So we call blk_old_get_request which calls:

        rq = get_request(q, rw, NULL, gfp_mask);

static struct request *get_request(struct request_queue *q, unsigned int op,
                struct bio *bio, gfp_t gfp_mask)
{
        const bool is_sync = op_is_sync(op);
        DEFINE_WAIT(wait);
        struct request_list *rl;
        struct request *rq;

        rl = blk_get_rl(q, bio);        /* transferred to @rq on success */
retry:
        rq = __get_request(rl, op, bio, gfp_mask);
...

static inline struct request_list *blk_get_rl(struct request_queue *q,
                                              struct bio *bio)
{
        struct blkcg *blkcg;
        struct blkcg_gq *blkg;

        rcu_read_lock();

        blkcg = bio_blkcg(bio);
...

static inline struct blkcg *bio_blkcg(struct bio *bio)
{
        if (bio && bio->bi_css)
                return css_to_blkcg(bio->bi_css);
        return task_blkcg(current);
}

blk_old_get_request() passed NULL for the bio, so we use task_blkcg(current).

static inline struct blkcg *task_blkcg(struct task_struct *tsk)
{
        return css_to_blkcg(task_css(tsk, io_cgrp_id));
}    

static inline struct cgroup_subsys_state *task_css(struct task_struct *task,
                                                   int subsys_id)
{
        return task_css_check(task, subsys_id, false);
}

#define task_css_check(task, subsys_id, __c)                            \
        task_css_set_check((task), (__c))->subsys[(subsys_id)]

#define task_css_set_check(task, __c)                                   \
        rcu_dereference((task)->cgroups)

static inline struct blkcg *css_to_blkcg(struct cgroup_subsys_state *css)
{
        return css ? container_of(css, struct blkcg, css) : NULL;
}

crash> task -R cgroups ffff94fe3b512680
PID: 11739  TASK: ffff94fe3b512680  CPU: 2   COMMAND: "nfsd"
  cgroups = 0xffffffffa5e86360, 

crash> io_cgrp_id
enum cgroup_subsys_id = 3

crash> p ((struct css_set *)0xffffffffa5e86360)->subsys[3]
$7 = (struct cgroup_subsys_state *) 0xffffffffa635f260

crash> blkcg.css -o
struct blkcg {
    [0x0] struct cgroup_subsys_state css;
}

So 0xffffffffa635f260 is also the address of blkcg.

Continuing on in blk_get_rl() we have:
...
        if (blkcg == &blkcg_root)
                goto root_rl;
...
root_rl:
        rcu_read_unlock();
        return &q->root_rl;
...

crash> p &blkcg_root
$8 = (struct blkcg *) 0xffffffffa635f260

crash> px &((struct request_queue *)0xffff94fdf6875160)->root_rl
$9 = (struct request_list *) 0xffff94fdf6875198
crash> request_list.q 0xffff94fdf6875198
  q = 0x0

and when we tried to access the request_queue->elevator we oopsed.

Note You need to log in before you can comment on or make changes to this bug.