Bug 1471203

Summary:	BUG: unable to handle kernel NULL pointer dereference at 0000000000000018; IP: get_request+0x14a
Product:	[Fedora] Fedora	Reporter:	Scott Mayhew <smayhew>
Component:	kernel	Assignee:	Kernel Maintainer List <kernel-maint>
Status:	CLOSED EOL	QA Contact:	Fedora Extras Quality Assurance <extras-qa>
Severity:	medium	Docs Contact:
Priority:	unspecified
Version:	25	CC:	bfields, gansalmon, ichavero, itamar, jonathan, kernel-maint, madhu.chinakonda, mchehab
Target Milestone:	---
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2017-12-12 10:22:47 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Scott Mayhew 2017-07-14 16:56:42 UTC

Description of problem:

I had the following oops when processing a GETDEVICEINFO call from a client:

[  123.079423] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
[  123.080348] IP: get_request+0x14a/0x7b0
[  123.080728] PGD 0 
[  123.080728] 
[  123.081129] Oops: 0000 [#1] SMP
[  123.081470] Modules linked in: rpcsec_gss_krb5 nfsd auth_rpcgss nfs_acl lockd grace binfmt_misc snd_hda_codec_generic snd_hda_intel sunrpc snd_hda_codec crct10dif_pclmul crc32_pclmul snd_hda_core snd_hwdep ghash_clmulni_intel snd_pcm ppdev snd_timer joydev snd virtio_balloon soundcore parport_pc parport i2c_piix4 qemu_fw_cfg tpm_tis tpm_tis_core tpm xfs libcrc32c virtio_net virtio_console virtio_blk qxl drm_kms_helper ttm drm crc32c_intel virtio_pci serio_raw virtio_ring virtio ata_generic pata_acpi
[  123.085581] CPU: 2 PID: 11739 Comm: nfsd Not tainted 4.11.9-200.fc25.x86_64 #1
[  123.086256] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.3-1.fc25 04/01/2014
[  123.087211] task: ffff94fe3b512680 task.stack: ffffa091c1a0c000
[  123.087835] RIP: 0010:get_request+0x14a/0x7b0
[  123.088293] RSP: 0018:ffffa091c1a0fc58 EFLAGS: 00010046
[  123.088855] RAX: ffff94fdf6875198 RBX: ffff94fdf6875198 RCX: ffff94fe3b512680
[  123.089590] RDX: 0000000000000000 RSI: ffff94fdf68751c0 RDI: ffffffffa635f260
[  123.090208] RBP: ffffa091c1a0fd08 R08: ffff94fe3fd1ca80 R09: ffff94fe39fe9300
[  123.090806] R10: ffff94fe39fe9300 R11: ffff94fe3b854220 R12: 0000000000000000
[  123.091399] R13: 0000000000000020 R14: ffff94fe38266200 R15: 0000000000000000
[  123.092034] FS:  0000000000000000(0000) GS:ffff94fe3fd00000(0000) knlGS:0000000000000000
[  123.092703] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  123.093178] CR2: 0000000000000018 CR3: 0000000036079000 CR4: 00000000003406e0
[  123.093768] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  123.094461] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  123.095148] Call Trace:
[  123.095409]  ? remove_wait_queue+0x70/0x70
[  123.095791]  blk_get_request+0x8e/0x120
[  123.096143]  nfsd4_scsi_proc_getdeviceinfo+0xc8/0x3c0 [nfsd]
[  123.096652]  nfsd4_getdeviceinfo+0x11c/0x1c0 [nfsd]
[  123.097101]  nfsd4_proc_compound+0x371/0x6c0 [nfsd]
[  123.097553]  nfsd_dispatch+0xc8/0x260 [nfsd]
[  123.097941]  svc_process_common+0x374/0x6b0 [sunrpc]
[  123.098382]  svc_process+0xfe/0x1b0 [sunrpc]
[  123.098779]  nfsd+0xe9/0x150 [nfsd]
[  123.099105]  kthread+0x109/0x140
[  123.099390]  ? nfsd_destroy+0x60/0x60 [nfsd]
[  123.100082]  ? kthread_park+0x90/0x90
[  123.100686]  ret_from_fork+0x25/0x30
[  123.101167] Code: ff ff ff 48 8d 14 00 48 01 c2 48 c1 e0 02 48 8d 74 d3 28 48 89 45 80 48 01 d8 48 89 45 90 48 89 b5 60 ff ff ff 4c 8b 3b 4d 85 e4 <49> 8b 47 18 48 8b 00 48 89 85 78 ff ff ff 0f 84 a1 03 00 00 4d 
[  123.103141] RIP: get_request+0x14a/0x7b0 RSP: ffffa091c1a0fc58
[  123.103805] CR2: 0000000000000018
Version-Release number of selected component (if applicable):
4.11.9-200.fc25.x86_64

How reproducible:
Easy

Steps to Reproduce:
1. Create an export with 'pnfs' in the export options
2. Mount with '-o v4.1' from the client
3. Attempt to write a file from the client

Actual results:
The oops occurs.  If the machine isn't configured to panic on oops then you can actually ctrl+c on the client and it looks to me like that ties up that session slot on the server (because all subsequent SEQUENCE ops using the same slot number & sequence number return NFS4ERR_DELAY...)

Expected results:
No oops.

Additional info:

I was actually trying to test flexfile layouts... but since the directory I'm exporting is on an xfs filesystem (albeit on a logical volume on my laptop's ssd), in the GETATTR reply for supported layout types the server reports block & scsi layout types in addition to flexfiles.

Comment 1 Scott Mayhew 2017-07-14 16:57:26 UTC

crash> bt
PID: 11739  TASK: ffff94fe3b512680  CPU: 2   COMMAND: "nfsd"
 #0 [ffffa091c1a0f8a8] machine_kexec at ffffffffa505873a
 #1 [ffffa091c1a0f908] __crash_kexec at ffffffffa5139ced
 #2 [ffffa091c1a0f9d0] __crash_kexec at ffffffffa5139dc5
 #3 [ffffa091c1a0f9e8] crash_kexec at ffffffffa5139e0b
 #4 [ffffa091c1a0fa08] oops_end at ffffffffa502a484
 #5 [ffffa091c1a0fa30] no_context at ffffffffa506575f
 #6 [ffffa091c1a0fa98] __bad_area_nosemaphore at ffffffffa5065a31
 #7 [ffffa091c1a0fad8] bad_area_nosemaphore at ffffffffa5065b74
 #8 [ffffa091c1a0fae8] __do_page_fault at ffffffffa5065f1e
 #9 [ffffa091c1a0fb50] trace_do_page_fault at ffffffffa50663f1
#10 [ffffa091c1a0fb88] do_async_page_fault at ffffffffa505fdca
#11 [ffffa091c1a0fba0] async_page_fault at ffffffffa5879e28
    [exception RIP: get_request+0x14a]
    RIP: ffffffffa53e374a  RSP: ffffa091c1a0fc58  RFLAGS: 00010046
    RAX: ffff94fdf6875198  RBX: ffff94fdf6875198  RCX: ffff94fe3b512680
    RDX: 0000000000000000  RSI: ffff94fdf68751c0  RDI: ffffffffa635f260
    RBP: ffffa091c1a0fd08   R8: ffff94fe3fd1ca80   R9: ffff94fe39fe9300
    R10: ffff94fe39fe9300  R11: ffff94fe3b854220  R12: 0000000000000000
    R13: 0000000000000020  R14: ffff94fe38266200  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
#12 [ffffa091c1a0fd10] blk_get_request at ffffffffa53e48be
#13 [ffffa091c1a0fd40] nfsd4_scsi_proc_getdeviceinfo at ffffffffc055e158 [nfsd]
#14 [ffffa091c1a0fd78] nfsd4_getdeviceinfo at ffffffffc053e9ec [nfsd]
#15 [ffffa091c1a0fdb8] nfsd4_proc_compound at ffffffffc053f7c1 [nfsd]
#16 [ffffa091c1a0fe18] nfsd_dispatch at ffffffffc052c338 [nfsd]
#17 [ffffa091c1a0fe50] svc_process_common at ffffffffc04c3ec4 [sunrpc]
#18 [ffffa091c1a0feb8] svc_process at ffffffffc04c516e [sunrpc]
#19 [ffffa091c1a0fee0] nfsd at ffffffffc052bd89 [nfsd]
#20 [ffffa091c1a0ff08] kthread at ffffffffa50c5f29
#21 [ffffa091c1a0ff50] ret_from_fork at ffffffffa5878b55

crash> dis -lr get_request+0x14a
...
/usr/src/debug/kernel-4.11.fc25/linux-4.11.9-200.fc25.x86_64/block/blk-core.c: 1054
0xffffffffa53e374a <get_request+0x14a>: mov    0x18(%r15),%rax

static struct request *__get_request(struct request_list *rl, unsigned int op,
                struct bio *bio, gfp_t gfp_mask)
{
        struct request_queue *q = rl->q;
        struct request *rq;
        struct elevator_type *et = q->elevator->type; <-----HERE
...

crash> request_queue.elevator -o
struct request_queue {
   [0x18] struct elevator_queue *elevator;
}

crash> mount | grep xfs
ffff94fe3ca15080 ffff94fe3ca21800 xfs    /dev/mapper/fedora-root /

crash> super_block.s_bdev ffff94fe3ca21800
  s_bdev = 0xffff94fe3b168d00

crash> block_device.bd_disk 0xffff94fe3b168d00
  bd_disk = 0xffff94fdf6779800

crash> gendisk.queue 0xffff94fdf6779800
  queue = 0xffff94fdf6875160

Call chain for nfsd4_scsi_proc_getdeviceinfo:

nfsd4_scsi_proc_getdeviceinfo
  nfsd4_block_get_device_info_scsi
    nfsd4_scsi_identify_device
      blk_get_request

struct request *blk_get_request(struct request_queue *q, int rw, gfp_t gfp_mask)
{
        if (q->mq_ops)
                return blk_mq_alloc_request(q, rw,
                        (gfp_mask & __GFP_DIRECT_RECLAIM) ?
                                0 : BLK_MQ_REQ_NOWAIT);
        else
                return blk_old_get_request(q, rw, gfp_mask);
...

crash> request_queue.mq_ops 0xffff94fdf6875160
  mq_ops = 0x0

So we call blk_old_get_request which calls:

        rq = get_request(q, rw, NULL, gfp_mask);

static struct request *get_request(struct request_queue *q, unsigned int op,
                struct bio *bio, gfp_t gfp_mask)
{
        const bool is_sync = op_is_sync(op);
        DEFINE_WAIT(wait);
        struct request_list *rl;
        struct request *rq;

        rl = blk_get_rl(q, bio);        /* transferred to @rq on success */
retry:
        rq = __get_request(rl, op, bio, gfp_mask);
...

static inline struct request_list *blk_get_rl(struct request_queue *q,
                                              struct bio *bio)
{
        struct blkcg *blkcg;
        struct blkcg_gq *blkg;

        rcu_read_lock();

        blkcg = bio_blkcg(bio);
...

static inline struct blkcg *bio_blkcg(struct bio *bio)
{
        if (bio && bio->bi_css)
                return css_to_blkcg(bio->bi_css);
        return task_blkcg(current);
}

blk_old_get_request() passed NULL for the bio, so we use task_blkcg(current).

static inline struct blkcg *task_blkcg(struct task_struct *tsk)
{
        return css_to_blkcg(task_css(tsk, io_cgrp_id));
}    

static inline struct cgroup_subsys_state *task_css(struct task_struct *task,
                                                   int subsys_id)
{
        return task_css_check(task, subsys_id, false);
}

#define task_css_check(task, subsys_id, __c)                            \
        task_css_set_check((task), (__c))->subsys[(subsys_id)]

#define task_css_set_check(task, __c)                                   \
        rcu_dereference((task)->cgroups)

static inline struct blkcg *css_to_blkcg(struct cgroup_subsys_state *css)
{
        return css ? container_of(css, struct blkcg, css) : NULL;
}

crash> task -R cgroups ffff94fe3b512680
PID: 11739  TASK: ffff94fe3b512680  CPU: 2   COMMAND: "nfsd"
  cgroups = 0xffffffffa5e86360, 

crash> io_cgrp_id
enum cgroup_subsys_id = 3

crash> p ((struct css_set *)0xffffffffa5e86360)->subsys[3]
$7 = (struct cgroup_subsys_state *) 0xffffffffa635f260

crash> blkcg.css -o
struct blkcg {
    [0x0] struct cgroup_subsys_state css;
}

So 0xffffffffa635f260 is also the address of blkcg.

Continuing on in blk_get_rl() we have:
...
        if (blkcg == &blkcg_root)
                goto root_rl;
...
root_rl:
        rcu_read_unlock();
        return &q->root_rl;
...

crash> p &blkcg_root
$8 = (struct blkcg *) 0xffffffffa635f260

crash> px &((struct request_queue *)0xffff94fdf6875160)->root_rl
$9 = (struct request_list *) 0xffff94fdf6875198
crash> request_list.q 0xffff94fdf6875198
  q = 0x0

and when we tried to access the request_queue->elevator we oopsed.

Comment 2 Fedora End Of Life 2017-11-16 18:44:45 UTC

This message is a reminder that Fedora 25 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 25. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '25'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version'
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not
able to fix it before Fedora 25 is end of life. If you would still like
to see this bug fixed and are able to reproduce it against a later version
of Fedora, you are encouraged  change the 'version' to a later Fedora
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.

Comment 3 Fedora End Of Life 2017-12-12 10:22:47 UTC

Fedora 25 changed to end-of-life (EOL) status on 2017-12-12. Fedora 25 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.