Bug 846023 - BUG: unable to handle kernel NULL pointer dereference at 0000000000000048 cifs_show_options
Summary: BUG: unable to handle kernel NULL pointer dereference at 0000000000000048 cif...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel
Version: 6.0
Hardware: x86_64
OS: Linux
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Sachin Prabhu
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-08-06 14:42 UTC by Igor
Modified: 2012-09-12 15:53 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-09-12 15:53:48 UTC
Target Upstream Version:


Attachments (Terms of Use)
Serial console log (8.38 KB, text/plain)
2012-08-18 16:46 UTC, Vladimir Simonov
no flags Details


Links
System ID Priority Status Summary Last Updated
Linux Kernel 16306 None None None 2019-03-25 08:24:57 UTC

Description Igor 2012-08-06 14:42:17 UTC
Description of problem:
Reproduced bug https://bugzilla.kernel.org/show_bug.cgi?id=16306
The issue has been caught during run of our product but i have reproduced the bug w/o it.

Version-Release number of selected component (if applicable):
2.6.32-71.el6.x86_64

How reproducible:
Very often

Steps to Reproduce:

I have run parallel the three endless cycles:
1) while [ "1" ]; do cat /proc/1/mountinfo; done
2) while [ "1" ]; do mount -t cifs //host/share /mnt/cifs/ -o username=user,password=password; umount /mnt/cifs; done
3) while [ "1" ]; do ifconfig eth0 down; ifconfig eth0 up; done

  
Actual results:

After some time kernel crashes with folowing callstack.

BUG: unable to handle kernel NULL pointer dereference at 0000000000000048
IP: [<ffffffffa04842c9>] cifs_show_options+0xf9/0x480 [cifs]

....

Kernel panic - not syncing: Fatal exception
Pid: 3387, comm: mms Tainted: P D ---------------- 2.6.32-71.el6.x86_64 #1
Call Trace:
[<ffffffff814c7b23>] panic+0x78/0x137
[<ffffffff814cbbf4>] oops_end+0xe4/0x100
[<ffffffff8104651b>] no_context+0xfb/0x260
[<ffffffff8113eb95>] ? page_add_new_anon_rmap+0xb5/0xd0
[<ffffffff810467a5>] __bad_area_nosemaphore+0x125/0x1e0
[<ffffffff810468ce>] bad_area+0x4e/0x60
[<ffffffff814cd740>] do_page_fault+0x390/0x3a0
[<ffffffff814caf45>] page_fault+0x25/0x30
[<ffffffffa04842c9>] ? cifs_show_options+0xf9/0x480 [cifs]
[<ffffffffa04842c9>] ? cifs_show_options+0xf9/0x480 [cifs]
[<ffffffff8118aa42>] show_vfsmnt+0x112/0x150
[<ffffffff8118e895>] seq_read+0xe5/0x3f0
[<ffffffff8116d085>] vfs_read+0xb5/0x1a0
[<ffffffff810d40a2>] ? audit_syscall_entry+0x272/0x2a0
[<ffffffff8116d1c1>] sys_read+0x51/0x90
[<ffffffff81013172>] system_call_fastpath+0x16/0x1b


Expected results:
Any kind of userspace behavior but no kernel panic

Comment 2 Andrew Cathrow 2012-08-06 16:41:38 UTC
2.6.32-71.el6.x86_64 is an old kernel, from the original 6.0 release back in 2010  have you tried in the latest kernel.

Specifically kernel-2.6.32-279.2.1?

Comment 4 Igor 2012-08-08 14:35:26 UTC
Ok, we will try latest distribution but we'd like to know if such bug was ever investigated and fixed. If so this ticket is duplicate of another ticket that we need to know.

Comment 5 Jeff Layton 2012-08-08 15:20:44 UTC
Aside from the upstream report (which we never were able to reproduce), I'm not aware of any reports of this in RHEL.

Comment 6 Sachin Prabhu 2012-08-08 23:39:36 UTC
I ran a quick disassembly based on the Oops message given in the summary.

The crash happens within cifs_show_address()


static int
cifs_show_options(struct seq_file *s, struct vfsmount *m)
{
..

        cifs_show_address(s, tcon->ses->server);
..
}


static void
cifs_show_address(struct seq_file *s, struct TCP_Server_Info *server)
{
        seq_printf(s, ",addr=");

        switch (server->addr.sockAddr.sin_family) { <-- HERE
..


crash> dis cifs_show_options
..
0xffffffffa02c82c9 <cifs_show_options+0xf9>:    movzwl 0x48(%r15),%eax
..

At this point, register %r15 is expected to hold the pointer to cifsSesInfo->TCP_Server_Info and the assembly is attempting to access server->addr when it crashes due to a NULL value set for server.

Comment 7 Jeff Layton 2012-08-09 13:51:52 UTC
Yep, looks like the same bug as was reported in the upstream report. It would be very helpful to know if that's still reproducible.

Comment 8 Jeff Layton 2012-08-09 13:52:29 UTC
If you can reproduce it on a more recent kernel, it would be extra helpful if you can get a vmcore as well...

Comment 9 Igor 2012-08-09 14:19:39 UTC
We already gave the task to our QA team. Now  we are waiting for results...

Comment 10 Vladimir Simonov 2012-08-18 16:45:18 UTC
QA is still processing our request.
Probably you it is not too difficult to do the same in your lab.
Initially I could not to reproduce the problem, but when I manually mounted
the same share to the same mountpoint 40 times and started Igor's test I got the
crash in 10 minutes. 1 CPU was used. IMO the more cifs-related entries exist
in /proc/mounts the larger probability of the race on tcon->ses->server access.

2.6.32-71.el6.x86_64 was used, I'll try kernel-2.6.32-279.2.1 next week.

As I understand, we see race on tcon->ses->server access.
Have you analized tcon->ses->server protection schema in 2.6.32-71.el6.x86_64?
Was it changed in 2.6.32-279.2.1? Why you expect that 2.6.32-279.2.1 will
work better?

Attaching serial log before the crash.

Regards
Vladimir

Comment 11 Vladimir Simonov 2012-08-18 16:46:24 UTC
Created attachment 605353 [details]
Serial console log

Comment 12 Vladimir Simonov 2012-09-11 10:06:12 UTC
Ok. Our QA confirms that the problem is fixed(not observed)
in 2.6.32-279 kernel.
Thank you all.

Regards
Vladimir

Comment 13 Sachin Prabhu 2012-09-12 15:53:48 UTC
Closing this bz. Please re-open if the problem is seen on newer versions of the kernel.

Sachin Prabhu


Note You need to log in before you can comment on or make changes to this bug.