Bug 846023 - BUG: unable to handle kernel NULL pointer dereference at 0000000000000048 cifs_show_options
BUG: unable to handle kernel NULL pointer dereference at 0000000000000048 cif...
Status: CLOSED NOTABUG
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel (Show other bugs)
6.0
x86_64 Linux
unspecified Severity unspecified
: rc
: ---
Assigned To: Sachin Prabhu
Red Hat Kernel QE team
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-08-06 10:42 EDT by Igor
Modified: 2012-09-12 11:53 EDT (History)
6 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-09-12 11:53:48 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Serial console log (8.38 KB, text/plain)
2012-08-18 12:46 EDT, Vladimir Simonov
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Linux Kernel 16306 None None None 2012-08-06 10:55:34 EDT

  None (edit)
Description Igor 2012-08-06 10:42:17 EDT
Description of problem:
Reproduced bug https://bugzilla.kernel.org/show_bug.cgi?id=16306
The issue has been caught during run of our product but i have reproduced the bug w/o it.

Version-Release number of selected component (if applicable):
2.6.32-71.el6.x86_64

How reproducible:
Very often

Steps to Reproduce:

I have run parallel the three endless cycles:
1) while [ "1" ]; do cat /proc/1/mountinfo; done
2) while [ "1" ]; do mount -t cifs //host/share /mnt/cifs/ -o username=user,password=password; umount /mnt/cifs; done
3) while [ "1" ]; do ifconfig eth0 down; ifconfig eth0 up; done

  
Actual results:

After some time kernel crashes with folowing callstack.

BUG: unable to handle kernel NULL pointer dereference at 0000000000000048
IP: [<ffffffffa04842c9>] cifs_show_options+0xf9/0x480 [cifs]

....

Kernel panic - not syncing: Fatal exception
Pid: 3387, comm: mms Tainted: P D ---------------- 2.6.32-71.el6.x86_64 #1
Call Trace:
[<ffffffff814c7b23>] panic+0x78/0x137
[<ffffffff814cbbf4>] oops_end+0xe4/0x100
[<ffffffff8104651b>] no_context+0xfb/0x260
[<ffffffff8113eb95>] ? page_add_new_anon_rmap+0xb5/0xd0
[<ffffffff810467a5>] __bad_area_nosemaphore+0x125/0x1e0
[<ffffffff810468ce>] bad_area+0x4e/0x60
[<ffffffff814cd740>] do_page_fault+0x390/0x3a0
[<ffffffff814caf45>] page_fault+0x25/0x30
[<ffffffffa04842c9>] ? cifs_show_options+0xf9/0x480 [cifs]
[<ffffffffa04842c9>] ? cifs_show_options+0xf9/0x480 [cifs]
[<ffffffff8118aa42>] show_vfsmnt+0x112/0x150
[<ffffffff8118e895>] seq_read+0xe5/0x3f0
[<ffffffff8116d085>] vfs_read+0xb5/0x1a0
[<ffffffff810d40a2>] ? audit_syscall_entry+0x272/0x2a0
[<ffffffff8116d1c1>] sys_read+0x51/0x90
[<ffffffff81013172>] system_call_fastpath+0x16/0x1b


Expected results:
Any kind of userspace behavior but no kernel panic
Comment 2 Andrew Cathrow 2012-08-06 12:41:38 EDT
2.6.32-71.el6.x86_64 is an old kernel, from the original 6.0 release back in 2010  have you tried in the latest kernel.

Specifically kernel-2.6.32-279.2.1?
Comment 4 Igor 2012-08-08 10:35:26 EDT
Ok, we will try latest distribution but we'd like to know if such bug was ever investigated and fixed. If so this ticket is duplicate of another ticket that we need to know.
Comment 5 Jeff Layton 2012-08-08 11:20:44 EDT
Aside from the upstream report (which we never were able to reproduce), I'm not aware of any reports of this in RHEL.
Comment 6 Sachin Prabhu 2012-08-08 19:39:36 EDT
I ran a quick disassembly based on the Oops message given in the summary.

The crash happens within cifs_show_address()


static int
cifs_show_options(struct seq_file *s, struct vfsmount *m)
{
..

        cifs_show_address(s, tcon->ses->server);
..
}


static void
cifs_show_address(struct seq_file *s, struct TCP_Server_Info *server)
{
        seq_printf(s, ",addr=");

        switch (server->addr.sockAddr.sin_family) { <-- HERE
..


crash> dis cifs_show_options
..
0xffffffffa02c82c9 <cifs_show_options+0xf9>:    movzwl 0x48(%r15),%eax
..

At this point, register %r15 is expected to hold the pointer to cifsSesInfo->TCP_Server_Info and the assembly is attempting to access server->addr when it crashes due to a NULL value set for server.
Comment 7 Jeff Layton 2012-08-09 09:51:52 EDT
Yep, looks like the same bug as was reported in the upstream report. It would be very helpful to know if that's still reproducible.
Comment 8 Jeff Layton 2012-08-09 09:52:29 EDT
If you can reproduce it on a more recent kernel, it would be extra helpful if you can get a vmcore as well...
Comment 9 Igor 2012-08-09 10:19:39 EDT
We already gave the task to our QA team. Now  we are waiting for results...
Comment 10 Vladimir Simonov 2012-08-18 12:45:18 EDT
QA is still processing our request.
Probably you it is not too difficult to do the same in your lab.
Initially I could not to reproduce the problem, but when I manually mounted
the same share to the same mountpoint 40 times and started Igor's test I got the
crash in 10 minutes. 1 CPU was used. IMO the more cifs-related entries exist
in /proc/mounts the larger probability of the race on tcon->ses->server access.

2.6.32-71.el6.x86_64 was used, I'll try kernel-2.6.32-279.2.1 next week.

As I understand, we see race on tcon->ses->server access.
Have you analized tcon->ses->server protection schema in 2.6.32-71.el6.x86_64?
Was it changed in 2.6.32-279.2.1? Why you expect that 2.6.32-279.2.1 will
work better?

Attaching serial log before the crash.

Regards
Vladimir
Comment 11 Vladimir Simonov 2012-08-18 12:46:24 EDT
Created attachment 605353 [details]
Serial console log
Comment 12 Vladimir Simonov 2012-09-11 06:06:12 EDT
Ok. Our QA confirms that the problem is fixed(not observed)
in 2.6.32-279 kernel.
Thank you all.

Regards
Vladimir
Comment 13 Sachin Prabhu 2012-09-12 11:53:48 EDT
Closing this bz. Please re-open if the problem is seen on newer versions of the kernel.

Sachin Prabhu

Note You need to log in before you can comment on or make changes to this bug.