545382 – Kernel crash in serial driver. RIP: uart_put_char+0x42

Bug 545382 - Kernel crash in serial driver. RIP: uart_put_char+0x42

Summary: Kernel crash in serial driver. RIP: uart_put_char+0x42

Keywords:
Status:	CLOSED INSUFFICIENT_DATA
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	5.4.z
Hardware:	All
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Mauro Carvalho Chehab
QA Contact:	Red Hat Kernel QE team
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2009-12-08 12:43 UTC by Sachin Prabhu
Modified:	2018-10-27 15:37 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2011-10-17 03:13:32 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Lock then check if cir->buf is NULL (622 bytes, patch) 2009-12-08 20:01 UTC, Mauro Carvalho Chehab	no flags	Details \| Diff
View All

Description Sachin Prabhu 2009-12-08 12:43:23 UTC

User reported a kernel crash with the following oops message

Unable to handle kernel NULL pointer dereference at 0000000000000075 RIP: 
 [<ffffffff801ba0bf>] uart_put_char+0x42/0x64
PGD 29dcad067 PUD 1cccac067 PMD 0 
Oops: 0002 [1] SMP 
last sysfs file: /devices/pci0000:00/0000:00:04.0/0000:0c:00.0/host2/rport-2:0-0/target2:0:0/2:0:0:9/state
CPU 1 
Modules linked in: nfsd exportfs auth_rpcgss gfs(U) lock_dlm gfs2 dlm configfs nfs fscache nfs_acl autofs4 lockd sunrpc bonding ipv6 xfrm_nalgo crypto_api dm_round_robin dm_emc dm_multipath scsi_dh video hwmon backlight sbs i2c_ec i2c_core button battery asus_acpi acpi_memhotplug ac parport_pc lp parport joydev i5000_edac bnx2 ide_cd edac_mc e1000e pcspkr serio_raw cdrom sg dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod usb_storage qla2xxx scsi_transport_fc ata_piix libata shpchp megaraid_sas sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Pid: 27, comm: events/1 Tainted: G      2.6.18-164.6.1.el5 #1
RIP: 0010:[<ffffffff801ba0bf>]  [<ffffffff801ba0bf>] uart_put_char+0x42/0x64
RSP: 0000:ffff81043f559cc0  EFLAGS: 00010002
RAX: 0000000000000000 RBX: ffff81043a32bd48 RCX: 0000000000000000
RDX: 0000000000000075 RSI: 0000000000000286 RDI: ffffffff804ddc40
RBP: ffffffff804ddc40 R08: ffff81043f558000 R09: 000000000000003b
R10: ffff81043fcd4008 R11: 000000002000800b R12: 000000000000020d
R13: 000000000000000a R14: ffff81026715ad4c R15: ffff8102b25a51c0
FS:  0000000000000000(0000) GS:ffff81010eef7840(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000075 CR3: 000000019fc03000 CR4: 00000000000006e0
Process events/1 (pid: 27, threadinfo ffff81043f558000, task ffff81043f550040)
Stack:  ffff8102b25a5000 ffff8102b25a500a 0000000000000246 ffffffff8004e162
 ffff8102b25a5544 000000000000000a ffff8102b25a5000 ffffffff8001496e
 0000000000000141 00000000000000c1 ffff81026715ac92 ffff81026715ad93
Call Trace:
 [<ffffffff8004e162>] opost+0x99/0x1b9
 [<ffffffff8001496e>] n_tty_receive_buf+0x994/0xdc6
 [<ffffffff8008bba7>] dequeue_task+0x18/0x37
 [<ffffffff80062fc8>] thread_return+0x62/0xfe
 [<ffffffff801a21a4>] flush_to_ldisc+0x132/0x192
 [<ffffffff801a2072>] flush_to_ldisc+0x0/0x192
 [<ffffffff8004d7fb>] run_workqueue+0x94/0xe4
 [<ffffffff8004a043>] worker_thread+0x0/0x122
 [<ffffffff8004a133>] worker_thread+0xf0/0x122
 [<ffffffff8008c3bd>] default_wake_function+0x0/0xe
 [<ffffffff8003297c>] kthread+0xfe/0x132
 [<ffffffff8005dfb1>] child_rip+0xa/0x11
 [<ffffffff8003287e>] kthread+0x0/0x132
 [<ffffffff8005dfa7>] child_rip+0x0/0x11


Code: 44 88 24 10 8b 43 08 ff c0 25 ff 0f 00 00 89 43 08 5b 48 89 
RIP  [<ffffffff801ba0bf>] uart_put_char+0x42/0x64
 RSP <ffff81043f559cc0>


The crash seems to have happened in
static inline void
__uart_put_char(struct uart_port *port, struct circ_buf *circ, unsigned char c)
{
..
        if (!circ->buf)
                return;

        spin_lock_irqsave(&port->lock, flags);
        if (uart_circ_chars_free(circ) != 0) {
                circ->buf[circ->head] = c;  <-- here
                circ->head = (circ->head + 1) & (UART_XMIT_SIZE - 1);
        }
        spin_unlock_irqrestore(&port->lock, flags);
..
}

0xffffffff801ba0b9 <uart_put_char+0x3c>:        mov    (%rbx),%rax <-- rbx contains the address to struct circ_buf *circ. The value in circ.buf is copied over to %rax
0xffffffff801ba0bc <uart_put_char+0x3f>:        movslq %edx,%rdx
0xffffffff801ba0bf <uart_put_char+0x42>:        mov    %r12b,(%rax,%rdx,1) <-- Here value in %r12b('\r') is copied to memory address (%rax + %rdx * 1), ie. an offset in the string.

However, looking at the Oops message,
%RBX = 0xffff81043a32bd48

ie.
crash> struct circ_buf 0xffff81043a32bd48
struct circ_buf {
 buf = 0xffff810276ad2000 "",
 head = 0x0,
 tail = 0x0
}

Which should be a circular buffer. So, the value in %RAX should have been 0xffff810276ad2000, but for some reason is NULL. This condition is checked for earlier in __uart_put_char() and at this point, the buffer should not be NULL.

This looks like a race condition where values in this structure are changed while this process was waiting for the spinlock( circ_buf to NULL) and then again soon after(cir_buf to 0xffff810276ad2000).

Comment 1 Sachin Prabhu 2009-12-08 12:45:57 UTC

The 2 processes which are racing are

PID: 27     TASK: ffff81043f550040  CPU: 1   COMMAND: "events/1"
#0 [ffff81043f559a20] crash_kexec at ffffffff800acafe
#1 [ffff81043f559ae0] __die at ffffffff80065127
#2 [ffff81043f559b20] do_page_fault at ffffffff80066da7
#3 [ffff81043f559c10] error_exit at ffffffff8005dde9
   [exception RIP: uart_put_char+0x42]
   RIP: ffffffff801ba0bf  RSP: ffff81043f559cc0  RFLAGS: 00010002
   RAX: 0000000000000000  RBX: ffff81043a32bd48  RCX: 0000000000000000
   RDX: 0000000000000075  RSI: 0000000000000286  RDI: ffffffff804ddc40
   RBP: ffffffff804ddc40   R8: ffff81043f558000   R9: 000000000000003b
   R10: ffff81043fcd4008  R11: 000000002000800b  R12: 000000000000020d
   R13: 000000000000000a  R14: ffff81026715ad4c  R15: ffff8102b25a51c0
   ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0000
#4 [ffff81043f559cd8] opost at ffffffff8004e162
#5 [ffff81043f559cf8] n_tty_receive_buf at ffffffff8001496e
#6 [ffff81043f559de8] flush_to_ldisc at ffffffff801a21a4
#7 [ffff81043f559e38] run_workqueue at ffffffff8004d7fb
#8 [ffff81043f559e78] worker_thread at ffffffff8004a133
#9 [ffff81043f559ee8] kthread at ffffffff8003297c
#10 [ffff81043f559f48] kernel_thread at ffffffff8005dfb1

and

PID: 20886  TASK: ffff810146838040  CPU: 2   COMMAND: "login"
#0 [ffff81010ef6df20] crash_nmi_callback at ffffffff8007a5e6
#1 [ffff81010ef6df40] do_nmi at ffffffff8006585a
#2 [ffff81010ef6df50] nmi at ffffffff80064ebf
   [exception RIP: .text.lock.spinlock+0x11]
   RIP: ffffffff80064bdb  RSP: ffff81020e7e1d00  RFLAGS: 00000086
   RAX: 0000000000000206  RBX: ffffffff804ddc40  RCX: 0000000000000000
   RDX: 00000000000002fd  RSI: 0000000000000005  RDI: ffffffff804ddc40
   RBP: ffffffff804ddc40   R8: 0000000000000000   R9: 000000000006525f
   R10: 0000000000000000  R11: 0000000000000002  R12: 0000000000002710
   R13: 0000000000000000  R14: 00000000fffffff4  R15: ffff8103befeccc0
   ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
--- <exception stack> ---
#3 [ffff81020e7e1d00] .text.lock.spinlock at ffffffff80064bdb (via _spin_lock_irqsave)
#4 [ffff81020e7e1d00] serial8250_startup at ffffffff801be735
#5 [ffff81020e7e1d30] uart_startup at ffffffff8005331f
#6 [ffff81020e7e1d60] uart_open at ffffffff800535b1
#7 [ffff81020e7e1dd0] tty_open at ffffffff801a3a8e
#8 [ffff81020e7e1e20] chrdev_open at ffffffff800495e3
#9 [ffff81020e7e1e60] __dentry_open at ffffffff8001e880
#10 [ffff81020e7e1ea0] do_filp_open at ffffffff80027316
#11 [ffff81020e7e1f50] do_sys_open at ffffffff80019cdb
#12 [ffff81020e7e1f80] tracesys at ffffffff8005d28d (via system_call)
   RIP: 00000033438c58b2  RSP: 00007ffff6dd2680  RFLAGS: 00000246
   RAX: ffffffffffffffda  RBX: ffffffff8005d28d  RCX: ffffffffffffffff
   RDX: 0000000000000000  RSI: 0000000000000802  RDI: 0000000002216230
   RBP: 0000000002216230   R8: 00007ffff6dd26b0   R9: 00007ffff6dd2610
   R10: 0000000000000000  R11: 0000000000000246  R12: 00007ffff6dd2a50
   R13: 0000000002216230  R14: 0000000000000000  R15: 0000000000000802
   ORIG_RAX: 0000000000000002  CS: 0033  SS: 002b

Both processes are working on the same tty devices.

tty_struct is ffff8102b25a5000
state(uart_state) is stored in tty_struct.driver_data = 0xffff81043e532838
info(uart_info) = state.info = uart_state.info = 0xffff81043a32bd40

Comment 4 Mauro Carvalho Chehab 2009-12-08 20:01:23 UTC

Created attachment 376993 [details]
Lock then check if cir->buf is NULL

uart_open is protected by kernel big lock, so the bug seems to be at __uart_put_char().

By looking carefully, the code first checks for the buffer, and then protects
with a spin_lock:

        if (!circ->buf)
                return;

        spin_lock_irqsave(&port->lock, flags);


It is possible that a race condition would occur exactly between the if and
the spin_lock_irqsave().

In this case, 
    circ->buf[circ->head] = c;

Will try to do a NULL pointer dereference as reported.

Could you please try the enclosed patch?

Comment 10 Mauro Carvalho Chehab 2010-03-05 22:56:43 UTC

As the bug got closed on Issue tracker, better to close the corresponding bugzilla.
Since the kernel with a fix were produced, but we got no tests if it fixes, I'm closing it as "Insufficient data". 

Please, if you re-open it, provide us the test results for the patched kernel.

Comment 11 Quentin Barnes 2010-09-15 21:27:46 UTC

We see the above panic about once every three days out of a rack of five machines.  (We also see an additional panic on the same machines.  More on that later.)

What makes these five machines unique is that they have their serial ttyS1 port accidentally wired into a loopback configuration and have a getty running on the port.  That ends up causing the getty to respond to its own banner, exec login, login whines a few times about bad login attempts, exits, and init respawns the getty.  I believe because the tty is being opened and closed with a substantial character backlog, it can trigger this panic.

I've studied the affected code for some time.  I don't think the patch mentioned in Comment #4 will do anything.  The circ->buf pointer is only ever reset to NULL during uart_shutdown() where that lock won't matter.  The problem is a bad race between release_dev() and flush_to_ldisc().  I've worked up a patch to correct the race and am testing it currently.  Once I'm confident of it, I'll upload it.

The "obvious" fix for the five machines is to simply turn off the mingetty on ttyS1, but might as well fix the bug while we're at it.  :-)

Someone else could hit this problem by sheer accident from closing a very backlogged serial tty, but I would guess that if they hit this panic, something's unusual about their serial port configuration such as the above.

About one time in five, we see the another panic, included below.  What triggers this one is if the freed "tty_struct" gets reallocated and initialized to zero.  (Every time I've seen this panic, it's because a fork has grabbed it for an audit_context struct.)  Why __run_timers() panics when dereferencing "fn" ("function") and not when dereferencing "next" took me awhile to figure out.  It's because the buf.work entry on the timer list is the only one and the list_replace_init() call overwrites its "next" and "prev" when moving it from one list to the other.


=======================================================
Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP: 
 [<0000000000000000>] _stext+0x7ffff000/0x1000
PGD 424a7f067 PUD 4254b6067 PMD 0 
Oops: 0010 [1] SMP 
last sysfs file: /devices/pci0000:00/0000:00:02.0/0000:06:00.0/0000:07:00.0/0000:08:00.0/0000:09:00.0/irq
CPU 2 
Modules linked in: ylock(U) xfs dm_multipath scsi_dh video hwmon backlight sbs i2c_ec i2c_core button battery asus_acpi acpi_memhotplug ac crash parport_pc lp parport serio_raw bnx2 sg pcspkr i5000_edac edac_mc dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod shpchp megaraid_sas sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Pid: 0, comm: swapper Tainted: G      2.6.18-164.2.1.el5 #1
RIP: 0010:[<0000000000000000>]  [<0000000000000000>] _stext+0x7ffff000/0x1000
RSP: 0018:ffff81010ebebf28  EFLAGS: 00010202
RAX: ffff81010ebe7fd8 RBX: ffff81010ebdc000 RCX: ffff8103cd1ab170
RDX: ffff81010ebebf30 RSI: 11d160c19795886f RDI: 0000000000000000
RBP: 0000000000000100 R08: 000000000410c19e R09: 0000000000000080
R10: 0000000000004000 R11: 0000000000000080 R12: 0000000000000000
R13: 000000000000000a R14: 0000000000000002 R15: ffffffff803eb2a0
FS:  0000000000000000(0000) GS:ffff81010eb9ce40(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 0000000424fd3000 CR4: 00000000000006e0
Process swapper (pid: 0, threadinfo ffff81010ebe6000, task ffff81010eb9d080)
Stack:  ffffffff80096e0a ffff81010ebebf30 ffff81010ebebf30 0000000000000046
 0000000000000011 ffffffff803d8f90 ffffffff8001235a 0000000000000046
 ffff81010ebebf98 00000000000000ff ffffffff803c9180 0000000000000200
Call Trace:
 <IRQ>  [<ffffffff80096e0a>] run_timer_softirq+0x133/0x1af
 [<ffffffff8001235a>] __do_softirq+0x89/0x133
 [<ffffffff8005e2fc>] call_softirq+0x1c/0x28
 [<ffffffff8006cb20>] do_softirq+0x2c/0x85
 [<ffffffff800571c3>] mwait_idle+0x0/0x4a
 [<ffffffff8005dc8e>] apic_timer_interrupt+0x66/0x6c
 <EOI>  [<ffffffff800571f9>] mwait_idle+0x36/0x4a
 [<ffffffff800493a3>] cpu_idle+0x95/0xb8
 [<ffffffff80076f7f>] start_secondary+0x498/0x4a7


Code:  Bad RIP value.
RIP  [<0000000000000000>] _stext+0x7ffff000/0x1000
 RSP <ffff81010ebebf28>
=======================================================

Comment 12 Sachin Prabhu 2010-09-16 08:32:49 UTC

Reopening this issue since we have new data from Quentin Barnes.

Sachin Prabhu

Comment 13 Quentin Barnes 2010-10-08 21:20:15 UTC

It's been about three weeks since I commented.  I thought I was
overdue for an update.

These panics have unfolded into a messier problem than first
appeared.  It took me much longer to debug than I expected because
I thought the two panics ("uart_put_char NULL pointer deref" and
"run_timer_softirq jump to address 0x0") had the same root cause.  I
was pouring over the code looking for a connection.  There wasn't
one.

The run_timer_softirq() panic was due to the the tty->buf.work
handler (flush_to_ldisc) left enabled after the tty struct had been
freed and reallocated.  (What I described in Comment #11.)  This was
a race between flush_to_ldisc() and release_dev().  A simple fix to
release_dev() is all that is needed.  Since I made that fix, the
machine testing the patch hasn't paniced in that way.

The original problem of this BZ ticket exploits a different race.
The problem here is that during ldisc processing of incoming
characters, they can trigger character transmission.  That happens
when one or more of the local echo modes are enabled (one or more
of the ECHO*() macros are true).  You can see this is the cause of
the panic because of the backtrace in Comment #0.  You see that the
PC was in n_tty_receive_buf() doing receive processing, but then
the next function in the stack that leads to the panic is opost()
for character output.

Why the uart_put_char() panic happens is because the transmit buffer
(circ->buf) was freed and the pointer zeroed by uart_shutdown()
[called from tty->driver->close() in release_dev()], but the
flush_to_ldisc() handler hadn't been shut down yet.  If the
handler is invoked after uart_shutdown() has been called, but
before cancel_delayed_work(&tty->buf.work), you see this panic.
This problem is not trivial to fix.  You simply can't just call
cancel_delayed_work() earlier in release_dev() due to other ordering
requirements.  To fix, I pulled back patches from kernel.org for a
basis then fixed up uart_close().  After the UART receiver is disabled,
I have uart_close() drain the ldisc work so if flush_to_ldisc() gets
called in the mean time, it finds nothing to do.  With the UART
receiver disabled, we know that after draining no new work can get
added to the ldisc queue before cancel_delayed_work() can get called
later.

For now this information is still preliminary.  I'm still testing
the patches.  In a few weeks after machines running the patched
kernel with both fixes haven't produced either panic and haven't
shown any new problems, I'll post the patches.  If for some reason
you need the patches earlier, such as a customer hitting the problem
again, just let me know and I'll pass them along.


Although these are rather large race windows, I don't think they've
been exercised often.  That's because to hit them, you have
to get your serial port flooded with incoming characters then
close the port with them still buffered and while using local
echo mode with line discipline.  Most of the time, that's not
going to happen during the normal course of talking to an RS-232
peripheral.  Peripherals usually don't use local echo mode and
typical handshaking or higher level protocol will not see a incoming
flood of characters while attempting to close the tty.

Comment 14 Mauro Carvalho Chehab 2010-11-30 10:54:52 UTC

Hi Quentin,

(In reply to comment #13)

> For now this information is still preliminary.  I'm still testing
> the patches.  In a few weeks after machines running the patched
> kernel with both fixes haven't produced either panic and haven't
> shown any new problems, I'll post the patches.  If for some reason
> you need the patches earlier, such as a customer hitting the problem
> again, just let me know and I'll pass them along.

Did the approach you took worked?

Comment 15 Quentin Barnes 2010-11-30 22:28:20 UTC

No, not yet.  My patch to date has fixed the panic in Comment #11, but so far to my frustration I've not been able to resolve the original uart_put_char() panic.  I've fixed several races involved with line discipline character processing in uart_close(), uart_hangup(), release_dev(), and flush_to_ldisc(), but that's not been good enough.

I'm still actively testing fixes for the panic.  I have two dumps from crashes in the last two days that I still need to analyze.  It's the same panic, but I have additional debug in those dumps that hopefully will point me more towards the source of the race I haven't isolated yet.

If you're interested, I can write you a longer response with my findings and fixes so far.  The biggest update to Comment #13 was I found the same race in uart_hangup() [the getty/login cycle exercises the hangup code], and I backported the changesets from 0fb5a532, 42fd552e, and 45242006 for flush_to_ldisc().

Has the panic happened again, or was this just a query since I've been quiet for awhile?

Comment 17 RHEL Program Management 2010-12-07 10:08:50 UTC

This request was evaluated by Red Hat Product Management for inclusion in Red Hat Enterprise Linux 5.6 and Red Hat does not plan to fix this issue the currently developed update.

Contact your manager or support representative in case you need to escalate this bug.

Comment 18 RHEL Program Management 2011-06-20 21:32:25 UTC

This request was evaluated by Red Hat Product Management for inclusion in Red Hat Enterprise Linux 5.7 and Red Hat does not plan to fix this issue the currently developed update.

Contact your manager or support representative in case you need to escalate this bug.

Note You need to log in before you can comment on or make changes to this bug.