User reported a kernel crash with the following oops message Unable to handle kernel NULL pointer dereference at 0000000000000075 RIP: [<ffffffff801ba0bf>] uart_put_char+0x42/0x64 PGD 29dcad067 PUD 1cccac067 PMD 0 Oops: 0002 [1] SMP last sysfs file: /devices/pci0000:00/0000:00:04.0/0000:0c:00.0/host2/rport-2:0-0/target2:0:0/2:0:0:9/state CPU 1 Modules linked in: nfsd exportfs auth_rpcgss gfs(U) lock_dlm gfs2 dlm configfs nfs fscache nfs_acl autofs4 lockd sunrpc bonding ipv6 xfrm_nalgo crypto_api dm_round_robin dm_emc dm_multipath scsi_dh video hwmon backlight sbs i2c_ec i2c_core button battery asus_acpi acpi_memhotplug ac parport_pc lp parport joydev i5000_edac bnx2 ide_cd edac_mc e1000e pcspkr serio_raw cdrom sg dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod usb_storage qla2xxx scsi_transport_fc ata_piix libata shpchp megaraid_sas sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd Pid: 27, comm: events/1 Tainted: G 2.6.18-164.6.1.el5 #1 RIP: 0010:[<ffffffff801ba0bf>] [<ffffffff801ba0bf>] uart_put_char+0x42/0x64 RSP: 0000:ffff81043f559cc0 EFLAGS: 00010002 RAX: 0000000000000000 RBX: ffff81043a32bd48 RCX: 0000000000000000 RDX: 0000000000000075 RSI: 0000000000000286 RDI: ffffffff804ddc40 RBP: ffffffff804ddc40 R08: ffff81043f558000 R09: 000000000000003b R10: ffff81043fcd4008 R11: 000000002000800b R12: 000000000000020d R13: 000000000000000a R14: ffff81026715ad4c R15: ffff8102b25a51c0 FS: 0000000000000000(0000) GS:ffff81010eef7840(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 0000000000000075 CR3: 000000019fc03000 CR4: 00000000000006e0 Process events/1 (pid: 27, threadinfo ffff81043f558000, task ffff81043f550040) Stack: ffff8102b25a5000 ffff8102b25a500a 0000000000000246 ffffffff8004e162 ffff8102b25a5544 000000000000000a ffff8102b25a5000 ffffffff8001496e 0000000000000141 00000000000000c1 ffff81026715ac92 ffff81026715ad93 Call Trace: [<ffffffff8004e162>] opost+0x99/0x1b9 [<ffffffff8001496e>] n_tty_receive_buf+0x994/0xdc6 [<ffffffff8008bba7>] dequeue_task+0x18/0x37 [<ffffffff80062fc8>] thread_return+0x62/0xfe [<ffffffff801a21a4>] flush_to_ldisc+0x132/0x192 [<ffffffff801a2072>] flush_to_ldisc+0x0/0x192 [<ffffffff8004d7fb>] run_workqueue+0x94/0xe4 [<ffffffff8004a043>] worker_thread+0x0/0x122 [<ffffffff8004a133>] worker_thread+0xf0/0x122 [<ffffffff8008c3bd>] default_wake_function+0x0/0xe [<ffffffff8003297c>] kthread+0xfe/0x132 [<ffffffff8005dfb1>] child_rip+0xa/0x11 [<ffffffff8003287e>] kthread+0x0/0x132 [<ffffffff8005dfa7>] child_rip+0x0/0x11 Code: 44 88 24 10 8b 43 08 ff c0 25 ff 0f 00 00 89 43 08 5b 48 89 RIP [<ffffffff801ba0bf>] uart_put_char+0x42/0x64 RSP <ffff81043f559cc0> The crash seems to have happened in static inline void __uart_put_char(struct uart_port *port, struct circ_buf *circ, unsigned char c) { .. if (!circ->buf) return; spin_lock_irqsave(&port->lock, flags); if (uart_circ_chars_free(circ) != 0) { circ->buf[circ->head] = c; <-- here circ->head = (circ->head + 1) & (UART_XMIT_SIZE - 1); } spin_unlock_irqrestore(&port->lock, flags); .. } 0xffffffff801ba0b9 <uart_put_char+0x3c>: mov (%rbx),%rax <-- rbx contains the address to struct circ_buf *circ. The value in circ.buf is copied over to %rax 0xffffffff801ba0bc <uart_put_char+0x3f>: movslq %edx,%rdx 0xffffffff801ba0bf <uart_put_char+0x42>: mov %r12b,(%rax,%rdx,1) <-- Here value in %r12b('\r') is copied to memory address (%rax + %rdx * 1), ie. an offset in the string. However, looking at the Oops message, %RBX = 0xffff81043a32bd48 ie. crash> struct circ_buf 0xffff81043a32bd48 struct circ_buf { buf = 0xffff810276ad2000 "", head = 0x0, tail = 0x0 } Which should be a circular buffer. So, the value in %RAX should have been 0xffff810276ad2000, but for some reason is NULL. This condition is checked for earlier in __uart_put_char() and at this point, the buffer should not be NULL. This looks like a race condition where values in this structure are changed while this process was waiting for the spinlock( circ_buf to NULL) and then again soon after(cir_buf to 0xffff810276ad2000).
The 2 processes which are racing are PID: 27 TASK: ffff81043f550040 CPU: 1 COMMAND: "events/1" #0 [ffff81043f559a20] crash_kexec at ffffffff800acafe #1 [ffff81043f559ae0] __die at ffffffff80065127 #2 [ffff81043f559b20] do_page_fault at ffffffff80066da7 #3 [ffff81043f559c10] error_exit at ffffffff8005dde9 [exception RIP: uart_put_char+0x42] RIP: ffffffff801ba0bf RSP: ffff81043f559cc0 RFLAGS: 00010002 RAX: 0000000000000000 RBX: ffff81043a32bd48 RCX: 0000000000000000 RDX: 0000000000000075 RSI: 0000000000000286 RDI: ffffffff804ddc40 RBP: ffffffff804ddc40 R8: ffff81043f558000 R9: 000000000000003b R10: ffff81043fcd4008 R11: 000000002000800b R12: 000000000000020d R13: 000000000000000a R14: ffff81026715ad4c R15: ffff8102b25a51c0 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0000 #4 [ffff81043f559cd8] opost at ffffffff8004e162 #5 [ffff81043f559cf8] n_tty_receive_buf at ffffffff8001496e #6 [ffff81043f559de8] flush_to_ldisc at ffffffff801a21a4 #7 [ffff81043f559e38] run_workqueue at ffffffff8004d7fb #8 [ffff81043f559e78] worker_thread at ffffffff8004a133 #9 [ffff81043f559ee8] kthread at ffffffff8003297c #10 [ffff81043f559f48] kernel_thread at ffffffff8005dfb1 and PID: 20886 TASK: ffff810146838040 CPU: 2 COMMAND: "login" #0 [ffff81010ef6df20] crash_nmi_callback at ffffffff8007a5e6 #1 [ffff81010ef6df40] do_nmi at ffffffff8006585a #2 [ffff81010ef6df50] nmi at ffffffff80064ebf [exception RIP: .text.lock.spinlock+0x11] RIP: ffffffff80064bdb RSP: ffff81020e7e1d00 RFLAGS: 00000086 RAX: 0000000000000206 RBX: ffffffff804ddc40 RCX: 0000000000000000 RDX: 00000000000002fd RSI: 0000000000000005 RDI: ffffffff804ddc40 RBP: ffffffff804ddc40 R8: 0000000000000000 R9: 000000000006525f R10: 0000000000000000 R11: 0000000000000002 R12: 0000000000002710 R13: 0000000000000000 R14: 00000000fffffff4 R15: ffff8103befeccc0 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <exception stack> --- #3 [ffff81020e7e1d00] .text.lock.spinlock at ffffffff80064bdb (via _spin_lock_irqsave) #4 [ffff81020e7e1d00] serial8250_startup at ffffffff801be735 #5 [ffff81020e7e1d30] uart_startup at ffffffff8005331f #6 [ffff81020e7e1d60] uart_open at ffffffff800535b1 #7 [ffff81020e7e1dd0] tty_open at ffffffff801a3a8e #8 [ffff81020e7e1e20] chrdev_open at ffffffff800495e3 #9 [ffff81020e7e1e60] __dentry_open at ffffffff8001e880 #10 [ffff81020e7e1ea0] do_filp_open at ffffffff80027316 #11 [ffff81020e7e1f50] do_sys_open at ffffffff80019cdb #12 [ffff81020e7e1f80] tracesys at ffffffff8005d28d (via system_call) RIP: 00000033438c58b2 RSP: 00007ffff6dd2680 RFLAGS: 00000246 RAX: ffffffffffffffda RBX: ffffffff8005d28d RCX: ffffffffffffffff RDX: 0000000000000000 RSI: 0000000000000802 RDI: 0000000002216230 RBP: 0000000002216230 R8: 00007ffff6dd26b0 R9: 00007ffff6dd2610 R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffff6dd2a50 R13: 0000000002216230 R14: 0000000000000000 R15: 0000000000000802 ORIG_RAX: 0000000000000002 CS: 0033 SS: 002b Both processes are working on the same tty devices. tty_struct is ffff8102b25a5000 state(uart_state) is stored in tty_struct.driver_data = 0xffff81043e532838 info(uart_info) = state.info = uart_state.info = 0xffff81043a32bd40
Created attachment 376993 [details] Lock then check if cir->buf is NULL uart_open is protected by kernel big lock, so the bug seems to be at __uart_put_char(). By looking carefully, the code first checks for the buffer, and then protects with a spin_lock: if (!circ->buf) return; spin_lock_irqsave(&port->lock, flags); It is possible that a race condition would occur exactly between the if and the spin_lock_irqsave(). In this case, circ->buf[circ->head] = c; Will try to do a NULL pointer dereference as reported. Could you please try the enclosed patch?
As the bug got closed on Issue tracker, better to close the corresponding bugzilla. Since the kernel with a fix were produced, but we got no tests if it fixes, I'm closing it as "Insufficient data". Please, if you re-open it, provide us the test results for the patched kernel.
We see the above panic about once every three days out of a rack of five machines. (We also see an additional panic on the same machines. More on that later.) What makes these five machines unique is that they have their serial ttyS1 port accidentally wired into a loopback configuration and have a getty running on the port. That ends up causing the getty to respond to its own banner, exec login, login whines a few times about bad login attempts, exits, and init respawns the getty. I believe because the tty is being opened and closed with a substantial character backlog, it can trigger this panic. I've studied the affected code for some time. I don't think the patch mentioned in Comment #4 will do anything. The circ->buf pointer is only ever reset to NULL during uart_shutdown() where that lock won't matter. The problem is a bad race between release_dev() and flush_to_ldisc(). I've worked up a patch to correct the race and am testing it currently. Once I'm confident of it, I'll upload it. The "obvious" fix for the five machines is to simply turn off the mingetty on ttyS1, but might as well fix the bug while we're at it. :-) Someone else could hit this problem by sheer accident from closing a very backlogged serial tty, but I would guess that if they hit this panic, something's unusual about their serial port configuration such as the above. About one time in five, we see the another panic, included below. What triggers this one is if the freed "tty_struct" gets reallocated and initialized to zero. (Every time I've seen this panic, it's because a fork has grabbed it for an audit_context struct.) Why __run_timers() panics when dereferencing "fn" ("function") and not when dereferencing "next" took me awhile to figure out. It's because the buf.work entry on the timer list is the only one and the list_replace_init() call overwrites its "next" and "prev" when moving it from one list to the other. ======================================================= Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP: [<0000000000000000>] _stext+0x7ffff000/0x1000 PGD 424a7f067 PUD 4254b6067 PMD 0 Oops: 0010 [1] SMP last sysfs file: /devices/pci0000:00/0000:00:02.0/0000:06:00.0/0000:07:00.0/0000:08:00.0/0000:09:00.0/irq CPU 2 Modules linked in: ylock(U) xfs dm_multipath scsi_dh video hwmon backlight sbs i2c_ec i2c_core button battery asus_acpi acpi_memhotplug ac crash parport_pc lp parport serio_raw bnx2 sg pcspkr i5000_edac edac_mc dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod shpchp megaraid_sas sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd Pid: 0, comm: swapper Tainted: G 2.6.18-164.2.1.el5 #1 RIP: 0010:[<0000000000000000>] [<0000000000000000>] _stext+0x7ffff000/0x1000 RSP: 0018:ffff81010ebebf28 EFLAGS: 00010202 RAX: ffff81010ebe7fd8 RBX: ffff81010ebdc000 RCX: ffff8103cd1ab170 RDX: ffff81010ebebf30 RSI: 11d160c19795886f RDI: 0000000000000000 RBP: 0000000000000100 R08: 000000000410c19e R09: 0000000000000080 R10: 0000000000004000 R11: 0000000000000080 R12: 0000000000000000 R13: 000000000000000a R14: 0000000000000002 R15: ffffffff803eb2a0 FS: 0000000000000000(0000) GS:ffff81010eb9ce40(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 0000000000000000 CR3: 0000000424fd3000 CR4: 00000000000006e0 Process swapper (pid: 0, threadinfo ffff81010ebe6000, task ffff81010eb9d080) Stack: ffffffff80096e0a ffff81010ebebf30 ffff81010ebebf30 0000000000000046 0000000000000011 ffffffff803d8f90 ffffffff8001235a 0000000000000046 ffff81010ebebf98 00000000000000ff ffffffff803c9180 0000000000000200 Call Trace: <IRQ> [<ffffffff80096e0a>] run_timer_softirq+0x133/0x1af [<ffffffff8001235a>] __do_softirq+0x89/0x133 [<ffffffff8005e2fc>] call_softirq+0x1c/0x28 [<ffffffff8006cb20>] do_softirq+0x2c/0x85 [<ffffffff800571c3>] mwait_idle+0x0/0x4a [<ffffffff8005dc8e>] apic_timer_interrupt+0x66/0x6c <EOI> [<ffffffff800571f9>] mwait_idle+0x36/0x4a [<ffffffff800493a3>] cpu_idle+0x95/0xb8 [<ffffffff80076f7f>] start_secondary+0x498/0x4a7 Code: Bad RIP value. RIP [<0000000000000000>] _stext+0x7ffff000/0x1000 RSP <ffff81010ebebf28> =======================================================
Reopening this issue since we have new data from Quentin Barnes. Sachin Prabhu
It's been about three weeks since I commented. I thought I was overdue for an update. These panics have unfolded into a messier problem than first appeared. It took me much longer to debug than I expected because I thought the two panics ("uart_put_char NULL pointer deref" and "run_timer_softirq jump to address 0x0") had the same root cause. I was pouring over the code looking for a connection. There wasn't one. The run_timer_softirq() panic was due to the the tty->buf.work handler (flush_to_ldisc) left enabled after the tty struct had been freed and reallocated. (What I described in Comment #11.) This was a race between flush_to_ldisc() and release_dev(). A simple fix to release_dev() is all that is needed. Since I made that fix, the machine testing the patch hasn't paniced in that way. The original problem of this BZ ticket exploits a different race. The problem here is that during ldisc processing of incoming characters, they can trigger character transmission. That happens when one or more of the local echo modes are enabled (one or more of the ECHO*() macros are true). You can see this is the cause of the panic because of the backtrace in Comment #0. You see that the PC was in n_tty_receive_buf() doing receive processing, but then the next function in the stack that leads to the panic is opost() for character output. Why the uart_put_char() panic happens is because the transmit buffer (circ->buf) was freed and the pointer zeroed by uart_shutdown() [called from tty->driver->close() in release_dev()], but the flush_to_ldisc() handler hadn't been shut down yet. If the handler is invoked after uart_shutdown() has been called, but before cancel_delayed_work(&tty->buf.work), you see this panic. This problem is not trivial to fix. You simply can't just call cancel_delayed_work() earlier in release_dev() due to other ordering requirements. To fix, I pulled back patches from kernel.org for a basis then fixed up uart_close(). After the UART receiver is disabled, I have uart_close() drain the ldisc work so if flush_to_ldisc() gets called in the mean time, it finds nothing to do. With the UART receiver disabled, we know that after draining no new work can get added to the ldisc queue before cancel_delayed_work() can get called later. For now this information is still preliminary. I'm still testing the patches. In a few weeks after machines running the patched kernel with both fixes haven't produced either panic and haven't shown any new problems, I'll post the patches. If for some reason you need the patches earlier, such as a customer hitting the problem again, just let me know and I'll pass them along. Although these are rather large race windows, I don't think they've been exercised often. That's because to hit them, you have to get your serial port flooded with incoming characters then close the port with them still buffered and while using local echo mode with line discipline. Most of the time, that's not going to happen during the normal course of talking to an RS-232 peripheral. Peripherals usually don't use local echo mode and typical handshaking or higher level protocol will not see a incoming flood of characters while attempting to close the tty.
Hi Quentin, (In reply to comment #13) > For now this information is still preliminary. I'm still testing > the patches. In a few weeks after machines running the patched > kernel with both fixes haven't produced either panic and haven't > shown any new problems, I'll post the patches. If for some reason > you need the patches earlier, such as a customer hitting the problem > again, just let me know and I'll pass them along. Did the approach you took worked?
No, not yet. My patch to date has fixed the panic in Comment #11, but so far to my frustration I've not been able to resolve the original uart_put_char() panic. I've fixed several races involved with line discipline character processing in uart_close(), uart_hangup(), release_dev(), and flush_to_ldisc(), but that's not been good enough. I'm still actively testing fixes for the panic. I have two dumps from crashes in the last two days that I still need to analyze. It's the same panic, but I have additional debug in those dumps that hopefully will point me more towards the source of the race I haven't isolated yet. If you're interested, I can write you a longer response with my findings and fixes so far. The biggest update to Comment #13 was I found the same race in uart_hangup() [the getty/login cycle exercises the hangup code], and I backported the changesets from 0fb5a532, 42fd552e, and 45242006 for flush_to_ldisc(). Has the panic happened again, or was this just a query since I've been quiet for awhile?
This request was evaluated by Red Hat Product Management for inclusion in Red Hat Enterprise Linux 5.6 and Red Hat does not plan to fix this issue the currently developed update. Contact your manager or support representative in case you need to escalate this bug.
This request was evaluated by Red Hat Product Management for inclusion in Red Hat Enterprise Linux 5.7 and Red Hat does not plan to fix this issue the currently developed update. Contact your manager or support representative in case you need to escalate this bug.