Bug 131674
| Summary: | CAN-2004-0814 potential race condition in RHEL 2.1/3 tty layer | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 3 | Reporter: | Tom "spot" Callaway <tcallawa> | ||||||||
| Component: | kernel | Assignee: | Jason Baron <jbaron> | ||||||||
| Status: | CLOSED ERRATA | QA Contact: | |||||||||
| Severity: | high | Docs Contact: | |||||||||
| Priority: | medium | ||||||||||
| Version: | 3.0 | CC: | alan, bressers, ckloiber, jbaron, knoel, lwoodman, mjc, nikola.ciprich, nobody+wcheng, peterm, petrides, redhat-bugzilla, riel, tao, tburke, vanhoof, vkanakas | ||||||||
| Target Milestone: | --- | Keywords: | Security | ||||||||
| Target Release: | --- | ||||||||||
| Hardware: | All | ||||||||||
| OS: | Linux | ||||||||||
| Whiteboard: | impact=important,public=20040907 | ||||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||||
| Doc Text: | Story Points: | --- | |||||||||
| Clone Of: | Environment: | ||||||||||
| Last Closed: | 2005-04-22 20:17:30 UTC | Type: | --- | ||||||||
| Regression: | --- | Mount Type: | --- | ||||||||
| Documentation: | --- | CRM: | |||||||||
| Verified Versions: | Category: | --- | |||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||
| Embargoed: | |||||||||||
| Bug Depends On: | |||||||||||
| Bug Blocks: | 132991 | ||||||||||
| Attachments: |
|
||||||||||
|
Description
Tom "spot" Callaway
2004-09-03 02:27:59 UTC
Thanks, Tom. I'll start the RHEL3 bug off with Jason, who is looking into this already for RHEL2.1. Tom, I think this is at risk for U4. *** Bug 133108 has been marked as a duplicate of this bug. *** This is promoted to security severity, with "Important" impact It looks like this fix needs a little touchup, specifically, when the serial connection is using DCD. See the LKML thread that starts here: http://marc.theaimsgroup.com/?l=linux-kernel&m=109820018332641&w=2 The default setup trusts LCP closes sent between the two ends, and it should work fine. However, with DCD, a new ldisc->hangup function needs to be written. I just read the LKML thread, this issue seems to be forgotten. Do we have any status on this? The ppp hangup fixes have been done, so that bit seems solid. However it changes the ABI, although it would be possible I guess to hack hardcode a PPP ldisc check and call into the tty layer core. Created attachment 106183 [details]
tty base patch
Created attachment 106184 [details]
tty base patch
patches to resolve this issue
Created attachment 106185 [details]
tty abi patch
We also need a per tty semaphore, see: http://marc.theaimsgroup.com/?l=linux-kernel&m=109985842421212&w=2 This of course breaks the ABI once again by adding a field to the tty_struct. Is it reasonable to have one global semaphore for this? Some USB stuff can take a while but you might get away with it Hello,
We're having OOPSes from 2.4.21-15.ELsmp (U2) and 2.4.21-20.ELsmp (U3)
kernels and it might be that we're hitting this race. Can you (just at
first glance) confirm or deny that this dump is one of those races?
ksymoops 2.4.9 on i686 2.4.21-15.ELsmp. Options used
-V (default)
-K (specified)
-l /proc/modules (default)
-o /lib/modules/2.4.21-15.ELsmp/ (default)
-m /boot/System.map-2.4.21-15.ELsmp (specified)
No modules in ksyms, skipping objects
No ksyms, skipping lsmod
Unable to handle kernel NULL pointer dereference at virtual address
00000000
c01ad325
*pde = 1b017001
Oops: 0000
CPU: 2
EIP: 0060:[<c01ad325>] Tainted: PF
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010246
eax: 00000000 ebx: 00000500 ecx: 00000000 edx: 00000000
esi: 00000000 edi: e175fc00 ebp: c04cfe00 esp: daf4de80
ds: 0068 es: 0068 ss: 0068
Process sh (pid: 2569, stackpage=daf4d000)
Stack: 00000000 00000000 c0140a62 0ac42025 00000000 00000000 cd085025
c4e85fa4
db102f00 00000000 b756a540 e1847e00 00030002 daf4c000 00000000
e175fc00
f57fc980 c01adf86 00000500 daf4dee4 c6abbd80 f169e008 00000000
c0179a00
Call Trace: [<c0140a62>] do_anonymous_page [kernel] 0x252 (0xdaf4de88)
[<c01adf86>] tty_open [kernel] 0x66 (0xdaf4dec4)
[<c0179a00>] dput [kernel] 0x30 (0xdaf4dedc)
[<c016f3e6>] link_path_walk [kernel] 0x656 (0xdaf4def0)
[<c0161288>] get_chrfops [kernel] 0x98 (0xdaf4df00)
[<c0139773>] in_group_p [kernel] 0x23 (0xdaf4df08)
[<f88e07ea>] ext3_permission [ext3] 0xaa (0xdaf4df10)
[<c0161541>] chrdev_open [kernel] 0x71 (0xdaf4df38)
[<c015f790>] dentry_open [kernel] 0x110 (0xdaf4df54)
[<c015f678>] filp_open [kernel] 0x68 (0xdaf4df70)
[<c015fa83>] sys_open [kernel] 0x53 (0xdaf4dfa8)
Code: 8b 04 88 89 44 24 30 85 c0 0f 84 9c 00 00 00 8b 54 24 30 8b
>>EIP; c01ad325 <init_dev+55/500> <=====
>>ebp; c04cfe00 <dev_tty_driver+0/c0>
Trace; c0140a62 <do_anonymous_page+252/510>
Trace; c01adf86 <tty_open+66/410>
Trace; c0179a00 <dput+30/1b0>
Trace; c016f3e6 <link_path_walk+656/7a0>
Trace; c0161288 <get_chrfops+98/170>
Trace; c0139773 <in_group_p+23/30>
Trace; f88e07ea <END_OF_CODE+383b2cd2/????>
Trace; c0161541 <chrdev_open+71/b0>
Trace; c015f790 <dentry_open+110/210>
Trace; c015f678 <filp_open+68/70>
Trace; c015fa83 <sys_open+53/c0>
Code; c01ad325 <init_dev+55/500>
00000000 <_EIP>:
Code; c01ad325 <init_dev+55/500> <=====
0: 8b 04 88 mov (%eax,%ecx,4),%eax <=====
Code; c01ad328 <init_dev+58/500>
3: 89 44 24 30 mov %eax,0x30(%esp,1)
Code; c01ad32c <init_dev+5c/500>
7: 85 c0 test %eax,%eax
Code; c01ad32e <init_dev+5e/500>
9: 0f 84 9c 00 00 00 je ab <_EIP+0xab>
Code; c01ad334 <init_dev+64/500>
f: 8b 54 24 30 mov 0x30(%esp,1),%edx
Code; c01ad338 <init_dev+68/500>
13: 8b 00 mov (%eax),%eax
Kernel panic: Fatal exception
Looks unrelated - more like random memory corruption I've seen this trace a few times notably with nvidia drivers loaded. Strange. We're not running X (a server machine) and we did run memtest86 for several days on it with no errors... From comment #28 and #29, I agree with Alan this doesn't look like related but with a small "deviation". We (support) happen to have two reports (and two crash dumps) with the exact same panic route. Doesn't look like a memory corruption to me - more like the driver registration routine (tty_init) that doesn't initialize the "table" field within tty_driver structure ? Seems possible...the routine that registers the driver is tty_register_driver, and it should set the table pointer appropriately...any idea which tty driver is involved? For Red Hat support reading this bugzilla, please don't tie the panic
comes down from tty_open->init_dev to this bugzilla *yet*. We need to
further look into that oops.
For Jason: Note that in tty_open(), TTY_DEV should be re-directed to
the actual physical driver via the following if statement:
if (device == TTY_DEV) {
if (!current->tty)
return -ENXIO;
device = current->tty->device;
filp->f_flags |= O_NONBLOCK;
}
Unfortunately, from the dump, current->tty->device was 1280 (=0x500,
which was major 5 minor 0) that happened to be /dev/tty. So it was
"redirected" *back* to /dev/tty. Then it came down to init_dev() to
access the NULL "table" field and died there.
hmmm, the tty pointer should either be NULL, or point to a real device, so you're saying it was pointing to a tty_struct which has a device field set to /dev/tty? Is it really possible to create such a tty_struct...since the open causes a re-direct? Might be worth auditing against a race with vhangup or other cases where the tty we are acquiring is vanishing. Thats all I can think off offhand Can you memset the buffer at the point the tty structure is freed, memset it to some clear signature and look for that elsewhere. I can confirm that the OOPS-es we are experiencing come from opening
the /dev/tty device (major 5, minor 0). It can happen that the system
is running for a month without the problem and then we get a series of
OOPS-es in a short period. I have worked arround them with the
following "patch":
*** tty_io.c.ORIGINAL 2004-08-19 02:30:35.000000000 +0200
--- tty_io.c 2004-11-26 11:54:20.000000000 +0100
***************
*** 849,854 ****
--- 849,862 ----
*/
down_tty_sem(idx);
+ /* check whether driver->table is NULL to avoid OOPS */
+ if (driver->table == NULL)
+ {
+ retval = -ENODEV;
+ printk(KERN_INFO "init_dev: major=%d minor=%d
driver->table is NULL (this would OOPS)\n", MAJOR(device), MINOR(device));
+ goto end_init;
+ }
+
/* check whether we're reopening an existing tty */
tty = driver->table[idx];
if (tty) goto fast_track;
and since applying this to our 7 servers we have been running without
a panic for a month until recently when I found the following in the
messages on one of our machines:
Dec 22 17:14:16 xk kernel: init_dev: major=5 minor=0 driver->table is
NULL (this would OOPS)
Dec 23 12:14:41 xk kernel: init_dev: major=5 minor=0 driver->table is
NULL (this would OOPS)
Dec 23 12:14:42 xk last message repeated 6 times
Dec 23 18:09:16 xk kernel: init_dev: major=5 minor=0 driver->table is
NULL (this would OOPS)
Dec 23 18:09:23 xk last message repeated 3 times
Dec 25 19:23:07 xk kernel: init_dev: major=5 minor=0 driver->table is
NULL (this would OOPS)
Dec 25 19:23:07 xk last message repeated 8 times
Peter
I can help by adding some more debug printk statements and wait for re-occurence of the event. Please advise what I should put to the printk... Regards, Peter the value of: current->tty->device would be helpful, but first check
current->tty being non-NULL as in tty_open():
if (current->tty)
device = current->tty->device;
printk("device: %i\n", device);
Also, i want to better understand the patch you did...it seems to me
that by having init_dev return 0, you end up getting tty_open to retry
which is likely why you have the message repeated multiple times, ie
the current->tty->device pointer is still incorrectly set...seems to
me that maybe this means a memory barrier is missing. ie eventully
this gets sets corretly and the tty_open succeeds. Are you noticing
any application errors. thanks.
Actualy I have init_dev return -ENODEV (looking at the return statement a few lines before that I made init_dev return the same as if there were no driver for device) and yes, the application might retry if it gets this error. Before this patch, the OOPS-es allways printed the process that was trying to open /dev/tty was /bin/sh. There are a lot of processes running /bin/sh (cron jobs, Oracle database, etc) but I have not found any error in the logs that could be related to not being able to open /dev/tty. I also don't see any possibility that /bin/sh was trying to open /dev/tty while the process was actualy associated with a terminal. The only place where this would happen are interactive remote logins (via ssh) or on console and noone was doing that at the times the event occured. So the most possible event I see is that /bin/sh is trying to open /dev/tty while the process is not associated with a terminal. What is the code path at that time and which memory barrier might be missing and where? I will put the current->tty->device to my printk and let you know of the results. Regards, Peter By the way, isn't this a potential race:
if (device == TTY_DEV) {
if (!current->tty)
return -ENXIO;
device = current->tty->device;
filp->f_flags |= O_NONBLOCK;
}
The current->tty is accessed twice and could be reset to NULL in-between.
I have made the following "adjustments" to tty_io.c:
*** tty_io.c.ORIGINAL 2004-08-19 02:30:35.000000000 +0200
--- tty_io.c 2004-12-27 16:47:36.000000000 +0100
***************
*** 849,854 ****
--- 849,862 ----
*/
down_tty_sem(idx);
+ /* check whether driver->table is NULL to avoid OOPS */
+ if (driver->table == NULL)
+ {
+ retval = -ENODEV;
+ printk(KERN_INFO "init_dev: process %d (%s): major=%d
minor=%d driver->table is NULL (this would OOPS)\n", current->pid,
current->comm, MAJOR(device), MINOR(device));
+ goto end_init;
+ }
+
/* check whether we're reopening an existing tty */
tty = driver->table[idx];
if (tty) goto fast_track;
***************
*** 1326,1343 ****
--- 1334,1356 ----
retry_open:
noctty = filp->f_flags & O_NOCTTY;
device = inode->i_rdev;
+
+ printk(KERN_INFO "tty_open: process %d (%s): major=%d
minor=%d\n", current->pid, current->comm, MAJOR(device), MINOR(device));
+
if (device == TTY_DEV) {
if (!current->tty)
return -ENXIO;
device = current->tty->device;
filp->f_flags |= O_NONBLOCK; /* Don't let /dev/tty
block */
/* noctty = 1; */
+ printk(KERN_INFO "tty_open: process %d (%s): after
swaping for current->tty->device: major=%d minor=%d\n", current->pid,
current->comm, MAJOR(device), MINOR(device));
}
#ifdef CONFIG_VT
if (device == CONSOLE_DEV) {
extern int fg_console;
device = MKDEV(TTY_MAJOR, fg_console + 1);
noctty = 1;
+ printk(KERN_INFO "tty_open: process %d (%s): after
swaping for MKDEV(TTY_MAJOR, fg_console + 1): major=%d minor=%d\n",
current->pid, current->comm, MAJOR(device), MINOR(device));
}
#endif
if (device == SYSCONS_DEV) {
***************
*** 1349,1354 ****
--- 1362,1368 ----
device = c->device(c);
filp->f_flags |= O_NONBLOCK; /* Don't let /dev/console
block */
noctty = 1;
+ printk(KERN_INFO "tty_open: process %d (%s): after
swaping for c->device(c): major=%d minor=%d\n", current->pid,
current->comm, MAJOR(device), MINOR(device));
}
if (device == PTMX_DEV) {
And after 2 days on one of the machines I got the following in the
messages:
Dec 30 12:26:33 xk kernel: tty_open: process 6184 (sh): major=5 minor=0
Dec 30 12:26:33 xk kernel: tty_open: process 6184 (sh): after swaping
for current->tty->device: major=5 minor=0
Dec 30 12:26:33 xk kernel: init_dev: process 6184 (sh): major=5
minor=0 driver->table is NULL (this would OOPS)
Dec 30 12:26:33 xk kernel: tty_open: process 6195 (sh): major=5 minor=0
Dec 30 12:26:33 xk kernel: tty_open: process 6195 (sh): after swaping
for current->tty->device: major=5 minor=0
Dec 30 12:26:33 xk kernel: init_dev: process 6195 (sh): major=5
minor=0 driver->table is NULL (this would OOPS)
Dec 30 12:26:40 xk kernel: tty_open: process 6212 (sh): major=5 minor=0
Dec 30 12:26:40 xk kernel: tty_open: process 6212 (sh): after swaping
for current->tty->device: major=5 minor=0
Dec 30 12:26:40 xk kernel: init_dev: process 6212 (sh): major=5
minor=0 driver->table is NULL (this would OOPS)
Dec 30 12:26:40 xk kernel: tty_open: process 6216 (sh): major=5 minor=0
Dec 30 12:26:40 xk kernel: tty_open: process 6216 (sh): after swaping
for current->tty->device: major=5 minor=0
Dec 30 12:26:40 xk kernel: init_dev: process 6216 (sh): major=5
minor=0 driver->table is NULL (this would OOPS)
... so the current->tty points to a structure that represents a
/dev/tty. The "repeats" that we observed in my previous post are
actualy from differen sh processes, each trying to open /dev/tty only
once. How's this possible?
Regards, Peter
hi Peter, thanks for all the info. Do you have the major/minor number for a successful open around the time of the oops? There could be a race in the open and close paths, and knowing which major/minor number would tell us which driver to look at. Regarding the race metioned in comment #48, current->tty cannot be reset in that window, since *we* are the one that owns current, meaning nobody else should be mucking with our task structure. I don't see why different shell processes trying to open /dev/tty at that same time is necessarily a problem? below, is a patch for a race condition that *might* fix this issue. I
haven't had a chance to test it yet, but i'll look further into it
tomorrow...
Peter, if you feel up to it, feel free to give this patch a try. thanks.
--- linux-2.4.21/drivers/char/tty_io.c.bak Mon Jan 3 19:14:55 2005
+++ linux-2.4.21/drivers/char/tty_io.c Mon Jan 3 19:15:39 2005
@@ -589,6 +589,8 @@ void disassociate_ctty(int on_exit)
struct list_head *l;
struct pid *pid;
+ lock_kernel();
+
if (tty) {
tty_pgrp = tty->pgrp;
if (on_exit && tty->driver.type != TTY_DRIVER_TYPE_PTY)
@@ -598,6 +600,7 @@ void disassociate_ctty(int on_exit)
kill_pg(current->tty_old_pgrp, SIGHUP, on_exit);
kill_pg(current->tty_old_pgrp, SIGCONT, on_exit);
}
+ unlock_kernel();
return;
}
if (tty_pgrp > 0) {
@@ -614,6 +617,7 @@ void disassociate_ctty(int on_exit)
for_each_task_pid(current->session, PIDTYPE_SID, p, l, pid)
p->tty = NULL;
read_unlock(&tasklist_lock);
+ unlock_kernel();
}
void stop_tty(struct tty_struct *tty)
Jason, I have added some more printks to find out what the
current->tty is actualy pointing at when the init_dev is called with
/dev/tty and I added some printks to the points where current->tty is
assigned. I made the following:
*** tty_io.c.ORIGINAL 2004-08-19 02:30:35.000000000 +0200
--- tty_io.c 2004-12-30 15:36:16.000000000 +0100
***************
*** 822,827 ****
--- 822,859 ----
static void release_mem(struct tty_struct *tty, int idx);
+ static void print_tty_struct(char *var, struct tty_struct *tty)
+ {
+ if (tty == NULL)
+ {
+ printk(KERN_INFO " %s = NULL\n", var);
+ return;
+ }
+
+ printk(KERN_INFO " %s->magic = %d\n", var, tty->magic);
+ printk(KERN_INFO " %s->pgrp = %d\n", var, tty->pgrp);
+ printk(KERN_INFO " %s->session = %d\n", var, tty->session);
+ printk(KERN_INFO " %s->device = (major:%d, minor:%d)\n", var,
MAJOR(tty->device), MINOR(tty->device));
+ }
+
+ static void print_task_struct(char *var, struct task_struct *t)
+ {
+ if (t == NULL)
+ {
+ printk(KERN_INFO " %s = NULL\n", var);
+ return;
+ }
+
+ printk(KERN_INFO " %s->pid = %d\n", var, t->pid);
+ printk(KERN_INFO " %s->pgrp = %d\n", var, t->pgrp);
+ printk(KERN_INFO " %s->tty_old_pgrp = %d\n", var,
t->tty_old_pgrp);
+ printk(KERN_INFO " %s->session = %d\n", var, t->session);
+ printk(KERN_INFO " %s->tgid = %d\n", var, t->tgid);
+ printk(KERN_INFO " %s->leader = %d\n", var, t->leader);
+
+ print_tty_struct(" ...->tty", t->tty);
+ }
+
/*
* WSH 06/09/97: Rewritten to remove races and properly clean up after a
* failed open. The new code protects the open with a semaphore, so
it's
***************
*** 849,854 ****
--- 881,896 ----
*/
down_tty_sem(idx);
+ /* check whether driver->table is NULL to avoid OOPS */
+ if (driver->table == NULL)
+ {
+ printk(KERN_INFO "init_dev: device=(major:%d,
minor:%d): driver->table is NULL (this would OOPS)\n", MAJOR(device),
MINOR(device));
+ print_task_struct("current", current);
+ print_task_struct("current->parent", current->parent);
+ retval = -ENODEV;
+ goto end_init;
+ }
+
/* check whether we're reopening an existing tty */
tty = driver->table[idx];
if (tty) goto fast_track;
***************
*** 1326,1331 ****
--- 1368,1374 ----
retry_open:
noctty = filp->f_flags & O_NOCTTY;
device = inode->i_rdev;
+
if (device == TTY_DEV) {
if (!current->tty)
return -ENXIO;
***************
*** 1439,1444 ****
--- 1482,1491 ----
current->tty_old_pgrp = 0;
tty->session = current->session;
tty->pgrp = current->pgrp;
+
+ printk(KERN_INFO "tty_open: current->tty was set\n");
+ print_task_struct("current", current);
+ print_task_struct("current->parent", current->parent);
}
if ((tty->driver.type == TTY_DRIVER_TYPE_SERIAL) &&
(tty->driver.subtype == SERIAL_TYPE_CALLOUT) &&
***************
*** 1621,1626 ****
--- 1668,1678 ----
current->tty_old_pgrp = 0;
tty->session = current->session;
tty->pgrp = current->pgrp;
+
+ printk(KERN_INFO "tiocsctty: current->tty was set\n");
+ print_task_struct("current", current);
+ print_task_struct("current->parent", current->parent);
+
return 0;
}
The results are this (just one of the occurences):
Jan 3 13:59:32 xk kernel: init_dev: device=(major:5, minor:0):
driver->table is NULL (this would OOPS)
Jan 3 13:59:32 xk kernel: current->pid = 28162
Jan 3 13:59:32 xk kernel: current->pgrp = 18059
Jan 3 13:59:32 xk kernel: current->tty_old_pgrp = 0
Jan 3 13:59:32 xk kernel: current->session = 16467
Jan 3 13:59:32 xk kernel: current->tgid = 28162
Jan 3 13:59:32 xk kernel: current->leader = 0
Jan 3 13:59:32 xk kernel: ...->tty->magic = 538976288
Jan 3 13:59:32 xk kernel: ...->tty->pgrp = 1734702177
Jan 3 13:59:32 xk kernel: ...->tty->session = 108622447
Jan 3 13:59:32 xk kernel: ...->tty->device = (major:5, minor:0)
Jan 3 13:59:32 xk kernel: current->parent->pid = 28159
Jan 3 13:59:32 xk kernel: current->parent->pgrp = 18059
Jan 3 13:59:32 xk kernel: current->parent->tty_old_pgrp = 0
Jan 3 13:59:32 xk kernel: current->parent->session = 16467
Jan 3 13:59:32 xk kernel: current->parent->tgid = 28159
Jan 3 13:59:32 xk kernel: current->parent->leader = 0
Jan 3 13:59:32 xk kernel: ...->tty->magic = 538976288
Jan 3 13:59:32 xk kernel: ...->tty->pgrp = 1734702177
Jan 3 13:59:32 xk kernel: ...->tty->session = 108622447
Jan 3 13:59:32 xk kernel: ...->tty->device = (major:5, minor:0)
... I couldn't find anything in the log about assigning current->tty
such a structure. This means either the memory occupied by
current->tty structure was freed prematurely and used by something
else or something is trashing the current->tty pointer. The tty->magic
should be 21505 but in these situations I found following different
values:
20
538976288
1936876880
1953460082
I will study your suggested patch closely.
Regarding the race mentioned in comment #48, and your comment #50,
what about the situation in the tiocsctty function where the tty is
being stolen away from a session group. In that case we *are* mucking
with somebody else's task structure:
static int tiocsctty(struct tty_struct *tty, int arg)
{
if (current->leader &&
(current->session == tty->session))
return 0;
/*
* The process must be a session leader and
* not have a controlling tty already.
*/
if (!current->leader || current->tty)
return -EPERM;
if (tty->session > 0) {
/*
* This tty is already the controlling
* tty for another session group!
*/
if ((arg == 1) && suser()) {
/*
* Steal it away
*/
struct task_struct *p;
struct list_head *l;
struct pid *pid;
read_lock(&tasklist_lock);
for_each_task_pid(tty->session, PIDTYPE_SID,
p, l, pid)
p->tty = NULL;
And regarding my "How's this possible?": I was refering to the
statement before: "the current->tty points to a structure that
represents a /dev/tty". As my latest logs show this doesn't look like
a tty structure at all...
Do these latest logs change you perspective on the issue?
Regards, Peter
Folks, since we're talking about two different issues, could we move the new problem into bugzilla 144059 ? Not sure which key I hit wrong - somehow I altered the following and don't know
how to "reset" it back.
wcheng changed:
What |Removed |Added
----------------------------------------------------------------------------
devel_whiteboard|impact=important,public=2004|
|0907 |
Hi Peter, thanks for the additinal printks. I came to the same conclusion too - that the current->tty was pointing to apparently the wrong place (but somehow the device number still made sense). I had a netdump of a kernel that panic'd, although i don't have a simple reproducer for this issue...anyways, i think like you said something is most likely free'ing the tty struct and then we are left pointing to it. I was trying to think how this could happen and my best guess was a process exit racing with an open. This problem does in fact appear in rhel3 and is plugged by the patch in comment #51. As far as the ioctl race mentioned in comment #52, i don't believe this is an issue since both sys_open and sys_ioctl do a lock_kernel before calling down into the above code. So i do not belive an ioctl vs. open race exists. Further, the structure is not freed in the ioctl path, which doesn't fit this case as closely as the exit vs. open case. In any case, i'm going to try and see if i can create or find a simple way to reproduce this, given my current thought on how this fails. thanks. Fixes for these problems have just been committed to the RHEL3 U5 patch pool this evening (in kernel version 2.4.21-27.13.EL). For reference the fix that got committed upstream into 2.4.29: http://linux.bkbits.net:8080/linux-2.4/cset@41c3801dxJnuBSRCpUiMRkwItPHjWA Fixes for these problems have also been committed to the RHEL3 E5 patch pool this evening (in kernel version 2.4.21-27.0.3.EL). An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2005-293.html An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2005-294.html *** Bug 173146 has been marked as a duplicate of this bug. *** |