Bug 151981 (IT_69402) - udevd fails to create /dev files after misc_register
Summary: udevd fails to create /dev files after misc_register
Keywords:
Status: CLOSED ERRATA
Alias: IT_69402
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel
Version: 4.0
Hardware: ia64
OS: Linux
medium
medium
Target Milestone: ---
: ---
Assignee: Jason Baron
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks: 181409
TreeView+ depends on / blocked
 
Reported: 2005-03-24 00:52 UTC by Hal Prince
Modified: 2013-03-06 05:58 UTC (History)
9 users (show)

Fixed In Version: RHSA-2006-0575
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2006-08-10 21:02:36 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
should fix this problem (534 bytes, patch)
2005-12-05 20:48 UTC, Jason Baron
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2006:0575 0 normal SHIPPED_LIVE Important: Updated kernel packages available for Red Hat Enterprise Linux 4 Update 4 2006-08-10 04:00:00 UTC

Description Hal Prince 2005-03-24 00:52:55 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; SunOS sun4u; en-US; rv:1.7.5) Gecko/20041109 Firefox/1.0

Description of problem:
Our FS product contains a module vxportal which needs a device file
/dev/vxportal.  We have the module init function call misc_register,
since we just need a character pseudo-device.  The deinit code calls
misc_deregister.

We notice that on x86 and x86_64 machines, the code works as expected,
namely /dev/vxportal disappears when we rmmod vxportal, and reappears
when we modprobe vxportal.  But on ia64, there appears to be a race
which causes /dev/vxportal not to be created when we modprobe.  By
running strace on udevd, we see udevd wake up from its select,
open /sys/class/misc/vxportal, and do getdents on this directory.
However, getdents returns just . and .., and not the "dev" file that
contains the major/minor numbers.  Since the directory is empty,
udevd does not call mknod to create /dev/vxportal.  If we then
ls this directory, we see the "dev" entry.  And if we then run
start_udev manually, the missing /dev/vxportal will get created.

Version-Release number of selected component (if applicable):
kernel-2.6.9-5.EL

How reproducible:
Always

Steps to Reproduce:
1.make a module that uses misc_register/misc_deregister in its init/deinit
routines to create a device file minor 32 name "vxportal"
2.modprobe this module
3.ls /dev/vxportal
  

Actual Results:  ia64: /dev/vxportal does not exist
x86, x86_64: /dev/vxportal exists

Expected Results:  x86, x86_64, ia64: /dev/vxportal exists

Additional info:

Our misc_register call:

extern struct file_operations vxportal_fops;

STATIC struct miscdevice vxportal_dev = {
        /* [XXX] akale select a minor number */
        32,             /* minor */
        "vxportal",     /* name */
        &vxportal_fops, /* fops */
        {NULL, NULL},   /* next, prev */
        NULL,           /* dev */
};

STATIC int __init
init_vxportal(
        void)
{
        int     error;

        error = misc_register(&vxportal_dev);


strace of udevd on ia64 (FAILS):

[pid 19178] lstat("/sys/class/misc/vxportal", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
[pid 19178] lstat("/sys/class/misc/vxportal", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
[pid 19178] open("/sys/class/misc/vxportal", O_RDONLY|O_NONBLOCK|O_DIRECTORY) = 4
[pid 19178] fstat(4, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
[pid 19178] fcntl(4, F_SETFD, FD_CLOEXEC) = 0
[pid 19178] getdents64(0x4, 0x600000000003c508, 0x4000) = 48 <<--!!!!!
[pid 19178] getdents64(0x4, 0x600000000003c508, 0x4000) = 0
[pid 19178] close(4)                    = 0
[pid 19178] open("/sys/class/misc/vxportal", O_RDONLY|O_NONBLOCK|O_DIRECTORY) = 4
[pid 19178] fstat(4, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
[pid 19178] fcntl(4, F_SETFD, FD_CLOEXEC) = 0
[pid 19178] getdents64(0x4, 0x600000000003c508, 0x4000) = 48 <<--!!!!!
[pid 19178] getdents64(0x4, 0x600000000003c508, 0x4000) = 0
[pid 19178] close(4)                    = 0
[pid 19178] munmap(0x2000000000400000, 163840) = 0
[pid 19178] close(3)                    = 0
[pid 19178] exit_group(-1)              = ?

strace of udevd on x86_64 (SUCCEEDS):

[pid 19936] lstat("/sys/class/misc/vxportal", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
[pid 19936] lstat("/sys/class/misc/vxportal", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
[pid 19936] open("/sys/class/misc/vxportal", O_RDONLY|O_NONBLOCK|O_DIRECTORY) = 4
[pid 19936] fstat(4, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
[pid 19936] fcntl(4, F_SETFD, FD_CLOEXEC) = 0
[pid 19936] getdents64(4, /* 3 entries */, 4096) = 72  <<--!!!!!!
[pid 19936] lstat("/sys/class/misc/vxportal/dev", {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0
[pid 19936] stat("/sys/class/misc/vxportal/dev", {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0
[pid 19936] open("/sys/class/misc/vxportal/dev", O_RDONLY) = 5
[pid 19936] read(5, "10:32\n", 4096)    = 6
[pid 19936] close(5)                    = 0
[pid 19936] getdents64(4, /* 0 entries */, 4096) = 0
[pid 19936] close(4)                    = 0
[pid 19936] open("/sys/class/misc/vxportal/dev", O_RDONLY) = 4
[pid 19936] read(4, "10:32\n", 4096)    = 6
[pid 19936] close(4)                    = 0

Comment 1 Jason Baron 2005-04-14 21:05:16 UTC
hmmm, pretty strange. If you had a small complete module that reproduced this
issue, it would save me some time. thanks.

Comment 2 Rob Kenna 2005-04-18 17:20:34 UTC
Reposted to make visible in IT
----------------

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; SunOS sun4u; en-US; rv:1.7.5) Gecko/20041109
Firefox/1.0

Description of problem:
Our FS product contains a module vxportal which needs a device file
/dev/vxportal.  We have the module init function call misc_register,
since we just need a character pseudo-device.  The deinit code calls
misc_deregister.

We notice that on x86 and x86_64 machines, the code works as expected,
namely /dev/vxportal disappears when we rmmod vxportal, and reappears
when we modprobe vxportal.  But on ia64, there appears to be a race
which causes /dev/vxportal not to be created when we modprobe.  By
running strace on udevd, we see udevd wake up from its select,
open /sys/class/misc/vxportal, and do getdents on this directory.
However, getdents returns just . and .., and not the "dev" file that
contains the major/minor numbers.  Since the directory is empty,
udevd does not call mknod to create /dev/vxportal.  If we then
ls this directory, we see the "dev" entry.  And if we then run
start_udev manually, the missing /dev/vxportal will get created.

Version-Release number of selected component (if applicable):
kernel-2.6.9-5.EL

How reproducible:
Always

Steps to Reproduce:
1.make a module that uses misc_register/misc_deregister in its init/deinit
routines to create a device file minor 32 name "vxportal"
2.modprobe this module
3.ls /dev/vxportal
  

Actual Results:  ia64: /dev/vxportal does not exist
x86, x86_64: /dev/vxportal exists

Expected Results:  x86, x86_64, ia64: /dev/vxportal exists

Additional info:

Our misc_register call:

extern struct file_operations vxportal_fops;

STATIC struct miscdevice vxportal_dev = {
        /* [XXX] akale select a minor number */
        32,             /* minor */
        "vxportal",     /* name */
        &vxportal_fops, /* fops */
        {NULL, NULL},   /* next, prev */
        NULL,           /* dev */
};

STATIC int __init
init_vxportal(
        void)
{
        int     error;

        error = misc_register(&vxportal_dev);


strace of udevd on ia64 (FAILS):

[pid 19178] lstat("/sys/class/misc/vxportal", {st_mode=S_IFDIR|0755, st_size=0,
...}) = 0
[pid 19178] lstat("/sys/class/misc/vxportal", {st_mode=S_IFDIR|0755, st_size=0,
...}) = 0
[pid 19178] open("/sys/class/misc/vxportal", O_RDONLY|O_NONBLOCK|O_DIRECTORY) = 4
[pid 19178] fstat(4, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
[pid 19178] fcntl(4, F_SETFD, FD_CLOEXEC) = 0
[pid 19178] getdents64(0x4, 0x600000000003c508, 0x4000) = 48 <<--!!!!!
[pid 19178] getdents64(0x4, 0x600000000003c508, 0x4000) = 0
[pid 19178] close(4)                    = 0
[pid 19178] open("/sys/class/misc/vxportal", O_RDONLY|O_NONBLOCK|O_DIRECTORY) = 4
[pid 19178] fstat(4, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
[pid 19178] fcntl(4, F_SETFD, FD_CLOEXEC) = 0
[pid 19178] getdents64(0x4, 0x600000000003c508, 0x4000) = 48 <<--!!!!!
[pid 19178] getdents64(0x4, 0x600000000003c508, 0x4000) = 0
[pid 19178] close(4)                    = 0
[pid 19178] munmap(0x2000000000400000, 163840) = 0
[pid 19178] close(3)                    = 0
[pid 19178] exit_group(-1)              = ?

strace of udevd on x86_64 (SUCCEEDS):

[pid 19936] lstat("/sys/class/misc/vxportal", {st_mode=S_IFDIR|0755, st_size=0,
...}) = 0
[pid 19936] lstat("/sys/class/misc/vxportal", {st_mode=S_IFDIR|0755, st_size=0,
...}) = 0
[pid 19936] open("/sys/class/misc/vxportal", O_RDONLY|O_NONBLOCK|O_DIRECTORY) = 4
[pid 19936] fstat(4, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
[pid 19936] fcntl(4, F_SETFD, FD_CLOEXEC) = 0
[pid 19936] getdents64(4, /* 3 entries */, 4096) = 72  <<--!!!!!!
[pid 19936] lstat("/sys/class/misc/vxportal/dev", {st_mode=S_IFREG|0444,
st_size=4096, ...}) = 0
[pid 19936] stat("/sys/class/misc/vxportal/dev", {st_mode=S_IFREG|0444,
st_size=4096, ...}) = 0
[pid 19936] open("/sys/class/misc/vxportal/dev", O_RDONLY) = 5
[pid 19936] read(5, "10:32\n", 4096)    = 6
[pid 19936] close(5)                    = 0
[pid 19936] getdents64(4, /* 0 entries */, 4096) = 0
[pid 19936] close(4)                    = 0
[pid 19936] open("/sys/class/misc/vxportal/dev", O_RDONLY) = 4
[pid 19936] read(4, "10:32\n", 4096)    = 6
[pid 19936] close(4)                    = 0



Comment 4 Hal Prince 2005-04-20 21:15:50 UTC
We have recently noticed that this problem occurs
on x86 machines as well, though it seems easier to
reproduce on ia64.

Jason asks for a small complete module that 
demonstrates the problem.  Here is a small
incomplete module; it's just a trimmed-down
excerpt from our
vxportal driver described above.
It doesn't define vxportal_fops, but perhaps
that could be a NULL pointer anyway. 


STATIC struct miscdevice vxportal_dev = {
        /* [XXX] akale select a minor number */
        32,             /* minor */
        "vxportal",     /* name */
        &vxportal_fops, /* fops */
        {NULL, NULL},   /* next, prev */
        NULL,           /* dev */
};

STATIC int __init
init_vxportal(
        void)
{
        int     error;

        error = misc_register(&vxportal_dev);
        return error;
}

STATIC void __exit
exit_vxportal(
        void)
{
        misc_deregister(&vxportal_dev);        
}
module_init(init_vxportal);
module_exit(exit_vxportal);



Comment 5 Jason Baron 2005-04-20 21:23:02 UTC
ok. thanks. i'll see if i can reproduce this race. 

Comment 12 Jason Baron 2005-12-05 20:48:58 UTC
Created attachment 121864 [details]
should fix this problem

Comment 13 Jason Baron 2005-12-05 20:51:03 UTC
The above is a quick and dirty fix for this issue. The problem is the kernel
calls the hotplug event before the file is created. Thus, if the userspace
happens to run before the file is added the problem described here will happen.
The patch in comment #12 should fix this, although its not pretty.

Comment 14 Jason Baron 2005-12-07 04:39:19 UTC
Can we get the patch from comment #12 tested.

Comment 20 Bob Johnson 2006-04-11 16:48:26 UTC
This issue is on Red Hat Engineering's list of planned work items 
for the upcoming Red Hat Enterprise Linux 4.4 release.  Engineering 
resources have been assigned and barring unforeseen circumstances, Red 
Hat intends to include this item in the 4.4 release.

Comment 22 Jason Baron 2006-05-03 17:22:11 UTC
committed in stream U4 build 34.28. A test kernel with this patch is available
from http://people.redhat.com/~jbaron/rhel4/


Comment 25 Red Hat Bugzilla 2006-08-10 21:02:36 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2006-0575.html



Note You need to log in before you can comment on or make changes to this bug.