Bug 151981 - (IT_69402) udevd fails to create /dev files after misc_register
udevd fails to create /dev files after misc_register
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel (Show other bugs)
4.0
ia64 Linux
medium Severity medium
: ---
: ---
Assigned To: Jason Baron
Brian Brock
:
Depends On:
Blocks: 181409
  Show dependency treegraph
 
Reported: 2005-03-23 19:52 EST by Hal Prince
Modified: 2013-03-06 00:58 EST (History)
9 users (show)

See Also:
Fixed In Version: RHSA-2006-0575
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2006-08-10 17:02:36 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
should fix this problem (534 bytes, patch)
2005-12-05 15:48 EST, Jason Baron
no flags Details | Diff

  None (edit)
Description Hal Prince 2005-03-23 19:52:55 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; SunOS sun4u; en-US; rv:1.7.5) Gecko/20041109 Firefox/1.0

Description of problem:
Our FS product contains a module vxportal which needs a device file
/dev/vxportal.  We have the module init function call misc_register,
since we just need a character pseudo-device.  The deinit code calls
misc_deregister.

We notice that on x86 and x86_64 machines, the code works as expected,
namely /dev/vxportal disappears when we rmmod vxportal, and reappears
when we modprobe vxportal.  But on ia64, there appears to be a race
which causes /dev/vxportal not to be created when we modprobe.  By
running strace on udevd, we see udevd wake up from its select,
open /sys/class/misc/vxportal, and do getdents on this directory.
However, getdents returns just . and .., and not the "dev" file that
contains the major/minor numbers.  Since the directory is empty,
udevd does not call mknod to create /dev/vxportal.  If we then
ls this directory, we see the "dev" entry.  And if we then run
start_udev manually, the missing /dev/vxportal will get created.

Version-Release number of selected component (if applicable):
kernel-2.6.9-5.EL

How reproducible:
Always

Steps to Reproduce:
1.make a module that uses misc_register/misc_deregister in its init/deinit
routines to create a device file minor 32 name "vxportal"
2.modprobe this module
3.ls /dev/vxportal
  

Actual Results:  ia64: /dev/vxportal does not exist
x86, x86_64: /dev/vxportal exists

Expected Results:  x86, x86_64, ia64: /dev/vxportal exists

Additional info:

Our misc_register call:

extern struct file_operations vxportal_fops;

STATIC struct miscdevice vxportal_dev = {
        /* [XXX] akale select a minor number */
        32,             /* minor */
        "vxportal",     /* name */
        &vxportal_fops, /* fops */
        {NULL, NULL},   /* next, prev */
        NULL,           /* dev */
};

STATIC int __init
init_vxportal(
        void)
{
        int     error;

        error = misc_register(&vxportal_dev);


strace of udevd on ia64 (FAILS):

[pid 19178] lstat("/sys/class/misc/vxportal", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
[pid 19178] lstat("/sys/class/misc/vxportal", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
[pid 19178] open("/sys/class/misc/vxportal", O_RDONLY|O_NONBLOCK|O_DIRECTORY) = 4
[pid 19178] fstat(4, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
[pid 19178] fcntl(4, F_SETFD, FD_CLOEXEC) = 0
[pid 19178] getdents64(0x4, 0x600000000003c508, 0x4000) = 48 <<--!!!!!
[pid 19178] getdents64(0x4, 0x600000000003c508, 0x4000) = 0
[pid 19178] close(4)                    = 0
[pid 19178] open("/sys/class/misc/vxportal", O_RDONLY|O_NONBLOCK|O_DIRECTORY) = 4
[pid 19178] fstat(4, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
[pid 19178] fcntl(4, F_SETFD, FD_CLOEXEC) = 0
[pid 19178] getdents64(0x4, 0x600000000003c508, 0x4000) = 48 <<--!!!!!
[pid 19178] getdents64(0x4, 0x600000000003c508, 0x4000) = 0
[pid 19178] close(4)                    = 0
[pid 19178] munmap(0x2000000000400000, 163840) = 0
[pid 19178] close(3)                    = 0
[pid 19178] exit_group(-1)              = ?

strace of udevd on x86_64 (SUCCEEDS):

[pid 19936] lstat("/sys/class/misc/vxportal", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
[pid 19936] lstat("/sys/class/misc/vxportal", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
[pid 19936] open("/sys/class/misc/vxportal", O_RDONLY|O_NONBLOCK|O_DIRECTORY) = 4
[pid 19936] fstat(4, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
[pid 19936] fcntl(4, F_SETFD, FD_CLOEXEC) = 0
[pid 19936] getdents64(4, /* 3 entries */, 4096) = 72  <<--!!!!!!
[pid 19936] lstat("/sys/class/misc/vxportal/dev", {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0
[pid 19936] stat("/sys/class/misc/vxportal/dev", {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0
[pid 19936] open("/sys/class/misc/vxportal/dev", O_RDONLY) = 5
[pid 19936] read(5, "10:32\n", 4096)    = 6
[pid 19936] close(5)                    = 0
[pid 19936] getdents64(4, /* 0 entries */, 4096) = 0
[pid 19936] close(4)                    = 0
[pid 19936] open("/sys/class/misc/vxportal/dev", O_RDONLY) = 4
[pid 19936] read(4, "10:32\n", 4096)    = 6
[pid 19936] close(4)                    = 0
Comment 1 Jason Baron 2005-04-14 17:05:16 EDT
hmmm, pretty strange. If you had a small complete module that reproduced this
issue, it would save me some time. thanks.
Comment 2 Rob Kenna 2005-04-18 13:20:34 EDT
Reposted to make visible in IT
----------------

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; SunOS sun4u; en-US; rv:1.7.5) Gecko/20041109
Firefox/1.0

Description of problem:
Our FS product contains a module vxportal which needs a device file
/dev/vxportal.  We have the module init function call misc_register,
since we just need a character pseudo-device.  The deinit code calls
misc_deregister.

We notice that on x86 and x86_64 machines, the code works as expected,
namely /dev/vxportal disappears when we rmmod vxportal, and reappears
when we modprobe vxportal.  But on ia64, there appears to be a race
which causes /dev/vxportal not to be created when we modprobe.  By
running strace on udevd, we see udevd wake up from its select,
open /sys/class/misc/vxportal, and do getdents on this directory.
However, getdents returns just . and .., and not the "dev" file that
contains the major/minor numbers.  Since the directory is empty,
udevd does not call mknod to create /dev/vxportal.  If we then
ls this directory, we see the "dev" entry.  And if we then run
start_udev manually, the missing /dev/vxportal will get created.

Version-Release number of selected component (if applicable):
kernel-2.6.9-5.EL

How reproducible:
Always

Steps to Reproduce:
1.make a module that uses misc_register/misc_deregister in its init/deinit
routines to create a device file minor 32 name "vxportal"
2.modprobe this module
3.ls /dev/vxportal
  

Actual Results:  ia64: /dev/vxportal does not exist
x86, x86_64: /dev/vxportal exists

Expected Results:  x86, x86_64, ia64: /dev/vxportal exists

Additional info:

Our misc_register call:

extern struct file_operations vxportal_fops;

STATIC struct miscdevice vxportal_dev = {
        /* [XXX] akale select a minor number */
        32,             /* minor */
        "vxportal",     /* name */
        &vxportal_fops, /* fops */
        {NULL, NULL},   /* next, prev */
        NULL,           /* dev */
};

STATIC int __init
init_vxportal(
        void)
{
        int     error;

        error = misc_register(&vxportal_dev);


strace of udevd on ia64 (FAILS):

[pid 19178] lstat("/sys/class/misc/vxportal", {st_mode=S_IFDIR|0755, st_size=0,
...}) = 0
[pid 19178] lstat("/sys/class/misc/vxportal", {st_mode=S_IFDIR|0755, st_size=0,
...}) = 0
[pid 19178] open("/sys/class/misc/vxportal", O_RDONLY|O_NONBLOCK|O_DIRECTORY) = 4
[pid 19178] fstat(4, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
[pid 19178] fcntl(4, F_SETFD, FD_CLOEXEC) = 0
[pid 19178] getdents64(0x4, 0x600000000003c508, 0x4000) = 48 <<--!!!!!
[pid 19178] getdents64(0x4, 0x600000000003c508, 0x4000) = 0
[pid 19178] close(4)                    = 0
[pid 19178] open("/sys/class/misc/vxportal", O_RDONLY|O_NONBLOCK|O_DIRECTORY) = 4
[pid 19178] fstat(4, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
[pid 19178] fcntl(4, F_SETFD, FD_CLOEXEC) = 0
[pid 19178] getdents64(0x4, 0x600000000003c508, 0x4000) = 48 <<--!!!!!
[pid 19178] getdents64(0x4, 0x600000000003c508, 0x4000) = 0
[pid 19178] close(4)                    = 0
[pid 19178] munmap(0x2000000000400000, 163840) = 0
[pid 19178] close(3)                    = 0
[pid 19178] exit_group(-1)              = ?

strace of udevd on x86_64 (SUCCEEDS):

[pid 19936] lstat("/sys/class/misc/vxportal", {st_mode=S_IFDIR|0755, st_size=0,
...}) = 0
[pid 19936] lstat("/sys/class/misc/vxportal", {st_mode=S_IFDIR|0755, st_size=0,
...}) = 0
[pid 19936] open("/sys/class/misc/vxportal", O_RDONLY|O_NONBLOCK|O_DIRECTORY) = 4
[pid 19936] fstat(4, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
[pid 19936] fcntl(4, F_SETFD, FD_CLOEXEC) = 0
[pid 19936] getdents64(4, /* 3 entries */, 4096) = 72  <<--!!!!!!
[pid 19936] lstat("/sys/class/misc/vxportal/dev", {st_mode=S_IFREG|0444,
st_size=4096, ...}) = 0
[pid 19936] stat("/sys/class/misc/vxportal/dev", {st_mode=S_IFREG|0444,
st_size=4096, ...}) = 0
[pid 19936] open("/sys/class/misc/vxportal/dev", O_RDONLY) = 5
[pid 19936] read(5, "10:32\n", 4096)    = 6
[pid 19936] close(5)                    = 0
[pid 19936] getdents64(4, /* 0 entries */, 4096) = 0
[pid 19936] close(4)                    = 0
[pid 19936] open("/sys/class/misc/vxportal/dev", O_RDONLY) = 4
[pid 19936] read(4, "10:32\n", 4096)    = 6
[pid 19936] close(4)                    = 0

Comment 4 Hal Prince 2005-04-20 17:15:50 EDT
We have recently noticed that this problem occurs
on x86 machines as well, though it seems easier to
reproduce on ia64.

Jason asks for a small complete module that 
demonstrates the problem.  Here is a small
incomplete module; it's just a trimmed-down
excerpt from our
vxportal driver described above.
It doesn't define vxportal_fops, but perhaps
that could be a NULL pointer anyway. 


STATIC struct miscdevice vxportal_dev = {
        /* [XXX] akale select a minor number */
        32,             /* minor */
        "vxportal",     /* name */
        &vxportal_fops, /* fops */
        {NULL, NULL},   /* next, prev */
        NULL,           /* dev */
};

STATIC int __init
init_vxportal(
        void)
{
        int     error;

        error = misc_register(&vxportal_dev);
        return error;
}

STATIC void __exit
exit_vxportal(
        void)
{
        misc_deregister(&vxportal_dev);        
}
module_init(init_vxportal);
module_exit(exit_vxportal);

Comment 5 Jason Baron 2005-04-20 17:23:02 EDT
ok. thanks. i'll see if i can reproduce this race. 
Comment 12 Jason Baron 2005-12-05 15:48:58 EST
Created attachment 121864 [details]
should fix this problem
Comment 13 Jason Baron 2005-12-05 15:51:03 EST
The above is a quick and dirty fix for this issue. The problem is the kernel
calls the hotplug event before the file is created. Thus, if the userspace
happens to run before the file is added the problem described here will happen.
The patch in comment #12 should fix this, although its not pretty.
Comment 14 Jason Baron 2005-12-06 23:39:19 EST
Can we get the patch from comment #12 tested.
Comment 20 Bob Johnson 2006-04-11 12:48:26 EDT
This issue is on Red Hat Engineering's list of planned work items 
for the upcoming Red Hat Enterprise Linux 4.4 release.  Engineering 
resources have been assigned and barring unforeseen circumstances, Red 
Hat intends to include this item in the 4.4 release.
Comment 22 Jason Baron 2006-05-03 13:22:11 EDT
committed in stream U4 build 34.28. A test kernel with this patch is available
from http://people.redhat.com/~jbaron/rhel4/
Comment 25 Red Hat Bugzilla 2006-08-10 17:02:36 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2006-0575.html

Note You need to log in before you can comment on or make changes to this bug.