Bug 151981 (IT_69402)

Summary: udevd fails to create /dev files after misc_register
Product: Red Hat Enterprise Linux 4 Reporter: Hal Prince <hal>
Component: kernelAssignee: Jason Baron <jbaron>
Status: CLOSED ERRATA QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.0CC: davej, harald, jbaron, knoel, linux26port, lwang, riel, rkenna, tao
Target Milestone: ---   
Target Release: ---   
Hardware: ia64   
OS: Linux   
Whiteboard:
Fixed In Version: RHSA-2006-0575 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-08-10 21:02:36 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 181409    
Attachments:
Description Flags
should fix this problem none

Description Hal Prince 2005-03-24 00:52:55 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; SunOS sun4u; en-US; rv:1.7.5) Gecko/20041109 Firefox/1.0

Description of problem:
Our FS product contains a module vxportal which needs a device file
/dev/vxportal.  We have the module init function call misc_register,
since we just need a character pseudo-device.  The deinit code calls
misc_deregister.

We notice that on x86 and x86_64 machines, the code works as expected,
namely /dev/vxportal disappears when we rmmod vxportal, and reappears
when we modprobe vxportal.  But on ia64, there appears to be a race
which causes /dev/vxportal not to be created when we modprobe.  By
running strace on udevd, we see udevd wake up from its select,
open /sys/class/misc/vxportal, and do getdents on this directory.
However, getdents returns just . and .., and not the "dev" file that
contains the major/minor numbers.  Since the directory is empty,
udevd does not call mknod to create /dev/vxportal.  If we then
ls this directory, we see the "dev" entry.  And if we then run
start_udev manually, the missing /dev/vxportal will get created.

Version-Release number of selected component (if applicable):
kernel-2.6.9-5.EL

How reproducible:
Always

Steps to Reproduce:
1.make a module that uses misc_register/misc_deregister in its init/deinit
routines to create a device file minor 32 name "vxportal"
2.modprobe this module
3.ls /dev/vxportal
  

Actual Results:  ia64: /dev/vxportal does not exist
x86, x86_64: /dev/vxportal exists

Expected Results:  x86, x86_64, ia64: /dev/vxportal exists

Additional info:

Our misc_register call:

extern struct file_operations vxportal_fops;

STATIC struct miscdevice vxportal_dev = {
        /* [XXX] akale select a minor number */
        32,             /* minor */
        "vxportal",     /* name */
        &vxportal_fops, /* fops */
        {NULL, NULL},   /* next, prev */
        NULL,           /* dev */
};

STATIC int __init
init_vxportal(
        void)
{
        int     error;

        error = misc_register(&vxportal_dev);


strace of udevd on ia64 (FAILS):

[pid 19178] lstat("/sys/class/misc/vxportal", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
[pid 19178] lstat("/sys/class/misc/vxportal", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
[pid 19178] open("/sys/class/misc/vxportal", O_RDONLY|O_NONBLOCK|O_DIRECTORY) = 4
[pid 19178] fstat(4, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
[pid 19178] fcntl(4, F_SETFD, FD_CLOEXEC) = 0
[pid 19178] getdents64(0x4, 0x600000000003c508, 0x4000) = 48 <<--!!!!!
[pid 19178] getdents64(0x4, 0x600000000003c508, 0x4000) = 0
[pid 19178] close(4)                    = 0
[pid 19178] open("/sys/class/misc/vxportal", O_RDONLY|O_NONBLOCK|O_DIRECTORY) = 4
[pid 19178] fstat(4, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
[pid 19178] fcntl(4, F_SETFD, FD_CLOEXEC) = 0
[pid 19178] getdents64(0x4, 0x600000000003c508, 0x4000) = 48 <<--!!!!!
[pid 19178] getdents64(0x4, 0x600000000003c508, 0x4000) = 0
[pid 19178] close(4)                    = 0
[pid 19178] munmap(0x2000000000400000, 163840) = 0
[pid 19178] close(3)                    = 0
[pid 19178] exit_group(-1)              = ?

strace of udevd on x86_64 (SUCCEEDS):

[pid 19936] lstat("/sys/class/misc/vxportal", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
[pid 19936] lstat("/sys/class/misc/vxportal", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
[pid 19936] open("/sys/class/misc/vxportal", O_RDONLY|O_NONBLOCK|O_DIRECTORY) = 4
[pid 19936] fstat(4, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
[pid 19936] fcntl(4, F_SETFD, FD_CLOEXEC) = 0
[pid 19936] getdents64(4, /* 3 entries */, 4096) = 72  <<--!!!!!!
[pid 19936] lstat("/sys/class/misc/vxportal/dev", {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0
[pid 19936] stat("/sys/class/misc/vxportal/dev", {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0
[pid 19936] open("/sys/class/misc/vxportal/dev", O_RDONLY) = 5
[pid 19936] read(5, "10:32\n", 4096)    = 6
[pid 19936] close(5)                    = 0
[pid 19936] getdents64(4, /* 0 entries */, 4096) = 0
[pid 19936] close(4)                    = 0
[pid 19936] open("/sys/class/misc/vxportal/dev", O_RDONLY) = 4
[pid 19936] read(4, "10:32\n", 4096)    = 6
[pid 19936] close(4)                    = 0

Comment 1 Jason Baron 2005-04-14 21:05:16 UTC
hmmm, pretty strange. If you had a small complete module that reproduced this
issue, it would save me some time. thanks.

Comment 2 Rob Kenna 2005-04-18 17:20:34 UTC
Reposted to make visible in IT
----------------

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; SunOS sun4u; en-US; rv:1.7.5) Gecko/20041109
Firefox/1.0

Description of problem:
Our FS product contains a module vxportal which needs a device file
/dev/vxportal.  We have the module init function call misc_register,
since we just need a character pseudo-device.  The deinit code calls
misc_deregister.

We notice that on x86 and x86_64 machines, the code works as expected,
namely /dev/vxportal disappears when we rmmod vxportal, and reappears
when we modprobe vxportal.  But on ia64, there appears to be a race
which causes /dev/vxportal not to be created when we modprobe.  By
running strace on udevd, we see udevd wake up from its select,
open /sys/class/misc/vxportal, and do getdents on this directory.
However, getdents returns just . and .., and not the "dev" file that
contains the major/minor numbers.  Since the directory is empty,
udevd does not call mknod to create /dev/vxportal.  If we then
ls this directory, we see the "dev" entry.  And if we then run
start_udev manually, the missing /dev/vxportal will get created.

Version-Release number of selected component (if applicable):
kernel-2.6.9-5.EL

How reproducible:
Always

Steps to Reproduce:
1.make a module that uses misc_register/misc_deregister in its init/deinit
routines to create a device file minor 32 name "vxportal"
2.modprobe this module
3.ls /dev/vxportal
  

Actual Results:  ia64: /dev/vxportal does not exist
x86, x86_64: /dev/vxportal exists

Expected Results:  x86, x86_64, ia64: /dev/vxportal exists

Additional info:

Our misc_register call:

extern struct file_operations vxportal_fops;

STATIC struct miscdevice vxportal_dev = {
        /* [XXX] akale select a minor number */
        32,             /* minor */
        "vxportal",     /* name */
        &vxportal_fops, /* fops */
        {NULL, NULL},   /* next, prev */
        NULL,           /* dev */
};

STATIC int __init
init_vxportal(
        void)
{
        int     error;

        error = misc_register(&vxportal_dev);


strace of udevd on ia64 (FAILS):

[pid 19178] lstat("/sys/class/misc/vxportal", {st_mode=S_IFDIR|0755, st_size=0,
...}) = 0
[pid 19178] lstat("/sys/class/misc/vxportal", {st_mode=S_IFDIR|0755, st_size=0,
...}) = 0
[pid 19178] open("/sys/class/misc/vxportal", O_RDONLY|O_NONBLOCK|O_DIRECTORY) = 4
[pid 19178] fstat(4, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
[pid 19178] fcntl(4, F_SETFD, FD_CLOEXEC) = 0
[pid 19178] getdents64(0x4, 0x600000000003c508, 0x4000) = 48 <<--!!!!!
[pid 19178] getdents64(0x4, 0x600000000003c508, 0x4000) = 0
[pid 19178] close(4)                    = 0
[pid 19178] open("/sys/class/misc/vxportal", O_RDONLY|O_NONBLOCK|O_DIRECTORY) = 4
[pid 19178] fstat(4, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
[pid 19178] fcntl(4, F_SETFD, FD_CLOEXEC) = 0
[pid 19178] getdents64(0x4, 0x600000000003c508, 0x4000) = 48 <<--!!!!!
[pid 19178] getdents64(0x4, 0x600000000003c508, 0x4000) = 0
[pid 19178] close(4)                    = 0
[pid 19178] munmap(0x2000000000400000, 163840) = 0
[pid 19178] close(3)                    = 0
[pid 19178] exit_group(-1)              = ?

strace of udevd on x86_64 (SUCCEEDS):

[pid 19936] lstat("/sys/class/misc/vxportal", {st_mode=S_IFDIR|0755, st_size=0,
...}) = 0
[pid 19936] lstat("/sys/class/misc/vxportal", {st_mode=S_IFDIR|0755, st_size=0,
...}) = 0
[pid 19936] open("/sys/class/misc/vxportal", O_RDONLY|O_NONBLOCK|O_DIRECTORY) = 4
[pid 19936] fstat(4, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
[pid 19936] fcntl(4, F_SETFD, FD_CLOEXEC) = 0
[pid 19936] getdents64(4, /* 3 entries */, 4096) = 72  <<--!!!!!!
[pid 19936] lstat("/sys/class/misc/vxportal/dev", {st_mode=S_IFREG|0444,
st_size=4096, ...}) = 0
[pid 19936] stat("/sys/class/misc/vxportal/dev", {st_mode=S_IFREG|0444,
st_size=4096, ...}) = 0
[pid 19936] open("/sys/class/misc/vxportal/dev", O_RDONLY) = 5
[pid 19936] read(5, "10:32\n", 4096)    = 6
[pid 19936] close(5)                    = 0
[pid 19936] getdents64(4, /* 0 entries */, 4096) = 0
[pid 19936] close(4)                    = 0
[pid 19936] open("/sys/class/misc/vxportal/dev", O_RDONLY) = 4
[pid 19936] read(4, "10:32\n", 4096)    = 6
[pid 19936] close(4)                    = 0



Comment 4 Hal Prince 2005-04-20 21:15:50 UTC
We have recently noticed that this problem occurs
on x86 machines as well, though it seems easier to
reproduce on ia64.

Jason asks for a small complete module that 
demonstrates the problem.  Here is a small
incomplete module; it's just a trimmed-down
excerpt from our
vxportal driver described above.
It doesn't define vxportal_fops, but perhaps
that could be a NULL pointer anyway. 


STATIC struct miscdevice vxportal_dev = {
        /* [XXX] akale select a minor number */
        32,             /* minor */
        "vxportal",     /* name */
        &vxportal_fops, /* fops */
        {NULL, NULL},   /* next, prev */
        NULL,           /* dev */
};

STATIC int __init
init_vxportal(
        void)
{
        int     error;

        error = misc_register(&vxportal_dev);
        return error;
}

STATIC void __exit
exit_vxportal(
        void)
{
        misc_deregister(&vxportal_dev);        
}
module_init(init_vxportal);
module_exit(exit_vxportal);



Comment 5 Jason Baron 2005-04-20 21:23:02 UTC
ok. thanks. i'll see if i can reproduce this race. 

Comment 12 Jason Baron 2005-12-05 20:48:58 UTC
Created attachment 121864 [details]
should fix this problem

Comment 13 Jason Baron 2005-12-05 20:51:03 UTC
The above is a quick and dirty fix for this issue. The problem is the kernel
calls the hotplug event before the file is created. Thus, if the userspace
happens to run before the file is added the problem described here will happen.
The patch in comment #12 should fix this, although its not pretty.

Comment 14 Jason Baron 2005-12-07 04:39:19 UTC
Can we get the patch from comment #12 tested.

Comment 20 Bob Johnson 2006-04-11 16:48:26 UTC
This issue is on Red Hat Engineering's list of planned work items 
for the upcoming Red Hat Enterprise Linux 4.4 release.  Engineering 
resources have been assigned and barring unforeseen circumstances, Red 
Hat intends to include this item in the 4.4 release.

Comment 22 Jason Baron 2006-05-03 17:22:11 UTC
committed in stream U4 build 34.28. A test kernel with this patch is available
from http://people.redhat.com/~jbaron/rhel4/


Comment 25 Red Hat Bugzilla 2006-08-10 21:02:36 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2006-0575.html