From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; SunOS sun4u; en-US; rv:1.7.5) Gecko/20041109 Firefox/1.0 Description of problem: Our FS product contains a module vxportal which needs a device file /dev/vxportal. We have the module init function call misc_register, since we just need a character pseudo-device. The deinit code calls misc_deregister. We notice that on x86 and x86_64 machines, the code works as expected, namely /dev/vxportal disappears when we rmmod vxportal, and reappears when we modprobe vxportal. But on ia64, there appears to be a race which causes /dev/vxportal not to be created when we modprobe. By running strace on udevd, we see udevd wake up from its select, open /sys/class/misc/vxportal, and do getdents on this directory. However, getdents returns just . and .., and not the "dev" file that contains the major/minor numbers. Since the directory is empty, udevd does not call mknod to create /dev/vxportal. If we then ls this directory, we see the "dev" entry. And if we then run start_udev manually, the missing /dev/vxportal will get created. Version-Release number of selected component (if applicable): kernel-2.6.9-5.EL How reproducible: Always Steps to Reproduce: 1.make a module that uses misc_register/misc_deregister in its init/deinit routines to create a device file minor 32 name "vxportal" 2.modprobe this module 3.ls /dev/vxportal Actual Results: ia64: /dev/vxportal does not exist x86, x86_64: /dev/vxportal exists Expected Results: x86, x86_64, ia64: /dev/vxportal exists Additional info: Our misc_register call: extern struct file_operations vxportal_fops; STATIC struct miscdevice vxportal_dev = { /* [XXX] akale select a minor number */ 32, /* minor */ "vxportal", /* name */ &vxportal_fops, /* fops */ {NULL, NULL}, /* next, prev */ NULL, /* dev */ }; STATIC int __init init_vxportal( void) { int error; error = misc_register(&vxportal_dev); strace of udevd on ia64 (FAILS): [pid 19178] lstat("/sys/class/misc/vxportal", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0 [pid 19178] lstat("/sys/class/misc/vxportal", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0 [pid 19178] open("/sys/class/misc/vxportal", O_RDONLY|O_NONBLOCK|O_DIRECTORY) = 4 [pid 19178] fstat(4, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0 [pid 19178] fcntl(4, F_SETFD, FD_CLOEXEC) = 0 [pid 19178] getdents64(0x4, 0x600000000003c508, 0x4000) = 48 <<--!!!!! [pid 19178] getdents64(0x4, 0x600000000003c508, 0x4000) = 0 [pid 19178] close(4) = 0 [pid 19178] open("/sys/class/misc/vxportal", O_RDONLY|O_NONBLOCK|O_DIRECTORY) = 4 [pid 19178] fstat(4, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0 [pid 19178] fcntl(4, F_SETFD, FD_CLOEXEC) = 0 [pid 19178] getdents64(0x4, 0x600000000003c508, 0x4000) = 48 <<--!!!!! [pid 19178] getdents64(0x4, 0x600000000003c508, 0x4000) = 0 [pid 19178] close(4) = 0 [pid 19178] munmap(0x2000000000400000, 163840) = 0 [pid 19178] close(3) = 0 [pid 19178] exit_group(-1) = ? strace of udevd on x86_64 (SUCCEEDS): [pid 19936] lstat("/sys/class/misc/vxportal", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0 [pid 19936] lstat("/sys/class/misc/vxportal", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0 [pid 19936] open("/sys/class/misc/vxportal", O_RDONLY|O_NONBLOCK|O_DIRECTORY) = 4 [pid 19936] fstat(4, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0 [pid 19936] fcntl(4, F_SETFD, FD_CLOEXEC) = 0 [pid 19936] getdents64(4, /* 3 entries */, 4096) = 72 <<--!!!!!! [pid 19936] lstat("/sys/class/misc/vxportal/dev", {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 [pid 19936] stat("/sys/class/misc/vxportal/dev", {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 [pid 19936] open("/sys/class/misc/vxportal/dev", O_RDONLY) = 5 [pid 19936] read(5, "10:32\n", 4096) = 6 [pid 19936] close(5) = 0 [pid 19936] getdents64(4, /* 0 entries */, 4096) = 0 [pid 19936] close(4) = 0 [pid 19936] open("/sys/class/misc/vxportal/dev", O_RDONLY) = 4 [pid 19936] read(4, "10:32\n", 4096) = 6 [pid 19936] close(4) = 0
hmmm, pretty strange. If you had a small complete module that reproduced this issue, it would save me some time. thanks.
Reposted to make visible in IT ---------------- From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; SunOS sun4u; en-US; rv:1.7.5) Gecko/20041109 Firefox/1.0 Description of problem: Our FS product contains a module vxportal which needs a device file /dev/vxportal. We have the module init function call misc_register, since we just need a character pseudo-device. The deinit code calls misc_deregister. We notice that on x86 and x86_64 machines, the code works as expected, namely /dev/vxportal disappears when we rmmod vxportal, and reappears when we modprobe vxportal. But on ia64, there appears to be a race which causes /dev/vxportal not to be created when we modprobe. By running strace on udevd, we see udevd wake up from its select, open /sys/class/misc/vxportal, and do getdents on this directory. However, getdents returns just . and .., and not the "dev" file that contains the major/minor numbers. Since the directory is empty, udevd does not call mknod to create /dev/vxportal. If we then ls this directory, we see the "dev" entry. And if we then run start_udev manually, the missing /dev/vxportal will get created. Version-Release number of selected component (if applicable): kernel-2.6.9-5.EL How reproducible: Always Steps to Reproduce: 1.make a module that uses misc_register/misc_deregister in its init/deinit routines to create a device file minor 32 name "vxportal" 2.modprobe this module 3.ls /dev/vxportal Actual Results: ia64: /dev/vxportal does not exist x86, x86_64: /dev/vxportal exists Expected Results: x86, x86_64, ia64: /dev/vxportal exists Additional info: Our misc_register call: extern struct file_operations vxportal_fops; STATIC struct miscdevice vxportal_dev = { /* [XXX] akale select a minor number */ 32, /* minor */ "vxportal", /* name */ &vxportal_fops, /* fops */ {NULL, NULL}, /* next, prev */ NULL, /* dev */ }; STATIC int __init init_vxportal( void) { int error; error = misc_register(&vxportal_dev); strace of udevd on ia64 (FAILS): [pid 19178] lstat("/sys/class/misc/vxportal", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0 [pid 19178] lstat("/sys/class/misc/vxportal", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0 [pid 19178] open("/sys/class/misc/vxportal", O_RDONLY|O_NONBLOCK|O_DIRECTORY) = 4 [pid 19178] fstat(4, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0 [pid 19178] fcntl(4, F_SETFD, FD_CLOEXEC) = 0 [pid 19178] getdents64(0x4, 0x600000000003c508, 0x4000) = 48 <<--!!!!! [pid 19178] getdents64(0x4, 0x600000000003c508, 0x4000) = 0 [pid 19178] close(4) = 0 [pid 19178] open("/sys/class/misc/vxportal", O_RDONLY|O_NONBLOCK|O_DIRECTORY) = 4 [pid 19178] fstat(4, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0 [pid 19178] fcntl(4, F_SETFD, FD_CLOEXEC) = 0 [pid 19178] getdents64(0x4, 0x600000000003c508, 0x4000) = 48 <<--!!!!! [pid 19178] getdents64(0x4, 0x600000000003c508, 0x4000) = 0 [pid 19178] close(4) = 0 [pid 19178] munmap(0x2000000000400000, 163840) = 0 [pid 19178] close(3) = 0 [pid 19178] exit_group(-1) = ? strace of udevd on x86_64 (SUCCEEDS): [pid 19936] lstat("/sys/class/misc/vxportal", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0 [pid 19936] lstat("/sys/class/misc/vxportal", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0 [pid 19936] open("/sys/class/misc/vxportal", O_RDONLY|O_NONBLOCK|O_DIRECTORY) = 4 [pid 19936] fstat(4, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0 [pid 19936] fcntl(4, F_SETFD, FD_CLOEXEC) = 0 [pid 19936] getdents64(4, /* 3 entries */, 4096) = 72 <<--!!!!!! [pid 19936] lstat("/sys/class/misc/vxportal/dev", {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 [pid 19936] stat("/sys/class/misc/vxportal/dev", {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 [pid 19936] open("/sys/class/misc/vxportal/dev", O_RDONLY) = 5 [pid 19936] read(5, "10:32\n", 4096) = 6 [pid 19936] close(5) = 0 [pid 19936] getdents64(4, /* 0 entries */, 4096) = 0 [pid 19936] close(4) = 0 [pid 19936] open("/sys/class/misc/vxportal/dev", O_RDONLY) = 4 [pid 19936] read(4, "10:32\n", 4096) = 6 [pid 19936] close(4) = 0
We have recently noticed that this problem occurs on x86 machines as well, though it seems easier to reproduce on ia64. Jason asks for a small complete module that demonstrates the problem. Here is a small incomplete module; it's just a trimmed-down excerpt from our vxportal driver described above. It doesn't define vxportal_fops, but perhaps that could be a NULL pointer anyway. STATIC struct miscdevice vxportal_dev = { /* [XXX] akale select a minor number */ 32, /* minor */ "vxportal", /* name */ &vxportal_fops, /* fops */ {NULL, NULL}, /* next, prev */ NULL, /* dev */ }; STATIC int __init init_vxportal( void) { int error; error = misc_register(&vxportal_dev); return error; } STATIC void __exit exit_vxportal( void) { misc_deregister(&vxportal_dev); } module_init(init_vxportal); module_exit(exit_vxportal);
ok. thanks. i'll see if i can reproduce this race.
Created attachment 121864 [details] should fix this problem
The above is a quick and dirty fix for this issue. The problem is the kernel calls the hotplug event before the file is created. Thus, if the userspace happens to run before the file is added the problem described here will happen. The patch in comment #12 should fix this, although its not pretty.
Can we get the patch from comment #12 tested.
This issue is on Red Hat Engineering's list of planned work items for the upcoming Red Hat Enterprise Linux 4.4 release. Engineering resources have been assigned and barring unforeseen circumstances, Red Hat intends to include this item in the 4.4 release.
committed in stream U4 build 34.28. A test kernel with this patch is available from http://people.redhat.com/~jbaron/rhel4/
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2006-0575.html