From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.5) Gecko/20041107 Firefox/1.0 Description of problem: Being an administrator, I often "grep -ri word /etc" to find things. That command now freezes my system with a kernel panic: kernel panic - not syncing: arch/i386/kernel/semaphore.c: 64 spin_is_locked on uninitialized spinlock f9496c28 system is frozen with caps & scroll lock flashing After looking at other bugzilla reports and toying with the grep command to narrow the problem, I discovered the panic is triggered when grep gets to /etc/udev/devices and the nvidia* modules within. PLEASE, before marking this as a duplicate of the twenty million nvidia bugs, let's think about the problem for a minute. I have a feeling that this problem could occur with any non-plain file that may be put in that directory. Normally you wouldn't want to read from device files during a grep. It seems to me the whole idea of putting device files in /etc/udev/devices is flawed in this regard. They should live in another part of the filesystem, maybe even their own root dir, like /udev. I can't be the only one who greps entire root subdirs (/var, /etc, etc). It may not be RH's responsibility to make sure nvidia drivers don't crash the system, but it should be RH's responsibility to ensure a safe and logical operating environment for these third party drivers. Strangely enough, grep's --devices=skip didn't solve the problem -- it still crashed even with that option. That also seems a little fishy to me. Version-Release number of selected component (if applicable): udev-039-10.FC3.6 How reproducible: Always Steps to Reproduce: 1. have 3rd party device files in /etc/udev/devices (tested with nvidia) 2. grep -ri word /etc 3. Actual Results: kernel panic / complete freeze Expected Results: grep completes as normal Additional info: nvidia driver version: NVIDIA-Linux-x86-1.0-6629-pkg1.run
I have another question. Should it be safe to grep a device node regardless of whether it is in use or not? If it should always be safe then this is a bug in the nvidia drivers...
hmm, should not be safe... sometimes reading a register takes some actions on various chipsets and if you trigger the wrong ones you are doomed. But good point with your bugzilla. I will move it to another place.
I don't like "/udev" :-) Some propositions : - /etc/udev/static_dev.d/start_udev <= the current list of devices in /sbin/start_udev. - /etc/udev/static_dev.d/nvidia - /etc/udev/static_dev.d/mga_vid Or - /etc/udev/start_udev.d/default <= the current list of devices in /sbin/start_udev The content : DEV = tty1 ppp ... For exemple, the nvidia paquage will use : /etc/makedev.d/nvidia /etc/udev/start_udev.d/nvidia
you caught a nice bug in the nvidia driver; please report that to nvidia, at least as far as the kernel crash is concerned.
Could this be the same thing as https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=141294 ??
*** Bug 141294 has been marked as a duplicate of this bug. ***
Yes, it sounds like the same problem. His output looks like what I saw except I get a panic and he only got a segfault. Perhaps it is card and/or driver state dependent. The root of the problem still exists however -- having device nodes in /etc and/or device nodes that are not readable without panics. I will try to notify nvidia of the bug.
A workaround for grep users (such as the original poster): grep -D skip aka grep --devices=skip will cause grep to skip device nodes, FIFOs, or sockets.
My apologies (and sorry for again adding noise to the email of those watching this bug). When I posted my previous comment on this bug, I didn't see the original poster's comment: > Strangely enough, grep's --devices=skip didn't solve the problem -- it > still crashed even with that option. That also seems a little fishy > to me
I've started a thread on the nvnews NVIDIA support forums. We'll see if anyone (at NV hopefully) has anything to say. http://www.nvnews.net/vbulletin/showthread.php?p=533894#post533894 PS: I'm 99% positive I tested the grep --devices=skip option *twice*, once with the option manually on the cmd line and a second time with the option in a site-wide csh alias, and both times it panicked. I'm too scared of data corruption and rebooting my system (a production workstation/server!!) to try again unless absolutely necessary. It would be nice if one other person could grep their nv* device files to confirm that it's not specific to my particular card, driver version, kernel version, etc.
Does the same thing for me and several people I know. My machine panics daily because updatedb hits this device file. updatedb skips all of /dev, but some undeniably bright person decided to make the default location for all of udev /etc/udev. Did someone make a typo in the scripts for the package? This can't be intentional.
/etc/udev/devices was meant as a "last resort" and should not be used as a standard place. rawhide has a better mechanism, which I will document soon.