This is with kernel 2.6.9-55.0.2.EL.P1smp. The looks a lot like CVE-2007-3104; however, this version of the kernel seems to have a patch for this issue included. The relevant kernel log: Nov 12 17:33:21 myhost kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000000 Nov 12 17:33:21 myhost kernel: printing eip: Nov 12 17:33:21 myhost kernel: c018fd3a Nov 12 17:33:21 myhost kernel: *pde = 2207c001 Nov 12 17:33:21 myhost kernel: Oops: 0000 [#1] Nov 12 17:33:21 myhost kernel: SMP Nov 12 17:33:21 myhost kernel: Modules linked in: pcspkr vmmemctl(U) md5 ipv6 ipt_NOTRACK iptable_raw ipt_REJECT ipt_state iptable_filter iptable_nat ip_conntrack ip_tables microcode vmhgfs(U) dm_mod button battery ac pcnet32 vmxnet(U) mii bonding(U) floppy raid1 megaraid_mbox megaraid_mm megaraid_sas ext3 jbd ata_piix libata mptscsih mptsas mptspi mptscsi mptbase sd_mod scsi_mod Nov 12 17:33:21 myhost kernel: CPU: 0 Nov 12 17:33:20 myhost kernel: EIP: 0060:[<c018fd3a>] Not tainted VLI Nov 12 17:33:21 myhost kernel: EFLAGS: 00210246 (2.6.9-55.0.2.EL.P1smp) Nov 12 17:33:21 myhost kernel: EIP is at sysfs_readdir+0x123/0x187 Nov 12 17:33:21 myhost kernel: eax: 00000000 ebx: f7ba2240 ecx: ffffffff edx: 00000000 Nov 12 17:33:21 myhost kernel: esi: f7ba2244 edi: 00000000 ebp: cb9c0204 esp: c76dbf60 Nov 12 17:33:22 myhost kernel: ds: 007b es: 007b ss: 0068 Nov 12 17:33:22 myhost kernel: Process udevstart (pid: 2656, threadinfo=c76db000 task=f43fd3f0) Nov 12 17:33:22 myhost kernel: Stack: bfeff5b8 f7f08200 c016b6d5 c76dbfa0 da30b4c0 c03325c0 da30b4c0 c5375208 Nov 12 17:33:22 myhost kernel: c016b6d5 c016b351 c76dbfa0 ffffffda 08a3999c da30b4c0 00000000 c016b98b Nov 12 17:33:22 myhost kernel: 08a39a04 08a399ec 00000f98 ffffffea 00000005 08a3999c 003e3ff4 c76db000 Nov 12 17:33:22 myhost kernel: Call Trace: Nov 12 17:33:22 myhost kernel: [<c016b6d5>] filldir64+0x0/0x11a Nov 12 17:33:22 myhost kernel: [<c016b6d5>] filldir64+0x0/0x11a Nov 12 17:33:22 myhost kernel: [<c016b351>] vfs_readdir+0x7d/0xa5 Nov 12 17:33:22 myhost kernel: [<c016b98b>] sys_getdents64+0x80/0xba Nov 12 17:33:22 myhost kernel: [<c02d6093>] syscall_call+0x7/0xb When disassembling the corresponding vmlinux from the kernel-debuginfo rpm, I see the following disassembly: 0xc018fd2a <sysfs_readdir+275>: mov %ebx,%eax 0xc018fd2c <sysfs_readdir+277>: call 0xc018e938 <sysfs_get_name> 0xc018fd31 <sysfs_readdir+282>: mov %eax,%edx 0xc018fd33 <sysfs_readdir+284>: or $0xffffffff,%ecx 0xc018fd36 <sysfs_readdir+287>: xor %eax,%eax 0xc018fd38 <sysfs_readdir+289>: mov %edx,%edi 0xc018fd3a <sysfs_readdir+291>: repnz scas %es:(%edi),%al 0xc018fd3c <sysfs_readdir+293>: not %ecx 0xc018fd3e <sysfs_readdir+295>: dec %ecx 0xc018fd3f <sysfs_readdir+296>: movzwl 0x1c(%ebx),%eax * (gdb) p/d 0x123 $2 = 291 * the offending instruction is the repnz scas, which would seem to be the strlen in the following: 439 name = sysfs_get_name(next); 440 len = strlen(name); 441 ino = next->s_ino; * and edi/edx are both zero. So sysfs_get_name() is returning a null value. Unclear why. This problem has occurred several times, on both virtual hardware (VMware) and real hardware.
Created attachment 364326 [details] proposed patch It seems that in sysfs_readdir(), it operates the list without holding dentry->d_inode->i_sem, so a simple guess would be just adding down(i_sem)/up(i_sem). NOTE, this patch is _totally_ untested, even without a compiling test. I am sorry for this, because I can't reserve a RHEL4 machine to test (RHTS takes too long time to reserve a machine...). Can try it?
Created attachment 364327 [details] updated version This one is better. Use this.
Hi Andrew, Has this happened since you submitted the bug? How many times? There unfortunately very little information to go on regarding how udev got down that path in the first place.
Hello, Andrew, Could you answer James' questions above? Besides, plus mine: 1. how to reproduce it? 2. Did it only occur on one machine? Or every machine? 3. what is the full boot log? Thanks.
Hi James, Amerigo, I've seen this bug on several machines within our operations network. In every case, RPMs were being upgraded, and we tracked it down to the udev rpm running udevstart (during which it iterates through /sys). I saw it several times, but was not able to reproduce it on command. thanks, Andrew
Thank, Andrew. Mind to try the attached patch? If you just need a patched kernel RPM, just say, I will do.
I have a report from another customer (on RHEL4) who produced this same panic signature by running "find" on /sys
(In reply to comment #27) > I have a report from another customer (on RHEL4) who produced this same panic > signature by running "find" on /sys How often can you catch it? I remember I also tried to run 'find', but no luck to reproduce. Thanks.
Event posted on 12-09-2009 05:12pm JST by mfuruta Hi Takahashi-san, Thank you for your input from your customer! I got your customer's situation that they could not provide vmcore to us and just want to track BZ#481374 only. In this case, that BZ had already been linked to this IT ticket, you can track the BZ on this ticket. Thanks in advance. Regards, Masaki Furuta Internal Status set to 'Waiting on Customer' Status set to: Waiting on Client This event sent from IssueTracker by mfuruta issue 362171
If anyone could provide the steps of reproducing this or test the proposed patch, it would be helpful.
I know for my part the customer sees this randomly. I can get them to test the patch though.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
Development Management has reviewed and declined this request. You may appeal this decision by reopening this request.