Bug 481374 - udevstart causes kernel oops when udev rpm is installed
udevstart causes kernel oops when udev rpm is installed
Status: CLOSED WONTFIX
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel (Show other bugs)
4.5
i386 Linux
high Severity high
: ---
: ---
Assigned To: Cong Wang
Martin Jenner
:
Depends On:
Blocks: 583726
  Show dependency treegraph
 
Reported: 2009-01-23 15:02 EST by Andrew Elmore
Modified: 2013-09-29 22:08 EDT (History)
8 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2010-05-07 18:19:08 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
updated version (727 bytes, patch)
2009-10-10 01:31 EDT, Cong Wang
no flags Details | Diff

  None (edit)
Description Andrew Elmore 2009-01-23 15:02:33 EST
This is with kernel 2.6.9-55.0.2.EL.P1smp.

The looks a lot like CVE-2007-3104; however, this version of the kernel seems to have a patch for this issue included.

The relevant kernel log:
Nov 12 17:33:21 myhost kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000000
Nov 12 17:33:21 myhost kernel:  printing eip:
Nov 12 17:33:21 myhost kernel: c018fd3a
Nov 12 17:33:21 myhost kernel: *pde = 2207c001
Nov 12 17:33:21 myhost kernel: Oops: 0000 [#1]
Nov 12 17:33:21 myhost kernel: SMP
Nov 12 17:33:21 myhost kernel: Modules linked in: pcspkr vmmemctl(U) md5 ipv6 ipt_NOTRACK iptable_raw ipt_REJECT ipt_state iptable_filter iptable_nat ip_conntrack ip_tables microcode vmhgfs(U) dm_mod button battery ac pcnet32 vmxnet(U) mii bonding(U) floppy raid1 megaraid_mbox megaraid_mm megaraid_sas ext3 jbd ata_piix libata mptscsih mptsas mptspi mptscsi mptbase sd_mod scsi_mod
Nov 12 17:33:21 myhost kernel: CPU:    0
Nov 12 17:33:20 myhost kernel: EIP:    0060:[<c018fd3a>]    Not tainted VLI
Nov 12 17:33:21 myhost kernel: EFLAGS: 00210246   (2.6.9-55.0.2.EL.P1smp)
Nov 12 17:33:21 myhost kernel: EIP is at sysfs_readdir+0x123/0x187
Nov 12 17:33:21 myhost kernel: eax: 00000000   ebx: f7ba2240   ecx: ffffffff   edx: 00000000
Nov 12 17:33:21 myhost kernel: esi: f7ba2244   edi: 00000000   ebp: cb9c0204   esp: c76dbf60
Nov 12 17:33:22 myhost kernel: ds: 007b   es: 007b   ss: 0068
Nov 12 17:33:22 myhost kernel: Process udevstart (pid: 2656, threadinfo=c76db000 task=f43fd3f0)
Nov 12 17:33:22 myhost kernel: Stack: bfeff5b8 f7f08200 c016b6d5 c76dbfa0 da30b4c0 c03325c0 da30b4c0 c5375208
Nov 12 17:33:22 myhost kernel:        c016b6d5 c016b351 c76dbfa0 ffffffda 08a3999c da30b4c0 00000000 c016b98b
Nov 12 17:33:22 myhost kernel:        08a39a04 08a399ec 00000f98 ffffffea 00000005 08a3999c 003e3ff4 c76db000
Nov 12 17:33:22 myhost kernel: Call Trace:
Nov 12 17:33:22 myhost kernel:  [<c016b6d5>] filldir64+0x0/0x11a
Nov 12 17:33:22 myhost kernel:  [<c016b6d5>] filldir64+0x0/0x11a
Nov 12 17:33:22 myhost kernel:  [<c016b351>] vfs_readdir+0x7d/0xa5
Nov 12 17:33:22 myhost kernel:  [<c016b98b>] sys_getdents64+0x80/0xba
Nov 12 17:33:22 myhost kernel:  [<c02d6093>] syscall_call+0x7/0xb

When disassembling the corresponding vmlinux from the kernel-debuginfo rpm, I see the following disassembly:
0xc018fd2a <sysfs_readdir+275>: mov    %ebx,%eax
0xc018fd2c <sysfs_readdir+277>: call   0xc018e938 <sysfs_get_name>
0xc018fd31 <sysfs_readdir+282>: mov    %eax,%edx
0xc018fd33 <sysfs_readdir+284>: or     $0xffffffff,%ecx
0xc018fd36 <sysfs_readdir+287>: xor    %eax,%eax
0xc018fd38 <sysfs_readdir+289>: mov    %edx,%edi
0xc018fd3a <sysfs_readdir+291>: repnz scas %es:(%edi),%al
0xc018fd3c <sysfs_readdir+293>: not    %ecx
0xc018fd3e <sysfs_readdir+295>: dec    %ecx
0xc018fd3f <sysfs_readdir+296>: movzwl 0x1c(%ebx),%eax
  * (gdb) p/d 0x123
    $2 = 291
  * the offending instruction is the repnz scas, which would seem to be the strlen in the following:
439                 name = sysfs_get_name(next);
440                 len = strlen(name);
441                 ino = next->s_ino;
  * and edi/edx are both zero.  So sysfs_get_name() is returning a null value.  Unclear why.

This problem has occurred several times, on both virtual hardware (VMware) and real hardware.
Comment 6 Cong Wang 2009-10-10 01:26:06 EDT
Created attachment 364326 [details]
proposed patch

It seems that in sysfs_readdir(), it operates the list without holding dentry->d_inode->i_sem, so a simple guess would be just adding down(i_sem)/up(i_sem).

NOTE, this patch is _totally_ untested, even without a compiling test. I am sorry for this, because I can't reserve a RHEL4 machine to test (RHTS takes too long time to reserve a machine...).

Can try it?
Comment 7 Cong Wang 2009-10-10 01:31:49 EDT
Created attachment 364327 [details]
updated version

This one is better. Use this.
Comment 16 James M. Leddy 2009-10-14 10:03:43 EDT
Hi Andrew,

Has this happened since you submitted the bug? How many times? There unfortunately very little information to go on regarding how udev got down that path in the first place.
Comment 18 Cong Wang 2009-10-20 22:46:05 EDT
Hello, Andrew,

Could you answer James' questions above? Besides, plus mine:

1. how to reproduce it?
2. Did it only occur on one machine? Or every machine?
3. what is the full boot log?

Thanks.
Comment 20 Andrew Elmore 2009-10-21 19:57:57 EDT
Hi James, Amerigo,

I've seen this bug on several machines within our operations network.  In every case, RPMs were being upgraded, and we tracked it down to the udev rpm running udevstart (during which it iterates through /sys).

I saw it several times, but was not able to reproduce it on command.

thanks,
Andrew
Comment 21 Cong Wang 2009-10-21 21:29:15 EDT
Thank, Andrew.

Mind to try the attached patch? If you just need a patched kernel RPM, just say, I will do.
Comment 27 Guy Streeter 2009-11-25 14:30:07 EST
I have a report from another customer (on RHEL4) who produced this same panic signature by running "find" on /sys
Comment 28 Cong Wang 2009-11-27 04:47:05 EST
(In reply to comment #27)
> I have a report from another customer (on RHEL4) who produced this same panic
> signature by running "find" on /sys  

How often can you catch it?
I remember I also tried to run 'find', but no luck to reproduce.

Thanks.
Comment 31 Issue Tracker 2009-12-09 03:12:14 EST
Event posted on 12-09-2009 05:12pm JST by mfuruta@redhat.com

Hi Takahashi-san,

Thank you for your input from your customer!
I got your customer's situation that they could not provide vmcore to us
and just want to track BZ#481374 only.

In this case, that BZ had already been linked to this IT ticket, you can
track the BZ on this ticket.

Thanks in advance.

Regards,
Masaki Furuta

Internal Status set to 'Waiting on Customer'
Status set to: Waiting on Client

This event sent from IssueTracker by mfuruta@redhat.com 
 issue 362171
Comment 34 Cong Wang 2010-03-04 21:57:18 EST
If anyone could provide the steps of reproducing this or test the proposed patch, it would be helpful.
Comment 35 James M. Leddy 2010-03-05 10:42:25 EST
I know for my part the customer sees this randomly. I can get them to test the patch though.
Comment 38 RHEL Product and Program Management 2010-04-20 02:15:13 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
Comment 44 RHEL Product and Program Management 2010-05-07 18:19:08 EDT
Development Management has reviewed and declined this request.  You may appeal
this decision by reopening this request.

Note You need to log in before you can comment on or make changes to this bug.