Description of problem: Crashes when running hal near end of reboot sequence. Full dmesg output attached. problem seems to be related to parports: lp0: using parport0 (interrupt-driven). lp0: console ready ppdev: user-space parallel port driver ppdev0: registered pardevice ppdev0: unregistered pardevice ppdev1: claim the port first ppdev2: claim the port first ppdev3: claim the port first Unable to handle kernel NULL pointer dereference at 0000000000000040 RIP: <ffffffff801d1d00>{sysfs_readdir+387} PGD 36f2a067 PUD 36f2b067 PMD 0 Oops: 0000 [1] SMP CPU 0 Version-Release number of selected component (if applicable): kernel-2.6.14-1.1709_FC5 hal-0.5.5.1-1 udev-075-5 How reproducible: Nearly 100%, multiple kernel versions, eventually able to get kernel- 2.6.14-1.1632_FC5 Steps to Reproduce: 1.reboot 2. 3. Actual results: (see full dmesg attached) Expected results: Additional info:
Created attachment 121491 [details] dmesg ouput
can you try and reproduce this with the latest update kernel ? 1729 should show up at http://people.redhat.com/davej/kernels/Fedora/devel soon.
I can boot this kernel on a non-raid i686 box in combination with mkinitrd-5.0.12-1. dmesg ends with: SELinux: initialized (dev autofs, type autofs), uses genfs_contexts lp0: using parport0 (interrupt-driven). lp0: console ready eth0: no IPv6 routers present which looks OK I can boot this kernel on a software-raid i686 box, but only in combination with mkinitrd-5.0.10-1. dmesg similarly looks OK. I can't boot this kernel on the software-raid x86_64 box with either version of mkinitrd. It fails to switchroot and the kernel panics. I think this maybe different from bug#169059. Picture of panic to follow. Basically, I can't retest this problem on the original x86_64 box.
Created attachment 121693 [details] pic of panic on x86_64 with kernel-2.6.14-1.1729_FC5 and mkinitrd-5.0.12-1
I'm now past the panic and onto an OOPS thats blocking me. I think the panic was bug#169059 and the workaround is mkinitrd-5.0.10-1 I don't know why this wasn't working for me earlier today. I could have sworn I tried this combo. I did apply the rest of today's updates befoer retesting. Now I'm gettin an OOPS at the end of init. I can use the main console, but there are no prompts in any of the virtual terminals, and "startx" results in a total system hang. The end of dmesg contains: md: Autodetecting RAID arrays. md: autorun ... md: ... autorun DONE. ieee1394: Host added: ID:BUS[0-00:1023] GUID[0010dc000077077b] cdrom: open failed. EXT3 FS on dm-0, internal journal kjournald starting. Commit interval 5 seconds EXT3 FS on sda1, internal journal EXT3-fs: mounted filesystem with ordered data mode. Adding 1124508k swap on /dev/sdb1. Priority:-1 extents:1 across:1124508k Adding 923728k swap on /dev/sda2. Priority:-2 extents:1 across:923728k Unable to handle kernel NULL pointer dereference at 0000000000000040 RIP: <ffffffff801d1890>{sysfs_readdir+387} PGD 7a151067 PUD 7a152067 PMD 0 Oops: 0000 [1] SMP CPU 0 Modules linked in: autofs4 video button battery ac cx88_blackbird parport_pc parport tuner floppy cx8800 cx88_dvb cx8802 cx88xx sata_sil i2c_algo_bit ir_common v4l1_compat tveeprom v4l2_common mt352 or51132 btcx_risc video_buf_dvb videodev dvb_core video_buf ohci1394 ieee1394 nxt200x lgdt330x cx22702 dvb_pll snd_intel8x0 snd_ac97_codec snd_ac97_bus snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm i2c_nforce2 snd_timer snd soundcore i2c_core forcedeth snd_page_alloc ohci_hcd ehci_hcd shpchp dm_snapshot dm_zero dm_mirror dm_mod ext3 jbd raid0 sata_nv libata sd_mod scsi_mod Pid: 2339, comm: hald Not tainted 2.6.14-1.1729_FC5 #1 RIP: 0010:[<ffffffff801d1890>] <ffffffff801d1890>{sysfs_readdir+387} RSP: 0018:ffff81007a19feb8 EFLAGS: 00010282 RAX: 0000000000000000 RBX: ffff81007e9af198 RCX: 0000000000000014 RDX: ffff81007e9afc20 RSI: ffff81007efd3568 RDI: ffff81007e9b0260 RBP: ffff81007a2a3620 R08: 0000000000001e78 R09: 0000000000000004 R10: 0000000000000000 R11: 0000000000000246 R12: ffff81007e9af1a0 R13: ffff81007e9b024c R14: ffff81007e4f8660 R15: 0000000000000013 FS: 00002aaaaaeb2150(0000) GS:ffffffff805fc000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000040 CR3: 000000007a150000 CR4: 00000000000006e0 Process hald (pid: 2339, threadinfo ffff81007a19e000, task ffff81007dbbb040) Stack: ffff81007fae0b10 ffffffff801a2fa3 ffff81007a19ff48 ffff81007e4f8660 00000000fffffffe ffff81007fae9990 ffffffff801a2fa3 ffff81007a19ff48 ffff81007fae9a68 ffffffff801a2e57 Call Trace:<ffffffff801a2fa3>{filldir64+0} <ffffffff801a2fa3>{filldir64+0} <ffffffff801a2e57>{vfs_readdir+159} <ffffffff801a30d1>{sys_getdents64+116} <ffffffff8010faea>{system_call+126} Code: 48 8b 40 40 eb 11 48 8b 3d 5b 17 3e 00 be 02 00 00 00 e8 0a RIP <ffffffff801d1890>{sysfs_readdir+387} RSP <ffff81007a19feb8> CR2: 0000000000000040 <6>w83627hf 9191-0290: Reading VID from GPIO5
Reading back I realize that this is the same OOPS as originally reported, so I guess the answer to your request is that kernel-2.6.14-1.1729_FC5 still exhibits the problem. kernel-2.6.14-1.1729_FC5 hal-0.5.5.1-1 udev-075-4 mkinitrd-5.0.10-1
ahh, there's at least one other bug which is oopsing in the same place in sysfs code. I've added a debug patch to 1735 which will tell us the name of the sysfs file that was being accessed, which may give some clues.
Created attachment 121736 [details] dmesg output of same oops on kernel-2.6.14-1.1735_FC5
oops, I only added the debug printk on i386. I'm building 1737 right now, which has the same functionality for x86-64. You can grab it early from http://people.redhat.com/davej/kernels/Fedora/devel in an hour or so.
Created attachment 121795 [details] dmesg output of same oops on kernel-2.6.14-1.1737_FC5 I see: last sysfs file: /class/vc/vcsa1/dev which seem consistent with the symptom of the virtual terminals not working.
Dave, a slightly off-topic issue. Are your x86_64 kernels all SMP ? See bug#174894, but it might be just because I'm using your build?
yes, all the x86-64 kernels are smp, as the number of UP x86-64's are a minority (especially with the advent of dual core & Intel EM64T's all have HT). The penalty of locked operations is also a lot less than it is on 32 bit CPUs. (Intel still hurts a bit, but AMD is negligable).
Back on topic. I noticed after one of my many reboots that 2.6.14-1.1737_FC5 had rebooted without the oops. Unfortunately I didn't capture its dmesg, and I haven't been able to repeat since. Anyway, apparently the oops only happens about 95% of the time
Created attachment 121810 [details] dmesg from 2.6.14-1.1674_FC5 without oops Example dmesg outputs for same kernel version with and without oops.
Created attachment 121811 [details] dmesg from 2.6.14-1.1674_FC5 with oops
Three reboots in a row this morning without the oops. Virtual consoles are back. Was this a udev problem? kernel-2.6.14-1.1740_FC5 mkinitrd-5.0.10-1 udev-076-1
udev changes more likely. the changes in the last few builds have been fairly benign. Does ls -R /sys cause the oops to reoccur ?
No problems with "ls -R /sys" I was getting the oops yesterday with 1740, so it definitely wasn't a kernel change that fixed it. I think you probably can blame this one on udev, and close it.
ok, please reopen if it reoccurs, as this one still smells a bit odd to me.