174188 – opps while running hal during startup

Bug 174188 - opps while running hal during startup

Summary: opps while running hal during startup

Keywords:
Status:	CLOSED RAWHIDE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	rawhide
Hardware:	x86_64
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	---
Assignee:	Dave Jones
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2005-11-25 15:32 UTC by John Ellson
Modified:	2015-01-04 22:23 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2005-12-08 04:13:32 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
dmesg ouput (22.64 KB, text/plain) 2005-11-25 15:32 UTC, John Ellson	no flags	Details
pic of panic on x86_64 with kernel-2.6.14-1.1729_FC5 and mkinitrd-5.0.12-1 (1.99 MB, image/jpeg) 2005-12-01 16:41 UTC, John Ellson	no flags	Details
dmesg output of same oops on kernel-2.6.14-1.1735_FC5 (22.73 KB, text/plain) 2005-12-02 10:48 UTC, John Ellson	no flags	Details
dmesg output of same oops on kernel-2.6.14-1.1737_FC5 (22.76 KB, text/plain) 2005-12-03 10:39 UTC, John Ellson	no flags	Details
dmesg from 2.6.14-1.1674_FC5 without oops (20.31 KB, text/plain) 2005-12-03 22:01 UTC, John Ellson	no flags	Details
dmesg from 2.6.14-1.1674_FC5 with oops (22.39 KB, text/plain) 2005-12-03 22:02 UTC, John Ellson	no flags	Details
View All

Description John Ellson 2005-11-25 15:32:47 UTC

Description of problem:
Crashes when running hal near end of reboot sequence.  Full dmesg output
attached.  problem seems to be related to parports:
  lp0: using parport0 (interrupt-driven).
  lp0: console ready
  ppdev: user-space parallel port driver
  ppdev0: registered pardevice
  ppdev0: unregistered pardevice
  ppdev1: claim the port first
  ppdev2: claim the port first
  ppdev3: claim the port first
  Unable to handle kernel NULL pointer dereference at 0000000000000040 RIP:
<ffffffff801d1d00>{sysfs_readdir+387}
  PGD 36f2a067 PUD 36f2b067 PMD 0
  Oops: 0000 [1] SMP
  CPU 0


Version-Release number of selected component (if applicable):
kernel-2.6.14-1.1709_FC5
hal-0.5.5.1-1
udev-075-5

How reproducible:
Nearly 100%, multiple kernel versions,  eventually able to get kernel-
2.6.14-1.1632_FC5
Steps to Reproduce:
1.reboot
2.
3.
  
Actual results:
(see full dmesg attached)

Expected results:


Additional info:

Comment 1 John Ellson 2005-11-25 15:32:47 UTC

Created attachment 121491 [details]
dmesg ouput

Comment 2 Dave Jones 2005-12-01 09:52:37 UTC

can you try and reproduce this with the latest update kernel ? 1729 should show
up at http://people.redhat.com/davej/kernels/Fedora/devel soon.

Comment 3 John Ellson 2005-12-01 16:36:02 UTC

I can boot this kernel on a non-raid i686 box in combination with
mkinitrd-5.0.12-1.  dmesg ends with:
   SELinux: initialized (dev autofs, type autofs), uses genfs_contexts
   lp0: using parport0 (interrupt-driven).
   lp0: console ready
   eth0: no IPv6 routers present
which looks OK

I can boot this kernel on a software-raid i686 box, but only in combination with
mkinitrd-5.0.10-1.   dmesg similarly looks OK.

I can't boot this kernel on the software-raid x86_64 box with either version of
mkinitrd.   It fails to switchroot and the kernel panics.  I think this maybe
different from bug#169059.   Picture of panic to follow.

Basically, I can't retest this problem on the original x86_64 box.

Comment 4 John Ellson 2005-12-01 16:41:05 UTC

Created attachment 121693 [details]
pic of panic on x86_64 with kernel-2.6.14-1.1729_FC5 and mkinitrd-5.0.12-1

Comment 5 John Ellson 2005-12-02 01:07:44 UTC

I'm now past the panic and onto an OOPS thats blocking me.

I think the panic was bug#169059 and the workaround is mkinitrd-5.0.10-1 I don't
know why this wasn't working for me earlier today.  I could have sworn I tried
this combo.  I did apply the rest of today's updates befoer retesting.

Now I'm gettin an OOPS at the end of init.  I can use the main console, but
there are no prompts in any of the virtual terminals, and "startx" results in a
total system hang.   The end of dmesg contains:

md: Autodetecting RAID arrays.
md: autorun ...
md: ... autorun DONE.
ieee1394: Host added: ID:BUS[0-00:1023]  GUID[0010dc000077077b]
cdrom: open failed.
EXT3 FS on dm-0, internal journal
kjournald starting.  Commit interval 5 seconds
EXT3 FS on sda1, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
Adding 1124508k swap on /dev/sdb1.  Priority:-1 extents:1 across:1124508k
Adding 923728k swap on /dev/sda2.  Priority:-2 extents:1 across:923728k
Unable to handle kernel NULL pointer dereference at 0000000000000040 RIP:
<ffffffff801d1890>{sysfs_readdir+387}
PGD 7a151067 PUD 7a152067 PMD 0
Oops: 0000 [1] SMP
CPU 0
Modules linked in: autofs4 video button battery ac cx88_blackbird parport_pc
parport tuner floppy cx8800 cx88_dvb cx8802 cx88xx sata_sil i2c_algo_bit ir_common
v4l1_compat tveeprom v4l2_common mt352 or51132 btcx_risc video_buf_dvb videodev
dvb_core video_buf ohci1394 ieee1394 nxt200x lgdt330x cx22702 dvb_pll
snd_intel8x0 snd_ac97_codec snd_ac97_bus snd_seq_dummy snd_seq_oss
snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm
i2c_nforce2 snd_timer snd soundcore i2c_core forcedeth snd_page_alloc ohci_hcd
ehci_hcd shpchp dm_snapshot
dm_zero dm_mirror dm_mod ext3 jbd raid0 sata_nv libata sd_mod scsi_mod
Pid: 2339, comm: hald Not tainted 2.6.14-1.1729_FC5 #1
RIP: 0010:[<ffffffff801d1890>] <ffffffff801d1890>{sysfs_readdir+387}
RSP: 0018:ffff81007a19feb8  EFLAGS: 00010282
RAX: 0000000000000000 RBX: ffff81007e9af198 RCX: 0000000000000014
RDX: ffff81007e9afc20 RSI: ffff81007efd3568 RDI: ffff81007e9b0260
RBP: ffff81007a2a3620 R08: 0000000000001e78 R09: 0000000000000004
R10: 0000000000000000 R11: 0000000000000246 R12: ffff81007e9af1a0
R13: ffff81007e9b024c R14: ffff81007e4f8660 R15: 0000000000000013
FS:  00002aaaaaeb2150(0000) GS:ffffffff805fc000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000040 CR3: 000000007a150000 CR4: 00000000000006e0
Process hald (pid: 2339, threadinfo ffff81007a19e000, task ffff81007dbbb040)
Stack: ffff81007fae0b10 ffffffff801a2fa3 ffff81007a19ff48 ffff81007e4f8660
       00000000fffffffe ffff81007fae9990 ffffffff801a2fa3 ffff81007a19ff48
       ffff81007fae9a68 ffffffff801a2e57
Call Trace:<ffffffff801a2fa3>{filldir64+0} <ffffffff801a2fa3>{filldir64+0}
       <ffffffff801a2e57>{vfs_readdir+159} <ffffffff801a30d1>{sys_getdents64+116}
       <ffffffff8010faea>{system_call+126}

Code: 48 8b 40 40 eb 11 48 8b 3d 5b 17 3e 00 be 02 00 00 00 e8 0a
RIP <ffffffff801d1890>{sysfs_readdir+387} RSP <ffff81007a19feb8>
CR2: 0000000000000040
 <6>w83627hf 9191-0290: Reading VID from GPIO5

Comment 6 John Ellson 2005-12-02 01:16:13 UTC

Reading back I realize that this is the same OOPS as originally reported, so I
guess the answer to your request is that kernel-2.6.14-1.1729_FC5 still exhibits
the problem.

kernel-2.6.14-1.1729_FC5
hal-0.5.5.1-1
udev-075-4
mkinitrd-5.0.10-1

Comment 7 Dave Jones 2005-12-02 06:16:51 UTC

ahh, there's at least one other bug which is oopsing in the same place in sysfs
code.  I've added a debug patch to 1735 which will tell us the name of the sysfs
file that was being accessed, which may give some clues.

Comment 8 John Ellson 2005-12-02 10:48:55 UTC

Created attachment 121736 [details]
dmesg output of same oops on kernel-2.6.14-1.1735_FC5

Comment 9 Dave Jones 2005-12-03 01:49:24 UTC

oops, I only added the debug printk on i386. I'm building 1737 right now, which
has the same functionality for x86-64.  You can grab it early from
http://people.redhat.com/davej/kernels/Fedora/devel in an hour or so.

Comment 10 John Ellson 2005-12-03 10:39:41 UTC

Created attachment 121795 [details]
dmesg output of same oops on kernel-2.6.14-1.1737_FC5

I see:
     last sysfs file: /class/vc/vcsa1/dev
which seem consistent with the symptom of the virtual terminals not working.

Comment 11 John Ellson 2005-12-03 17:56:06 UTC

Dave, a slightly off-topic issue. Are your x86_64 kernels all SMP ?  See
bug#174894, but it might be just because I'm using your build?

Comment 12 Dave Jones 2005-12-03 20:54:55 UTC

yes, all the x86-64 kernels are smp, as the number of UP x86-64's are a minority
(especially with the advent of dual core & Intel EM64T's all have HT).  The
penalty of locked operations is also a lot less than it is on 32 bit CPUs.
(Intel still hurts a bit, but AMD is negligable).

Comment 13 John Ellson 2005-12-03 21:23:31 UTC

Back on topic.   I noticed after one of my many reboots that 2.6.14-1.1737_FC5
had rebooted without the oops.   Unfortunately I didn't capture its dmesg, and I
haven't been able to repeat since.

Anyway, apparently the oops only happens about 95% of the time

Comment 14 John Ellson 2005-12-03 22:01:45 UTC

Created attachment 121810 [details]
dmesg from 2.6.14-1.1674_FC5 without oops

Example dmesg outputs for same kernel version with and without oops.

Comment 15 John Ellson 2005-12-03 22:02:36 UTC

Created attachment 121811 [details]
dmesg from 2.6.14-1.1674_FC5 with oops

Comment 16 John Ellson 2005-12-07 13:54:01 UTC

Three reboots in a row this morning without the oops.  Virtual consoles are back.
Was this a udev problem?

kernel-2.6.14-1.1740_FC5
mkinitrd-5.0.10-1
udev-076-1

Comment 17 Dave Jones 2005-12-07 20:12:02 UTC

udev changes more likely.  the changes in the last few builds have been fairly
benign.   Does ls -R /sys cause the oops to reoccur ?

Comment 18 John Ellson 2005-12-07 20:36:32 UTC

No problems with "ls -R /sys"

I was getting the oops yesterday with 1740, so it definitely wasn't a kernel
change that fixed it.    

I think you probably can blame this one on udev, and close it.

Comment 19 Dave Jones 2005-12-08 04:13:32 UTC

ok, please reopen if it reoccurs, as this one still smells a bit odd to me.

Note You need to log in before you can comment on or make changes to this bug.