Description of problem:
Kernel halts with an Oops while initializing system. Problem is in
usb-uhci module, and is triggered when cups initializes an HP printer
through ptal_mlcd, which is part of the hpoj package.
Version-Release number of selected component (if applicable):
About 1/3 of the time during boot.
Steps to Reproduce:
1. Install an HP Photosmart 7350 printer, and configure HPOJ to
recognize it. Configure cups to recognize it. Configure cups to start
3. Oops will occur during cups start-up. If it doesn't happen, sorry,
try rebooting again.
Kernel Oops during cups initialization.
Smooth sailing. Happens on occasion.
Sometimes HPOJ fails to initialize, but an oops does not result. The
message in the log is:
ptal-mlcd: ERROR at ExMgr.cpp:2744, dev=<mlc:usb:
photosmart_7350@/dev/usb/lp0>, pid=920, e=19 llioService:
llioRead returns -1, expected=6!
ptal-mlcd: ERROR at ExMgr.cpp:902, dev=<mlc:usb:
photosmart_7350@/dev/usb/lp0>, pid=920, e=19
It's not clear to me that there is a bug in ptal-mlcd. The error
message could have resulted from flaky kernel behavior.
Anyway, I know you are wondering whether I am just going to leave you
with this information, or whether I investigated the oops. I did.
Here's what I found.
(EIP) uhci_submit_bulk_urb [usb-uhci] 0x16
do_select [kernel] 0x153
uhci_submit_urb [usb-uhci] 0x319
usb_submit_urb_Rsmp_93abab4d [usbcore] 0x3d
usblp_read [printer] 0x12e
sys_read [kernel] 0x97
system_call [kernel] 0x33
The code at EIP is:
820: _static int uhci_submit_bulk_urb (struct urb *urb,
struct urb *bulk_urb)
822: uhci_t *s = (uhci_t*) urb->dev->bus->hcpriv;
where offset 0x16 is:
where %ebx is "urb->dev", and 0xcc is the offset to "bus".
The kernel stops here because %ebx is zero.
So in the urb structure, the "dev" field is null. That doesn't seem
right. I have USB 2.0 on the motherboard that has had plenty of time
to start up, and the printer has been on for weeks. I suspect the
initialization of the dev field is flaky, and this could also explain
the soft failures reported by the ptal-mlcd process and why it works
By the way, when ptal-mlcd fails during start up and the kernel does
not oops, I sometimes have to rmmod usb-uhci and then reload it with
A null dev means that URB was completed.
All HC drivers zap ->dev before they decrement device usage.
I'll look into this, although I do not have a printer.
Probably someone used (urb->status==-EINPROGRESS) test again,
or something simple like that.
BTW, Craig, can you try a Fedora kernel?
I don't think so. According to your Fedora Project pages, I have to
download 3 ISOs and configure a dual boot system. Sorry, but I don't
have the time to do that right now.
If I have misunderstood the situation, and can simply install
another kernel and add it to my grub.conf, then please tell me where
the fedora kernel RPM is, and I'll do it. But I'm guessing that
upgrading only the kernel without Fedora's glibc & other user space
friends may not work too well - true?
Craig, one more thing - please attach the actual dmesg capture
with the oops, if possible.
Re. the Fedora kernel, it can be downloaded separately from isos
and installed on top of RHL 9 userland. Bother RHL 9 and FC 1
are NPTL based, so it matches. But let's concentrate on dmesg.
Created attachment 96065 [details]
Kernel oops detail
I copied the oops data manually from the screen. The system log did not have
Awwww, I did not mean to make all this extra work, especially
when I wanted to see if any other messages were present before
I continue to suspect (urb->status==-EINPROGRESS) at this point.
Created attachment 96067 [details]
dmesg preceeding oops
Thanks for your concern, Pete, but it really was not a problem. The oops detail
was handy because I copied it into a file to run ksymoops with (and then
realized that modern oops reports pretty much obviate ksymoops).
I did not understand that you wanted to see the messages preceeding the oops.
Here they are.
Created attachment 96078 [details]
I installed and booted with Fedora kernel 2.4.22-1.2115.nptlsmp.
Rebooted 6 times. Based on previous behavior, that should have
elicited either the oops or the ptal complaint at least once. Didn't.
So it appears the problem is cured with the kernel. Yet I have
lingering suspicions that this bug results from a timing problem in a
multiprocessor environment, and do not recommend closing this bug
Presumably, this problem has been in the kernel for awhile. Yet it
did not show up until I upgraded my P-3 processor to a P-4, installed
an SMP kernel, and enabled both processors. Although I am now using
Fedora's SMP kernel, it appears to be using only one processor! Both
top and gkrellm show plenty of activity on CPU0 and no activity
whatsoever on CPU1, and /proc/cpuinfo shows two processors in the
system. Maybe the Fedora folks broke that part temporarily.
Are you running Fedora kernel on top of RHL9 userland, or your
yum-ed whole distro?
In any case, please try this:
Please capture me the trace with a serial console, digicam, or
some other method, if it blows up.
If it refuses to sit on top of RHL 9 userland with rpm -i, --force it.
It should work with old glibc just fine.
I ran the Fedora kernel on top of RH-9, and am using the latest
packages from RHN. Seemed to work fine.
I tried your 2121 build. Didn't blow up, nor require forced install.
Did not oops on me, either. I rebooted 3 times before I got bored
with sublime reboot behavior. In fact, 2121 seemed indistinguishable
from the Fedore 2115 build.
And this is a problem, because both Fedora kernels are labeled as
"SMP", but they are not. They enabled the second processor, but did
not utilize it. To reiterate my concern, I never saw this problem
with my single processor system, and fear that it could come be due
to the SMP environment; Fedora's broken SMP could be masking the bug.
I suggest retesting the fix when the SMP is working again.
Craig, did you file a bug against the SMP utilization?
The printer backport was committed to 2.4.22-1.2136, but I cannot
do anything to this bug except close->worksforme, unless your
claims about SMP are resolved, and this bug is not a ticket
to track those.
Sorry, I thought this one was so obvious that I never checked. I just
now filed Bug 112597 for SMP utilization.
I also just now tried 2.4.22-1.2135, and it has the same SMP problem.
2136 has not yet been posted to the Fedora download site.
Craig, can I close this? Is the problem resolved?
Sorry, I cannot provide much more info. I switched away from Red Hat.
All I can tell you is that hpijs-1.4.1 works fine on kernel 2.6.6
(Gentoo), and hpoj appears to now be unnecessary.