Bug 588900

Summary: ibm webcam via USB causes a crash
Product: [Fedora] Fedora Reporter: Bill Davidsen <davidsen>
Component: kernelAssignee: Hans de Goede <hdegoede>
Status: CLOSED NEXTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: low    
Version: 13CC: anton, benjavalero, dougsland, gansalmon, hdegoede, itamar, jonathan, kernel-maint, mishu, vwfoxguru, zaitcev
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-02-23 18:42:13 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
log of crash during boot with one camera attached
none
dmesg from Live-CD boot and working webcam
none
dmesg from RHEL 5 and xawtv
none
usbmon from RHEL 5 and xawtv
none
GFP_ATOMIC and disable message flood
none
messages log from motion program testing
none
Supported palettes from the motion config file none

Description Bill Davidsen 2010-05-04 19:24:05 UTC
Created attachment 411380 [details]
log of crash during boot with one camera attached

Description of problem: Booting with an IBM webcam (ibmcam driver) attached causes a crash during boot and the camera is unusable. NOTE: on 4/21 fully patched beta worked and was used as a security server (using 'motion' app). Upgrade to current on 5/2 initiated the problem.


Version-Release number of selected component (if applicable):
2.6.33.2-41.fc13.x86_64
2.6.33.2-57.fc13.x86_64

How reproducible:
Always

Steps to Reproduce:
1.Attach camera before or after boot
2.
3.
  
Actual results: crash reported, camera(s) don't work


Expected results: Continued operation as the 4/21 system provided


Additional info: After upgrade 5/2 glxgears dropped from ~1k fps to 60 fps, and X performance became very sluggish.

Comment 1 Pete Zaitcev 2010-05-04 19:50:53 UTC
The stack dump is just a warning, not a crash. I have the same thing too
with ibmcam. However, the reports of usb_submit_urb failing are worrying.
Perhaps the bandwidth is missing.

Comment 2 Bill Davidsen 2010-05-04 21:09:11 UTC
(In reply to comment #1)
> The stack dump is just a warning, not a crash. I have the same thing too
> with ibmcam. However, the reports of usb_submit_urb failing are worrying.
> Perhaps the bandwidth is missing.    

This system ran, with this camera attached, from 4/21 to 5/2 monitoring the camera output with the 'motion' app. I checked it several times a day to keep in touch with my family, and every time the camera was working, and I had videos as people and cats walked near the camera. It also worked with 'cheese' to do snapshots or films.

When I returned I applied all updates, and subsequently the system has not been able to use the camera. I'm used to these warnings, this kernel spits a few during shutdown, too late to capture. But the non-functionality is concerning, as this system will be a security server in the future. Don't know where the bandwidth would have gone, I have over a week of 7x24 operation without a problem, and it's a BIG system, at least as webcam hosts go.

I think a week of continuous working is about as high an uptime as you will see on a beta release, so I hope the original functionality can be restored rather than roll back to FC11 (or FC12 in VESA mode).

Comment 3 Bill Davidsen 2010-05-05 13:53:24 UTC
Retested with 2.6.33.3-72.fc13.x86_64 this morning, which seems to eliminate the stack trace at boot. Unfortunately the camera still doesn't *work*, as it di with the updates current 4/21. I'll attach dmesg parts (or the whole thing) if wanted, but the "usbvideo: usb_submit_urb error (-1)" message persists.

I'll try a different model of webcam and see if USB is working at all (have tried using three connections already).

Comment 4 Pete Zaitcev 2010-05-06 05:14:03 UTC
So far all I know is that urb->reject is somehow set on the ISO URB when
it's submitted for the second time, and then usb_hcd_link_urb_to_ep
bails with -1 (-EPERM).

It goes down like so on usbmon:

d6355f20 1.804055 S Co:2:002:0 s 02 01 0000 0081 0000 0
d6355f20 1.804873 C Co:2:002:0 0 0
d636c000 1.805048 S Zi:2:002:1 -:1:5220352 32 -18:0:1022 -18:1022:1022 -18:2044:
1022 -18:3066:1022 -18:4088:1022 -18:5110:1022 -18:6132:1022 -18:7154:1022 32704
 <
d636c860 1.805176 S Zi:2:002:1 -:1:5220384 32 -18:0:1022 -18:1022:1022 -18:2044:
1022 -18:3066:1022 -18:4088:1022 -18:5110:1022 -18:6132:1022 -18:7154:1022 32704
 <
d636c000 1.809974 C Zi:2:002:1 -2:1:5220925:0 32 -18:0:0 -18:1022:0 -18:2044:0 -
18:3066:0 -18:4088:0 -18:5110:0 -18:6132:0 -18:7154:0 0
d636c000 1.810002 S Zi:2:002:1 -:1:5220925 32 -18:0:1022 -18:1022:1022 -18:2044:
1022 -18:3066:1022 -18:4088:1022 -18:5110:1022 -18:6132:1022 -18:7154:1022 32704
 <
d636c000 1.810186 E Zi:2:002:1 -1 0
d636c860 1.813978 C Zi:2:002:1 -2:1:5220957:0 32 -18:0:0 -18:1022:0 -18:2044:0 -
18:3066:0 -18:4088:0 -18:5110:0 -18:6132:0 -18:7154:0 0
d636c860 1.814004 S Zi:2:002:1 -:1:5220957 32 -18:0:1022 -18:1022:1022 -18:2044:1022 -18:3066:1022 -18:4088:1022 -18:5110:1022 -18:6132:1022 -18:7154:1022 32704 <
d636c860 1.814188 E Zi:2:002:1 -1 0
d6355f20 1.819213 S Co:2:002:0 s 42 00 0000 010c 0000 0
d6355f20 1.820870 C Co:2:002:0 0 0

So the first time URB ends normally, but transfers zero bytes (weird).
The second time it's the reject and -EPERM.

Comment 5 Pete Zaitcev 2010-05-06 19:13:47 UTC
Bill, are you able to get it working anywhere at all? I tested on
2.6.32.12-114.fc12.x86_64, same result. I suspect that it was a pure
accident that the Beta worked, it was based on 2.6.33.1-24.fc13.
I checked changelogs at kernel.org, there was nothing interesting
since Linus trees for 2.6.33 and 2.6.32 were cut.

I just need one usbmon trace and a working kernel version. Heck
even RHEL-5 would do, probably (its usbmon hasn't got support for
dumping ISO descriptors, but will do hopefuly).

Comment 6 Bill Davidsen 2010-05-06 19:56:24 UTC
This worked on all the parts current on 4/21, I did an upgrade to current and started the monitor. I think that would be the 2.6.33.2-57.fc13.x86_64 kernel. I can go back and boot the 2.6.33.2-41.fc13.x86_64 kernel and verify functionality. 
Did so, still get the USB error. It seems to be something outside the kernel, but what that could be I can't say. If I think of anything else to try I'll add it.

Comment 7 Bill Davidsen 2010-05-06 20:11:41 UTC
Got it! I booted off the "live CD" kernel 2.6.33.1-19.fc13.x86_64 and the webcam works. There may have been intermediate kernels between that and the one I had, but it does give you a starting point.  Will add the dmesg output after I get it here.

Comment 8 Bill Davidsen 2010-05-06 20:14:12 UTC
Created attachment 412153 [details]
dmesg from Live-CD boot and working webcam

This is the dmesg, there's a trace at the boot, several failures of various kinds, but both cheese and motion work.

Comment 9 Pete Zaitcev 2010-05-06 21:35:31 UTC
Could you run usbmon on that? Not sure if it's installed or if you can
do "yum install usbmon" on a LiveCD...

Comment 10 Pete Zaitcev 2010-05-06 22:39:41 UTC
I installed 2.6.33.1-19.fc13.x86_64 on top of an F14 Rawhide and it fails
in exactly the same way as F14's own 2.6.34-0.38.rc5.git0.fc14.x86_64 and
the upstream.

Comment 11 Pete Zaitcev 2010-05-07 03:56:17 UTC
Created attachment 412208 [details]
dmesg from RHEL 5 and xawtv

Comment 12 Pete Zaitcev 2010-05-07 03:57:02 UTC
Created attachment 412209 [details]
usbmon from RHEL 5 and xawtv

Comment 13 Pete Zaitcev 2010-05-07 04:06:45 UTC
I ran the ibmcam on RHEL 5 with xawtv just now. The result is:
 - camera works
 - same -1 error from usb_submit_urb occurs
However, the errors only occur at the beginning, but not thereafter.
Here's the console trace:

[root@niphredil zaitcev]# xawtv -nodga
This is xawtv-3.88, running on Linux/x86_64 (2.6.18-160.el5-u)
xinerama 0: 1440x900+0+0
WARNING: remote display `localhost:10.0' not allowed, using `:10.0' instead
X Error of failed request:  XF86DGANoDirectVideoMode
  Major opcode of failed request:  130 (XFree86-DGA)
  Minor opcode of failed request:  1 (XF86DGAGetVideoLL)
  Serial number of failed request:  13
  Current serial number in output stream:  13
v4l-conf had some trouble, trying to continue anyway
Warning: Cannot convert string "-*-ledfixed-medium-r-*--39-*-*-*-c-*-*-*" to type FontStruct
config: invalid value for input: Television
valid choices for "input": "Camera"
ioctl: VIDIOCMCAPTURE(frame=0;height=4;width=8;format=7): Invalid argument
ioctl: VIDIOCMCAPTURE(frame=0;height=4;width=8;format=15): Invalid argument
ioctl: VIDIOCMCAPTURE(frame=0;height=4;width=8;format=9): Invalid argument
ioctl: VIDIOCMCAPTURE(frame=0;height=4;width=8;format=5): Invalid argument
[root@niphredil zaitcev]# 

So... Apparently ibmcam always was like this, but applications compensated
somehow. The usbmon trace shows the same exact failure that occurs with F14,
but then a large number of control requests and error-free ISOs.

In my case I used fswebcam on Fedora.

One suspicious thing here is that supposedly both "motion" and "cheese"
fail or work together, which they should not do unless they are built
from the same codebase.

BTW, usbvideo+ibmcam is busted on RHEL 5: it floods the logs with the
report of scheduling in interrupt, because ibmcam asks usbvideo to
resubmit URBs from an interrupt (how else, right?), and usbvideo
passes GFP_KERNEL to usb_submit_urb. I'm pretty sure this means that
nobody ever used ibmcam on RHEL 5. I had to fix it up in order to
run my tests.

Comment 14 Pete Zaitcev 2010-05-08 01:43:01 UTC
BTW, I got the fswebcam to work with ibmcam by:
1. fixig crashes (commits are in the upstream git now),
2. by specifying the right palette with  fswebcam -p BGR24

Comment 15 Pete Zaitcev 2010-05-08 01:45:08 UTC
Created attachment 412470 [details]
GFP_ATOMIC and disable message flood

The GFP_KERNEL from an interrupt handler is plain ridicuous, I wonder how
it ever worked for anyone.

Comment 16 Bill Davidsen 2010-05-24 19:28:08 UTC
I retried this when 2.6.34 came out, that kernel does not improve things. However, I did note that after trying to use the camera /dev/video0 vanishes, leaving only /dev/video1 by itself. When I get to a reboot I will look to see if /dev/video0 survives until used.

Also tried running the camera under KVM using USB passthru, with an fc9 image I had around which did work when it was current. The camera is detected, but ibmcam is not loaded. I may have tried to use the camera before that test, so will investigate more after reboot. Unfortunately I have a server in a VM there for a few more hours until the normal host is ready to go again.

Comment 17 Scott Williams 2010-07-11 04:10:58 UTC
I am seeing this same issue here.  Webcam worked at first in Fedora 13 and at least in the past two kernels it no longer is detected.

Comment 18 Scott Williams 2010-07-11 05:04:27 UTC
I'm sorry, after doing some more checking I have a different chipset for my camera, so I doubt my issue is related.

Comment 19 Hans de Goede 2011-02-17 16:15:08 UTC
Hi,

I just stumbled over this bug. Yes the old ibmcam driver has become rather buggy I'm afraid. That is mostly because no one has really maintained it since it was first written, and because it is an old v4l1 driver it needs to be rewritten to follow the v4l2 api. So that is what I've done :)

There now is a new clean v4l2 driver using the gspca framework for usb webcams.

Can you please download and install this kernel and give it a try? :
http://koji.fedoraproject.org/koji/buildinfo?buildID=213137

Note when installing kernels always use:
"rpm -ivh kernel..."

Do *not* use "rpm -Uvh kernel...", the difference is the first
one installs the kernel next to your existing one, whereas the second (wrong) one will replace your existing kernel, and if the new one for some reason wont boot on your harware ...

Unfortunately this kernel also still builds and installs the old ibmcam driver, so before plugging in the cam do (as root):
rm /lib/modules/2.6.37-2.fc15.x86_64/kernel/drivers/media/video/usbvideo/ibmcam.ko

Thanks & Regards,

Hans

Comment 20 Bill Davidsen 2011-02-18 22:18:33 UTC
(In reply to comment #19)
> Hi,
> 
> I just stumbled over this bug. Yes the old ibmcam driver has become rather
> buggy I'm afraid. That is mostly because no one has really maintained it since
> it was first written, and because it is an old v4l1 driver it needs to be
> rewritten to follow the v4l2 api. So that is what I've done :)
> 
> There now is a new clean v4l2 driver using the gspca framework for usb webcams.
> 
> Can you please download and install this kernel and give it a try? :
> http://koji.fedoraproject.org/koji/buildinfo?buildID=213137
> 
Some good, some bad news. Installed in an FC13-x64 fully updated system, removed the module, system booted, when the camera was attached it was detected. Testing with the "Cheese" application worked, both snapshot and video worked.

The bad news is that the useful application, "motion" for monitoring, didn't detect any motion. That's not so good since the whole idea is to use these installed units for monitoring.

Testing continues, I'll post logs after being sure I didn't miss something. What module should it be loading instead of ibmcam?

Comment 21 Hans de Goede 2011-02-18 22:30:22 UTC
(In reply to comment #20)
> (In reply to comment #19)
> > Hi,
> > 
> > I just stumbled over this bug. Yes the old ibmcam driver has become rather
> > buggy I'm afraid. That is mostly because no one has really maintained it since
> > it was first written, and because it is an old v4l1 driver it needs to be
> > rewritten to follow the v4l2 api. So that is what I've done :)
> > 
> > There now is a new clean v4l2 driver using the gspca framework for usb webcams.
> > 
> > Can you please download and install this kernel and give it a try? :
> > http://koji.fedoraproject.org/koji/buildinfo?buildID=213137
> > 
> Some good, some bad news. Installed in an FC13-x64 fully updated system,
> removed the module, system booted, when the camera was attached it was
> detected. Testing with the "Cheese" application worked, both snapshot and video
> worked.
> 

Ok, so it seems that everything works as it should.

> The bad news is that the useful application, "motion" for monitoring, didn't
> detect any motion. That's not so good since the whole idea is to use these
> installed units for monitoring.

Hmm, maybe motion is not patched to use libv4l, and instead tries to access
/dev/video# directly and then does not grok the xirlink specific image format
the drivers delivers frames in, try launching motion like this:

LD_PRELOAD=/usr/lib64/libv4l/v4l1compat.so motion

This causes motion to be dynamically linked against a small C-lib wrapper which
redirects fd ops on /dev/video# to libv4l.

> Testing continues, I'll post logs after being sure I didn't miss something.
> What module should it be loading instead of ibmcam?

You should have a gspca_xirlink module loaded

Regards,

Hans

Comment 22 Hans de Goede 2011-02-21 09:16:21 UTC
Bill,

Any news on this ?

Regards,

Hans

Comment 23 Bill Davidsen 2011-02-23 16:03:57 UTC
Created attachment 480516 [details]
messages log from motion program testing

I attach the log data from the motion program in the default (v4l2) mode and with the v4l1compat library loaded as well. It looks to me as if the driver doesn't support any of the eight palettes the applition supports, or they have been renumbered in some way, perhaps?

Comment 24 Bill Davidsen 2011-02-23 16:06:32 UTC
Created attachment 480517 [details]
Supported palettes from the motion config file

These are the supported palettes for motion, none seem to be supported by the new driver. Don't know if that's the fault of the program or the driver, but motion works on these cams with the old driver, and with several Logitech cams as well.

Comment 25 Hans de Goede 2011-02-23 18:42:13 UTC
(In reply to comment #23)
> Created attachment 480516 [details]
> messages log from motion program testing
> 
> I attach the log data from the motion program in the default (v4l2) mode and
> with the v4l1compat library loaded as well. It looks to me as if the driver
> doesn't support any of the eight palettes the applition supports, or they have
> been renumbered in some way, perhaps?

Hi,

If you look closely at the log with LD_PRELOAD, you see:
Feb 23 10:05:00 posidon motion: [1] Config palette index 1 (BA81) doesn't work.
Feb 23 10:05:00 posidon motion: [1] Supported palettes:
Feb 23 10:05:00 posidon motion: [1] 0: GRBG (GRBG)
Feb 23 10:05:00 posidon motion: [1] 1: CITV (CITV)
Feb 23 10:05:00 posidon motion: [1] Unable to find a compatible palette format.
Feb 23 10:05:00 posidon motion: [1] ioctl(VIDIOCGMBUF) - Error device does not support memory map
Feb 23 10:05:00 posidon motion: [1] V4L capturing using read is deprecated!

Which looks like, the LD_PRELOAD for some reason is not working. Then you see:

Feb 23 10:05:01 posidon motion: [1] Test palette YU12 (352x240)
Feb 23 10:05:01 posidon motion: [1] Adjusting resolution from 352x240 to 320x240.
Feb 23 10:05:01 posidon motion: [1] Using palette YU12 (320x240) bytesperlines 320 sizeimage 115200 colorspace 00000008

Which looks good. Followed by:
Feb 23 10:05:01 posidon motion: [1] bind(): Address already in use
Feb 23 10:05:01 posidon motion: [1] Problem enabling stream server in port 8081: Address already in use
Feb 23 10:05:01 posidon motion: [1] Closing video device /dev/video0

Which points to another copy of motion running.

Anyways this clearly points to a userspace problem. Not a kernel one. Since 2.6.38 has the new driver, and no longer the old conflicting one which I asked you to remove, I consider this fixed from the kernel side of things.

Please file a new bug against motion if you keep having issues with motion. If you can get motion to work with the LD_PRELOAD, but not without, again please file a bug as it should be patched to directly use libv4l. Please put me in the CC when you file a bug against motion. I'll try to make some time to look at it, but no promises.

Regards,

Hans