Bug 240771 - kernel oops when running 'test-dv' from libiec61883-utils with firewire dv camcorder
Summary: kernel oops when running 'test-dv' from libiec61883-utils with firewire dv ca...
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: rawhide
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Kristian Høgsberg
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2007-05-21 15:47 UTC by Will Woods
Modified: 2007-11-30 22:12 UTC (History)
18 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2007-10-11 17:31:28 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
oops (from dmesg) with a little extra context (3.46 KB, text/plain)
2007-05-21 15:47 UTC, Will Woods
no flags Details
messages log of Ooops (5.93 KB, text/plain)
2007-06-12 22:54 UTC, toddz
no flags Details
dmesg log of Oops (2.32 KB, text/plain)
2007-06-12 22:55 UTC, toddz
no flags Details
oops output (4.51 KB, text/plain)
2007-06-13 19:16 UTC, S. Zickler
no flags Details
2.6.22.4-65.fc7.x86_64 dvgrab oops (5.53 KB, text/plain)
2007-09-03 23:47 UTC, Anthony Messina
no flags Details
dvgrab (3.60 KB, text/plain)
2007-09-06 12:05 UTC, Mateusz Kurtas
no flags Details
dvgrab bug (3.62 KB, text/plain)
2007-09-27 11:03 UTC, Mateusz Kurtas
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Linux Kernel 8623 0 None None None Never

Description Will Woods 2007-05-21 15:47:01 UTC
kernel-2.6.21-1.3167.fc7 (x86_64)
libraw1394-1.2.1-7.fc7

After plugging in my DV camcorder, I ran test-dv (as root) and the kernel oopsed
in fw_core:
BUG: unable to handle kernel paging request at virtual address ffffffea

full oops is attached.

Comment 1 Will Woods 2007-05-21 15:47:01 UTC
Created attachment 155101 [details]
oops (from dmesg) with a little extra context

Comment 2 ElLocoGato 2007-06-11 21:18:36 UTC
I get a similar kernel "oops" when I attempt to run dvgrab as root on Fedora 7.
 If I run dvgrab as user, I get "No camera found" (permissions problem?).

kernel-2.6.21-1.3194.fc7
libraw1394-1.2.1-7.fc7


Comment 3 toddz 2007-06-12 22:54:05 UTC
I am experiencing a similar problem on two different machines - both running
Fedora 7, both using different firewire capture devices and both using different
cameras - same result. 

Here's a little more information.  When I plug in the camera I get the following
messages in the log:
Jun 12 15:30:42 floyd kernel: fw_core: created new fw device fw1 (0 config rom
retries)
Jun 12 15:30:42 floyd kernel: fw_core: phy config: card 0, new root=ffc1,
gap_count=5
Jun 12 15:31:13 floyd last message repeated 27 times
Jun 12 15:32:14 floyd last message repeated 54 times

At that point the system locks hard (can't ssh in, etc etc) and I need to hard
reboot.

When I plug a cam in and change the ownership of /dev/fw1 (its owned rw by root
only) and run dvgrab I get the attached Oops (dmesg and messages attached
although I think they show the same thing).  

Like I said, same issue on two machines,two capture cards, two cameras.

Comment 4 toddz 2007-06-12 22:54:42 UTC
Created attachment 156832 [details]
messages log of Ooops

Comment 5 toddz 2007-06-12 22:55:10 UTC
Created attachment 156833 [details]
dmesg log of Oops

Comment 6 S. Zickler 2007-06-13 19:13:06 UTC
I receive a similar kernel oops when trying to initialize capturing using
libdc1394 as root (using the latest libraw1394) on Fedora 7. This is not using
test-dv, but rather on any application using the latest libdc1394 and thus
libraw1394 and the new FW stack. I am a computer vision researcher and basically
all my dc1394 based applications no longer work with Fedora 7 due to the new
broken FW stack.

Looking at the oops-output (which will follow in an attachment) the error seems
in fact to be generated somewhere down the line in the new firewire-stack and/or
its libraw1394 support.

Please let me know if you need any additional info, I would be glad to help
track this thing down.

Here are my specs:

Fedora 7
Linux 2.6.21-1.3194.fc7 #1 SMP (i386)
libraw1394 1.2.1-7.fc7
libdc1394 latest SVN (svn revision number 401) compiled with "juju" backend support


Comment 7 S. Zickler 2007-06-13 19:16:04 UTC
Created attachment 156901 [details]
oops output

attached is the oops error message.
It seems like the bug is somewhere in the FW stack at
fw_iso_context_destroy / fw_device_op_release

Comment 8 S. Zickler 2007-06-13 20:14:22 UTC
Ok, I did some more debugging to track down the place where the oops is 
originally triggered in user-space. I found that the oops is generated by the 
following ioctl in libdc1394 in file capture.c:

ioctl(craw->iso_fd, FW_CDEV_IOC_CREATE_ISO_CONTEXT, &create)

This call fails (returns -1) and it also triggers the oops.
This ioctl's target is part of the firewire-stack and resides in linux/firewire-
cdev.h, so the error should be somewhere in there.

Also, libdc1394 will segfault afterwards when attempting to handle the failed 
ioctl with its error handler (it attempts to close the iso_fd, but that's not 
directly related to the oops since it happens afterwards).

I hope this is of help.


Comment 9 Mauro M. 2007-06-14 10:04:38 UTC
I have opened bug 8623 in the kernel bug tracker. I would like to know who took 
the decision to change the FW stack in fedora without running appropriate 
regression tests and what was the rationale behind his decision. 

Comment 10 Will Woods 2007-06-14 14:21:23 UTC
(In reply to comment #9)
> I would like to know who took the decision to change the FW stack in fedora

The firewire maintainer made that decision. Seems pretty obvious that he's the
one who knows best what the Right Thing To Do is with firewire.

> without running appropriate 
> regression tests and what was the rationale behind his decision. 

How, exactly, do you know what tests we ran? Furthermore this was a
well-publicized feature of Fedora 7 from the beginning:
http://fedoraproject.org/wiki/Releases/FeatureFirewireJuJu

There was 5 months of public testing, with four public test releases, and nobody
filed a bug about this until it was too late to fix for Fedora 7. 

Given that we have neither the time nor resources to test everything, perhaps
you will join us for Fedora 8 test releases to help ensure this doesn't happen
again. In the meantime this bug is being tracked and updates will be posted here
when there are fixes available.

Comment 11 Mauro M. 2007-06-17 10:19:07 UTC
For those who need to capture from their camera and do not want to wait for
Fedora, here is a fix that will restore kernel, libraries and kino former 
and working FireWire stack:

http://www.ezplanetone.com/xwiki/bin/view/KnowledgeBase/BrokenFC7FireWire

Comment 12 Dominik 'Rathann' Mierzejewski 2007-06-26 17:59:20 UTC
(In reply to comment #11)
> For those who need to capture from their camera and do not want to wait for
> Fedora, here is a fix that will restore kernel, libraries and kino former 
> and working FireWire stack:
> 
> http://www.ezplanetone.com/xwiki/bin/view/KnowledgeBase/BrokenFC7FireWire
[...]
Fedora supplies bleeding edge Linux in the real sense and this time someone over
there managed to replace a fairly good working FireWire stack with a broken one
that is not even included in the main stream Kernel. Why? Who knows, at Fedora
they are too arrogant to admit that they have made a mistake, and to listen to
their user base.
[...]

This is an outright lie, please stop spreading FUD. I am appaled that a member
of the community would write something like that instead of helping fix the
problem. You should've participated in the F7 test phase. If you didn't, you
have no right to complain now.

Comment 13 ElLocoGato 2007-06-26 18:20:22 UTC
(In reply to comment #12)

I agree that the statements on that page are reactionary and overly harsh. 
However, it is true that this is a pretty big bug that seems to be affecting a
lot of Fedora 7 users, and for which there is no workaround other than
recompiling the kernel and some libraries, which is beyond many users.

Moving forward, what can those of us who are experiencing this bug do to help
get it resolved quickly?


Comment 14 Kristian Høgsberg 2007-06-26 19:05:00 UTC
The problem has been fixed upstream and the rawhide kernels has the fix.  It's
possible to upgrade to a rawhide kernel without affecting the rest of the
system.  Doing so will make the crash go away, but the underlying problem won't,
unfortunately.  The issue is that the new stack only works with an OHCI 1.1
compatible controller, but it turns out that these are more widely deployed than
first assumed.

The plan is to issue an F7 update once 2.6.22 comes out.  I'm planning a
software fallback for the feature that the OHCI 1.0 controllers doesn't provide,
but it won't make it into the official 2.6.22 kernel.  We may be able to ship it
as a patch in the 2.6.22 kernel RPM, which would bring back test-dv in F7.

Comment 15 Mauro M. 2007-06-26 20:54:03 UTC
Those who cannot or do not want to get involved with rebuilding kernel and
libraries can follow the link and instructions in my previous comment #11. I
have been using the updates for more than a week and all the problems have gone
away. I  am also monitoring the site that gets hundreds of downloads/day and so
far no complaints.

I hope this helps.
M.

Comment 16 Mark Alford 2007-07-02 21:04:49 UTC
Would it have been possible to adopt the new firewire stack but leave the old
raw1394 code in place and available as a fallback? (Perhaps activateable via a
kernel boot option, much as APM survives in the shadow of ACPI)? 




Comment 17 Kristian Høgsberg 2007-07-05 18:47:50 UTC
*** Bug 243081 has been marked as a duplicate of this bug. ***

Comment 18 Mauro M. 2007-07-21 15:48:43 UTC
I have just updated Fedora 7 to kernel 2.6.22.1-27.fc7 but the kino still does
not recognize my firewire device. I working on a kernel with the good firewire
stack to release with EzPlanet updates. 

Comment 19 Mark Heslep 2007-08-21 12:39:13 UTC
Confirm that my Via firewire fails w/ 2.6.22.1-41.fc7 kernel and yesterdays
development (fc8) kernel.  That is, kino, dvgrab, etc crash when talking to an
attached device (immediately).

Comment 20 Mark Heslep 2007-08-21 12:41:16 UTC
Follow-up: and same machine, same devices worked fine in FC6

Comment 21 W. Michael Petullo 2007-08-27 00:44:37 UTC
I am using kernel-2.6.22.4-65.fc7

Comment 22 W. Michael Petullo 2007-08-27 00:52:29 UTC
I am using kernel-2.6.22.4-65.fc7.  Dvgrab says:

$ dvgrab
Found AV/C device with GUID 0x0000850001097ee3
""     0.00 MB 0 frames                                                         
Capture Stopped                                                                 
Error: no DV

No oops.  Nothing is captured, although the camera does start playing for a few
seconds until dvgrab says "Capture Stopped."

I also was able to use dvgrab with Fedora Core 6.

Note: the device root permissions issue some people are reporting is documented
in Bug #191670.

Comment 23 Michiel 2007-09-02 03:52:38 UTC
I have the same problem. The "update" on comment #11 didn't work for me. I'm
torrenting the Fedora 8 Test 1 now, and I'll report results in a few days.

Comment 24 Anthony Messina 2007-09-03 23:47:37 UTC
Created attachment 185451 [details]
2.6.22.4-65.fc7.x86_64 dvgrab oops

this is from dvgrab, kino produces a similar situation.

Comment 25 Mateusz Kurtas 2007-09-06 12:05:16 UTC
Created attachment 188651 [details]
dvgrab

2.6.22.4-65.fc7

dvgrab

Comment 26 texas_ducod 2007-09-10 23:48:31 UTC
I just tried ezplanet's repo and it is down... so how does one go about fixing
this bug?

Comment 27 W. Michael Petullo 2007-09-16 17:44:23 UTC
I get the same results as documented in comment #22 when using Fedora 8 Test 2.

Comment 28 texas_ducod 2007-09-18 01:11:16 UTC
(In reply to comment #26)
> I just tried ezplanet's repo and it is down... so how does one go about fixing
> this bug?

Thank you for opening ezplanet's repo. It worked once I yum updated my kernel
and other 1394 libraries from ez repository.

Cheers,
Duc


Comment 29 texas_ducod 2007-09-18 01:15:40 UTC
(In reply to comment #28)
> (In reply to comment #26)
> > I just tried ezplanet's repo and it is down... so how does one go about fixing
> > this bug?
> 
> Thank you for opening ezplanet's repo. It worked once I yum updated my kernel
> and other 1394 libraries from ez repository.
> 
> Cheers,
> Duc
> How does running the ez updates effect the other repo's packages and updates?
And is it safe to run the ez kernel?



Comment 30 Mateusz Kurtas 2007-09-18 22:16:32 UTC
how i can get the kmod to ez kernel ???

Comment 31 Mateusz Kurtas 2007-09-27 11:03:07 UTC
Created attachment 208221 [details]
dvgrab bug

kernel-2.6.22.7-85.fc7
dvgrab-2.1-2.fc7

Comment 32 Will Woods 2007-10-11 17:31:28 UTC
The kernel oops originally reported here seems to be fixed. The "oops" reported
in comment #24 and later is a bug in libraw1394, not a kernel bug. 

I've filed the libraw1394 oops as bug #328011, but I believe the kernel bug is
closed.

Comment 33 Chris Petersen 2007-10-11 17:48:30 UTC
So..  fixed in rawhide and not stable?  What about those of us running plain old
F-7?

Comment 34 Jarod Wilson 2007-10-19 14:40:07 UTC
I believe the oops should be gone in F-7 too, using the latest kernel from
updates-testing (2.6.23.1-4.fc7 as of this writing).

Comment 35 Mark Heslep 2007-10-22 15:15:56 UTC
Its not clear that this problem, as articulated by the maintainer, has been
resolved. Kristian Høgsberg's  post #14 pointed out:

"...The issue is that the new stack only works with an OHCI 1.1
compatible controller..."

and implied that the planned fix in the new firewire stack would only be useful
to certain controllers, and I see nothing posted here by Kristian to update that
statement.  My Via 1.0's failed w/ the original F7. I'm concerned that most of
us so effected have moved on to a work around (custom kernels) and thus this
latest is not tested. Does this new rawhide kernel purport to address the other
controllers as well?

Comment 36 Jarod Wilson 2007-10-22 15:26:23 UTC
The oops is gone in rawhide kernels (and should be gone in the latest f7
updates-testing kernel, if not earlier kernels), as well as the double-free
problem (bug 328011) recently being fixed. OHCI 1.0 controllers still don't work
though, as noted in 328011, and now tracked in bug 344851.

Comment 37 Mateusz Kurtas 2007-11-15 10:21:58 UTC
in Fedora 8 it still dont working i have:

[root@host-192]~# dvgrab -autosplit -format raw -size 0 -noavc -timestamp foo-
Found AV/C device with GUID 0x08004601044380f6
ioctl call failed, retval = -1
ieee1394io.cc:460: In function "virtual bool iec61883Reader::StartReceive()": 
"iec61883_dv_fb_start( m_iec61883.dv, channel )" evaluated to -1
ieee1394io.cc:460: errno: 38 (Function not implemented)
""     0.00 MB 0 frames
Capture Stopped




Note You need to log in before you can comment on or make changes to this bug.