Bug 243081
Summary: | dvgrab does not work with new firewire stack | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Mladen Kuntner <mladen.kuntner> |
Component: | dvgrab | Assignee: | Jarod Wilson <jarod> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | |
Severity: | low | Docs Contact: | |
Priority: | low | ||
Version: | 7 | CC: | deknuydt, krh, linuxuser101, lists, mail, montagne29, mpope, perja, stefan-r-rhbz, toddz, yatiohi |
Target Milestone: | --- | Keywords: | Reopened |
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | 2.6.24.5-85.fc8 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2008-04-29 20:54:42 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Attachments: |
Description
Mladen Kuntner
2007-06-07 08:27:58 UTC
I also had the exact same problems until I tried dvgrab as root. Then it worked! For me at least... In FC6 I could also use dvgrab as a normal user so something has changed. Anyone knows how normal users can get access as well (some udev rules for /dev/ fwX I guess)? mladen: does it work if you run dvgrab as root? dvgrab as normal user also worked great for me in FC6. Running dvgrab as root is no go for me, however. I get the behavior described in Bug 240771 (https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=240771). Running as root is working for me but only if i run it like: dvgrab --noavc ... and the tape is already running so it look like two bugs 1. not possible to run as normal user 2. libavc1394 does not work with my camera JVC GR-DVL157E I don't see bug 240771. I also experience Bug 240771 when attempting to run dvgrab as root. I've added my experiences and log files to bug 240771. Find advice that firewire is electrically noisy on dongle connectors. Put old firewire card in that has direct connectors. lspci -v : 01:07.0 FireWire (IEEE 1394): VIA Technologies, Inc. IEEE 1394 Host Controller (rev 43) (prog-if 10 [OHCI]) Subsystem: VIA Technologies, Inc. IEEE 1394 Host Controller Flags: bus master, medium devsel, latency 32, IRQ 19 Memory at f8000000 (32-bit, non-prefetchable) [size=2K] I/O ports at c000 [size=128] Capabilities: [50] Power Management version 2 And the result is worse: [root@localhost hack]# dvgrab Found AV/C device with GUID 0x0080880100a306bb Message from syslogd@ at Wed Jun 27 10:15:18 2007 ... localhost kernel: Oops: 0000 [1] SMP Message from syslogd@ at Wed Jun 27 10:15:18 2007 ... localhost kernel: CR2: ffffffffffffffea going back to onboard nvidia On suggestion from fedora-devel-list added Kristian Høgsberg to cc *** This bug has been marked as a duplicate of 240771 *** I must disagree with your decision. I think that this are two separate bugs. Can you please read: https://www.redhat.com/archives/fedora-devel-list/2007-June/msg02697.html summary: - test-dv is working on onboard nvidia controller - on onboard nvidia controller there is no kernel oops and in 50% dvgrab is working ok. If you need any additional data please let me know. thanks mladen *** Bug 308741 has been marked as a duplicate of this bug. *** Please re-test with the latest libraw1394 just pushed to f7 updates and let me know if that makes any significant difference. not working as user trying as root: **** try 1 ***** [root@localhost test]# dvgrab --duration 20 test Found AV/C device with GUID 0x0080880100a306bb ioctl call failed, retval = -1 ieee1394io.cc:426: In function "virtual bool iec61883Reader::StartReceive()": "iec61883_dv_fb_start( m_iec61883dv, channel )" evaluated to -1 ieee1394io.cc:426: errno: 22 (Invalid argument) "" 0.00 MB 0 frames Capture Stopped and hanging the terminal ( ctrl c not working..) this one is with: 01:07.0 FireWire (IEEE 1394): VIA Technologies, Inc. IEEE 1394 Host Controller (rev 43) (which is OHCI 1.0 and i think don't have support in kernel) **** try 2 ***** [root@localhost test]# dvgrab --duration 20 test send oops send oops send oops send oops send oops send oops send oops send oops send oops send oops send oops send oops send oops send oops rom1394_1 warning: read failed: 0x0000fffff000040c rom1394_1 warning: read failed: 0x0000fffff0000410 Found AV/C device with GUID 0x01ff317701ff3177 send oops send oops send oops send oops "" 0.00 MB 0 frames Capture Stopped Error: no DV this one with: 01:0a.0 FireWire (IEEE 1394): Texas Instruments TSB82AA2 IEEE-1394b Link Layer Controller (rev 01) (which is OHCI 1.1 and supported in kernel) Both FireWire cards are working ok on vanilla 2.6.23 kernel and old stack. (In reply to comment #11) > not working as user trying as root: I think we require some udev updates (which are in f8, where this does work) to get that working. > **** try 1 ***** > [root@localhost test]# dvgrab --duration 20 test > Found AV/C device with GUID 0x0080880100a306bb > ioctl call failed, retval = -1 > ieee1394io.cc:426: In function "virtual bool iec61883Reader::StartReceive()": > "iec61883_dv_fb_start( m_iec61883dv, channel )" evaluated to -1 > ieee1394io.cc:426: errno: 22 (Invalid argument) > "" 0.00 MB 0 frames > Capture Stopped > > and hanging the terminal ( ctrl c not working..) > > this one is with: > 01:07.0 FireWire (IEEE 1394): VIA Technologies, Inc. IEEE 1394 Host Controller > (rev 43) > (which is OHCI 1.0 and i think don't have support in kernel) Correct, 1.0 controllers aren't yet supported. As for the hang, not quite sure what that's about. I think I've seen that happen, but I believe its cleared up by the latest dvgrab 3.0 build in f8. I'm respinning the same for f7 right now... Ah, done. Please see if this dvgrab works better for you: http://koji.fedoraproject.org/packages/dvgrab/3.0/2.fc7/ > **** try 2 ***** > [root@localhost test]# dvgrab --duration 20 test > send oops > send oops > send oops > send oops > send oops > send oops > send oops > send oops > send oops > send oops > send oops > send oops > send oops > send oops > rom1394_1 warning: read failed: 0x0000fffff000040c > rom1394_1 warning: read failed: 0x0000fffff0000410 > Found AV/C device with GUID 0x01ff317701ff3177 > send oops > send oops > send oops > send oops > "" 0.00 MB 0 frames > Capture Stopped > Error: no DV > > this one with: > 01:0a.0 FireWire (IEEE 1394): Texas Instruments TSB82AA2 IEEE-1394b Link Layer > Controller (rev 01) > (which is OHCI 1.1 and supported in kernel) The 'send oops' stuff is new to me... Is this by chance a powerpc system? If so, yet another libraw1394 update coming up shortly should fix the lack of video captured there. Just got dv capture working on my own powerbook earlier today, need to backport the f8 fixes to f7 and push a build. Please also upgrade libraw1394 to this build, particularly if that's a ppc system: http://koji.fedoraproject.org/packages/libraw1394/1.2.1/11.fc7/ It is not powerpc. It is: 2.6.22.1-27.fc7 #1 SMP Tue Jul 17 17:19:58 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux libraw1394-1.2.1-10.fc7 libavc1394-0.5.3-1.fc6 dvgrab-3.0-2.fc7 I try again with Texas Instruments TSB82AA2 controller and maximum that i can get is: [root@localhost test]# dvgrab --noavc --duration 20 test send oops send oops send oops send oops send oops send oops send oops send oops send oops send oops send oops send oops send oops send oops rom1394_1 warning: read failed: 0x0000fffff000040c rom1394_1 warning: read failed: 0x0000fffff0000410 Found AV/C device with GUID 0x01ff317701ff3177 Capture Started "test001.avi": buffer underrun near: timecode 00:00:32.24 date 2007.10.13 09:24:33 This error means that the frames could not be written fast enough. "test001.avi": 72.33 MB 500 frames timecode 00:00:53.00 date 2007.10.13 09:24:53 Capture Stopped Warning: 1 dropped frames. This is with --noavc option for dvgrab and camera already playing. Captured avi file is ok. But this is not new it works like that from FC7 test 1. I think error is in libavc1394? (i could not try new libra1394 because of dependencies) Erm, what dependencies? The libraw1394-1.2.1-11.fc7 package should cleanly upgrade -10.fc7. What exactly do you get for output when you try that upgrade? If it doesn't cleanly upgrade, there could well be some bad bits in the mix, interfering with things... Your upgrade failure sounds suspiciously similar to bug 353231 (and others filed earlier), which are due to some 3rd-party packages being installed. If the upgrade failure does mirror bug 353231, please get rid of any 3rd-party libraw1394 bits and make sure you're not loading the ieee1394 driver stack, only the juju firewire driver stack. I'd forgot ATrpms was packaging up a libraw1394 w/the old ieee1394 support enabled instead of juju and ieee1394-kmdl packages. Mixing and matching the two is bound to cause bad things to happen, you have to settle on one or the other. And I can only do anything about the juju bits. I downloaded libraw1394-1.2.1-11.fc7.x86_64.rpm double click gives me: Missing Dependency: libraw1394.so.8 is needed by package libavc1394 Missing Dependency: libraw1394 = 1.2.1-10.fc7 is needed by package libraw1394-devel Missing Dependency: libraw1394.so.8 is needed by package libiec61883 Missing Dependency: libraw1394.so.8 is needed by package libfreebob There were for sure no 3rd-party bits when I was testing FC7 test1 and I don't think that there are any now. I tried that also on clean FC7 32 bit install and on clean LFS build and result was the same. I could try rpm -e --nodeps libraw1394 and then installing new one. OK, hrm, seems you have libraw1394-devel-1.2.1-10.fc7 installed as well as libraw1394-1.2.1-10.fc7. You'll have to either remove libraw1394-devel or also download the newer libraw1394-devel and upgrade both libraw1394 and libraw1394-devel simultaneously, then I think you should be all set. For sanity's sake, just check that there's nothing unexpected in the output of 'rpm -qa |grep libraw'. Of course, since you're not on ppc, I'm not all that hopeful that this is going to make a difference, but we might as well be certain... I did: yum remove libraw1394-devel but stil get: Missing Dependency: libraw1394.so.8 is needed by package libavc1394 Missing Dependency: libraw1394.so.8 is needed by package libiec61883 Missing Dependency: libraw1394.so.8 is needed by package libfreebob # rpm -qa | grep libraw libraw1394-1.2.1-10.fc7 libraw1394-1.2.1-10.fc7 should i try rpm -e --nodeps libraw1394 and then installing new one? Aha, I see. You've got a 64-bit system with both the i386 and x86_64 versions of libraw1394 installed. You'll have to either grab *both* arch libraw1394 upgrade packages and upgrade the both of them, or yum remove the one of the arch you don't need. I did: rpm -e --nodeps libraw1394.i386 rpm -e --nodeps libraw1394.x86_64 then installed libraw1394-1.2.1-11.fc7 but nothing changed. As for "send oops" I think it is a bus reset between the send command and the answer from camera (see comment #5). I just re-read the entire bug, and I think I may have been slightly confused and/or missed something at one point, so just to clarify... The main thing we're trying to fix is that your OHCI 1.1 controller only captures video 50% of the time, right? Also, when it does capture, are you always seeing the dropped frame (or frames)? The non-working OHCI 1.0 controller is a separate known issue, tracked in bug 344851. As you can see from the dates this is long story. There are really 3 bugs that i at that time did not know about: 1. firewire (testlibraw, dvgrab, dvcont...) working only as root - this one is still there in fully updated FC8 2. firewire not working on OHCI 1.0 card - this one is still there but as you said is bug 344851 3. the original bug ( "send oops" messages and dvgrab working about 50% of time) - this one is still there in fully updated FC8 and now it is 100% NOT working kernel: 2.6.23.1-49.fc8 #1 SMP x86_64 x86_64 x86_64 GNU/Linux libraw: libraw1394-1.3.0-3.fc8 dvgrab: dvgrab-3.0-2.fc8 in dmesg with no camera i see: firewire_core: created new fw device fw0 (0 config rom retries, S800) when camera is connected this lines are added: firewire_core: created new fw device fw1 (0 config rom retries, S200) firewire_core: phy config: card 0, new root=ffc1, gap_count=5 firewire_core: BM lock failed, making local node (ffc0) root. firewire_core: phy config: card 0, new root=ffc0, gap_count=5 ( what "BM lock failed" means? something wrong here? ) now i try to work with camera as root: testlibraw - > looks ok dvcont play send oops send oops send oops Could not find any AV/C devices on the 1394 bus. this is not working dvgrab test send oops send oops send oops Error: no camera exists not working dvgrab --noavc send oops send oops send oops Error: invalid source specified not working Conclusion: things changed in FC8 but to worse, firewire (dvgrab,dvcont) is now 100% not working on my machine. It is a kernel bug: this code is from linux-2.6.24.rc5 /drivers/firewire/fw-ohci.c /* FIXME: Document how the locking works. */ if (ohci->generation != packet->generation) { if (packet->payload_length > 0) dma_unmap_single(ohci->card.device, payload_bus, packet->payload_length, DMA_TO_DEVICE); packet->ack = RCODE_GENERATION; return -1; } When i run: dvcont play i get into this code and after that "send oops" is displayed. If i just comment out this part of code dvcont commands (play, stop ..) works ok. "dvgrab test" starts camera, makes avi file but in the end i get a: dvgrab[2264]:segfault .... So it looks like error in this part of code? One more interesting thing: the VIA OHCI 1.0 controller is working now (kernel 2.6.24.rc5) without any changes in code, But dvgrab is segfaulting in the end. Captured video is ok. Answering the request for testing in redhat-devel mailing list: with kernel 2.6.24-0.147.rc7.git2.fc9 we have: VIA controller: -32bit - video capture OK - segfault at the end -64bit - video capture OK NVIDIA controller: -32bit - send oops - video capture not working -64bit - send oops - video capture not working > NVIDIA controller: > -32bit - send oops - video capture not working > -64bit - send oops - video capture not working If it is NVidia nForce2, then that's bug 244576, unfortunately. Segfault of dvgrab on exit is known but not yet debugged. It's harmless though. It is: 01:0a.0 FireWire (IEEE 1394): Texas Instruments TSB82AA2 IEEE-1394b Link Layer Controller (rev 01) (prog-if 10 [OHCI]) Subsystem: Giga-byte Technology GA-K8N Ultra-9 Mainboard Flags: bus master, medium devsel, latency 32, IRQ 18 Memory at f8004000 (32-bit, non-prefetchable) [size=2K] Memory at f8000000 (32-bit, non-prefetchable) [size=16K] Capabilities: [44] Power Management version 2 It is on mainboard with NVidia chipset. >> If it is NVidia nForce2, > > It is: > 01:0a.0 FireWire (IEEE 1394): Texas Instruments TSB82AA2 IEEE-1394b Link Layer ... > It is on mainboard with NVidia chipset. That's different. TI OHCI 1.1 controllers, TSB82AA2 in particular, are supposed to work with dvgrab. Mine does. The mainboard's chipset behind the FireWire controller doesn't matter to the FireWire drivers, except perhaps if there are IOMMUs or/and very complicated bus bridges which might expose driver bugs that are hidden on simpler machines. (CardBus bridges and PCIe to PCI bridges are no problem, but at least a PCI-X to PCI multiplexer has been reported to make trouble.) On the other hand, I read that some mainboards equipped with 1394b/ FireWire 800 controller exhibited SBP-2 storage performance which was worse than 1394a/ FireWire 400 controllers. Unless the testers were fooled by Windows, those mainboards evidently had a buggy FireWire implementation. Please look comment 24. At one moment there is a packet with payload_length 0 and ohci->generation 1 bigger then packet->generation and after that i get "send oops". If i just comment out that code dvgrab is working?? > Please look comment 24.
>
> At one moment there is a packet with payload_length 0
> and ohci->generation 1 bigger then packet->generation
>
> and after that i get "send oops".
>
> If i just comment out that code dvgrab is working??
That code is asynchronous transmission. Either dvgrab wanted to send a request,
or fw-core wanted to send a request or response. It is certainly the former.
ohci->generation > packet->generation means that there was a bus reset after
dvgrab or fw-core generated the packet but before fw-ohci was able to enqueue
the packet. In this case, we definitely must not enqueue the packet. The code
which you quoted is correct and has to stay. There is obviously a bug somewhere
else (apparently higher up) which needs to be fixed. Whoever sent the request
or reply has to handle the transaction abortion due to RCODE_GENERATION
appropriately, according to whatever application-level protocol is involved.
> Whoever sent the request or reply has to handle the transaction abortion
> due to RCODE_GENERATION appropriately, according to whatever
> application-level protocol is involved.
PS: Either this condition doesn't happen on the reporter's setup with the old
driver stack due to whatever differences in timing, or there are differences how
old and new stack present such a transaction abortion to dvgrab.
(In reply to comment #28) > It is: > 01:0a.0 FireWire (IEEE 1394): Texas Instruments TSB82AA2 IEEE-1394b Link Layer > Controller (rev 01) (prog-if 10 [OHCI]) > Subsystem: Giga-byte Technology GA-K8N Ultra-9 Mainboard > Flags: bus master, medium devsel, latency 32, IRQ 18 > Memory at f8004000 (32-bit, non-prefetchable) [size=2K] > Memory at f8000000 (32-bit, non-prefetchable) [size=16K] > Capabilities: [44] Power Management version 2 > > It is on mainboard with NVidia chipset. Not sure how I didn't notice this before, but I have the exact same motherboard in my possession, complete with the same controller. However, its in my personal mail and web server, so I'm somewhat reluctant to do any extensive testing on it... But I'll throw some updated firewire drivers on it and see what I can see. Finally got an updated kernel (2.6.23.14-119.fc8) booted on the box, but I've been fixated on storage of late. Thus far, sbp2 stuff works great on it. :) Will try to mix in some dv testing soon though. Some of the bits regarding mismatched generations *might* be fixed in rawhide, less sure if they're fixed in the kernel I'm running on this box now... Finally got around to dvgrab on my GA-K8N... The results I get are identical to those in the attachment to bug 370931 (https://bugzilla.redhat.com/attachment.cgi?id=251301). Hi, new tests: With Kernel 2.6.25-0.54.rc2.fc9: #at start-up: [fedora@fedora ~]$ dmesg | grep firewire firewire_ohci: Added fw-ohci device 0000:00:0d.0, OHCI version 1.10 firewire_ohci: Added fw-ohci device 0000:02:0c.0, OHCI version 1.0 firewire_core: created device fw0: GUID 00e0180000cc2afe, S400 firewire_core: created device fw1: GUID 00308d0f00bc03e4, S400 #still same problem when pluging in the camera on my (motherboard, nForce2) firewire: firewire_core: BM lock failed, making local node (ffc0) root. firewire_core: phy config: card 0, new root=ffc0, gap_count=5 #ad libitum and system freeze... #when pluging in the camera (DV500+, after a "reset"): firewire_ohci: node ID not valid, new bus reset in progress firewire_core: created device fw2: GUID 00008500008cec1e, S100 firewire_core: phy config: card 1, new root=ffc1, gap_count=5 firewire_core: BM lock failed, making local node (ffc0) root. firewire_core: phy config: card 1, new root=ffc0, gap_count=5 firewire_core: phy config: card 1, new root=ffc1, gap_count=5 firewire_core: giving up on config rom for node id ffc1 #no more messages after running dvgrab! dvgrab: [root@fedora fedora]# dvgrab -i ./ttt.avi Found AV/C device with GUID 0x00008500008cec1e Going interactive. Press '?' for help. "stdout": buffer underrun near: timecode 00:4875865:-1993832910.00 date 2008.03.04 09:36:35 This error means that the frames could not be written fast enough. Capture Started" ff:ff:ff:ff "" sec "./ttt004.avi": 26.62 MiB 184 frames timecode 00:250000000:-1076310716.03 date 2008.03.04 09:36:45 Capture Stopped Warning: 1 dropped frames. Erreur de segmentation:ff:ff "" sec #Same problems as previous comment: "segfault" at living (and "buffer underrun" whem starting)! #But no more "dmesg" when leaving... #gstreamer: [root@fedora fedora]# gst-launch dv1394src ! decodebin name=d ! queue ! audioconvert ! audioresample ! alsasink d. ! ffmpegcolorspace ! xvimagesink Setting pipeline to PAUSED ... Pipeline is live and does not need PREROLL ... Setting pipeline to PLAYING ... New clock: GstSystemClock Caught interrupt -- handling interrupt. Interrupt: Setting pipeline to PAUSED ... Execution ended after 33318467000 ns. Setting pipeline to PAUSED ... Setting pipeline to READY ... Caught SIGSEGV accessing address 0xb6886004 #0 0x00110402 in _start () from /lib/ld-linux.so.2 #1 0x0065647b in ?? () #2 0x009c1218 in ?? () #3 0x00000000 in ?? () Spinning. Please run 'gdb gst-launch 3145' to continue debugging, Ctrl-C to quit, or Ctrl-\ to dump core. #Here everything work well too, until the exit (and another "segfault"!) #I can't quit by closing the vidéo window: I have to do 'ctrl-C' in the terminal #I can't do a dump (french keyboard: '\' with 'altgr', doesn't work...) #The video window doesn't even close until I close the terminal! So, few progresses since kernel 2.24... I'll continue testing! Bastien. Bastien, if the FireWire controller itself is the nForce2 integrated one (i.e. not a different controller on an otherwise nForce2 based motherboard), then it is unlikely that the driver updates do any good for you --- until there is some specific work done for bug 244576. (Of course sometimes a seemingly unrelated update may also unexpectedly fix a bug, but it is very unlikely that this happens in the nForce2 case, due to the special nature of the nForce2 quirk.) Stefan, with nForce2, nothing's working; The dvgrab and gstreamer dumps are done when I'm using my Pinacle DV500 card! Bastien. PS:I know this bug report isn't about the nForce2 bug -- I won't talk about it here! Okay, I've just now finally managed to free up my own GA-K8N system from its other duties and plan to do some more prodding on this issue in the upcoming week. FWIW as another data point: # lspci | grep 1394 04:05.0 FireWire (IEEE 1394): Texas Instruments TSB43AB22/A IEEE-1394a-2000 Controller (PHY/Link) <plug old JVC camcorder in> # tail /var/log/messages ... Mar 24 20:39:37 malbec kernel: firewire_core: created device fw1: GUID 0080880100a05168, S200 Mar 24 20:39:37 malbec kernel: firewire_core: phy config: card 0, new root=ffc1, gap_count=5 # dvgrab send oops send oops send oops Error: no camera exists Mike, you might also want to get on the cc list for bug 435550, which specifically deals with the TI chipset you've got there. Might ultimately be the same issue for both that bug and this one, not sure yet. Bug 435550 is different from comment #40 (Mike's comment). Mike's dvgrab fails while trying to read the config ROMs of the nodes on the bus. Bug 435550 is about failing isochronous reception after having the camcorder successfully detected. Hm... I was thinking the camera's rom had been read, and it was set up as device fw1... Yep, GUID 0x008088 is "VICTOR COMPANY OF JAPAN, LTD.", aka JVC. Mike, it would be good to know exactly which kernel that was with too, since the firewire drivers have been changing quite rapidly of late. Just looked at 435550 again... Seems there are several different problems that have evolved over time there. Ugh. Can't keep 'em all straight in my head anymore... comment #42: >> Mike's dvgrab fails while trying to read the config ROMs of the nodes >> on the bus. comment #43: > Hm... I was thinking the camera's rom had been read, and it was set up > as device fw1... Dvgrab issues its own read requests. (I believe this is different from IIDC cameras and libdc1394 clients because AFAIK libdc1394 has an abstraction for config ROM reads and is thus able to use Juju's config ROM cache.) For a while I have also seen dvgrab saying that there was no camera on the bus, despite fw-core being able to probe it and creating fw1 and fw1.0 for the camcorder. This happened on x86-64 with Gentoo's dvgrab-3.1, while the same combination of dvgrab + libraw1394 + kernel (except this being x86-32) still worked on a different PC with various cards. Today, dvgrab works again on my x86-64 PC, but I am too lazy to find out what fixed it. > For a while I have also seen dvgrab saying that there was no camera > on the bus, despite fw-core being able to probe it and creating fw1 > and fw1.0 [...] Today, dvgrab works again on my x86-64 PC, but I am > too lazy to find out what fixed it. PS: This is mainline kernel plus latest firewire development patches, i.e. minimally ahead of firewire drivers in latest fedora kernel. The two gap count related fixes which I sent today are in principle relevant, but comment #40 shows that there is no gap count problem on Mike's setup. In reply to Jarod at #44, I am not with the machine in question right now, but the kernel would be the latest F8 as I made sure it was fully up to date (at least modulo the local mirror in .AU), so kernel-2.6.24.3-34.fc8 (x86_64). Stefan is right in #42, dvgrab is failing to recognize the camcorder, not failing to read it. I traced dvgrab into raw1394util.c:discoverAVC(), as far as the line where it calls rom1394_get_node_type(). This is after a successful call to rom1394_get_directory(), so it looks like its at least trying to read the ROM, and thinking it succeeds, but the data may be bad. Re comment #46: ah, didn't know that dvgrab was trying to do its own thing... Re comment #48: I think -50.fc8 was pushed to updates over the weekend, so yeah, might not have hit all the mirrors yet. Particularly since you're on x86_64, it would be interesting to see how it does with -50... Hi. I have kernel 2.6.25-0.136.rc6.git5.fc9 x86_64. dvgrab still not working. OK, -50.fc8 arrived this evening, but the behaviour remains identically broken. Mike, could you provide further diagnostics: # logger "== begin test ==" # echo -1 > /sys/module/firewire_ohci/parameters/debug <plug camcorder in> # dvgrab Then post the contents of /var/log/messages (I presume that's the correct log; am not 100% sure) from the beginning of the test till its end. Thanks. Re comment #52: Mladen, ditto. :-) Created attachment 299412 [details]
firewire_ohci debug messages
Created attachment 299571 [details]
strace run of dvgrab
Stefan, switching on that debug flag does not produce anything new:
...
Mar 29 22:59:22 malbec root: === begin test ===
Mar 29 22:59:31 malbec kernel: firewire_core: created device fw1: GUID
0080880100a05168, S200
Mar 29 22:59:31 malbec kernel: firewire_core: phy config: card 0, new
root=ffc1, gap_count=5
Mar 29 22:59:38 malbec root: === end test ===
...
On the user side, I have narrowed the trouble down to the call to
avc1394_check_subunit_type, as this ltrace excerpt shows:
...
raw1394_get_nodecount(0x2040040, 0x20462f8, 0, 0, 0xfefefefefefefeff) = 2
rom1394_get_directory(0x2040040, 1, 0x7fff6e050380, 0, 0xfefefefefefefeff) = 0
rom1394_get_node_type(0x7fff6e050380, 0x2046a60, 0xfefefefeff40442f, 0x414530,
0
xfefefefefefefeff) = 2
avc1394_check_subunit_type(0x2040040, 1, 0x200000, 0x414530,
0xfefefefefefefeffs
end oops
send oops
send oops
) = 0
raw1394_get_nodecount(0x2040040, 0, 0x3c81f529e0, 0x2046b80, 0x2046b90) = 2
...
Hopefully the attached strace run is helpful, albeit full of cryptic ioctls.
> switching on that debug flag does not produce anything new Probably because kernel log messages at "debug" level don't go to /var/log/messages. ``grep -rl "selfID 0" /var/log´´ may find the right file. If not, the output of dmesg will contain it. > Hopefully the attached strace run is helpful, albeit full of cryptic ioctls. The drawback is that we don't get to see the data passed in via ioctl. > Created an attachment (id=299412) [edit]
> firewire_ohci debug messages
Thanks. I probably need to get myself the AV/C specs to understand this.
Trouble begins apparently after the config ROM was successfully read (lots of
quadlet reads, local node is ffc0, camera is ffc1). Then one AV/C register
write (?) is successfully performed:
AT spd 1 tl 1d, ffc0 -> ffc1, ack_pending , BW req, fffff0000b00 8,0
AR spd 1 tl 1d, ffc1 -> ffc0, ack_complete, W resp
Now we get in turn a write request from the camera:
AR spd 1 tl 1d, ffc1 -> ffc0, ack_pending , BW req, fffff0000d00 8,0
but which ever software is supposed to service the address range covering
fffff0000d00 does not sending out the required "W resp" (write response).
However, there has to be a software which installed an address range handler for
it, otherwise we should see fw-core sending a write response (with address error
rcode).
Instead, dvgrab proceeds issuing write requests to fffff0000b00 which the
camera's transaction layer doesn't accept for processing:
AT spd 1 tl 1e, ffc0 -> ffc1, ack_busy_X, BW req, fffff0000b00 8,0
AT spd 1 tl 1f, ffc0 -> ffc1, ack_busy_X, BW req, fffff0000b00 8,0
AT spd 1 tl 00, ffc0 -> ffc1, ack_busy_X, BW req, fffff0000b00 8,0
until dvgrab gives up.
Would you have more luck with "dvgrab -noavc"?
Created attachment 299598 [details]
dmesg output
grep finds nothing, but attached is the contents of dmesg from the point the
camcorder firewire cable is plugged in, through a run of dvgrab, to unplug.
Thanks. Same as Mladen's log, except that ffc1 is the local node and ffc0 is the camera, and there is already an earlier write request from the camera which the local node doesn't respond to. After that, the camera still responds to config ROM reads and one register write, but it becomes unresponsive after its second write request passes without response from the PC. Ditto, you might be able to get further with "dvgrab -noavc". Created attachment 299607 [details]
dvgrab -noavc, plus kernel logging
dvgrab -noavc fails, albeit differently. The run and kernel logging are all in
the attachment, now that the kernel has been convinced to write debug messages
to /var/log/messages.
Created attachment 299609 [details]
dvgrab -noavc -I /dev/fw1, plus kernel logging
Following on, dvgrab -noavc -I /dev/fw1 also fails, albeit seemingly in the
"fails to get data and hangs" rather than "fails to find device" mode,
previously seen near the top of this bug.
> + dvgrab -noavc -I /dev/fw1
> "" 0.00 MB 0 frames
> Capture Stopped
"-I file" is probably wrong, because this most certainly has to be a previously
captured DV stream, or a stream like the output of the old dv1394 driver.
Instead, try the following:
Connect camera,
start replay (or recording if in webcam mode),
$ dvgrab -noavc -interactive
Dvgrab should print
Found AV/C device with GUID 0x0080880100a05168
Going interactive. Press '?' for help.
It may also print a buffer underrun which should be harmless.
'?' will show the accepted keys. Press 'c' for capture. dvgrab should print
Capture started
It should now write into the file dvgrab-001.dv. After a while, press 'escape'
to stop capture. dvgrab should print something like
"dvgrab-001.dv": 972.84 MiB 7084 frames timecode
-1075085276:-1075085544:-1210018631.168909024 date 2008.03.30 10:11:28
Capture Stopped
and maybe the number of dropped frames. Then quit dvgrab with 'q'.
[...testing this myself...]
Hmm, this way, i.e. trying to "capture" without sending a "start" command, still
involves several block write requests. I will attach the messages which I get.
Created attachment 299614 [details]
kernel log from a working system
local node: ffc1, camera: ffc0
I also get several block write requests to the camera which end in ack_busy_X.
But the subsequent retry of such an attempt then succeeds with write response
from the camera.
And the following here right after "created device fw3" is _not_ caused by
dvgrab:
AT spd 0 tl 07, ffc1 -> ffc0, ack_complete, BW req, fffff0000b00 8,0
AR spd 0 tl 07, ffc0 -> ffc1, ack_pending , BW req, fffff0000d00 8,0
AT spd 0 tl 07, ffc1 -> ffc0, ack_complete, W resp
There are BTW two flavors of write transactions: "Split transactions" look
like this if successful:
ack_pending , BW req
ack_complete, W resp
"Unified transactions" look like this if successful:
ack_complete, BW req
My camcorder happens to support the latter kind for write transactions with the
PC as requester and the camera as responder.
No I got to find out what's at offset fffff0000d00 and why my kernel is
responding at it but yours isn't.
$ uname -a
Linux stein 2.6.25-rc6 #12 SMP PREEMPT Sun Mar 23 00:55:50 CET 2008 i686
Intel(R) Core(TM)2 CPU T7200 @ 2.00GHz GenuineIntel GNU/Linux
(I have got patches as in linux1394-2.6.git in this kernel.)
> $ dvgrab -noavc -interactive
>
> Dvgrab should print
> Found AV/C device with GUID 0x0080880100a05168
> Going interactive. Press '?' for help.
No, this just fails immediately, with the same error message as
without -interactive. No time right now to look hard at the debug messages,
but that looks very similar.
>> $ dvgrab -noavc -interactive
> No, this just fails immediately, with the same error message as
> without -interactive. No time right now to look hard at the debug messages,
> but that looks very similar.
That's plausible.
I need to find out why the PC is not sending the write response on your and
Mladen's setup.
Please see #24. As i see it it is sending response but with wrong generation. Then it goes out with error instead of doing right thing. Possible? > Please see #24.
>
> As i see it it is sending response but with wrong generation.
Right... this bug entry has become too long...
The response generation is supposed to come from the last "AR evt_bus_reset,
link internal". Why is there a wrong generation? BTW, one of my cards is a
"Texas Instruments TSB82AA2 IEEE-1394b Link Layer Controller (rev 01)" too, and
it works perfectly. This would be so much easier if I had an affected system
myself.
Could you add a printk to fw-ohci.c to show the ohci->generation and
packet->generation when they are compared in at_context_queue_packet? And while
you are at it, maybe you find other places where a printk would tell us
something about how it goes wrong.
Explanation of the generation thing: The bus generation is a counter which
increases at each bus reset. We need the generation to fulfill the rule that a
bus reset terminates a transaction if there wasn't a response sent before the
bus reset. IOW a response is not accepted anymore if a bus reset happened
between request and response. The OHCI controller tells us the generation in
two ways (three if we count very pedantic): By way of the self ID complete
event (implemented in fw-ohci.c's bus_reset_tasklet) and by way of synthesized
asynchronous packets which are inserted in the asynchronous request receive
buffer (implemented in fw-ohci.c's handle_ar_packet by the
OHCI1394_evt_bus_reset). This second way is necessary because processing of the
receive buffers might be delayed WRT self ID receive events, and we could assign
a too new generation to received packets while processing the receive buffer if
it weren't for that synthesized delimiter packet.
Hmm, if the request from the camera came in right after a bus reset, I could
imagine that there might be a race condition where we execute the
ar_request_ctx.tasklet earlier than the bus_reset_tasklet and could attempt to
send a response with what fw-ohci would consider a "future" generation. But in
your logs, the request happens many interrupts after the last bus reset, so if
there is such a race condition, it didn't hit your system.
I cannot try now but from memory: after reboot and camera turned on on first dvgrab call ohci->generation and packet->generation fere off by 1 ( i think 4 and 5 ). I think something is wrong in bus reset when camera is turned on. Maybe it is not controller but some timing issue ( i have old S200 camera ) ? As for affected system could you use ssh to my machine? two options: 1. LFS system 2. Fedora rawhide Created attachment 300219 [details]
gdb excerpt and command line run of dvgrab -noavc
I just had a moment for a quick look at dvgrab with gdb, and provoked different
behaviour again (see attached) --- this time it actually finds the device,
although running the same dvgrab -noavc from the command line still fails as
ever. Anyway if you want to know what is actually being sent in the ioctls I
can put some instrumentation in now if it would help.
Created attachment 301399 [details]
dvcont play with printk-s
As suggested i put few printk-s in fw-ohci.
Command executed is "dvcont play"
As told before at first, response ohci and packet generations
are not equal and we get error.
Is it really ok to check generation for response?
Comment in handle_ar_packet says we don't?
Created attachment 301967 [details]
/var/log/messages, dvgrab -noavc
Still broken at 2.6.24.3-50.fc8.
Re comment #70: > Created an attachment (id=301399) > dvcont play with printk-s > > As suggested i put few printk-s in fw-ohci. > Command executed is "dvcont play" Thanks. Since the whole log shows only ohci->generation 5 for lots of outgoing requests, and only one ohci_send_response in the middle of a bunch of (successful?) transmissions of requests happens --- and this has a packet->generation of 4 --- it looks like the OHCI controller inserted a PHY packet with a wrong generation. When you got the time, please log what values are assigned... ...to ohci->request_generation in handle_ar_packet(), ...to ohci->generation in bus_reset_tasklet() whenever these assignments are performed. > Is it really ok to check generation for response? > Comment in handle_ar_packet says we don't? I tried it to explain in comment #67. Let me put it in some other words. The drivers have to fulfill the following jobs to ensure that neither outbound transactions nor inbound transactions wrap over bus resets. This is done in differing ways: 1. Outbound transactions = local node sends a request packet, remote node sends a response packet. When we receive a response packet, we look into our stash of incomplete transactions if there is one with the same transaction label like the transaction label of this response packet. The remote node is supposed to not send a response to any request which it got before the last bus reset. But we cannot rely on the remote node getting it right all the time. We avoid handling responses which violate that rule by discarding all pending transactions when a bus reset happens. 2. Inbound transaction = remote node sends a request packet, local node sends a response packet. Now we have the task to not send a response anymore if a bus reset happened after that request. However, it's not as easy as it sounds. The incoming requests are lined up in a DMA buffer with room for several of those. That buffer is processed not fully synchronously with bus reset events. But the OHCI chip helps us here by inserting a bus reset marker into the packet buffer (in the form of a synthesized bus reset packet as per OHCI 1.1 clause 8.4.2.3). This way we know exactly which requests came before a bus reset and which ones came after a reset. The bus reset packet contains the generation number which is current at the tine the bus reset packet is inserted. This generation is recorded in fw-ohci's variable ohci->request_generation and stored into all subsequently received request packets. When fw-core generates response packets, it uses the request packet's generation as the response's generation. fw-ohci on the other hand will prevent the sending of responses if a bus reset happened after the respective request, i.e. if the current generation is different from the request's == response's generation. > the OHCI chip helps us here by inserting a bus reset marker into > the packet buffer (in the form of a synthesized bus reset packet > as per OHCI 1.1 clause 8.4.2.3). These bus reset packets are logged as "AR evt_bus_reset, link internal" in attachment 301967 [details]. We should add logging of the selfIDGeneration of these packets so that we can check if the genereation of those is the same as the nearest self id complete event ( = properly finished bus reset). Right now we only see that there are as many bus reset packets received as there are self ID complete events. I will post a patch soon. In attachment 301967 [details], we again see some incoming requests which our drivers do not respond to. Like the one in the last line of the following part of the log: Apr 10 20:50:32 malbec kernel: firewire_core: phy config: card 0, new root=ffc1, gap_count=5 Apr 10 20:50:32 malbec kernel: firewire_core: phy config: card 0, new root=ffc1, gap_count=5 Apr 10 20:50:32 malbec kernel: AT ack_complete, phy config packet, 01c50000 Apr 10 20:50:32 malbec kernel: AR evt_bus_reset, link internal Apr 10 20:50:32 malbec kernel: AR evt_bus_reset, link internal Apr 10 20:50:32 malbec kernel: firewire_ohci: 2 selfIDs, generation 22 Apr 10 20:50:32 malbec kernel: selfID 0: 80458992, phy 0 [p-.] S400 gc=5 +15W Lci Apr 10 20:50:32 malbec kernel: selfID 0: 814548d4, phy 1 [c--] S200 gc=5 +0W Lc Apr 10 20:50:32 malbec kernel: AT spd 1 tl 0b, ffc0 -> ffc1, ack_pending , BW req, fffff0000b00 8,0 Apr 10 20:50:32 malbec kernel: AR spd 1 tl 0b, ffc1 -> ffc0, ack_complete, W resp Apr 10 20:50:32 malbec kernel: AR spd 1 tl 0b, ffc1 -> ffc0, ack_pending , BW req, fffff0000d00 8,0 ffc0 is the local node here. The last block write request from the remote node, transaction label 0b, is never concluded with a response packet ("W resp") from ffc0 to ffc1 with the same transaction label (tl) 0b. I see no reason why our node does not send a response. It might be because of the generation mismatch which Mladen reported. Even though our failure to respond is a problem, the remote node should nevertheless time out his pending request eventually. However, it appears to have trouble handling this situation because we see at some point later that it begins to answer _our_ requests to it with ack_busy_X. However, I see another seemingly unrelated but serious problem in this log: Apr 10 20:50:32 malbec kernel: firewire_core: created device fw1: GUID 0080880100a05168, S200 Apr 10 20:50:32 malbec kernel: firewire_core: created device fw1: GUID 0080880100a05168, S200 WTH is this logged twice? > However, I see another seemingly unrelated but serious problem in this log: > Apr 10 20:50:32 malbec kernel: firewire_core: created device fw1: GUID > 0080880100a05168, S200 > Apr 10 20:50:32 malbec kernel: firewire_core: created device fw1: GUID > 0080880100a05168, S200 > > WTH is this logged twice? I think its a false alarm. The camcorder on/mode switch is designed for aliens with small dexterous fingers--- I had a bit of a struggle turning it on. >> WTH is this logged twice?
>
> I think its a false alarm. The camcorder on/mode switch is designed
> for aliens with small dexterous fingers--- I had a bit of a struggle
> turning it on.
That can explain series of (self ID complete event; read config ROM with many AT
and AR events; created device)^n, but it does not explain a sequence of (created
device)^n alone.
Jarod, what is this kernel doing? (Or am I missing something obvious? It's
past 01:00 AM here now.)
I'm quite confused now too. There are quite a few things that appear to be getting logged multiple times... Whether that's a logging oddity or the driver really is trying to log those things multiple times, I dunno... Its certainly something I've never seen before. I really need to swipe the office dv camera back from krh and hook it up to a few rawhide boxes to see what I can see... Another odd thing in the log that I don't understand: AT spd 1 tl 0b, ffc0 -> ffc1, ack_pending , QR req, fffff0000490 AR spd 1 tl 0b, ffc1 -> ffc0, ack_complete, QR resp = 00000000 OK, quadlet read request, quadlet read response, transaction label 0b... Everything looks more or less okay up to this point... AT spd 1 tl 0c, ffc0 -> ffc1, ack_pending , BW req, fffff0000b00 8,0 AR spd 1 tl 0c, ffc1 -> ffc0, ack_complete, W resp Block write request and response, transaction label 0c... AR spd 1 tl 0c, ffc1 -> ffc0, ack_pending , BW req, fffff0000d00 8,0 Now another response with transaction label 0c? Huh? (and yet, we reply with an ack_pending?). Much weirdness... AT spd 1 tl 0d, ffc0 -> ffc1, ack_busy_X, BW req, fffff0000b00 8,0 AT spd 1 tl 0e, ffc0 -> ffc1, ack_busy_X, BW req, fffff0000b00 8,0 AT spd 1 tl 0f, ffc0 -> ffc1, ack_busy_X, BW req, fffff0000b00 8,0 And of course from there, we go off into the woods. Now, this is possibly relevant: I didn't recognize offsets 0xb00 and 0xd00 at first glance... The code says they are CSR_FCP_COMMAND and CSR_FCP_RESPONSE respectively. The ieee1394 driver stack has a number of functions that deal with them (FCP == function control protocol), but there isn't a lick of code in the firewire stack that explicitly touches them... Okay, just did a bit more digging... The FCP regs appear to be specific to the AV/C spec, they aren't mentioned anywhere that I can see in either the OHCI 1.1, 1394-1995, 1394a-200 or 1394b-2002 specs, or any other number of docs I've poked at thus far. First thing I've found that explicitly mentions them is this: http://www.sfc.wide.ad.jp/~koh39/paper/dps2004.pdf I suspect this might be stuff covered in the iec61883 specs or another 1394ta spec... > There are quite a few things that appear to be getting logged > multiple times... Some things do happen twice, like bus resets. (Bus generation increases by two after those.) But I'm not sure whether it is OK that we see two "firewire_core: phy config: card 0, new root=ffc1, gap_count=5" right after another. > AT spd 1 tl 0c, ffc0 -> ffc1, ack_pending , BW req, fffff0000b00 8,0 > AR spd 1 tl 0c, ffc1 -> ffc0, ack_complete, W resp > > Block write request and response, transaction label 0c... > > AR spd 1 tl 0c, ffc1 -> ffc0, ack_pending , BW req, fffff0000d00 8,0 > > Now another response with transaction label 0c? Huh? (and yet, we > reply with an ack_pending?). This is OK. One was a transaction with local node as requester and remote node as responder, the other a transaction with the roles swapped. Transaction labels correspond to ordered pairs of requester node ID and responder node ID. > AT spd 1 tl 0f, ffc0 -> ffc1, ack_busy_X, BW req, fffff0000b00 8,0 > > And of course from there, we go off into the woods. I occasionally see ack_busy_X with my DV camera too, but the next retry of the request then always succeeds. But in this log the request receive pipe of the camera is clogged and never opens up again. The reason /could/ be the fact that the PC did not respond to previous write requests from the camera. The fact that the PC did not respond may be a bug in the drivers or the controller. BTW, my cheap JVC DV camera is a little bit more optimized as it concludes 8 byte block write requests from the PC with ack_complete (unified transaction) instead ack_pending + write response (split transaction). But which transaction type is used shouldn't matter for the correctness of operation. > The FCP regs appear to be specific to the AV/C spec, [...] > I suspect this might be stuff covered in the iec61883 specs Yes, the FCP part of the AV/C spec went into IEC 61883-1 AFAIK. Re comment #73: Patch which shows bus reset packet generation in the AR event log: http://lkml.org/lkml/2008/4/10/370 (In reply to comment #77) > > There are quite a few things that appear to be getting logged > > multiple times... > > Some things do happen twice, like bus resets. (Bus generation increases by two > after those.) But I'm not sure whether it is OK that we see two "firewire_core: > phy config: card 0, new root=ffc1, gap_count=5" right after another. I've seen it logged twice from time to time and its not been a problem, but its not been amongst a fairly large group of events that are all duplicated, like the very top of the log -- i.e., the phy config x2, created fw1 x2, phy config x2, bm lock fail x2, phy config x2 part. > > AT spd 1 tl 0c, ffc0 -> ffc1, ack_pending , BW req, fffff0000b00 8,0 > > AR spd 1 tl 0c, ffc1 -> ffc0, ack_complete, W resp > > > > Block write request and response, transaction label 0c... > > > > AR spd 1 tl 0c, ffc1 -> ffc0, ack_pending , BW req, fffff0000d00 8,0 > > > > Now another response with transaction label 0c? Huh? (and yet, we > > reply with an ack_pending?). > > This is OK. One was a transaction with local node as requester and remote node > as responder, the other a transaction with the roles swapped. Transaction > labels correspond to ordered pairs of requester node ID and responder node ID. Ah, I was unaware of (or forgot about) the req and resp nodes mattering in transaction label uniqueness. > > AT spd 1 tl 0f, ffc0 -> ffc1, ack_busy_X, BW req, fffff0000b00 8,0 > > > > And of course from there, we go off into the woods. > > I occasionally see ack_busy_X with my DV camera too, but the next retry of the > request then always succeeds. But in this log the request receive pipe of the > camera is clogged and never opens up again. The reason /could/ be the fact that > the PC did not respond to previous write requests from the camera. Indeed, my assumption is the camera is still waiting for the response to that write request and rejecting all our later requests w/the ack_busy_x. > The fact that the PC did not respond may be a bug in the drivers or the controller. Given that we don't seem to have anything in the way of FCP handling code in the firewire stack, I'm leaning slightly toward us failing to do something in the driver, but I'm not quite sure yet what... And it certainly could still be a controller issue... > BTW, my cheap JVC DV camera is a little bit more optimized as it concludes 8 > byte block write requests from the PC with ack_complete (unified transaction) > instead ack_pending + write response (split transaction). But which transaction > type is used shouldn't matter for the correctness of operation. I finally dragged my GA-K8N box into the office, so I'll get the Canon we've got in the office hooked up to its on-board controller soonish. If it *is* a controller issue Mladen is hitting, and my parsing of this massive bug is correct... Then I'll at least finally be using the same controller as him... Presumably a different camera, of course. > > The FCP regs appear to be specific to the AV/C spec, [...] > > I suspect this might be stuff covered in the iec61883 specs > > Yes, the FCP part of the AV/C spec went into IEC 61883-1 AFAIK. I've emailed someone over at 1394ta.org to see what it takes to get ahold of iec61883 specs and/or their other AV/C stuff. Hopefully, not gobs and gobs of cash... I'm sure this has been done before, but I figure it doesn't hurt to try again. >> The fact that the PC did not respond may be a bug in the drivers or >> the controller. > > Given that we don't seem to have anything in the way of FCP handling > code in the firewire stack, I'm leaning slightly toward us failing to > do something in the driver, but I'm not quite sure yet what... And it > certainly could still be a controller issue... As long as userspace does not install an FCP handler, the fw-core transaction layer should generate a response with address error return code. The fact that no such response is visible in the AT event log could for example be caused by the generation issue which Mladen reports. > If it *is* a controller issue Mladen is hitting, and my parsing of this > massive bug is correct... Then I'll at least finally be using the same > controller as him... Presumably a different camera, of course. My 1394b card has the same lspci signature as Mladen's, i.e. "Texas Instruments TSB82AA2 IEEE-1394b Link Layer Controller (rev 01)". But so far I always got the same generation in the bus reset packet as in the corresponding self ID complete event. I suppose I and Mladen may nevertheless have different chip revisions though. > I've emailed someone over at 1394ta.org to see what it takes to get > ahold of iec61883 specs and/or their other AV/C stuff. Hopefully, not > gobs and gobs of cash... I'm sure this has been done before, but I > figure it doesn't hurt to try again. I wasn't "there" at the time, but the interwebs show that this was a FAQ when AV/C were turned into IEC norms. webstore.iec.ch currently lists IEC 61883-1:2008-02 as PDF for CHF 193 ~ USD 193. So... As I'd said before, I finally freed up a box at home that has the exact same motherboard (so far as I can tell) as Mladen, right down to lspci output that is 100% identical to comment #28. But with a Canon camera hooked up right now, I can capture video just fine (save the dvgrab segfault on exit). Kernel is 2.6.25-0.218.rc8.git7.fc9.x86_64. Ugh. Created attachment 302172 [details]
Capture to GA-K8N onboard from Canon camera
Created attachment 302220 [details] dvcont play additional debug re: comment 72 Thank you for sharing knowledge. Attached dmesg with requested printk-s. Actions done: 1. modprobe 2. turn camera on 3 dvcont play Attachment 302220 [details] shows that your TSB82AA2 repeatedly inserts a bus reset packet
whose generation is one off into the future:
- The only time it is correct is right at the start, where the generation
from the packet as well as the generation of the self ID buffer is 0.
- Then presumably a series of two bus resets happens.
- After that, the bus rest packet's generation already advanced to 2,
while the self ID buffer's generation is still 1.
Created attachment 302236 [details] kernel log from a working system This log shows a system with two controllers, an unused VT6307 and a TSB82AA2. (The log was generated with the patch from comment #78. I indented the log lines which belong to the VIA controller.) Superficially, this setup seems to work. But my TSB82AA2 is in fact defective too: - When I turn on the camera, it causes two bus resets. - The bus reset packets' generations advance from 1 to 2 and stay at 2, while the self ID buffer's generation advances from 1 to 3. From this point on, we wouldn't be able to send responses. Luckily we don't get requests yet. - Later, the bus manager code resets the bus for gap count optimization. This bus reset rescues the setup, because after it the bus reset packet's generation is 4, and the self ID buffer's generation is 4. (One other comment on this log: The block write request to fffff0000b00 = CSR_FCP_COMMAND at the camera at Apr 12 16:50:07, shortly after the camera's device file was created, was caused by the hald service. This is on Gentoo Linux.) My conclusion so far: This bus reset packet thing in the OHCI spec is ingenious. Alas we can't really use this feature because at least TSB82AA2's implementation is buggy. Some more things which I noticed in a few further tests: - This accident with the double bus reset and transient generation mismatch is about 90% repeatable if only the camcorder is attached to the TSB82AA2. If for example a 1394b hub is also plugged in into the card, switching on the camera causes only a normal single bus reset and the generations stay in sync. - A NEC 1394a cardbus card + the camcorder also experiences the double bus reset when the camcorder is switched on. But this card sets the generation correctly in the second bus reset. VT63063, VT6307, FW323 so far only had single bus resets and, like the NEC card, always kept bus reset packet generation and self ID buffer generation in sync. - IRQ event logs from TSB82AA2, VT63063, VT6307, NEC-something, FW323 usually show in case of a bus reset that the request-AR interrupt with the bus reset packet happens first, then the self ID complete interrupt. On two or three cards I also saw the request-AR event and self ID complete event combined in a single interrupt. I never saw that the self ID complete IRQ came before the request AR IRQ. Because we can neither rely on a strict order of request AR event vs. self ID complete event, nor on an order of execution of tasklets, it is hard or impossible to check whether the bus reset packet's generation is off by one. Since Mike has a TSB43AB22/A, I guess that it has the same bug as Mladen's or/and perhaps my TSB82AA2. Mike, please post the output of "lspci -nn | grep TSB". Proposed fix: http://lkml.org/lkml/2008/4/12/174 I'm tacking the proposed patch (well, the subsequent updated version) onto the rawhide kernel right now... Okay, should be in kernel-2.6.25-0.226.rc9.fc9 and later. [root@malbec ~]# lspci -nn | grep TSB 04:05.0 FireWire (IEEE 1394) [0c00]: Texas Instruments TSB43AB22/A IEEE-1394a-2000 Controller (PHY/Link) [104c:8023] Stefan's patch now added to kernel-2.6.24.4-81.fc8, currently working its way through the build system. Please try with kernel-2.6.24.4-82.fc8 and report results back here... Created attachment 302681 [details]
dmesg for kernel 2.6.24.4-82
attached dmesg for camera on sequence
dvcont play
and
dvgrab -d 10 test
working OK :) (segfault at the end but this is another story)
Good, TSB82AA2 down, one to go. Mike, how about TSB43AB22? There was no sign of kernel-2.6.24.4-82.fc8 in my .au mirror this morning before I left for work two hours ago. Hopefully this evening I will have something to report. Mike, -82 hasn't been pushed as an update, it won't be on any mirrors any time soon, its currently only available via pulling it straight from the build system: http://koji.fedoraproject.org/packages/kernel/2.6.24.4/82.fc8/ OK, -82 looks good. dvgrab is now detecting the camcorder and capturing .avi. Out of time though, thorough check that its all fixed tomorrow. I have done a capture of a full tape and checked the results. Looks like you have nailed it. My congratulations to all. Excellent, glad to hear we've finally got this one figured out! I'm going to go ahead and close out this bug. 2.6.24.4-84.fc8 is the latest and greatest built in brew (and actually reverts one firewire bit in -82 that can cause sbp2 issues), I'll see if we can get that or something later pushed as an actual update shortly... I'm told we should have a formal kernel update carrying this fix pushed within a few days (basically, once 2.6.24.5 is out). Fixed 10 months after report. This is not even too bad for upstream IEEE 1394 driver standards... Thanks guys, for staying persistent. kernel-2.6.24.5-85.fc8 has been submitted as an update for Fedora 8 kernel-2.6.24.5-85.fc8 has been pushed to the Fedora 8 stable repository. If problems still persist, please make note of it in this bug report. Mike, could you please tell us with how much RAM you are (successfully) using the TSB43AB22(A), and what the chipset of the motherboard is? There is another bug filed where a TSB43AB22A does not work properly in isochronous reception if buffers are located at physical addresses above 2GB (bug 435550). It would be interesting to know if anybody else uses the same chip successfully with 3GB or more RAM present. RAM = 2GB, motherboard = ASUS M2NPV-VM, chipset = nvidia geforce 6150 GPU / nvidia nforce 430 MCP This is exactly the board for which bug 435550 was reported; but the bug happens only with more than 2GB RAM. (We still haven't figured out who is the culprit of that bug: Drivers or controller or board or a combination. Hence I am looking for someone with the same controller but /different/ board + 3GB or more RAM.) |