Description of problem: It seems that the HW/SW combination I've in my hand does not work for capturing DV streams. Version-Release number of selected component (if applicable): kernel-2.6.23.15-137.fc8 How reproducible: Not really systematically, but quite often. See below. Steps to Reproduce: 1. Connect DV camera (Sony DCR-PC110E) 2. Launch: dvgrab -i -t -showstatus -debug all test 3. Play and capture. Actual results: It depends, I do not (yet) know on what. Sometimes, very seldom (actually only once), the stream is captured. Sometimes, there are a lot of "buffer underrun" errors and the stream is only partially captured, with many "holes". Almost always "dvgrab" reports something like "error no DV stream" (or similar). Sometimes, I got kernel panic and reboot was necessary (requested by the kernel itself). Expected results: Well, the stream should be captured. Additional info: The old stack seems to work fine, with only one issue (see below). The autosplit function of dvgrab does not work, the filename timestamp is not updated, so dvgrab overwrites always the same file. This might be a dvgrab problem or of some library in between, since it happens also with kino and with the old stack. Note that the timestamp is properly printed (old stack), when dvgrab captures the stream. The motherboard is an ASUS M2NPV-VM, with NVIDIA 6150 + 430 chipset, lspci -vv returns the following for the firewire part: 01:05.0 FireWire (IEEE 1394): Texas Instruments TSB43AB22/A IEEE-1394a-2000 Controller (PHY/Link) (prog-if 10 [OHCI]) Subsystem: ASUSTeK Computer Inc. K8N4-E Mainboard Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 32 (500ns min, 1000ns max), Cache Line Size: 32 bytes Interrupt: pin A routed to IRQ 19 Region 0: Memory at fddff000 (32-bit, non-prefetchable) [size=2K] Region 1: Memory at fddf8000 (32-bit, non-prefetchable) [size=16K] Capabilities: [44] Power Management version 2 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=0 PME+ Kernel driver in use: firewire_ohci Kernel modules: firewire-ohci
Please try out the same with the latest rawhide kernel build if you would, just to make absolutely certain this isn't already fixed (I don't think so, but want to verify).
With kernel 2.6.24.3-13.fc8, from koji, trying the usual: dvgrab -i -t -showstatus -debug all test I got the following: rom1394_1 warning: read failed: 0x0000fffff0000414 error reading config rom directory for node 1 Found AV/C device with GUID 0x08004601029441d8 Going interactive. Press '?' for help. "" 0.00 MB 0 frames" sec Capture Stopped Error: no DV Which is the same result as with previous kernel, so no improvement. Note that the first and second line happen always, even when the capturing works. The kernels for F9 do not seem to fit properly in F8. When I tried those, I got some errors/warnings at boot, apparently unrelated to the FW subsystem, but who knows... pg
Rawhide kernels are definitely installable on Fedora 8 systems, that's how a lot of us Fedora kernel folk tend to roll, since other parts of rawhide may well be broken, and we really only care about the kernel... :) Although of late, it does seem you may need rawhide lvm2 and mkinitrd (plus deps) to get booted on a rawhide kernel, but that should be it. Ah well, I'm pretty sure nothing that's been added to rawhide kernels will make a difference anyhow. One other thing to double-check... That's the latest dvgrab for F8, right? I think earlier versions still had some issues that have since been fixed. Not quite sure where to poke next, would be easier if I could find a setup on my end that produces the same results...
I guess we have a problem here. In order to upgrade to 2.6.25-0.80.rc3.git2.fc9 another 16 packages need to be installed/upgraded. Among all "libstdc++" and "initscripts"... This is quite a bit too much, since I still need a stable working system. One possible solution would be to compile a vanilla kernel, eventually a 2.6.25-rc3, without all the other things, if this works. If you think this helps, I could give it a try. dvgrab is 3.0-2, should be latest from F8, but not latest in general, since version 3.1 is out, which should fix something. Anyway, with the old FW stack it was working, even if the old stacks requires also a different libraw1394, maybe there is something there to check. Thanks pg
Unless Jarod has a better idea for you, you could try vanilla 2.6.24.y or 2.6.25-rcX with the very latest firewire patches from http://me.in-berlin.de/~s5r6/linux1394/updates/. Fedora kernels have much newer firewire drivers than vanilla has. (Git users can obtain firewire updates from git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6.git .)
For a minute, I thought I had a good idea, but now I don't think so... Best idea I can come up with (as far as minimal time investment goes) is probably to just take your current kernel, install the machine kernel-devel and build the firewire drivers out of git, then drop the resulting .ko files in place of the existing ones. But again, I doubt if there's any code changes that help this particular problem, I think all the relevant firewire-ohci updates are already included in the 2.6.24.3 Fedora kernel. Rawhide does have dvgrab 3.1, which might be worth upgrading to and testing. I don't *think* it'll pull in a bunch of other changes, but I'm not certain (worst case, the rawhide dvgrab should rebuild on F8 just fine). I ought to push updated dvgrab packages for F8 too...
(In reply to comment #6) > For a minute, I thought I had a good idea, but now I don't think so... Best idea > I can come up with (as far as minimal time investment goes) is probably to just Well, time investment is a minor issue, the main concern I have is to keep the system changes minimal or easily reversible, since the system should be still "available", so to speak. > take your current kernel, install the machine kernel-devel and build the > firewire drivers out of git, then drop the resulting .ko files in place of the > existing ones. But again, I doubt if there's any code changes that help this Actually I'm on my way with 2.6.25-rc3 and Stefan's patches, but I'm anyway interested in your proposal. Could you please give more details, or point me to some documentation, on how to proceed with the kernel-devel package? I quickly tried to copy the firewire*.[ch] file, from 2.6.25-rc3 to the same place in the kernel-devel tree, but then I've to idea on how to build the module. Using "make drivers/firewire" complains it does not know how to build something. I'm unsure on the correct procedure. > particular problem, I think all the relevant firewire-ohci updates are already > included in the 2.6.24.3 Fedora kernel. Rawhide does have dvgrab 3.1, which > might be worth upgrading to and testing. I don't *think* it'll pull in a bunch > of other changes, but I'm not certain (worst case, the rawhide dvgrab should > rebuild on F8 just fine). I ought to push updated dvgrab packages for F8 too... I suspect rawhide is "out of range", since dvgrab requires the new libstdc++, which I would prefer not to upgrade. I'll try with the src.rpm. Thanks! pg
Nb: there's an f8 kernel (2.6.24.3-17.fc8) with all the same firewire patches as rawhide as of today, currently building in koji. (In reply to comment #7) > Actually I'm on my way with 2.6.25-rc3 and Stefan's patches, but I'm anyway > interested in your proposal. Could you please give more details, or point me to > some documentation, on how to proceed with the kernel-devel package? Assuming for example you've got kernel-2.6.24.2-12.fc8 (i686) installed, you want to then install kernel-devel-2.6.24.2-12.fc8 (i686) as well. From in drivers/firewire, then run: make -C /usr/src/kernels/2.6.24.2-12.fc8-i686/ M=`pwd` modules
So, some updates. I was able to compile dvgrab-3.1, from rawhide. This one does not seem to improve the situation, in one single sequence of trials, I got always the "buffer underrun" errors and, in the end, a system freeze (no log available). One positive thing was that the lines: rom1394_1 warning: read failed: 0x0000fffff0000414 error reading config rom directory for node 1 did not show up. I was able to compile a vanilla kernel 2.6.25-rc3 with Stefen's firewire patches. Unfortunately this did not boot, I guess the new lvm2 thing is needed or I made some mistakes. I tried then to build the FW modules (from 2.6.25-rc3 + patches) in the 2.6.24.3-13, following your instruction: cd drivers/firewire make -C /usr/src/kernels/2.6.24.3-13.fc8-x86_64 M=`pwd` modules This one failed, claiming a function, "dma_allignement... something" is missing (implicit declaration of function). So, I guess I'll have to get the koji one and try it. Side note, maybe unrelated to this one. You tell me if another bug report is needed and to which component. While testing the DV camera, an SBP2 device was attached to the PC. Each time the camera was switched on, a bus reset occurred, detaching the SBP2 drive and then re-attaching it. First of all, I'm not sure how good this is with a mounted device. Second, if the device is not mounted, the re-attaching event causes udev->hal->whatever->gnome-mount chain to be trigger, with the final result to mount it... Which is unwanted, of course. Third, after these tricks, I started to get block errors while accessing the SBP2 disk (maybe fixed in latest FW patches?), which were solved by un-mount, detach and re-attach of the device. Thanks. pg
> I tried then to build the FW modules (from 2.6.25-rc3 + patches) > in the 2.6.24.3-13 [...] This one failed, claiming a function, > "dma_allignement... something" is missing Yes, alas copying sources from one kernel source tree to another is in general not possible. This is why I maintain the firewire patchkits on my website for a few different kernel releases. These patches still only work for kernel.org kernels though, not necessarily for distributor kernels (in particular not for Fedora, RHEL, CentOS, Oracle... kernels). So, easiest is to wait for the Fedora package maintainers to produce packages or source packages for you. > Side note, maybe unrelated to this one. You tell me if another > bug report is needed and to which component. > Each time the camera was switched on, a bus reset occurred, > detaching the SBP2 drive and then re-attaching it. This is worth putting into another bug report. Don't forget to quote the relevant part of the kernel log. (Log with time stamps please; dmesg perhaps doesn't contain them, so you have to take them from /var/log/messages or /var/log/syslog or wherever Fedora writes out kernel messages. Hmm, maybe I should finally install Fedora somewhere to be able to make qualified comments in this bugtracker...)
> 01:05.0 FireWire (IEEE 1394): Texas Instruments TSB43AB22/A IEEE-1394a-2000 > Controller (PHY/Link) (prog-if 10 [OHCI]) Do you happen to know whether it is a TSB43AB22 or TSB43AB22A? You would probably have to look inside the PC to know this. (Texas Instruments sell both versions but recommend the latter, without going into detail on their website.)
(In reply to comment #11) > Do you happen to know whether it is a TSB43AB22 or TSB43AB22A? You would > probably have to look inside the PC to know this. (Texas Instruments sell both > versions but recommend the latter, without going into detail on their website.) According to ASUS (motherboard docs) it is a TSB43AB22A, I'll check directly as soon as possible. In any case, keep always in mind that the old stack was working. pg
(In reply to comment #10) > > Side note, maybe unrelated to this one. You tell me if another > > bug report is needed and to which component. > > Each time the camera was switched on, a bus reset occurred, > > detaching the SBP2 drive and then re-attaching it. > > This is worth putting into another bug report. Don't forget to quote the > relevant part of the kernel log. (Log with time stamps please; dmesg perhaps > doesn't contain them, so you have to take them from /var/log/messages or > /var/log/syslog or wherever Fedora writes out kernel messages. Hmm, maybe I > should finally install Fedora somewhere to be able to make qualified comments in > this bugtracker...) A couple of questions: 1) Should this go to Fedora bugzilla or to kernel bug tracker? 2) Do you have any chance to test this situation? It seems to me a design issue: a bus reset of FW should trigger udev (or whatever) only for added/removed devices... Or not? Thanks. pg
(In reply to comment #9) > I tried then to build the FW modules (from 2.6.25-rc3 + patches) in the > 2.6.24.3-13, following your instruction: > > cd drivers/firewire > make -C /usr/src/kernels/2.6.24.3-13.fc8-x86_64 M=`pwd` modules > > This one failed, claiming a function, "dma_allignement... something" is missing > (implicit declaration of function). Oh crud, yeah, forgot about that. Yeah, that works better when Linus' tree and the linux1394 tree are based on a similar 2.6.x. > So, I guess I'll have to get the koji one and try it. Its built now. And I should add that the patchset it carries is actually off of Stefan's site, which he referenced in comment #10. http://me.in-berlin.de/~s5r6/linux1394/updates/2.6.24/ (In reply to comment #13) > A couple of questions: > > 1) Should this go to Fedora bugzilla or to kernel bug tracker? Sounds like it should be generic enough that it could go in the kernel bugzilla, but you may also put it in here if you like/prefer. > 2) Do you have any chance to test this situation? It seems to me a design issue: > a bus reset of FW should trigger udev (or whatever) only for added/removed > devices... Or not? I believe if we just reconnect to the device, no, we shouldn't trigger udev, but if we have to disconnect (logout and re-do an sbp2 login), there's not yet any way to distinguish between this being a re-login and a freshly plugged in device. However, I have a thought on something that may improve this situation slightly (patch coming soon, Stefan... :)
Re comment #12: > According to ASUS (motherboard docs) it is a TSB43AB22A, I'll check > directly as soon as possible. > In any case, keep always in mind that the old stack was working. I just asked because I saw a presumably TSB43AB22A based card in a web shop. :-) Side note: The old stack programs iso reception in buffer-fill mode or packet-per-buffer mode depending on what the application program requested. I have to look up which mode dvgrab would use. (This is with raw1394 which dvgrab uses. video1394 always uses buffer-fill.) The new stack OTOH always uses packet-per-buffer on OHCI 1.0 chips and dual-buffer on OHCI 1.1 chips such as yours. The upshot: There is now the possibility that we get bitten by previously unknown chip quirks which were irrelevant for the old drivers.
Re comment #13, comment #14: Please open another bug to keep this one on-topic, and post the log.
OK, some updates. kernel-2.6.24.3-17.fc8 did not improve the situation, as expected. I tried dvgrab 3.0 and 3.1, with somehow different results: While the 3.0 returned: "" 0.00 MB 0 frames" sec Capture Stopped Error: no DV The 3.1 crashed. I'll provide the terminal dump. The chip on the MB is an "A" version, also this as expected. pg
Created attachment 296782 [details] Terminal dump of dvgrab 3.1 That's it, it seems something went wrong somewhere... :-) pg
Uhm, since it seems the other two FW issues are gone, maybe we continue here... :-) Searching the web returned the full TSB43AB22A data sheet (112 pages) (TI seems to offer the 2 pages version only). Could this be of any interest for debugging? Just for your info, the document I found is named "slls520.pdf". pg
Comment #18 looks a lot like bug 370931, which I can reproduce on one of my own boxes.
Piergiorgio, how much RAM is in your system? I'm wondering if your earlier failures could possibly be the coherent DMA issues fixed in 2.6.24.3-50.fc8 or later and in rawhide, and now we're just up against the same thing as bug 370931...
(In reply to comment #21) > Piergiorgio, how much RAM is in your system? I'm wondering if your earlier > failures could possibly be the coherent DMA issues fixed in 2.6.24.3-50.fc8 or > later and in rawhide, and now we're just up against the same thing as bug 370931... The machine has 4GB of RAM... But... I tried two modes. First was without memory hole remapping, that is, I've got only 3.25GB, since .75GB are the 32bit (PCI?) address space. Second was with memory hole remapping, that is 4GB RAM (minus 64MB of the UMA video buffer) crossing over the 4GB boundary, i.e. .75GB are mapped from 4GB to 4.75GB (for the same reason as above: 32bit address space). In this mode, BTW, the kernel complains about IOMMU not being available (?) and to work properly (3D OpenGL things) it needs "pci=nommconf". Maybe not helpful, but I do not have more... Well, anyhow both modes give same results. Final note, the dvgrab crash was only observable with dvgrab 3.1, the 3.0 version never did it. I've kernel-2.6.24.4-64.fc8 installed, I guess the DMA fix is in. I noticed that kernels .25, for F9, seem to have some more patches for the firewire (also DMA), any chance to get those in F8? Or I'll have to go to F9? Thanks, pg
fw-ohci stumbles over a bug in some TI controllers: Bug 243081 The bug exists in TSB82AA2 and possibly also in TSB43AB22(A) (not fully proven yet). Depending on 1. whether the generation mismatch found in bug 243081 also occurs on your setup and 2. how your camera would react on transaction timeouts, this may or may not affect your setup too. The failure mode here is quite different from bug 243081 though.
I believe that up to now, F8 had all the same possibly relevant patches as F9, but it definitely doesn't yet have the patch Stefan is referring to in comment #23, which I only just now added to rawhide. I'd like to beat on it some in rawhide before throwing it into F8, but soonish here I'll probably resync the F8 firewire bits with rawhide...
Looking for errata of the TSB43AB22(A) returned this document, which does not seems to include the A version (maybe that's why they made it), with an issue about bus reset: http://focus.ti.com/lit/er/sllz012/sllz012.pdf They claim is a "lab only" problem, etc., etc., anyway they provide a software workaround. Maybe useful. About a possible new kernel for F8, if you've something, even just the fw modules, just let me know. pg
Okay, I've added all the latest firewire bits to the F8 kernel tree. They'll be present in 2.6.24.4-81.fc8 and later, should get a build started in just a sec. Piergiorgio, if you want to try sooner than when the kernel is built, you can grab the bits from cvs now and build 'em.
Uhm, uhm, uhm... I tried the new kernel... The dvgrab problem is still there: $] dvgrab -i -t -showstatus -debug all test rom1394_1 warning: read failed: 0x0000fffff0000414 error reading config rom directory for node 1 Found AV/C device with GUID 0x08004601029441d8 Going interactive. Press '?' for help. "" 0.00 MB 0 frames" sec Capture Stopped Error: no DV In addition, the new bus reset scheme kills the SBP2 device without any way out. I can see the following as output of "dmesg", when the camera is switched on (after the SBP2 is initialized and even mounted): firewire_core: skipped bus generations, destroying all nodes firewire_sbp2: released fw1.0 firewire_core: created device fw0: GUID 0011d800012a56d3, S400 scsi15 : SBP-2 IEEE-1394 firewire_sbp2: Workarounds for fw1.0: 0x1 (firmware_revision 0x002600, model_id 0x000000) firewire_core: created device fw1: GUID 0030ffa046010076, S400 firewire_core: phy config: card 0, new root=ffc3, gap_count=8 firewire_sbp2: fw1.0: error status: 0:4 firewire_core: skipped bus generations, destroying all nodes firewire_sbp2: released fw1.0 firewire_core: giving up on config rom for node id ffc2 firewire_core: created device fw0: GUID 0011d800012a56d3, S400 firewire_core: created device fw1: GUID 08004601029441d8, S100 scsi16 : SBP-2 IEEE-1394 firewire_sbp2: Workarounds for fw2.0: 0x1 (firmware_revision 0x002600, model_id 0x000000) firewire_core: created device fw2: GUID 0030ffa046010076, S400 firewire_sbp2: fw2.0: error status: 0:4 firewire_sbp2: fw2.0: error status: 0:4 firewire_sbp2: fw2.0: error status: 0:4 firewire_sbp2: fw2.0: error status: 0:4 firewire_sbp2: fw2.0: error status: 0:4 firewire_sbp2: fw2.0: error status: 0:4 firewire_sbp2: fw2.0: failed to login to LUN 0000 Reconnecting the SBP2 (off/on sequence) somehow gets it back: firewire_core: skipped bus generations, destroying all nodes firewire_sbp2: released fw2.0 firewire_core: created device fw0: GUID 0011d800012a56d3, S400 firewire_core: created device fw1: GUID 08004601029441d8, S100 firewire_core: phy config: card 0, new root=ffc1, gap_count=5 firewire_core: skipped bus generations, destroying all nodes firewire_core: created device fw0: GUID 0011d800012a56d3, S400 firewire_core: created device fw1: GUID 08004601029441d8, S100 firewire_core: skipped bus generations, destroying all nodes firewire_core: created device fw0: GUID 0011d800012a56d3, S400 firewire_core: created device fw1: GUID 08004601029441d8, S100 scsi17 : SBP-2 IEEE-1394 firewire_sbp2: Workarounds for fw2.0: 0x1 (firmware_revision 0x002600, model_id 0x000000) firewire_core: created device fw2: GUID 0030ffa046010076, S400 firewire_core: phy config: card 0, new root=ffc3, gap_count=8 firewire_core: skipped bus generations, destroying all nodes firewire_core: created device fw0: GUID 0011d800012a56d3, S400 firewire_core: created device fw1: GUID 08004601029441d8, S100 scsi18 : SBP-2 IEEE-1394 firewire_sbp2: Workarounds for fw2.0: 0x1 (firmware_revision 0x002600, model_id 0x000000) firewire_core: created device fw2: GUID 0030ffa046010076, S400 firewire_sbp2: fw2.0: orb reply timed out, rcode=0x11 firewire_sbp2: fw2.0: logged in to LUN 0000 (0 retries) scsi 18:0:0:0: Direct-Access LSILogic SYM13FW500-Disk 1.00 PQ: 0 ANSI: 0 sd 18:0:0:0: [sdb] 117210240 512-byte hardware sectors (60012 MB) sd 18:0:0:0: [sdb] Write Protect is off sd 18:0:0:0: [sdb] Mode Sense: 10 00 00 00 sd 18:0:0:0: [sdb] Cache data unavailable sd 18:0:0:0: [sdb] Assuming drive cache: write through sd 18:0:0:0: [sdb] 117210240 512-byte hardware sectors (60012 MB) sd 18:0:0:0: [sdb] Write Protect is off sd 18:0:0:0: [sdb] Mode Sense: 10 00 00 00 sd 18:0:0:0: [sdb] Cache data unavailable sd 18:0:0:0: [sdb] Assuming drive cache: write through sdb: sdb1 sd 18:0:0:0: [sdb] Attached SCSI disk sd 18:0:0:0: Attached scsi generic sg2 type 0 firewire_sbp2: released fw2.0 Switching off the camera does not seem to have negative effects: firewire_core: phy config: card 0, new root=ffc2, gap_count=7 firewire_sbp2: fw2.0: orb reply timed out, rcode=0x11 firewire_sbp2: fw2.0: reconnected to LUN 0000 (1 retries) All in all I would not say this patch improves the situation, eventually it makes it worse. My suggestion would be to reconsider it... pg
> In addition, the new bus reset scheme kills the SBP2 device without > any way out. I can see the following as output of "dmesg", when the > camera is switched on (after the SBP2 is initialized and even mounted): [...] > firewire_core: skipped bus generations, destroying all nodes [...] > firewire_core: created device fw2: GUID 0030ffa046010076, S400 > firewire_sbp2: fw2.0: error status: 0:4 [...] > Switching off the camera does not seem to have negative effects: Some explanation: The cause for the regression is not the workaround for the TI bus reset packet bug. SBP-2 status writes are the only AR events in case of SBP-2, and the status write works just fine. (We get 0:4 = "access denied" status -- which is of course not the optimum...) The cause is probably the patch which introduced "skipped bus generations, destroying all nodes". Patch "firewire: insist on successive self ID complete events" http://git.kernel.org/?p=linux/kernel/git/ieee1394/linux1394-2.6.git;a=commit;h=c4ea81fcdf2172f65632c3955a674b15bd1bb781 (Commit ID will become invalid soon when I'm going to prepare the next mainline merge.) This patch is necessary to prevent firewire-core from crashing the kernel. Alas firewire-sbp2 (or alternatively fw-device.c in firewire-core) has not yet been extended to better handle fw-topology's fundamental inability to match nodes across more than a single self ID generation increment. > All in all I would not say this patch improves the situation, > eventually it makes it worse. > My suggestion would be to reconsider it... As I said, it evidently is not the patch with the TI specific workaround, but that other patch which you probably did not have when you last tried the Datafab enclosure together with this camcorder. But it is good that you reported it. The solution though cannot be to revoke that patch; it needs to be to better handle the "destroying all nodes" situation in one of the layers above the topology code. Thanks for being our Guinea pig once again...
> The solution though cannot be to revoke that patch Well, maybe the Fedora maintainers want to temporarily undo the patch until I improved the upper layers. Vice versa, I am thinking about holding off the mainline submission of the patch until I have those other bits in place. Both means to live with a possibility of a crash or other corruption when self ID complete events are not sequential, while avoiding the more frequent hassle with the destruction and recreation of device representations (which can cause data loss to e.g. if you have a filesystem mounted on a FireWire device.)
Piergiorgio, can you try building replacement modules w/just the patch Stefan referenced in comment #28 backed out and verify that you don't lose your disk drive? If so, I'll just back out that patch in the F8 tree for now.
(In reply to comment #30) > Piergiorgio, can you try building replacement modules w/just the patch Stefan > referenced in comment #28 backed out and verify that you don't lose your disk > drive? If so, I'll just back out that patch in the F8 tree for now. OK, no problem. Where or how do I get the proper source(s)? I guess I can just build the modules in the current kernel-devel dir tree, given the sources, or do you recommend a different method? pg
I wrote: > SBP-2 status writes are the only AR events in case of SBP-2 "request AR events", to be entirely precise. SBP-2 may also involve response AR events but those are not affected by the workaround for the TI quirk.
(In reply to comment #31) > Where or how do I get the proper source(s)? > I guess I can just build the modules in the current kernel-devel dir tree, given > the sources, or do you recommend a different method? Either grab the src.rpm out of koji and install it, then run 'rpmbuild -bp kernel.spec' and you'll get a patched kernel tree, or just check stuff out of cvs. From memory, cvs checkout procedure should be like so: $ export CVSROOT=:pserver:anonymous.org:/cvs/pkgs $ cvs co kernel/F-8 $ cd kernel/F-8 $ make prep $ cd kernel-2.6.24/linux-2.6.24.noarch/drivers/firewire $ <edit fw-topology.c, backing out that change in comment #28> $ make -C /usr/src/kernels/2.6.24.4-81.fc8-i686/ M=`pwd` modules
(In reply to comment #33) > Either grab the src.rpm out of koji and install it, then run 'rpmbuild -bp > kernel.spec' and you'll get a patched kernel tree, or just check stuff out of > cvs. From memory, cvs checkout procedure should be like so: > > $ export CVSROOT=:pserver:anonymous.org:/cvs/pkgs > $ cvs co kernel/F-8 > $ cd kernel/F-8 > $ make prep > $ cd kernel-2.6.24/linux-2.6.24.noarch/drivers/firewire > $ <edit fw-topology.c, backing out that change in comment #28> > $ make -C /usr/src/kernels/2.6.24.4-81.fc8-i686/ M=`pwd` modules This is really cool! :-) I'll go for it! OK, I removed the section as per comment #28, compiled and installed the new modules (all three). With this setup, switching on the camera does not kill the SBP2. "dmesg" reports the following: firewire_sbp2: fw1.0: reconnected to LUN 0000 (0 retries) firewire_core: phy config: card 0, new root=ffc3, gap_count=8 firewire_sbp2: fw1.0: reconnected to LUN 0000 (0 retries) firewire_core: created device fw2: GUID 08004601029441d8, S100 In one trial it required two retries to get it done: firewire_sbp2: fw1.0: reconnected to LUN 0000 (0 retries) firewire_core: phy config: card 0, new root=ffc3, gap_count=8 firewire_sbp2: fw1.0: reconnected to LUN 0000 (0 retries) firewire_core: created device fw2: GUID 08004601029441d8, S100 firewire_core: phy config: card 0, new root=ffc2, gap_count=7 firewire_sbp2: fw1.0: orb reply timed out, rcode=0x11 firewire_sbp2: fw1.0: reconnected to LUN 0000 (1 retries) I don't know if it matters, but this was with SBP2 fs mounted. In any case, with or without patch, the DV capture does not work. May I ask you both a completely unrelated question? Where are you located? I guess Stefan is in Berlin. And you, Jarod? Thanks! pg
(In reply to comment #34) > OK, I removed the section as per comment #28, compiled and installed the new > modules (all three). > > With this setup, switching on the camera does not kill the SBP2. > "dmesg" reports the following: > > firewire_sbp2: fw1.0: reconnected to LUN 0000 (0 retries) > firewire_core: phy config: card 0, new root=ffc3, gap_count=8 > firewire_sbp2: fw1.0: reconnected to LUN 0000 (0 retries) > firewire_core: created device fw2: GUID 08004601029441d8, S100 > > In one trial it required two retries to get it done: > > firewire_sbp2: fw1.0: reconnected to LUN 0000 (0 retries) > firewire_core: phy config: card 0, new root=ffc3, gap_count=8 > firewire_sbp2: fw1.0: reconnected to LUN 0000 (0 retries) > firewire_core: created device fw2: GUID 08004601029441d8, S100 > firewire_core: phy config: card 0, new root=ffc2, gap_count=7 > firewire_sbp2: fw1.0: orb reply timed out, rcode=0x11 > firewire_sbp2: fw1.0: reconnected to LUN 0000 (1 retries) > > I don't know if it matters, but this was with SBP2 fs mounted. Okay, this much looks good, I'll go ahead and back out that chunk for F8. > In any case, with or without patch, the DV capture does not work. Darn. Oh, now one thing I wanted to clarify... In comment #27, your command line shows you using interactive mode (-i switch), but doesn't seem to have any output suggesting the camera actually started rolling... To be 100% certain, if you omit that, and simply run 'dvgrab -d 2', (grab for 2 seconds), I presume nothing gets captured? > May I ask you both a completely unrelated question? Sure! > Where are you located? > I guess Stefan is in Berlin. And you, Jarod? I work out of the Red Hat engineering office in Westford, Massachusetts, USA. (Northeastern coast of the US).
(In reply to comment #35) > Darn. Oh, now one thing I wanted to clarify... In comment #27, your command line > shows you using interactive mode (-i switch), but doesn't seem to have any > output suggesting the camera actually started rolling... To be 100% certain, if > you omit that, and simply run 'dvgrab -d 2', (grab for 2 seconds), I presume > nothing gets captured? Well, I used "p/space" to play (it works) and "c" to capture. Actually the camera starts and it shows the movie (in its own screen). I used the interactive mode to have a second "control" path, the AVC one. This was to make sure other things, like cable, are working (I had bad experience with 1394 cables...). Interesting enough, it is possible to do everything, play, ff, backward, step motion and so on, the commands (I guess asynchronous mode) work fine. Just for further reassurance, I connected the camera to a Miranda Box, having analog video and audio output. These were then connected to a monitor. I can confirm that the camera and cable(s) work fine, I got perfect picture and sound. With "debug -d 2" I get the same results. pg
Now that you got self-compiled drivers, you could try forcing fw-ohci into OHCI 1.0 mode. In drivers/firewire/fw-ohci.c, change the three occurrences of "if (... >= OHCI_VERSION_1_1)" to "if (0)". (I assume unmodified fw-ohci drives TSB43AB22 in OHCI 1.1 mode --- you should confirm that first before doing the modification, unless you already did so. E.g. insert a printk("...") into the >= OHCI_VERSION_1_1 branch of ohci_allocate_iso_context.)
(In reply to comment #37) > Now that you got self-compiled drivers, you could try forcing fw-ohci into OHCI > 1.0 mode. In drivers/firewire/fw-ohci.c, change the three occurrences of "if > (... >= OHCI_VERSION_1_1)" to "if (0)". > > (I assume unmodified fw-ohci drives TSB43AB22 in OHCI 1.1 mode --- you should > confirm that first before doing the modification, unless you already did so. > E.g. insert a printk("...") into the >= OHCI_VERSION_1_1 branch of > ohci_allocate_iso_context.) I've a couple of questions: 1) do you mean the driver with the selfID patch removed? (i.e. my own personal version...) 2) what's the idea behind forcing OHCI 1.0? Anyway, I'll try it this evening (CET). pg
> 1) do you mean the driver with the selfID patch removed? > (i.e. my own personal version...) The self ID thing only influences operation of your Datafab disk. It doesn't matter to DV reception. So you can keep or remove that patch. > 2) what's the idea behind forcing OHCI 1.0? firewire-ohci uses different DMA modes for isochronous reception in OHCI 1.1 vs. OHCI 1.0 mode. OHCI 1.1 chips get to do "dual buffer mode", OHCI 1.0 chips do "packet per buffer mode". Maybe the latter one gets other results. The old stack used to do "buffer fill mode" or "packet per buffer mode", depending on what the userspace program or library requested. I would have to investigate which of those would be used by dvgrab.
(In reply to comment #39) > The self ID thing only influences operation of your Datafab disk. It doesn't > matter to DV reception. So you can keep or remove that patch. Do you think, in general, it is better to test with or without other "things" on the 1394 bus? > firewire-ohci uses different DMA modes for isochronous reception in OHCI 1.1 vs. > OHCI 1.0 mode. OHCI 1.1 chips get to do "dual buffer mode", OHCI 1.0 chips do > "packet per buffer mode". Maybe the latter one gets other results. > > The old stack used to do "buffer fill mode" or "packet per buffer mode", > depending on what the userspace program or library requested. I would have to > investigate which of those would be used by dvgrab. I was suspecting this... Maybe, if/when you've time, it could be nice to have the 1.0/1.1 selection as module parameter, as possible fallback. Of course, if it is planned to do the same buffer handling mode in both versions, then there is no point in having a module parameter. pg
> Maybe, if/when you've time, it could be nice to have the 1.0/1.1 > selection as module parameter, as possible fallback. > Of course, if it is planned to do the same buffer handling mode in > both versions, then there is no point in having a module parameter. Actually the goal is that isochronous reception (and everything else) Just Works eventually, without extra configuration by the user.
OK, I confirmed that, in normal conditions, the TSB43AB22A is configured as OHCI 1.1. Second, I forced the OHCI 1.0 mode, as you suggested, and dvgrab worked as per the old 1394 stack. I tried back and forth a couple of times, just to make sure it was not a false positive, with similar results. So I'm quite confident that OHCI 1.0 is working stable. Only one note. During the tests, in OHCI 1.0 mode, I switched off the camera and this caused a complete sudden freeze of the PC (reset needed, no chances of anything else). The selfID thing was disabled during all tests. pg
Someone else's TSB43AB22 or TSB43AB22A is able to receive: https://bugzilla.redhat.com/show_bug.cgi?id=243081#c40 https://bugzilla.redhat.com/show_bug.cgi?id=243081#c90 https://bugzilla.redhat.com/show_bug.cgi?id=243081#c97 Sigh.
> OK, I confirmed that, in normal conditions, the TSB43AB22A is configured > as OHCI 1.1. > Second, I forced the OHCI 1.0 mode, as you suggested, and dvgrab worked > as per the old 1394 stack. Jarod, maybe we should just remove all of the dual buffer code. Unless someone of us finds a TI chip with the same problem and is able to fix up dual buffer... which right now sounds like a waste of time to me. ------------ > During the tests, in OHCI 1.0 mode, I switched off the camera and this > caused a complete sudden freeze of the PC (reset needed, no chances of > anything else). IOW a panic in the bus reset handler. This could be related to... > The selfID thing was disabled during all tests. ...that one. I will try working on the issue from comments #27 - #29 on the weekend, so that we can do the strict self ID sequence checking without the drawback of spurious device de- and reattachments.
>>> Sometimes, very seldom (actually only once), the stream is captured. >>> Sometimes, there are a lot of "buffer underrun" errors and the stream >>> is only partially captured, with many "holes". >>> Almost always "dvgrab" reports something like "error no DV stream" >>> (or similar). ... >> OHCI 1.0 is working stable. ... > Unless someone of us finds a TI chip with the same problem [...] Or maybe the mainboard's chipset rather than the controller (or the combination of the two) is the culprit. Though from what I understood about how the packet-per-buffer replacement for dual-buffer works, it should cause similar memory access patterns.
(In reply to comment #43) > Someone else's TSB43AB22 or TSB43AB22A is able to receive: > https://bugzilla.redhat.com/show_bug.cgi?id=243081#c40 > https://bugzilla.redhat.com/show_bug.cgi?id=243081#c90 > https://bugzilla.redhat.com/show_bug.cgi?id=243081#c97 > > Sigh. But different camera! I've an i.Link(tm) one, maybe it does not like firewire(tm)... :-) Nevertheless, here OHCI 1.0 seems to work (more or less). Does this give any hints on how to proceed? For example, what about testing the "buffer fill mode", since this is closer to the "dual buffer mode" and see what happens. Any chance to do this? Does it makes sense to you? pg
Hrmph. One TSB43AB22A works in dual-buffer mode, one doesn't... Yuk. But yeah, Stefan's understanding is correct, the memory access usage and patterns between dual-buffer and packet-per-buffer should be quite similar, actually moreso than dual-buffer vs. buffer-fill, I believe. Regardless, there's no way to test buffer-fill mode w/the new driver, as nobody has written buffer-fill code for this stack. I actually started down the buffer-fill route when first working on OHCI 1.0 support, and quickly found it would be a nasty mess to implement in a way where the upper layers wouldn't have to care if the underlying device was OHCI 1.0 or 1.1. Kristian would likely be very much against dumping dual-buffer mode, iirc, as it does have some measurable benefits over packet-per-buffer in latency-sensitive operations (I believe his primary example was high-end a/v stuff). At the moment, I'd be more inclined to maybe make it a module option to firewire-ohci to run dual-buffer or packet-per-buffer for 1.1 chips. (Actually, that makes me wonder... If one were to force an OHCI 1.0 Via controller to try to use dual-buffer, what would happen... unrelated to this bug, of course...) So far, I'm not finding a TSB43AB22* in my stash of controllers to try out, the closest I have is a TSB43AB23, which has worked just fine in dual-buffer mode for as long as I can remember.
Also, who knows if there aren't actually some bugs lingering in packet-per-buffer support as well... (back to the whole Via OHCI 1.0 thing -- bug 415841). :\
[written before I read Jarod's last two comments] > For example, what about testing the "buffer fill mode", since this is > closer to the "dual buffer mode" and see what happens. Dual-buffer has the feature to split a portion of each packet off and put it into a separate buffer. (A very handy feature for some important protocols. Every OHCI chip should have it... but alas that's not the case.) Buffer-fill could emulate dual-buffer only by some copying by the CPU. That might be an issue with systems with low CPU power. (It shouldn't be an issue on desktop systems which aren't totally ancient.) Packet-per-buffer can emulate dual-buffer simply by setting appropriate buffer boundaries, without the CPU having to copy between buffers. The old stack uses buffer-fill and packet-per-buffer, depending on whether raw1394, video1394, or dv1394 is at work, and in case of raw1394, depending on what the application client requested. Nevertheless, raw1394 is not universal enough to replace video1394. firewire-core/-ohci on the other hand is supposed to provide a single isochronous API for all purposes, hence started out with dual-buffer with is the most capable of the modes. When it became clear that many card vendors disable OHCI 1.1 compatibility even if the chip supports it, the packet-per-buffer emulation of dual-buffer was added to firewire-ohci, having the benefit of providing the same split buffer layout to the application client as dual-buffer and still being a zero copy implementation. Still, firewire-ohci's packet-per-buffer isn't that great either, because VIA VT6306/7 still make trouble with it (while VT6307 works fine with dual-buffer if the card vendor didn't disable it --- it's a mess). BTW, raw1394 --- when used with libiec61883 clients such as dvgrab and kino --- as well as the old dv1394 driver use packet-per-buffer. But I don't know if they use it in a way like firewire-ohci.
Maybe one more thing. As I mentioned at the beginning, sometimes I get "buffer underrun" errors. This (sometimes?) happens as soon as dvgrab is launched. Since it is in interactive mode, this means I get these errors _before_ the capture actually starts. Now, the first question is "who is returning these errors"? The second is "why"? Is it possible there is some issue in the _initialization_ of some hardware, that can result in different behavior in different environments? BIOS? Not so safe 32/64bit code? Maybe some undefined registers can lead to different reactions. Specifically this could explain: 1) different performances in different MB 2) the random "buffer underrun" errors before capturing (depending on who is generating those) 3) the fact that sometimes it works, sometimes not pg
> At the moment, I'd be more inclined to maybe make it a module option > to firewire-ohci to run dual-buffer or packet-per-buffer for 1.1 chips. And what would the default value of the option be? Packet-per-buffer for all chips? Or "automatic", i.e. dual-buffer for 1.1 chips unless a blacklisted chip was detected? (TSB43AB22/A to be blacklisted, for reasons that are still unclear.) And should the ioctl ABI provide that switch too? Probably not.
> As I mentioned at the beginning, sometimes I get "buffer underrun" errors. > This (sometimes?) happens as soon as dvgrab is launched. > Since it is in interactive mode, this means I get these errors _before_ the > capture actually starts. I get this initial alleged buffer underrun too. > Now, the first question is "who is returning these errors"? > The second is "why"? Maybe it is caused by junk timecode values.
(In reply to comment #49) > BTW, raw1394 --- when used with libiec61883 clients such as dvgrab and kino --- > as well as the old dv1394 driver use packet-per-buffer. But I don't know if > they use it in a way like firewire-ohci. Then, if I got it right, there is no way to test "buffer fill mode". IMHO this would have revealed HW problems, since it behaves like "dual buffer mode", but with only one buffer. One DMA vs. two, the rest is the same. This means, "dual buffer" could have two DMA engines or one multiplexed, hence problems if done not properly (or defective). If there is a DMA HW problem, we have 50% chance to get it. In "packet per buffer", likely the HW implementation is different, so there are less chances to detect HW problems of "dual buffer". Related to comment #52, what do you mean with "junk timecode values"? Shouldn't everything properly initialized? On the other hand, when I get these errors, the capture is badly working and, usually, I get later some kernel crash. Just to close the circle, what if there is a bug in the libraw or libiec? Could this cause all these issues? pg
(In reply to comment #51) > > At the moment, I'd be more inclined to maybe make it a module option > > to firewire-ohci to run dual-buffer or packet-per-buffer for 1.1 chips. > > And what would the default value of the option be? > Packet-per-buffer for all chips? > Or "automatic", i.e. dual-buffer for 1.1 chips unless a blacklisted chip was > detected? (TSB43AB22/A to be blacklisted, for reasons that are still unclear.) My thought was that this is the one and only case I've seen/heard where dual-buffer fell down, and I can reproduce the OHCI 1.0 Via packet-per-buffer failure on 3 different Via controllers (as well as you being able to), so I'd go with "automatic", using dual-buffer still on all OHCI 1.1 chips, save those that are blacklisted. And then yeah, blacklist the TSB43AB22/A, but possibly with the ability to override the blacklist (similar to how the sbp2 work-arounds are set up). > And should the ioctl ABI provide that switch too? Probably not. I'd say probably not as well. I still think the upper layers shouldn't have to care. Although perhaps that would have to change anyway if someone writes buffer-fill support... (I have no plans to do so myself though). Oh, and I do get the buffer underrun thing from time to time too, right near the start of capture. Best as I could surmise, this happens when we start handling descriptors, and we handle them faster than we queue them and reach the end of the descriptor list (b=0x11, z=0) and the context is temporarily halted (this is usually 1-2 frames into capture) until we queue up more descriptors and restart the context. (Nb: this is also exactly where the Via controllers fall down and stall out, so far as I can tell).
(In reply to comment #54) > ... Best as I could surmise, this happens when we start handling > descriptors, and we handle them faster than we queue them and reach the end of > the descriptor list (b=0x11, z=0) That is '...we handle all queued descriptors before queueing more and reach...'.
(In reply to comment #53) > Then, if I got it right, there is no way to test "buffer fill mode". With the current firewire stack, that is correct. Not implemented at all. > IMHO this would have revealed HW problems, since it behaves like "dual buffer > mode", but with only one buffer. One DMA vs. two, the rest is the same. > This means, "dual buffer" could have two DMA engines or one multiplexed, hence > problems if done not properly (or defective). > If there is a DMA HW problem, we have 50% chance to get it. No, packet-per-buffer really does behave more like dual-buffer than buffer-fill does in this case. In dual-buffer, each descriptor points to a header buffer and a payload buffer. We emulate that in packet-per-buffer by chaining together two descriptors, the first points to the header buffer, the second to the payload buffer. > Just to close the circle, what if there is a bug in the libraw or libiec? > Could this cause all these issues? I do still think its possible there's an issue somewhere in userspace that ultimately leads to the buffer underruns (and via stall-outs), but I don't think a userspace issue could explain more than that, and the via stall-out I do think is a controller problem (but it would possibly be circumvented if we didn't have the buffer underrun).
(In reply to comment #55) > That is '...we handle all queued descriptors before queueing more and reach...'. OK, but the symptoms here are not mixed up. When the "underruns" occur, a specific situation is set up: 1) it happens _before_ capturing (the camera is stopped) 2) capture is working, but broken 3) later kernel crash It never (ever) happened to have "underruns" and no capturing at all and it never (ever) happened to have "underruns" at start up and then good capturing. I'm still thinking that some HW is not completely or correctly initialized. Maybe is it a BIOS fault. Or a defective chip... On the TSB43B22A datasheet it is mentioned a "DV/link enhanced mode", maybe there is a chance to enable it and see (an explosion). pg
(In reply to comment #56) > > IMHO this would have revealed HW problems, since it behaves like "dual buffer > > mode", but with only one buffer. One DMA vs. two, the rest is the same. > > This means, "dual buffer" could have two DMA engines or one multiplexed, hence > > problems if done not properly (or defective). > > If there is a DMA HW problem, we have 50% chance to get it. > > No, packet-per-buffer really does behave more like dual-buffer than buffer-fill > does in this case. In dual-buffer, each descriptor points to a header buffer and > a payload buffer. We emulate that in packet-per-buffer by chaining together two > descriptors, the first points to the header buffer, the second to the payload > buffer. I meant from the HW point of view. To implement in the chip "dual buffer" or "buffer fill" it is almost the same "logic", only dual vs. single DMA. The other mode, "packet per buffer", is different, still from the HW point of view. Hence, if there is a chip errata in "dual buffer", likely it will show up in "buffer fill", less likely in "packet per buffer". Anyway, this is just academic, since we cannot test. pg
> When the "underruns" occur, a specific situation is set up: > > 1) it happens _before_ capturing (the camera is stopped) > 2) capture is working, but broken > 3) later kernel crash > > It never (ever) happened to have "underruns" and no capturing at all and it > never (ever) happened to have "underruns" at start up and then good capturing. I get the alleged buffer underrun before capture starts on all chips which work for me (FW323/1.0, NEC/1.0, TSB82AA2/1.1, VT6307/1.1). "Work" means the stream is captured 100% perfect, as far as I and dvgrab can tell. I don't get this underrun on VT6306/1.0 which is currently unable to capture due to bug 415841. Both of this apparently happens always.
Hrm. I may be way off base then. My brain has been going all over the place trying to come up with some sort of explanation for the via failures...
The TSB43AB22A datasheet says that "dual buffer mode" does not work in multi channel mode. Specifically, it says that multichannel is automatically disabled, when enabling "dual buffer mode". Nevertheless, it is also mentioned that, in "single channel" mode, if multiple channels are enabled (in the channel mask register(s)), the results are undefined. Those ones (the channel mask register(s)) are undefined at reset. Does this ring a bell? Or is it standard? How difficult is to play with the OHCI register in the current stack? Is there any chance I can have a look and experiment something? Which are the files to investigate? And functions...? pg
Re comment 61: I haven't checked the TSB43AB22A manual in detail, but apart from the vendor extensions it repeats part of the OHCI 1.1 spec (particularly, the MMIO registers specifications). CPU and OHCI controller communicate by - memory mapped registers, - DMA programs (linked lists of buffer descriptors), - data buffers, - and of course interrupts. The formats of the DMA programs and data buffers are only described in the OHCI spec, not in TI's manual. There is a link to the spec at http://wiki.linux1394.org/Links/Specs . The chip is programmed by drivers/firewire/fw-ohci.[ch]. DV reception is single channel.
What does "lspci -nnv" say about the controller, BTW? If we are going the blacklist route, maybe we want to narrow it down to the subsystem_vendor:subsystem_device ID. Until the next one with the problem comes around. (I'm going to get myself a TSB43AB22A card as well to see how it works.)
(In reply to comment #63) > What does "lspci -nnv" say about the controller, BTW? 01:05.0 FireWire (IEEE 1394) [0c00]: Texas Instruments TSB43AB22/A IEEE-1394a-2000 Controller (PHY/Link) [104c:8023] (prog-if 10 [OHCI]) Subsystem: ASUSTeK Computer Inc. K8N4-E Mainboard [1043:808b] Flags: bus master, medium devsel, latency 32, IRQ 19 Memory at fddff000 (32-bit, non-prefetchable) [size=2K] Memory at fddf8000 (32-bit, non-prefetchable) [size=16K] Capabilities: [44] Power Management version 2 Kernel driver in use: firewire_ohci Kernel modules: firewire-ohci The motherboard does not seem to fit the actual model, since it is a M2NPV-VM. > If we are going the blacklist route, maybe we want to narrow it down to the > subsystem_vendor:subsystem_device ID. Until the next one with the problem comes I'm looking forward to this. Hopefully I'll be able to capture something! I'll then have to file an other bug, due to the not working autosplit of dvgrab... > around. (I'm going to get myself a TSB43AB22A card as well to see how it works.) Well, I'm really curious to see if I'm the lucky one with broken chip/BIOS/MB... Thanks! pg
(In reply to comment #62) > http://wiki.linux1394.org/Links/Specs . I'll have a look to this, maybe in comparison to the TI datasheet. It could be the TI chip have some constrains or so... > DV reception is single channel. Sorry, I was mis-quoting the DS, they claim (if I got it right) that multi channel works only when a single ISO context has it enabled. I guess, this means "active" ISO context, not all ISO context. pg
Maybe is not a problem, but according to the OHCI 1.1 specs and TI datasheet, the rcvSelfID bit of the LinkControl register must be set only _after_ a valid address is loaded in the selfID buffer pointer register. It seems to me, that in ohci_init(), the bit is first set and only later the address loaded. This might not be an issue, but it could be anyhow not within the specs. Another point is the cycleSource bit of the same register. According the the OHCI 1.1 specs, it is cleared at (hw) reset, but according to TI (it seems) it is undefined (actually TI is a bit lacking info here). I'm planning (when, I don't know) to move the selfID buffer pointer load before the rcvSelfID is set and to explicitly clear (or set?) the cycleSource bit. Unless you have other ideas about the topics (which would save me the time). Topic change. I was enabling the irq debug option in the firewire-ohci, and I can confirm interrupts are raining as soon as dvgrab is started (even if the camera is paused). Is there any easy way to somehow benchmark this irqs? I mean, is it possible to confirm the irq flow is consistent with the, supposed, data flow? Thanks. pg
I remember having had a conversation, or at least thought out loud, about the selfID buffer pointer issue somewhere sometime ago. Strange that I haven't patched it yet. However, selfID receive DMA is per se unrelated to isochronous receive DMA. Cycle master related functions are important. But they should be OK if debug logging shows regular cycle64Seconds interrupt events.
> is it possible to confirm the irq flow is consistent with the, > supposed, data flow? Not entirely. Interrupt events which fw-ohci logs as "IR" are the events per OHCI 1.1 section 6.4.1 ("...if a packet completes and any of the buffers it spans have the i bits set to 2'b11..."). That is, you only get these interrupts as long as the DMA context keeps running && when the descriptors told the controller to send an interrupt. There is also an "unrecoverableError" interrupt event which would fire e.g. when a DMA context goes dead. But we don't enable this event in the IntMask register.
Stefan, I believe you and I discussed the ordering of setting the buffer pointer and setting the rcvSelfID bit on irc when I was poking at LPS issues with my JMicron card, but it was mostly inconsequential, since we don't do anything selfID related until a bit later on. (At least, I think that was why, my recollection may be slightly off... So yeah, technically, we should fix that ordering up, but in practice, it shouldn't matter).
(In reply to comment #67) > I remember having had a conversation, or at least thought out loud, about the > selfID buffer pointer issue somewhere sometime ago. Strange that I haven't > patched it yet. However, selfID receive DMA is per se unrelated to isochronous > receive DMA. OK, I tried both, moving the selfID pointer before setting the recvSelfID bit and clearing the cycleSource bit, with no success. About the buffer story, I strongly recommend to do it by-the-book, in order to avoid potential problems somewhere in the future. You don't need a patch from me, do you? :-) > Cycle master related functions are important. But they should be OK if debug > logging shows regular cycle64Seconds interrupt events. Regular? I saw only one "cycle64Seconds" in dmesg, when the camera started (maybe during one of this "buffer underrun" situations). They are supposed to happen every 64 seconds, I hope... pg
(In reply to comment #68) > Not entirely. Interrupt events which fw-ohci logs as "IR" are the events per > OHCI 1.1 section 6.4.1 ("...if a packet completes and any of the buffers it > spans have the i bits set to 2'b11..."). That is, you only get these interrupts > as long as the DMA context keeps running && when the descriptors told the > controller to send an interrupt. The question is: how can dvgrab report "no DV" if the IR irq are coming? Does this mean some data is in the buffers, but not of "DV" type? Or that some event is not generated in time, thus leading dvgrab to believe there is no data at all? Thanks, pg
> I saw only one "cycle64Seconds" in dmesg, when the camera started They should occur in 64 seconds intervals. If not, something is wrong with the controller's cycle counter, or with the cycle master. (The cycle master is a node on the bus which sends "cycle start" packets in 125 µs intervals. Isochronous talkers listen for these packets and send an isochronous packet whenever they got the cycle start.)
Good news and bad news: I have got a TSB43AB22(A) CardBus card now (Exsys EX-6600E). It works fine with dvgrab, i.e. I can't reproduce the problem. 06:00.0 FireWire (IEEE 1394) [0c00]: Texas Instruments TSB43AB22/A IEEE-1394a-2000 Controller (PHY/Link) [104c:8023] (prog-if 10 [OHCI]) Flags: bus master, medium devsel, latency 64, IRQ 17 Memory at 80004000 (32-bit, non-prefetchable) [size=2K] Memory at 80000000 (32-bit, non-prefetchable) [size=16K] Memory at 80004800 (32-bit, non-prefetchable) [size=2K] Capabilities: [44] Power Management version 2 Kernel driver in use: firewire_ohci Kernel modules: firewire-ohci, ohci1394
(In reply to comment #73) > Good news and bad news: I have got a TSB43AB22(A) CardBus card now (Exsys > EX-6600E). It works fine with dvgrab, i.e. I can't reproduce the problem. Sob, sob... It seems I'm very lucky... > 06:00.0 FireWire (IEEE 1394) [0c00]: Texas Instruments TSB43AB22/A > IEEE-1394a-2000 Controller (PHY/Link) [104c:8023] (prog-if 10 [OHCI]) > Flags: bus master, medium devsel, latency 64, IRQ 17 > Memory at 80004000 (32-bit, non-prefetchable) [size=2K] > Memory at 80000000 (32-bit, non-prefetchable) [size=16K] > Memory at 80004800 (32-bit, non-prefetchable) [size=2K] Why you've one more memory range than I have? The last 2K do not appear in my setup. Is this OK? > Capabilities: [44] Power Management version 2 > Kernel driver in use: firewire_ohci > Kernel modules: firewire-ohci, ohci1394 So, I guess then you'll have to get working the other TI card I've, the OHCI 1.0 one... :-) pg
>> Memory at 80004000 (32-bit, non-prefetchable) [size=2K] >> Memory at 80000000 (32-bit, non-prefetchable) [size=16K] >> Memory at 80004800 (32-bit, non-prefetchable) [size=2K] > > Why you've one more memory range than I have? > The last 2K do not appear in my setup. Is this OK? I have no idea. OHCI requires just 2K but allows more for vendor-specific memory-mapped registers. A XIO2000 + TSB82AA2 PCIe card and a TSB82AA2 CardBus card of mine feature a 2K and a 16K region. Other cards have all sorts of other configurations. The Linux drivers don't use any vendor-specific registers. In theory, anything outside the OHCI range should not matter at all.
Jarod, could you point Piergiorgio to the latest packages (kernel, libraw1394, dvgrab) which he should have? Would be good if he could retest. The comment about IR interrupts happening all the time though dvgrab not receiving anything makes me wonder if it actually is a fw-ohci problem. However, if the latest and greatest still doesn't work, it would be good if Piergiorgio could test this (admittedly rather uninspired) patch: http://marc.info/?l=linux1394-devel&m=120968142705449
Latest kernel, dvgrab and libraw1394 should all be available from the Fedora 8 updates repo now, so a simple 'yum upgrade' should do the trick (or 'yum upgrade kernel dvgrab libraw1394' would do to limit the scope of the upgrade).
I've kernel -85, which is the latest with some firewire updates, libraw1394 is -6, also latest, but dvgrab is still 3.0, I see a 3.1 in koji, I'll pick it from there (why this one is not in update? It's from last year...). I've mixed feelings about this thing, at the moment. On one side, it could be HW (MB, BIOS, broken chip, etc.) related, thus the only solution could be the patch Stefan proposed, i.e. force packet-per-buffer, in this case. Another aspect could be the user space part, but this will not explain the "kernel panic" following the "buffer underrun" things, I guess. Assuming this is not a different problem. There is, unfortunately, something else. I've a dual core CPU and 4GiB. Due to the memory size, I've to boot with pci=nommconf, otherwise "unpredictable results may occur" (especially with openGL). The IOMMU seems to be "masked" by the BIOS, so the kernel has to workaround by its own. I also know that sometimes there are/were issues with dual core machines. Furthermore, the BIOS enables (but it is disable-able) the virtualization extensions of the CPU. So, overall it's a mess... In this situation, one possibility would be to physically remove 2GiB (is there any "soft" possibility?) and/or boot with maxcpus=1 and/or without pci=nommconf. If this could make any sense... Coming back to (my) comment #71, is there any way to dump the packet headers (and eventually data) from within the driver, after the reception? I mean, of course: where is(are) the buffer(s)? Is this the 16MiB allocated somewhere in fw-ohci.c? It would be interesting to know if and which data is there. I was thinking to memset this to some 0xD15EA5ED value (maybe 0x55 will be enough) and then check if and how is it overwritten. And how can dvgrab (because it's dvgrab, it seems) report the "underruns"? Finally, I'll try to add the patch and see how further can I go. Thanks, pg
> Due to the memory size, I've to boot with pci=nommconf, otherwise > "unpredictable results may occur" (especially with openGL). > The IOMMU seems to be "masked" by the BIOS, so the kernel has to > workaround by its own. Yes, that's worrying. Please test with the unpatched driver and mem=3G on the boot loader's command line, perhaps also with memmap=something. See http://lxr.linux.no/linux/Documentation/kernel-parameters.txt Or if that doesn't work out, a test with RAM physically reduced to anywhere <= 3G would be good. > I also know that sometimes there are/were issues with dual core > machines. Furthermore, the BIOS enables (but it is disable-able) the > virtualization extensions of the CPU. I don't expect trouble from either of those. Although, if Fedora starts a userspacce IRQ balancer, you may want to disable it to work around driver bugs. However, these bugs should be random in nature == not as systematically as the DV reception failure here, I presume. > Coming back to (my) comment #71, is there any way to dump the packet > headers (and eventually data) from within the driver, after the > reception? Only the upper layers can do that, i.e. fw-cdev.c or perhaps something in fw-iso.c. You can't do this in fw-ohci.c. That's because the CPU will only see the correct data in the buffers after having them dma_unmap_*()'d them. Looking into the buffers before them is a bug which only works on simple platforms without IOMMU or "software IOMMU". Mmm, I wonder if fw-ohci already looks into the buffers. The firewire drivers used to be sprinkled with DMA mapping/ DMA syncing bugs. However, I believe there are already people using the current drivers successfully on platforms which require proper mapping/ syncing. > I mean, of course: where is(are) the buffer(s)? Is this the 16MiB > allocated somewhere in fw-ohci.c? No, these are the descriptor buffers. (Call them metadata buffers if you will, since they contain the description of the actual data buffers.) The actual data buffers are allocated by fw-core on behalf of the userspace client, if I'm not mistaken. (And these data buffers are split into header buffer and payload buffer.) Browse fw-iso.c. If I knew how all this works, I would tell you. > It would be interesting to know if and which data is there. > I was thinking to memset this to some 0xD15EA5ED value (maybe 0x55 > will be enough) and then check if and how is it overwritten. Yes, I'm under the impression that you are on to something. > And how can dvgrab (because it's dvgrab, it seems) report the > "underruns"? If I knew how all this works... (Oh the good old times when I only maintained sbp2...)
PS: > You can't do this in fw-ohci.c. That's because the CPU will only see > the correct data in the buffers after having them dma_unmap_*()'d This refers to the data buffers only. The descriptor buffers reside in coherent memory. In that one, PCI device and CPU always see the same contents.
Gah, I thought I'd pushed dvgrab 3.1 ages ago... Okay, just did so now. As for issues with multi-core systems and 4GB of RAM, fwiw, I have multiple multi-core boxes with 4GB of RAM or more that work just fine, so there's no generic issue there, but certainly could be a bios-specific issue.
OK, I ran the following tests: On boot kernel command line: 1) nothing 2) pci=nommconf mem=2G 3) mem=2G The good news is that in cases 2) and 3) dvgrab captured the DV stream flawlessly, while in 1) there was the usual "error no DV". The bad news is that in case 1) there was the usual "error no DV, while in 2) and 3) dvgrab captured the DV stream flawlessly. ;-) The BIOS settings where untouched, in all cases. One thing is, the device works in dual-buffer mode, at least something was captured (actually I did not look at the video itself). So, chip errata and so on are excluded. Other thing is that only the mem=xG seems to affect the functionality, that is either BIOS mis-mis-mis-configuration or kernel problem. A thing I forgot (I was in hurry to inform you :-)) is to check what the kernel said about the IOMMU. What so you think? pg
I forgot, since the packet-per-buffer seems to work, with 4GiB, there could be something between this and the other mode, which does not fit with mem > 3GiB. What's the difference between the packet-per-buffer and dual-buffer modes, in terms of DMA, memory allocation and so on? Thanks again, pg
OK, further experiments, booting with pci=nommconf and the following: 1) mem=3G 2) mem=2800M 3) mem=2500M 4) mem=2200M 5) mem=2G (again) Results were: 1), 2) and 3) do not work, while 4) and 5) do. I checked, when working, the captured video and it is fine, no problems. In all these cases, no mention of IOMMU is visible with dmesg. pg
> 3) mem=2500M > 4) mem=2200M ... > Results were: 1), 2) and 3) do not work, while 4) and 5) do. Could you take the time to collect the first ~100 lines of dmesg for 2500M and 2200M?
...fresh after boot that is, before the dmesg ring buffer wraps.
As I just remembered while writing http://bugzilla.kernel.org/show_bug.cgi?id=10342#c22 (on SBP-2 I/O errors on Asus M2R32-MVP), we may still have memory access ordering bugs in fw-ohci.
I'm under the impression that the limit is 2G and 2200M works just because the upper 200MiB might allocated before the firewire DMA buffers are. So, I dumped the 2G, 2200M, 2500M and normal (4G) dmesg output. Since dmesg is quite verbose, I cut them to 350 lines, where it seems nothing more relate to memory or I/O happens. pg
Created attachment 304485 [details] mem=2G
Created attachment 304486 [details] mem=2200M
Created attachment 304488 [details] mem=2500M
Created attachment 304489 [details] no specific memory setup
Forgot something... It might still be a HW issue, maybe the chip can't DMA above 2G, in dual-buffer mode (and maybe even in packet-per-buffer). Somewhere I read about PCI DMA mask, maybe this out should be forced to 31 bits (or it is already like this, but it is not used). pg
You could insert dma_set_mask(&dev->dev, DMA_31BIT_MASK); in pci_probe() of fw-ohci.c. I believe it can go between pci_set_master(dev); pci_write_config_dword(dev, OHCI1394_PCI_HCI_Control, 0); If that makes it work, it still doesn't tell us whether it is a driver bug or hardware bug --- and if the latter, OHCI chip bug or board bug; or if the former, fw-ohci bug or platform code bug. AFAIU. The randomness of failures according to comment #0 indeed indicates that some DMA memory ranges don't work for us, for whatever reason. And I agree that your latest findings point at a 2G limit.
I added the DMA mask, as you wrote, and, from a first test, dvgrab was fine, the stream was captured and playable. I was doing the following considerations. All DMAs of this machine seem to work fine: SATA, an old BTTV PCI card, ethernet (if it has a DMA), SBP2, even ISO packet-per-buffer. Only the DMA of dual-buffer seems to be limited to 2GiB. It could be TI added the dual-buffer on the side of an existing design, maybe they did not make a fully new chip. So, they could have some "limitations" on the new part, independently from the old one. One possibility, would be to add the DMA mask as specific workaround for this chip. Any drawbacks? What about SBP2? Maybe better than forcing packet-per-buffer? Of course, it would nice if TI confirms/denies the findings. Unless you can point differences between one ISO mode and the other Or someone can suggest some test revealing problems elsewhere in the system. Any ideas? pg
I am currently testing TSB43AB22/A on an i945GT based board with 3.2 GB RAM. But I have yet to hit physical addresses above 2 GB. (I added a printk to get notified when that happens.) I guess I have to allocate some memory before running dvgrab.
(In reply to comment #96) > I am currently testing TSB43AB22/A on an i945GT based board with 3.2 GB RAM. > But I have yet to hit physical addresses above 2 GB. (I added a printk to get > notified when that happens.) I guess I have to allocate some memory before > running dvgrab. Cool! Good you could get such a thing. One question and one note. Is this a 64bit machine with 64bit kernel? I've the /tmp with tmpfs, I don't know if it matters, but this results in, more or less, 2GiB virtually allocated (I guess they're not really allocated until something is written there, but maybe this have impact on following allocations). Hope this helps, and thanks again! pg
I've got a pair of pcmcia cards coming that are both two-port fw400, hoping at least one of them is a TSB43AB22/A. Will plug them into my core 2 duo laptop running a 64-bit kernel with 4GB of RAM.
(In reply to comment #98) > I've got a pair of pcmcia cards coming that are both two-port fw400, hoping at > least one of them is a TSB43AB22/A. Will plug them into my core 2 duo laptop > running a 64-bit kernel with 4GB of RAM. Great! Hopefully someone can find what the matter is! One question for Stefan, could you please tell where and how to add the printing for the memory allocation? I would like to check a couple of things. Specifically, where, in my case, the memory is, with and without the 31 bit DMA limitation. What if in both cases the memory is below 2GiB? What surprises me is that I get the problem immediately, without any particular memory usage happening before the capture (OK, xorg, but nothing more...). I was even thinking that the memory is allocated starting from the higher addresses to the lower. pg
Created attachment 305409 [details] log bus addresses in dualbuffer IR >>> One possibility, would be to add the DMA mask as specific workaround >>> for this chip. Any drawbacks? What about SBP2? Maybe better than >>> forcing packet-per-buffer? This will kill performance on machines without IOMMU if more than 2 GB of memory is present because the CPU will have to copy back and forth to DMA bounce buffers. If it works in the first place. >> Is this a 64bit machine with 64bit kernel? 32 bit kernel. Shouldn't matter though, because PCI physical addresses are 32 bits wide. (I have not yet continued to set it up that I actually get buffers above 2 GB.) > could you please tell where and how to add the printing > for the memory allocation? The virtual addresses which we get at memory allocation probably aren't interesting (for now). More so the physical addresses (a.k.a. bus addresses) which we get by dma-mapping the memory. Attached is a stupid patch which logs the bus address of the descriptor and of the buffer page in the dualbuffer IR path. (If one of them is > 2G and only every 32nd time, to not flood the log. Well, as I said, I haven't triggered that yet. No guarantee that this patch does what I thought it would do.)
(In reply to comment #100) > This will kill performance on machines without IOMMU if more than 2 GB of > memory is present because the CPU will have to copy back and forth to DMA > bounce buffers. If it works in the first place. Assuming the chip cannot DMA above 2GiB in dual-buffer mode, would it be possible to force _only_ the memory allocation of the iso transfers to 31 bit addresses? The question is if there is more overhead with 4GiB w/o IOMMU, in the cases when this occurs, or in having always packet-per-buffer mode. Assuming the async transfer part (SBP2) is working properly. Of course, this would be only for this chip. > 32 bit kernel. Shouldn't matter though, because PCI physical addresses are 32 > bits wide. (I have not yet continued to set it up that I actually get buffers > above 2 GB.) Uhm, but there is this story of high/low mem, with 32bit machines with more than 1GiB of memory. I've an intel based, 32 bit, PC with 2GiB and the "memory split" is 3/1 (GiB), dmesg reports 1151MB highmem and 896MB lowmem. AFAIK there is bounce buffering going on with this setup, but I'm not sure about the details, I was reading the explanation long ago... > The virtual addresses which we get at memory allocation probably aren't > interesting (for now). More so the physical addresses (a.k.a. bus addresses) > which we get by dma-mapping the memory. Attached is a stupid patch which logs > the bus address of the descriptor and of the buffer page in the dualbuffer IR > path. (If one of them is > 2G and only every 32nd time, to not flood the log. > Well, as I said, I haven't triggered that yet. No guarantee that this patch > does what I thought it would do.) Thanks, I'll have a look, I'm curious to how the allocation patterns are. pg
Hi again. I tried the fw_notify() patch printing the DMA addresses. If I understand it correctly, it prints only if the address is above 2GiB. Well, the machine was running several different things, while I was patching, compiling and so on. I could imagine some memory was allocated. In this conditions, without the DMA_31BIT thing, there was no print and I was able to capture the DV stream. After this strange experience, I rebooted, and the situation went back to "normal" (i.e. no capturing), with something like this in /var/log/messages: ... May 15 20:47:28 lazy kernel: firewire_ohci: ##### d_bus 3415088752x, page_bus 3414638592x May 15 20:47:28 lazy kernel: firewire_ohci: ##### d_bus 3415090352x, page_bus 3414654976x May 15 20:47:28 lazy kernel: firewire_ohci: ##### d_bus 3415091888x, page_bus 3414671360x May 15 20:47:28 lazy kernel: firewire_ohci: ##### d_bus 3414384880x, page_bus 3414687744x May 15 20:47:28 lazy kernel: firewire_ohci: ##### d_bus 3414386416x, page_bus 3414704128x ... If I get it correctly, this somehow confirms that, for some unknown reason, DMA with high addresses do not work. Note that, this was after fresh reboot, somehow it also confirms that, at least this memory, is allocated starting from higher addresses. I think the 32 bit kernel will never go that far, due to the 3/1 split. Hope this helps. pg
So my two pcmcia cards arrived today. They're identical, save the stickers on 'em. That would be fine if they were the right chipset, but they aren't. They're both NEC cards. D'oh.
I was just reminded today that fw-ohci is still vulnerable to this: http://lkml.org/lkml/2008/5/26/297 (reordering of MMIO accesses vs. DMA buffer accesses, for a bunch of reasons) I don't know though whether this has a hand in this bug here.
Hi all, I was a bit busy upgrading some machines to F9. I also "patched" the fw-ohci.c for the new kernel (on the x86_64 box), with the DMA limit "feature". It even works :-), I can capture dv streams with dvgrab and kino. How should we proceed, then? Any ideas or a plan? pg
We still need to pinpoint whether the drivers or the TSB43AB22 or the board is at fault. A main obstacle on my side currently is lack of time.
(In reply to comment #106) > We still need to pinpoint whether the drivers or the TSB43AB22 or the board is > at fault. A main obstacle on my side currently is lack of time. Well, of course, the question was how to pinpoint this. Considering that a couple solutions are available, I was just wondering if you (both) had some ideas on how to go further (implement one, the other, all, try to find the root cause, etc.). Anyway, I guess we can wait until you'll have more time. pg
- code inspection (I did it to some degree but may have missed something) - fix up those other seemingly unrelated bugs or sloppinesses while we are at it and watch whether it has unexpected positive results - test different combos of software -- controller -- board to eliminate parts of the equation (hard because juju is the only known software which utilizes dual buffer, and you don't have a stash of controllers, notably not OHCI 1.1 ones, and I don't have this board) - attempt to find someone at TI who knows something about dualbuffer and big physical addresses (questionable approach)
(In reply to comment #108) > - code inspection (I did it to some degree but may have missed something) For this I'll not be of big help. > - fix up those other seemingly unrelated bugs or sloppinesses while we are at > it and watch whether it has unexpected positive results This even less. > - test different combos of software -- controller -- board to eliminate parts > of the equation (hard because juju is the only known software which utilizes > dual buffer, and you don't have a stash of controllers, notably not OHCI 1.1 > ones, and I don't have this board) Actually I've some, but all OHCI 1.0... :-( I can try to get some OHCI 1.1, but I cannot promise anything. > - attempt to find someone at TI who knows something about dualbuffer and big > physical addresses (questionable approach) Why "questionable approach"? I was thinking to email their support, maybe something will happen, in the past they were quite friendly. pg
> Why "questionable approach"? Depends on whether they already were in touch with somebody who extensively used dual buffer mode.
Created attachment 307293 [details] Simple fix Hi, I created this simple little patch to temporary fix the issue. Some notes: 1) It is entirely derived from Stefan's previous patch and suggestion 2) I inlined the TSB43AB22 PCI ID definition, I know it's ugly, but I was too lazy to get a patch for the entire source tree... :-) 3) There is a debug print inside, but I'm not sure it is done the correct way 4) I did not investigate how to use a parameter or sysconfig for it, which would be nice also for further testing (any hints, apart looking at fw-sbp2.c?) 5) it compiles, but I did not (yet) tested it Please have a look. Jarod, would it make sense to get this (or a similar one) temporary in the Fedora kernel patchset? After verification, of course. At least until one of you will have more time to follow the issue again? Thanks a lot in advance, pg
I would rather agree to a tentative workaround which switches IR to packet-per-buffer, to avoid performance impact on all the other FireWire DMA functions. Notably, but not only, SBP-2. I don't know if this board has an IOMMU /and/ can transparently make up address mappings below 2G. If not, then the CPU will have to do useless copying to and fro bounce buffers, and buffer allocations are more prone to fail because those bounce buffers are AFAIK a scarce resource. (BTW, callers of dma_set_mask should check its return value for possible error return code, which happens if the architecture does not support the requested mask. But the architectures which can run on an Asus board probably all support this mask.)
...OTOH it is not my business what goes into Fedora kernels and what not.
(In reply to comment #112) > I would rather agree to a tentative workaround which switches IR to > packet-per-buffer, to avoid performance impact on all the other FireWire DMA > functions. Notably, but not only, SBP-2. I don't know if this board has an > IOMMU /and/ can transparently make up address mappings below 2G. If not, then > the CPU will have to do useless copying to and fro bounce buffers, and buffer > allocations are more prone to fail because those bounce buffers are AFAIK a > scarce resource. My thinking was the following: 1) the patch should be temporary, this means until we found the root cause and a reasonable workaround or I change PC :-) 2) the DMA change is minimal invasive, making your life easier... 3) this type of workaround "captures" the findings we had so far 4) the platform *should* have IOMMU, even if the kernel complains about memory aperture 5) the combination 64bit, 4GiB, no IOMMU seems to me in any case problematic (32bit have anyway other issues) Wrote that, I have nothing against the packet-per-buffer vs. dual-buffer, I would only have some stable temporary workaround, one or the other is the same. > (BTW, callers of dma_set_mask should check its return value for possible error > return code, which happens if the architecture does not support the requested > mask. But the architectures which can run on an Asus board probably all support > this mask.) Ops... :-) Anyway, it was working for me... :-) pg
I *finally* found a system here in the office with a TSB43AB22/A controller, and have borrowed some memory for it to knock its total up to 3GB. Will beat on it some tomorrow...
Bug 449252 is looking suspiciously like a duplicate of this one (erratic dvgrab failure with a TSB43AB22/A controller and >2GB memory), and I've now reproduced the problem on a system here on my end.
(In reply to comment #116) > Bug 449252 is looking suspiciously like a duplicate of this one (erratic dvgrab > failure with a TSB43AB22/A controller and >2GB memory), and I've now reproduced > the problem on a system here on my end. Ah! Good, very good! Luckily you were able to reproduce it. I started to feel like those characters, in certain movies, witnessing some conspiracy, telling it to everybody, but having no evidence... :-) This is really good news! pg
pg's board: nVIDIA GeForce 6150 based, Asus Jarod's: nVIDIA nForce Pro 3600 based, Tyan Jarod, did you already try other chips on the Tyan board?
Yup, and just triple-checked again. Texas Instruments TSB82AA2 IEEE-1394b Link Layer Controller (rev 01) in the same box captured video perfectly each of a dozen times attempted just now.
Jarod, it would be also interesting to confirm, if you did not already, that booting with mem=2G fixes the problem with the TSB43AB22/A. So we will be (if confirmed) in sync also with this finding. pg
Also, try IIDC capture too if you can spare a few minutes for that. We have seen IIDC and DV capture behave differently in bug 415841. But in this bug here I expect IIDC to fail very similarly to DV, i.e. frames will be corrupted or no frames received, while the DMA program keeps going.
(In reply to comment #120) > Jarod, it would be also interesting to confirm, if you did not already, that > booting with mem=2G fixes the problem with the TSB43AB22/A. > > So we will be (if confirmed) in sync also with this finding. Didn't try mem=2G, but did patch the driver to set a 31-bit DMA mask. Works just fine then. Also works just fine when forced into packet-per-buffer mode.
(In reply to comment #121) > Also, try IIDC capture too if you can spare a few minutes for that. We have > seen IIDC and DV capture behave differently in bug 415841. But in this bug here > I expect IIDC to fail very similarly to DV, i.e. frames will be corrupted or no > frames received, while the DMA program keeps going. Hrm. Just tried IIDC, and somehow or another, IIDC is working fine. (This is even with dvgrab attempts mixed in around it, all of which stalled).
(In reply to comment #123) > Hrm. Just tried IIDC, and somehow or another, IIDC is working fine. (This is > even with dvgrab attempts mixed in around it, all of which stalled). There's no visual corruption, and I could just be imagining things, but actually, IIDC seems to be a touch choppy. When I set the 31-bit DMA mask, video appears to be smoother.
(In reply to comment #124) > There's no visual corruption, and I could just be imagining things, but > actually, IIDC seems to be a touch choppy. When I set the 31-bit DMA mask, video > appears to be smoother. Well, this could be something similar to what I get sometimes together with those "buffer underrun" messages. Something is captured, but not really in a clean way. Maybe some buffer is above and some below the 2GiB border... pg
Created attachment 309397 [details] logging + allocation test This debug patch adds the __GFP_HIGHMEM flag to descriptor allocations and buffer allocations and logs when descriptor physical addresses or buffer physical addresses are located above 2G. The GFP flag causes my 945GM/ICH7 based system with 32bit kernel and 3.2GB usable RAM to use buffer addresses above 2G. But I did not get descriptor addresses above 2G yet. I captured 20GB from my TSB43AB22/A CardBus card, and dvgrab was entirely satisfied with what it got so far. This /may/ mean that the problem specifically depends on descriptors located at physical addresses above 2G, while the data buffer locations don't matter.
Created attachment 309398 [details] 31bit consistent DMA mask This patch only forces consistent allocations to be located below 2G. This influences allocations of descriptors and some others, but not data buffer allocations. pg or Jarod, please test this at your leisure to narrow the issue down a little bit further.
Hi, I just patched the module, rebooted (to have all memory free) and tried. It works! I could capture the DV streams and play them, without issues. Good point! pg
Created attachment 312363 [details] firewire: fw-ohci: TSB43AB22/A dualbuffer workaround Years later... Proposed patch, posted on lkml/linux1394-devel: http://lkml.org/lkml/2008/7/22/331
Hi all, I was just trying out kernel-2.6.26.2-14.fc9.x86_64, which seems to have the latest patch from Stefan. It seems this is working, I could capture the DV stream without issues. How is now? Will this issue go to --> update --> QA --> CLOSED or there is still something to do? Thanks, pg
This message is a reminder that Fedora 8 is nearing its end of life. Approximately 30 (thirty) days from now Fedora will stop maintaining and issuing updates for Fedora 8. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '8'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 8's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 8 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora please change the 'version' of this bug to the applicable version. If you are unable to change the version, please add a comment here and someone will do it for you. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
I changed version to 9, but I guess this can be officially closed, since the fix is working and included. Should I close or you do it, Jarod? Thanks, pg
Either one works, and since I'm commenting, I'll just close it too... :)