243081 – dvgrab does not work with new firewire stack

Bug 243081 - dvgrab does not work with new firewire stack

Summary: dvgrab does not work with new firewire stack

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	dvgrab
Sub Component:
Version:	7
Hardware:	All
OS:	Linux
Priority:	low
Severity:	low
Target Milestone:	---
Assignee:	Jarod Wilson
QA Contact:
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	308741 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2007-06-07 08:27 UTC by Mladen Kuntner
Modified:	2008-05-06 12:34 UTC (History)
CC List:	11 users (show)
Fixed In Version:	2.6.24.5-85.fc8
Clone Of:
Environment:
Last Closed:	2008-04-29 20:54:42 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
firewire_ohci debug messages (14.18 KB, text/plain) 2008-03-28 00:03 UTC, Mladen Kuntner	no flags	Details
strace run of dvgrab (42.24 KB, text/plain) 2008-03-29 13:04 UTC, Mike Pope	no flags	Details
dmesg output (16.19 KB, text/plain) 2008-03-29 23:27 UTC, Mike Pope	no flags	Details
dvgrab -noavc, plus kernel logging (19.37 KB, text/plain) 2008-03-30 04:02 UTC, Mike Pope	no flags	Details
dvgrab -noavc -I /dev/fw1, plus kernel logging (10.61 KB, text/plain) 2008-03-30 04:05 UTC, Mike Pope	no flags	Details
kernel log from a working system (22.21 KB, text/plain) 2008-03-30 09:48 UTC, Stefan Richter	no flags	Details
gdb excerpt and command line run of dvgrab -noavc (740 bytes, text/plain) 2008-04-03 11:14 UTC, Mike Pope	no flags	Details
dvcont play with printk-s (12.26 KB, application/octet-stream) 2008-04-05 22:23 UTC, Mladen Kuntner	no flags	Details
/var/log/messages, dvgrab -noavc (24.60 KB, text/plain) 2008-04-10 11:26 UTC, Mike Pope	no flags	Details
Capture to GA-K8N onboard from Canon camera (22.26 KB, text/plain) 2008-04-11 21:28 UTC, Jarod Wilson	no flags	Details
dvcont play additional debug (5.37 KB, text/plain) 2008-04-12 14:24 UTC, Mladen Kuntner	no flags	Details
kernel log from a working system (22.76 KB, text/plain) 2008-04-12 17:16 UTC, Stefan Richter	no flags	Details
dmesg for kernel 2.6.24.4-82 (505 bytes, text/plain) 2008-04-16 22:59 UTC, Mladen Kuntner	no flags	Details
View All

Description Mladen Kuntner 2007-06-07 08:27:58 UTC

Description of problem:
dvgrab does not work with new firewire stack

Version-Release number of selected component (if applicable):
dvgrab-2.1-2.fc7
kernel-2.6.21-1.3194.fc7
libraw1394-1.2.1-7.fc7

How reproducible:
always

Steps to Reproduce:
1. run : dvgrab test
2. get video in test.avi

  
Actual results:
# dvgrab test
send oops
send oops
send oops
Error: no camera exists


Expected results:
no errors, video from camera in test.avi

Additional info:
when camera is turned on there is entry /dev/fw1 in /dev directory
testlibraw program is seeing the camera

Comment 1 Per Thomas Jahr 2007-06-11 19:50:25 UTC

I also had the exact same problems until I tried dvgrab as root. Then it 
worked! For me at least...

In FC6 I could also use dvgrab as a normal user so something has changed. 
Anyone knows how normal users can get access as well (some udev rules for /dev/
fwX I guess)?

mladen: does it work if you run dvgrab as root?

Comment 2 ElLocoGato 2007-06-11 21:15:04 UTC

dvgrab as normal user also worked great for me in FC6.

Running dvgrab as root is no go for me, however.  I get the behavior described
in Bug 240771 (https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=240771).

Comment 3 Mladen Kuntner 2007-06-12 16:28:53 UTC

Running as root is working for me but only if i run it like:
dvgrab --noavc ... and the tape is already running so it look like two bugs

1. not possible to run as normal user
2. libavc1394 does not work with my camera JVC GR-DVL157E 


I don't see bug 240771.

Comment 4 toddz 2007-06-13 17:20:12 UTC

I also experience Bug 240771 when attempting to run dvgrab as root.  I've added
my experiences and log files to bug 240771.

Comment 5 Mladen Kuntner 2007-06-27 08:23:38 UTC

Find advice that firewire is electrically noisy on dongle connectors.
Put old firewire card in that has direct connectors.

lspci -v :
01:07.0 FireWire (IEEE 1394): VIA Technologies, Inc. IEEE 1394 Host Controller
(rev 43) (prog-if 10 [OHCI])
        Subsystem: VIA Technologies, Inc. IEEE 1394 Host Controller
        Flags: bus master, medium devsel, latency 32, IRQ 19
        Memory at f8000000 (32-bit, non-prefetchable) [size=2K]
        I/O ports at c000 [size=128]
        Capabilities: [50] Power Management version 2

And the result is worse:
[root@localhost hack]# dvgrab
Found AV/C device with GUID 0x0080880100a306bb

Message from syslogd@ at Wed Jun 27 10:15:18 2007 ...
localhost kernel: Oops: 0000 [1] SMP 
Message from syslogd@ at Wed Jun 27 10:15:18 2007 ...
localhost kernel: CR2: ffffffffffffffea

going back to onboard nvidia

Comment 6 Mladen Kuntner 2007-06-30 07:59:34 UTC

On suggestion from fedora-devel-list added Kristian Høgsberg to cc

Comment 7 Kristian Høgsberg 2007-07-05 18:47:43 UTC


*** This bug has been marked as a duplicate of 240771 ***

Comment 8 Mladen Kuntner 2007-07-06 17:30:14 UTC

I must disagree with your decision.
I think that this are two separate bugs.
Can you please read:
https://www.redhat.com/archives/fedora-devel-list/2007-June/msg02697.html

summary:
- test-dv is working on onboard nvidia controller
- on onboard nvidia controller there is no kernel oops
and in 50% dvgrab is working ok.

If you need any additional data please let me know.

thanks mladen

Comment 9 Jarod Wilson 2007-09-27 14:43:32 UTC

*** Bug 308741 has been marked as a duplicate of this bug. ***

Comment 10 Jarod Wilson 2007-10-24 13:12:28 UTC

Please re-test with the latest libraw1394 just pushed to f7 updates and let me
know if that makes any significant difference.

Comment 11 Mladen Kuntner 2007-10-24 19:33:40 UTC

not working as user trying as root:

**** try 1 *****
[root@localhost test]# dvgrab --duration 20 test
Found AV/C device with GUID 0x0080880100a306bb
ioctl call failed, retval = -1
ieee1394io.cc:426: In function "virtual bool iec61883Reader::StartReceive()":
"iec61883_dv_fb_start( m_iec61883dv, channel )" evaluated to -1
ieee1394io.cc:426: errno: 22 (Invalid argument)
""     0.00 MB 0 frames                                                         
Capture Stopped

and hanging the terminal ( ctrl c not working..)

this one is with: 
01:07.0 FireWire (IEEE 1394): VIA Technologies, Inc. IEEE 1394 Host Controller
(rev 43)
(which is OHCI 1.0 and i think don't have support in kernel)

**** try 2 *****
[root@localhost test]# dvgrab --duration 20 test
send oops
send oops
send oops
send oops
send oops
send oops
send oops
send oops
send oops
send oops
send oops
send oops
send oops
send oops
rom1394_1 warning: read failed: 0x0000fffff000040c
rom1394_1 warning: read failed: 0x0000fffff0000410
Found AV/C device with GUID 0x01ff317701ff3177
send oops
send oops
send oops
send oops
""     0.00 MB 0 frames                                                         
Capture Stopped                                                                 
Error: no DV

this one with: 
01:0a.0 FireWire (IEEE 1394): Texas Instruments TSB82AA2 IEEE-1394b Link Layer
Controller (rev 01)
(which is OHCI 1.1 and supported in kernel)

Both FireWire cards are working ok on vanilla 2.6.23 kernel and old stack.

Comment 12 Jarod Wilson 2007-10-25 01:00:05 UTC

(In reply to comment #11)
> not working as user trying as root:

I think we require some udev updates (which are in f8, where this does work) to
get that working.

> **** try 1 *****
> [root@localhost test]# dvgrab --duration 20 test
> Found AV/C device with GUID 0x0080880100a306bb
> ioctl call failed, retval = -1
> ieee1394io.cc:426: In function "virtual bool iec61883Reader::StartReceive()":
> "iec61883_dv_fb_start( m_iec61883dv, channel )" evaluated to -1
> ieee1394io.cc:426: errno: 22 (Invalid argument)
> ""     0.00 MB 0 frames                                                         
> Capture Stopped
> 
> and hanging the terminal ( ctrl c not working..)
> 
> this one is with: 
> 01:07.0 FireWire (IEEE 1394): VIA Technologies, Inc. IEEE 1394 Host Controller
> (rev 43)
> (which is OHCI 1.0 and i think don't have support in kernel)

Correct, 1.0 controllers aren't yet supported. As for the hang, not quite sure
what that's about. I think I've seen that happen, but I believe its cleared up
by the latest dvgrab 3.0 build in f8. I'm respinning the same for f7 right now...

Ah, done. Please see if this dvgrab works better for you:

http://koji.fedoraproject.org/packages/dvgrab/3.0/2.fc7/

> **** try 2 *****
> [root@localhost test]# dvgrab --duration 20 test
> send oops
> send oops
> send oops
> send oops
> send oops
> send oops
> send oops
> send oops
> send oops
> send oops
> send oops
> send oops
> send oops
> send oops
> rom1394_1 warning: read failed: 0x0000fffff000040c
> rom1394_1 warning: read failed: 0x0000fffff0000410
> Found AV/C device with GUID 0x01ff317701ff3177
> send oops
> send oops
> send oops
> send oops
> ""     0.00 MB 0 frames                                                         
> Capture Stopped                                                                 
> Error: no DV
> 
> this one with: 
> 01:0a.0 FireWire (IEEE 1394): Texas Instruments TSB82AA2 IEEE-1394b Link Layer
> Controller (rev 01)
> (which is OHCI 1.1 and supported in kernel)

The 'send oops' stuff is new to me... Is this by chance a powerpc system? If so,
yet another libraw1394 update coming up shortly should fix the lack of video
captured there. Just got dv capture working on my own powerbook earlier today,
need to backport the f8 fixes to f7 and push a build.

Please also upgrade libraw1394 to this build, particularly if that's a ppc system:

http://koji.fedoraproject.org/packages/libraw1394/1.2.1/11.fc7/

Comment 13 Mladen Kuntner 2007-10-25 20:18:38 UTC

It is not powerpc.
It is:
2.6.22.1-27.fc7 #1 SMP Tue Jul 17 17:19:58 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux

libraw1394-1.2.1-10.fc7
libavc1394-0.5.3-1.fc6
dvgrab-3.0-2.fc7

I try again with Texas Instruments TSB82AA2 controller and maximum that i can get
is:

[root@localhost test]# dvgrab --noavc --duration 20 test
send oops
send oops
send oops
send oops
send oops
send oops
send oops
send oops
send oops
send oops
send oops
send oops
send oops
send oops
rom1394_1 warning: read failed: 0x0000fffff000040c
rom1394_1 warning: read failed: 0x0000fffff0000410
Found AV/C device with GUID 0x01ff317701ff3177
Capture Started

"test001.avi": buffer underrun near: timecode 00:00:32.24 date 2007.10.13 09:24:33
This error means that the frames could not be written fast enough.
"test001.avi":    72.33 MB 500 frames timecode 00:00:53.00 date 2007.10.13 09:24:53
Capture Stopped
Warning: 1 dropped frames.

This is with --noavc option for dvgrab and camera already playing.
Captured avi file is ok.
But this is not new it works like that from FC7 test 1.
I think error is in libavc1394?

(i could not try new libra1394 because of dependencies)

Comment 14 Jarod Wilson 2007-10-25 20:31:04 UTC

Erm, what dependencies? The libraw1394-1.2.1-11.fc7 package should cleanly
upgrade -10.fc7. What exactly do you get for output when you try that upgrade?
If it doesn't cleanly upgrade, there could well be some bad bits in the mix,
interfering with things...

Comment 15 Jarod Wilson 2007-10-25 21:05:10 UTC

Your upgrade failure sounds suspiciously similar to bug 353231 (and others filed
earlier), which are due to some 3rd-party packages being installed.

Comment 16 Jarod Wilson 2007-10-25 21:22:22 UTC

If the upgrade failure does mirror bug 353231, please get rid of any 3rd-party
libraw1394 bits and make sure you're not loading the ieee1394 driver stack, only
the juju firewire driver stack. I'd forgot ATrpms was packaging up a libraw1394
w/the old ieee1394 support enabled instead of juju and ieee1394-kmdl packages.
Mixing and matching the two is bound to cause bad things to happen, you have to
settle on one or the other. And I can only do anything about the juju bits.

Comment 17 Mladen Kuntner 2007-10-26 19:04:23 UTC

I downloaded libraw1394-1.2.1-11.fc7.x86_64.rpm
double click gives me:
Missing Dependency: libraw1394.so.8 is needed by package libavc1394
Missing Dependency: libraw1394 = 1.2.1-10.fc7 is needed by package libraw1394-devel
Missing Dependency: libraw1394.so.8 is needed by package libiec61883
Missing Dependency: libraw1394.so.8 is needed by package libfreebob

There were for sure no 3rd-party bits when I was testing FC7 test1 and
I don't think that there are any now.
I tried that also on clean FC7 32 bit install and on clean LFS build
and result was the same.

I could try rpm -e --nodeps libraw1394 and then installing new one.

Comment 18 Jarod Wilson 2007-10-26 19:18:32 UTC

OK, hrm, seems you have libraw1394-devel-1.2.1-10.fc7 installed as well as
libraw1394-1.2.1-10.fc7. You'll have to either remove libraw1394-devel or also
download the newer libraw1394-devel and upgrade both libraw1394 and
libraw1394-devel simultaneously, then I think you should be all set. For
sanity's sake, just check that there's nothing unexpected in the output of 'rpm
-qa |grep libraw'.

Of course, since you're not on ppc, I'm not all that hopeful that this is going
to make a difference, but we might as well be certain...

Comment 19 Mladen Kuntner 2007-10-26 20:18:30 UTC

I did: yum remove libraw1394-devel but stil get:

Missing Dependency: libraw1394.so.8 is needed by package libavc1394
Missing Dependency: libraw1394.so.8 is needed by package libiec61883
Missing Dependency: libraw1394.so.8 is needed by package libfreebob

# rpm -qa | grep libraw
libraw1394-1.2.1-10.fc7
libraw1394-1.2.1-10.fc7

should i try rpm -e --nodeps libraw1394 and then installing new one?

Comment 20 Jarod Wilson 2007-10-26 20:57:25 UTC

Aha, I see. You've got a 64-bit system with both the i386 and x86_64 versions of
libraw1394 installed. You'll have to either grab *both* arch libraw1394 upgrade
packages and upgrade the both of them, or yum remove the one of the arch you
don't need.

Comment 21 Mladen Kuntner 2007-10-27 17:15:06 UTC

I did:
rpm -e --nodeps libraw1394.i386
rpm -e --nodeps libraw1394.x86_64

then installed
libraw1394-1.2.1-11.fc7

but nothing changed.

As for "send oops" I think it is a bus reset between the send command and the
answer from camera (see comment #5).

Comment 22 Jarod Wilson 2007-11-20 22:22:43 UTC

I just re-read the entire bug, and I think I may have been slightly confused
and/or missed something at one point, so just to clarify... The main thing we're
trying to fix is that your OHCI 1.1 controller only captures video 50% of the
time, right? Also, when it does capture, are you always seeing the dropped frame
(or frames)?

The non-working OHCI 1.0 controller is a separate known issue, tracked in bug
344851.

Comment 23 Mladen Kuntner 2007-11-25 10:35:30 UTC

As you can see from the dates this is long story.
There are really 3 bugs that i at that time did not know about:

1. firewire (testlibraw, dvgrab, dvcont...) working only as root
- this one is still there in fully updated FC8

2. firewire not working on OHCI 1.0 card 
- this one is still there but as you said is bug 344851

3. the original bug ( "send oops" messages and dvgrab working about 50% of time)
- this one is still there in fully updated FC8 and now it is 100% NOT working

kernel: 2.6.23.1-49.fc8 #1 SMP x86_64 x86_64 x86_64 GNU/Linux
libraw: libraw1394-1.3.0-3.fc8
dvgrab: dvgrab-3.0-2.fc8

in dmesg with no camera i see:
firewire_core: created new fw device fw0 (0 config rom retries, S800)

when camera is connected this lines are added:
firewire_core: created new fw device fw1 (0 config rom retries, S200)
firewire_core: phy config: card 0, new root=ffc1, gap_count=5
firewire_core: BM lock failed, making local node (ffc0) root.
firewire_core: phy config: card 0, new root=ffc0, gap_count=5

( what "BM lock failed" means? something wrong here? )

now i try to work with camera as root:

testlibraw - > looks ok

dvcont play
send oops
send oops
send oops
Could not find any AV/C devices on the 1394 bus.

this is not working 

dvgrab test
send oops
send oops
send oops
Error: no camera exists

not working 

dvgrab --noavc
send oops
send oops
send oops
Error: invalid source specified

not working

Conclusion: things changed in FC8 but to worse, firewire (dvgrab,dvcont) is now
100% not working on my machine.

Comment 24 Mladen Kuntner 2008-01-01 18:51:40 UTC

It is a kernel bug:

this code is from linux-2.6.24.rc5
/drivers/firewire/fw-ohci.c

	/* FIXME: Document how the locking works. */
	if (ohci->generation != packet->generation) {
		if (packet->payload_length > 0)
			dma_unmap_single(ohci->card.device, payload_bus,
					 packet->payload_length, DMA_TO_DEVICE);
		packet->ack = RCODE_GENERATION;
		return -1;
	}

When i run: dvcont play
i get into this code and after that "send oops" is displayed.

If i just comment out this part of code 
dvcont commands (play, stop ..) works ok.

"dvgrab test" starts camera, makes avi file but in the end i get a:
dvgrab[2264]:segfault ....  

So it looks like error in this part of code?

Comment 25 Mladen Kuntner 2008-01-08 22:19:01 UTC

One more interesting thing:

the VIA OHCI 1.0 controller is working now (kernel 2.6.24.rc5)
without any changes in code,

But dvgrab is segfaulting in the end.
Captured video is ok.

Comment 26 Mladen Kuntner 2008-01-12 19:57:06 UTC

Answering the request for testing in redhat-devel mailing list:

with kernel 2.6.24-0.147.rc7.git2.fc9 we have:

VIA controller:
  -32bit - video capture OK - segfault at the end
  -64bit - video capture OK
NVIDIA controller:
  -32bit - send oops - video capture not working
  -64bit - send oops - video capture not working

Comment 27 Stefan Richter 2008-01-12 20:58:04 UTC

> NVIDIA controller:
>   -32bit - send oops - video capture not working
>   -64bit - send oops - video capture not working

If it is NVidia nForce2, then that's bug 244576, unfortunately.

Segfault of dvgrab on exit is known but not yet debugged.  It's harmless though.

Comment 28 Mladen Kuntner 2008-01-13 08:59:40 UTC

It is:
01:0a.0 FireWire (IEEE 1394): Texas Instruments TSB82AA2 IEEE-1394b Link Layer
Controller (rev 01) (prog-if 10 [OHCI])
        Subsystem: Giga-byte Technology GA-K8N Ultra-9 Mainboard
        Flags: bus master, medium devsel, latency 32, IRQ 18
        Memory at f8004000 (32-bit, non-prefetchable) [size=2K]
        Memory at f8000000 (32-bit, non-prefetchable) [size=16K]
        Capabilities: [44] Power Management version 2

It is on mainboard with NVidia chipset.

Comment 29 Stefan Richter 2008-01-13 09:37:28 UTC

>> If it is NVidia nForce2,
> 
> It is:
> 01:0a.0 FireWire (IEEE 1394): Texas Instruments TSB82AA2 IEEE-1394b Link Layer
...
> It is on mainboard with NVidia chipset.

That's different.  TI OHCI 1.1 controllers, TSB82AA2 in particular, are supposed
to work with dvgrab.  Mine does.

The mainboard's chipset behind the FireWire controller doesn't matter to the
FireWire drivers, except perhaps if there are IOMMUs or/and very complicated bus
bridges which might expose driver bugs that are hidden on simpler machines.
(CardBus bridges and PCIe to PCI bridges are no problem, but at least a PCI-X to
PCI multiplexer has been reported to make trouble.)

On the other hand, I read that some mainboards equipped with 1394b/ FireWire 800
controller exhibited SBP-2 storage performance which was worse than 1394a/
FireWire 400 controllers.  Unless the testers were fooled by Windows, those
mainboards evidently had a buggy FireWire implementation.

Comment 30 Mladen Kuntner 2008-01-13 09:55:36 UTC

Please look comment 24.

At one moment there is a packet with payload_length 0
and ohci->generation 1 bigger then packet->generation

and after that i get "send oops".

If i just comment out that code dvgrab is working??

Comment 31 Stefan Richter 2008-01-13 10:43:17 UTC

> Please look comment 24.
> 
> At one moment there is a packet with payload_length 0
> and ohci->generation 1 bigger then packet->generation
> 
> and after that i get "send oops".
> 
> If i just comment out that code dvgrab is working??

That code is asynchronous transmission.  Either dvgrab wanted to send a request,
or fw-core wanted to send a request or response.  It is certainly the former.

ohci->generation > packet->generation means that there was a bus reset after
dvgrab or fw-core generated the packet but before fw-ohci was able to enqueue
the packet.  In this case, we definitely must not enqueue the packet.  The code
which you quoted is correct and has to stay.  There is obviously a bug somewhere
else (apparently higher up) which needs to be fixed.  Whoever sent the request
or reply has to handle the transaction abortion due to RCODE_GENERATION
appropriately, according to whatever application-level protocol is involved.

Comment 32 Stefan Richter 2008-01-13 11:04:58 UTC

> Whoever sent the request or reply has to handle the transaction abortion
> due to RCODE_GENERATION appropriately, according to whatever
> application-level protocol is involved.

PS:  Either this condition doesn't happen on the reporter's setup with the old
driver stack due to whatever differences in timing, or there are differences how
old and new stack present such a transaction abortion to dvgrab.

Comment 33 Jarod Wilson 2008-01-13 18:41:16 UTC

(In reply to comment #28)
> It is:
> 01:0a.0 FireWire (IEEE 1394): Texas Instruments TSB82AA2 IEEE-1394b Link Layer
> Controller (rev 01) (prog-if 10 [OHCI])
>         Subsystem: Giga-byte Technology GA-K8N Ultra-9 Mainboard
>         Flags: bus master, medium devsel, latency 32, IRQ 18
>         Memory at f8004000 (32-bit, non-prefetchable) [size=2K]
>         Memory at f8000000 (32-bit, non-prefetchable) [size=16K]
>         Capabilities: [44] Power Management version 2
> 
> It is on mainboard with NVidia chipset.

Not sure how I didn't notice this before, but I have the exact same motherboard
in my possession, complete with the same controller. However, its in my personal
mail and web server, so I'm somewhat reluctant to do any extensive testing on
it... But I'll throw some updated firewire drivers on it and see what I can see.

Comment 34 Jarod Wilson 2008-01-30 05:13:14 UTC

Finally got an updated kernel (2.6.23.14-119.fc8) booted on the box, but I've
been fixated on storage of late. Thus far, sbp2 stuff works great on it. :)

Will try to mix in some dv testing soon though. Some of the bits regarding
mismatched generations *might* be fixed in rawhide, less sure if they're fixed
in the kernel I'm running on this box now...

Comment 35 Jarod Wilson 2008-02-04 14:55:39 UTC

Finally got around to dvgrab on my GA-K8N... The results I get are identical to
those in the attachment to bug 370931
(https://bugzilla.redhat.com/attachment.cgi?id=251301).

Comment 36 Bastien Montagne 2008-03-06 20:21:03 UTC

Hi, new tests:

With Kernel 2.6.25-0.54.rc2.fc9:

#at start-up:

[fedora@fedora ~]$ dmesg | grep firewire
firewire_ohci: Added fw-ohci device 0000:00:0d.0, OHCI version 1.10
firewire_ohci: Added fw-ohci device 0000:02:0c.0, OHCI version 1.0
firewire_core: created device fw0: GUID 00e0180000cc2afe, S400
firewire_core: created device fw1: GUID 00308d0f00bc03e4, S400

#still same problem when pluging in the camera on my (motherboard, nForce2)
firewire:

firewire_core: BM lock failed, making local node (ffc0) root.
firewire_core: phy config: card 0, new root=ffc0, gap_count=5

#ad libitum and system freeze...


#when pluging in the camera (DV500+, after a "reset"):

firewire_ohci: node ID not valid, new bus reset in progress
firewire_core: created device fw2: GUID 00008500008cec1e, S100
firewire_core: phy config: card 1, new root=ffc1, gap_count=5
firewire_core: BM lock failed, making local node (ffc0) root.
firewire_core: phy config: card 1, new root=ffc0, gap_count=5
firewire_core: phy config: card 1, new root=ffc1, gap_count=5
firewire_core: giving up on config rom for node id ffc1

#no more messages after running dvgrab!

dvgrab:

[root@fedora fedora]# dvgrab -i ./ttt.avi
Found AV/C device with GUID 0x00008500008cec1e
Going interactive. Press '?' for help.

"stdout": buffer underrun near: timecode 00:4875865:-1993832910.00 date
2008.03.04 09:36:35
This error means that the frames could not be written fast enough.
Capture Started" ff:ff:ff:ff ""          sec                                    
"./ttt004.avi":    26.62 MiB 184 frames timecode 00:250000000:-1076310716.03
date 2008.03.04 09:36:45
Capture Stopped
Warning: 1 dropped frames.
Erreur de segmentation:ff:ff ""          sec                                    

#Same problems as previous comment: "segfault" at living (and "buffer underrun"
whem starting)!
#But no more "dmesg" when leaving...


#gstreamer:

[root@fedora fedora]# gst-launch dv1394src ! decodebin name=d ! queue !
audioconvert ! audioresample ! alsasink d. ! ffmpegcolorspace ! xvimagesink
Setting pipeline to PAUSED ...
Pipeline is live and does not need PREROLL ...
Setting pipeline to PLAYING ...
New clock: GstSystemClock
Caught interrupt -- handling interrupt.
Interrupt: Setting pipeline to PAUSED ...
Execution ended after 33318467000 ns.
Setting pipeline to PAUSED ...
Setting pipeline to READY ...
Caught SIGSEGV accessing address 0xb6886004
#0  0x00110402 in _start () from /lib/ld-linux.so.2
#1  0x0065647b in ?? ()
#2  0x009c1218 in ?? ()
#3  0x00000000 in ?? ()
Spinning.  Please run 'gdb gst-launch 3145' to continue debugging, Ctrl-C to
quit, or Ctrl-\ to dump core.

#Here everything work well too, until the exit (and another "segfault"!)
#I can't quit by closing the vidéo window: I have to do 'ctrl-C' in the terminal
#I can't do a dump (french keyboard: '\' with 'altgr', doesn't work...)
#The video window doesn't even close until I close the terminal!

So, few progresses since kernel 2.24... I'll continue testing!

Bastien.

Comment 37 Stefan Richter 2008-03-06 21:49:45 UTC

Bastien, if the FireWire controller itself is the nForce2 integrated one (i.e.
not a different controller on an otherwise nForce2 based motherboard), then it
is unlikely that the driver updates do any good for you --- until there is some
specific work done for bug 244576.  (Of course sometimes a seemingly unrelated
update may also unexpectedly fix a bug, but it is very unlikely that this
happens in the nForce2 case, due to the special nature of the nForce2 quirk.)

Comment 38 Bastien Montagne 2008-03-06 22:30:50 UTC

Stefan, with nForce2, nothing's working;
The dvgrab and gstreamer dumps are done when I'm using my Pinacle DV500 card!

Bastien.

PS:I know this bug report isn't about the nForce2 bug -- I won't talk about it here!

Comment 39 Jarod Wilson 2008-03-23 04:30:56 UTC

Okay, I've just now finally managed to free up my own GA-K8N system from its
other duties and plan to do some more prodding on this issue in the upcoming week.

Comment 40 Mike Pope 2008-03-24 11:22:24 UTC

FWIW as another data point:
# lspci | grep 1394
04:05.0 FireWire (IEEE 1394): Texas Instruments TSB43AB22/A IEEE-1394a-2000 
Controller (PHY/Link)
<plug old JVC camcorder in>
# tail /var/log/messages
...
Mar 24 20:39:37 malbec kernel: firewire_core: created device fw1: GUID 
0080880100a05168, S200
Mar 24 20:39:37 malbec kernel: firewire_core: phy config: card 0, new 
root=ffc1, gap_count=5
# dvgrab
send oops
send oops
send oops
Error: no camera exists

Comment 41 Jarod Wilson 2008-03-24 12:30:09 UTC

Mike, you might also want to get on the cc list for bug 435550, which
specifically deals with the TI chipset you've got there. Might ultimately be the
same issue for both that bug and this one, not sure yet.

Comment 42 Stefan Richter 2008-03-24 17:25:31 UTC

Bug 435550 is different from comment #40 (Mike's comment).  Mike's dvgrab fails
while trying to read the config ROMs of the nodes on the bus.  Bug 435550 is
about failing isochronous reception after having the camcorder successfully
detected.

Comment 43 Jarod Wilson 2008-03-24 17:35:28 UTC

Hm... I was thinking the camera's rom had been read, and it was set up as device
fw1...

Comment 44 Jarod Wilson 2008-03-24 17:41:06 UTC

Yep, GUID 0x008088 is "VICTOR COMPANY OF JAPAN, LTD.", aka JVC.

Mike, it would be good to know exactly which kernel that was with too, since the
firewire drivers have been changing quite rapidly of late.

Comment 45 Jarod Wilson 2008-03-24 17:45:46 UTC

Just looked at 435550 again... Seems there are several different problems that
have evolved over time there. Ugh. Can't keep 'em all straight in my head anymore...

Comment 46 Stefan Richter 2008-03-24 22:27:47 UTC

comment #42:
>> Mike's dvgrab fails while trying to read the config ROMs of the nodes
>> on the bus.
comment #43:
> Hm... I was thinking the camera's rom had been read, and it was set up
> as device fw1...

Dvgrab issues its own read requests.  (I believe this is different from IIDC
cameras and libdc1394 clients because AFAIK libdc1394 has an abstraction for
config ROM reads and is thus able to use Juju's config ROM cache.)

For a while I have also seen dvgrab saying that there was no camera on the bus,
despite fw-core being able to probe it and creating fw1 and fw1.0 for the
camcorder.  This happened on x86-64 with Gentoo's dvgrab-3.1, while the same
combination of dvgrab + libraw1394 + kernel (except this being x86-32) still
worked on a different PC with various cards.  Today, dvgrab works again on my
x86-64 PC, but I am too lazy to find out what fixed it.

Comment 47 Stefan Richter 2008-03-24 22:39:36 UTC

> For a while I have also seen dvgrab saying that there was no camera
> on the bus, despite fw-core being able to probe it and creating fw1
> and fw1.0 [...]  Today, dvgrab works again on my x86-64 PC, but I am
> too lazy to find out what fixed it.

PS:  This is mainline kernel plus latest firewire development patches, i.e.
minimally ahead of firewire drivers in latest fedora kernel.  The two gap count
related fixes which I sent today are in principle relevant, but comment #40
shows that there is no gap count problem on Mike's setup.

Comment 48 Mike Pope 2008-03-25 00:57:17 UTC

In reply to Jarod at #44, I am not with the machine in question right now, but 
the kernel would be the latest F8 as I made sure it was fully up to date (at 
least modulo the local mirror in .AU), so kernel-2.6.24.3-34.fc8 (x86_64).

Stefan is right in #42, dvgrab is failing to recognize the camcorder, not 
failing to read it.  I traced dvgrab into raw1394util.c:discoverAVC(), as far 
as the line where it calls rom1394_get_node_type().  This is after a 
successful call to rom1394_get_directory(), so it looks like its at least 
trying to read the ROM, and thinking it succeeds, but the data may be bad.

Comment 49 Jarod Wilson 2008-03-25 02:01:43 UTC

Re comment #46: ah, didn't know that dvgrab was trying to do its own thing...

Re comment #48: I think -50.fc8 was pushed to updates over the weekend, so yeah,
might not have hit all the mirrors yet. Particularly since you're on x86_64, it
would be interesting to see how it does with -50...

Comment 50 Mladen Kuntner 2008-03-25 21:58:14 UTC

Hi.

I have kernel 2.6.25-0.136.rc6.git5.fc9 x86_64.
dvgrab still not working.

Comment 51 Mike Pope 2008-03-27 11:31:46 UTC

OK, -50.fc8 arrived this evening, but the behaviour remains identically 
broken.

Comment 52 Stefan Richter 2008-03-27 13:42:16 UTC

Mike, could you provide further diagnostics:

# logger "== begin test =="
# echo -1 > /sys/module/firewire_ohci/parameters/debug

<plug camcorder in>

# dvgrab

Then post the contents of /var/log/messages (I presume that's the correct log;
am not 100% sure) from the beginning of the test till its end.  Thanks.

Comment 53 Stefan Richter 2008-03-27 19:08:17 UTC

Re comment #52:
Mladen, ditto.  :-)

Comment 54 Mladen Kuntner 2008-03-28 00:03:20 UTC

Created attachment 299412 [details]
firewire_ohci debug messages

Comment 55 Mike Pope 2008-03-29 13:04:23 UTC

Created attachment 299571 [details]
strace run of dvgrab

Stefan, switching on that debug flag does not produce anything new:

...
Mar 29 22:59:22 malbec root: === begin test ===
Mar 29 22:59:31 malbec kernel: firewire_core: created device fw1: GUID 
0080880100a05168, S200
Mar 29 22:59:31 malbec kernel: firewire_core: phy config: card 0, new 
root=ffc1, gap_count=5
Mar 29 22:59:38 malbec root: === end test ===
...

On the user side, I have narrowed the trouble down to the call to 
avc1394_check_subunit_type, as this ltrace excerpt shows:

...
raw1394_get_nodecount(0x2040040, 0x20462f8, 0, 0, 0xfefefefefefefeff) = 2
rom1394_get_directory(0x2040040, 1, 0x7fff6e050380, 0, 0xfefefefefefefeff) = 0
rom1394_get_node_type(0x7fff6e050380, 0x2046a60, 0xfefefefeff40442f, 0x414530, 

0
xfefefefefefefeff) = 2
avc1394_check_subunit_type(0x2040040, 1, 0x200000, 0x414530, 
0xfefefefefefefeffs
end oops
send oops
send oops
) = 0
raw1394_get_nodecount(0x2040040, 0, 0x3c81f529e0, 0x2046b80, 0x2046b90) = 2
...

Hopefully the attached strace run is helpful, albeit full of cryptic ioctls.

Comment 56 Stefan Richter 2008-03-29 13:22:03 UTC

> switching on that debug flag does not produce anything new

Probably because kernel log messages at "debug" level don't go to
/var/log/messages.  ``grep -rl "selfID 0" /var/log´´ may find the right file. 
If not, the output of dmesg will contain it.

> Hopefully the attached strace run is helpful, albeit full of cryptic ioctls.

The drawback is that we don't get to see the data passed in via ioctl.

Comment 57 Stefan Richter 2008-03-29 14:23:36 UTC

> Created an attachment (id=299412) [edit]
> firewire_ohci debug messages

Thanks.  I probably need to get myself the AV/C specs to understand this.
Trouble begins apparently after the config ROM was successfully read (lots of
quadlet reads, local node is ffc0, camera is ffc1).  Then one AV/C register
write (?) is successfully performed:

AT spd 1 tl 1d, ffc0 -> ffc1, ack_pending , BW req, fffff0000b00 8,0
AR spd 1 tl 1d, ffc1 -> ffc0, ack_complete, W resp

Now we get in turn a write request from the camera:

AR spd 1 tl 1d, ffc1 -> ffc0, ack_pending , BW req, fffff0000d00 8,0

but which ever software is supposed to service the address range covering
fffff0000d00 does not sending out the required "W resp" (write response).
However, there has to be a software which installed an address range handler for
it, otherwise we should see fw-core sending a write response (with address error
rcode).

Instead, dvgrab proceeds issuing write requests to fffff0000b00 which the
camera's transaction layer doesn't accept for processing:

AT spd 1 tl 1e, ffc0 -> ffc1, ack_busy_X, BW req, fffff0000b00 8,0
AT spd 1 tl 1f, ffc0 -> ffc1, ack_busy_X, BW req, fffff0000b00 8,0
AT spd 1 tl 00, ffc0 -> ffc1, ack_busy_X, BW req, fffff0000b00 8,0

until dvgrab gives up.

Would you have more luck with "dvgrab -noavc"?

Comment 58 Mike Pope 2008-03-29 23:27:15 UTC

Created attachment 299598 [details]
dmesg output

grep finds nothing, but attached is the contents of dmesg from the point the
camcorder firewire cable is plugged in, through a run of dvgrab, to unplug.

Comment 59 Stefan Richter 2008-03-29 23:54:37 UTC

Thanks.  Same as Mladen's log, except that ffc1 is the local node and ffc0 is
the camera, and there is already an earlier write request from the camera which
the local node doesn't respond to.  After that, the camera still responds to
config ROM reads and one register write, but it becomes unresponsive after its
second write request passes without response from the PC.

Ditto, you might be able to get further with "dvgrab -noavc".

Comment 60 Mike Pope 2008-03-30 04:02:44 UTC

Created attachment 299607 [details]
dvgrab -noavc, plus kernel logging

dvgrab -noavc fails, albeit differently.  The run and kernel logging are all in
the attachment, now that the kernel has been convinced to write debug messages
to /var/log/messages.

Comment 61 Mike Pope 2008-03-30 04:05:09 UTC

Created attachment 299609 [details]
dvgrab -noavc -I /dev/fw1, plus kernel logging

Following on, dvgrab -noavc -I /dev/fw1 also fails, albeit seemingly in the
"fails to get data and hangs" rather than "fails to find device" mode,
previously seen near the top of this bug.

Comment 62 Stefan Richter 2008-03-30 09:30:57 UTC

> + dvgrab -noavc -I /dev/fw1
> ""     0.00 MB 0 frames
> Capture Stopped

"-I file" is probably wrong, because this most certainly has to be a previously
captured DV stream, or a stream like the output of the old dv1394 driver.

Instead, try the following:
Connect camera,
start replay (or recording if in webcam mode),
$ dvgrab -noavc -interactive

Dvgrab should print
    Found AV/C device with GUID 0x0080880100a05168
    Going interactive. Press '?' for help.

It may also print a buffer underrun which should be harmless.

'?' will show the accepted keys.  Press 'c' for capture.  dvgrab should print
    Capture started

It should now write into the file dvgrab-001.dv.  After a while, press 'escape'
to stop capture.  dvgrab should print something like
    "dvgrab-001.dv":   972.84 MiB 7084 frames timecode
    -1075085276:-1075085544:-1210018631.168909024 date 2008.03.30 10:11:28
    Capture Stopped

and maybe the number of dropped frames.  Then quit dvgrab with 'q'.


[...testing this myself...]

Hmm, this way, i.e. trying to "capture" without sending a "start" command, still
involves several block write requests.  I will attach the messages which I get.

Comment 63 Stefan Richter 2008-03-30 09:48:26 UTC

Created attachment 299614 [details]
kernel log from a working system

local node: ffc1, camera: ffc0

I also get several block write requests to the camera which end in ack_busy_X.
But the subsequent retry of such an attempt then succeeds with write response
from the camera.

And the following here right after "created device fw3" is _not_ caused by
dvgrab:
AT spd 0 tl 07, ffc1 -> ffc0, ack_complete, BW req, fffff0000b00 8,0
AR spd 0 tl 07, ffc0 -> ffc1, ack_pending , BW req, fffff0000d00 8,0
AT spd 0 tl 07, ffc1 -> ffc0, ack_complete, W resp

There are BTW two flavors of write transactions:  "Split transactions" look
like this if successful:
ack_pending , BW req
ack_complete, W resp
"Unified transactions" look like this if successful:
ack_complete, BW req

My camcorder happens to support the latter kind for write transactions with the
PC as requester and the camera as responder.

No I got to find out what's at offset fffff0000d00 and why my kernel is
responding at it but yours isn't.
$ uname -a
Linux stein 2.6.25-rc6 #12 SMP PREEMPT Sun Mar 23 00:55:50 CET 2008 i686
Intel(R) Core(TM)2 CPU T7200 @ 2.00GHz GenuineIntel GNU/Linux
(I have got patches as in linux1394-2.6.git in this kernel.)

Comment 64 Mike Pope 2008-03-30 11:27:15 UTC

> $ dvgrab -noavc -interactive
> 
> Dvgrab should print
>    Found AV/C device with GUID 0x0080880100a05168
>    Going interactive. Press '?' for help.

No, this just fails immediately, with the same error message as 
without -interactive.  No time right now to look hard at the debug messages, 
but that looks very similar.

Comment 65 Stefan Richter 2008-03-30 12:00:06 UTC

>> $ dvgrab -noavc -interactive
> No, this just fails immediately, with the same error message as 
> without -interactive.  No time right now to look hard at the debug messages, 
> but that looks very similar.

That's plausible.

I need to find out why the PC is not sending the write response on your and
Mladen's setup.

Comment 66 Mladen Kuntner 2008-03-30 22:41:41 UTC

Please see #24.

As i see it it is sending response but with wrong generation.
Then it goes out with error instead of doing right thing.

Possible?

Comment 67 Stefan Richter 2008-03-30 23:55:44 UTC

> Please see #24.
> 
> As i see it it is sending response but with wrong generation.

Right... this bug entry has become too long...

The response generation is supposed to come from the last "AR evt_bus_reset,
link internal".  Why is there a wrong generation?  BTW, one of my cards is a
"Texas Instruments TSB82AA2 IEEE-1394b Link Layer Controller (rev 01)" too, and
it works perfectly.  This would be so much easier if I had an affected system
myself.

Could you add a printk to fw-ohci.c to show the ohci->generation and
packet->generation when they are compared in at_context_queue_packet?  And while
you are at it, maybe you find other places where a printk would tell us
something about how it goes wrong.

Explanation of the generation thing:  The bus generation is a counter which
increases at each bus reset.  We need the generation to fulfill the rule that a
bus reset terminates a transaction if there wasn't a response sent before the
bus reset.  IOW a response is not accepted anymore if a bus reset happened
between request and response.  The OHCI controller tells us the generation in
two ways (three if we count very pedantic):  By way of the self ID complete
event (implemented in fw-ohci.c's bus_reset_tasklet) and by way of synthesized
asynchronous packets which are inserted in the asynchronous request receive
buffer (implemented in fw-ohci.c's handle_ar_packet by the
OHCI1394_evt_bus_reset).  This second way is necessary because processing of the
receive buffers might be delayed WRT self ID receive events, and we could assign
a too new generation to received packets while processing the receive buffer if
it weren't for that synthesized delimiter packet.

Hmm, if the request from the camera came in right after a bus reset, I could
imagine that there might be a race condition where we execute the
ar_request_ctx.tasklet earlier than the bus_reset_tasklet and could attempt to
send a response with what fw-ohci would consider a "future" generation.  But in
your logs, the request happens many interrupts after the last bus reset, so if
there is such a race condition, it didn't hit your system.

Comment 68 Mladen Kuntner 2008-03-31 19:51:47 UTC

I cannot try now but from memory:

after reboot and camera turned on on first dvgrab call
ohci->generation and packet->generation fere off by 1 ( i think 4 and 5 ).

I think something is wrong in bus reset when camera is turned on.
Maybe it is not controller but some timing issue ( i have old S200 camera ) ?

As for affected system could you use ssh to my machine?
two options:
1. LFS system 
2. Fedora rawhide

Comment 69 Mike Pope 2008-04-03 11:14:09 UTC

Created attachment 300219 [details]
gdb excerpt and command line run of dvgrab -noavc

I just had a moment for a quick look at dvgrab with gdb, and provoked different
behaviour again (see attached) --- this time it actually finds the device,
although running the same dvgrab -noavc from the command line still fails as
ever.  Anyway if you want to know what is actually being sent in the ioctls I
can put some instrumentation in now if it would help.

Comment 70 Mladen Kuntner 2008-04-05 22:23:21 UTC

Created attachment 301399 [details]
dvcont play with printk-s

As suggested i put few printk-s in fw-ohci.
Command executed is "dvcont play"

As told before at first, response ohci and packet generations
are not equal and we get error.

Is it really ok to check generation for response?
Comment in handle_ar_packet says we don't?

Comment 71 Mike Pope 2008-04-10 11:26:41 UTC

Created attachment 301967 [details]
/var/log/messages, dvgrab -noavc

Still broken at 2.6.24.3-50.fc8.

Comment 72 Stefan Richter 2008-04-10 20:27:48 UTC

Re comment #70:
> Created an attachment (id=301399)
> dvcont play with printk-s
> 
> As suggested i put few printk-s in fw-ohci.
> Command executed is "dvcont play"

Thanks.  Since the whole log shows only ohci->generation 5 for lots of outgoing
requests, and only one ohci_send_response in the middle of a bunch of
(successful?) transmissions of requests happens --- and this has a
packet->generation of 4 --- it looks like the OHCI controller inserted a PHY
packet with a wrong generation.

When you got the time, please log what values are assigned...
...to ohci->request_generation in handle_ar_packet(),
...to ohci->generation in bus_reset_tasklet()
whenever these assignments are performed.


> Is it really ok to check generation for response?
> Comment in handle_ar_packet says we don't?

I tried it to explain in comment #67.  Let me put it in some other words.  The
drivers have to fulfill the following jobs to ensure that neither outbound
transactions nor inbound transactions wrap over bus resets.  This is done in
differing ways:

1. Outbound transactions = local node sends a request packet, remote node sends
a response packet.  When we receive a response packet, we look into our stash of
incomplete transactions if there is one with the same transaction label like the
transaction label of this response packet.

The remote node is supposed to not send a response to any request which it got
before the last bus reset.  But we cannot rely on the remote node getting it
right all the time.  We avoid handling responses which violate that rule by
discarding all pending transactions when a bus reset happens.

2. Inbound transaction = remote node sends a request packet, local node sends a
response packet.  Now we have the task to not send a response anymore if a bus
reset happened after that request.

However, it's not as easy as it sounds.  The incoming requests are lined up in a
DMA buffer with room for several of those.  That buffer is processed not fully
synchronously with bus reset events.  But the OHCI chip helps us here by
inserting a bus reset marker into the packet buffer (in the form of a
synthesized bus reset packet as per OHCI 1.1 clause 8.4.2.3).  This way we know
exactly which requests came before a bus reset and which ones came after a reset.

The bus reset packet contains the generation number which is current at the tine
the bus reset packet is inserted.  This generation is recorded in fw-ohci's
variable ohci->request_generation and stored into all subsequently received
request packets.

When fw-core generates response packets, it uses the request packet's generation
as the response's generation.  fw-ohci on the other hand will prevent the
sending of responses if a bus reset happened after the respective request, i.e.
if the current generation is different from the request's == response's generation.

Comment 73 Stefan Richter 2008-04-10 21:04:03 UTC

> the OHCI chip helps us here by inserting a bus reset marker into
> the packet buffer (in the form of a synthesized bus reset packet
> as per OHCI 1.1 clause 8.4.2.3).

These bus reset packets are logged as "AR evt_bus_reset, link internal" in
attachment 301967 [details].  We should add logging of the selfIDGeneration of these
packets so that we can check if the genereation of those is the same as the
nearest self id complete event ( = properly finished bus reset).  Right now we
only see that there are as many bus reset packets received as there are self ID
complete events.  I will post a patch soon.

In attachment 301967 [details], we again see some incoming requests which our drivers do
not respond to.  Like the one in the last line of the following part of the log:
Apr 10 20:50:32 malbec kernel: firewire_core: phy config: card 0, new root=ffc1,
gap_count=5
Apr 10 20:50:32 malbec kernel: firewire_core: phy config: card 0, new root=ffc1,
gap_count=5
Apr 10 20:50:32 malbec kernel: AT ack_complete, phy config packet, 01c50000
Apr 10 20:50:32 malbec kernel: AR evt_bus_reset, link internal
Apr 10 20:50:32 malbec kernel: AR evt_bus_reset, link internal
Apr 10 20:50:32 malbec kernel: firewire_ohci: 2 selfIDs, generation 22
Apr 10 20:50:32 malbec kernel: selfID 0: 80458992, phy 0 [p-.] S400 gc=5 +15W Lci
Apr 10 20:50:32 malbec kernel: selfID 0: 814548d4, phy 1 [c--] S200 gc=5 +0W Lc
Apr 10 20:50:32 malbec kernel: AT spd 1 tl 0b, ffc0 -> ffc1, ack_pending , BW
req, fffff0000b00 8,0
Apr 10 20:50:32 malbec kernel: AR spd 1 tl 0b, ffc1 -> ffc0, ack_complete, W resp
Apr 10 20:50:32 malbec kernel: AR spd 1 tl 0b, ffc1 -> ffc0, ack_pending , BW
req, fffff0000d00 8,0

ffc0 is the local node here.  The last block write request from the remote node,
transaction label 0b, is never concluded with a response packet ("W resp") from
ffc0 to ffc1 with the same transaction label (tl) 0b.

I see no reason why our node does not send a response.  It might be because of
the generation mismatch which Mladen reported.

Even though our failure to respond is a problem, the remote node should
nevertheless time out his pending request eventually.  However, it appears to
have trouble handling this situation because we see at some point later that it
begins to answer _our_ requests to it with ack_busy_X.

However, I see another seemingly unrelated but serious problem in this log:
Apr 10 20:50:32 malbec kernel: firewire_core: created device fw1: GUID
0080880100a05168, S200
Apr 10 20:50:32 malbec kernel: firewire_core: created device fw1: GUID
0080880100a05168, S200

WTH is this logged twice?

Comment 74 Mike Pope 2008-04-10 22:38:08 UTC

> However, I see another seemingly unrelated but serious problem in this log:
> Apr 10 20:50:32 malbec kernel: firewire_core: created device fw1: GUID
> 0080880100a05168, S200
> Apr 10 20:50:32 malbec kernel: firewire_core: created device fw1: GUID
> 0080880100a05168, S200
>
> WTH is this logged twice?

I think its a false alarm.  The camcorder on/mode switch is designed for 
aliens with small dexterous fingers--- I had a bit of a struggle turning it 
on.

Comment 75 Stefan Richter 2008-04-10 23:09:08 UTC

>> WTH is this logged twice?
> 
> I think its a false alarm.  The camcorder on/mode switch is designed
> for aliens with small dexterous fingers--- I had a bit of a struggle
> turning it on.

That can explain series of (self ID complete event; read config ROM with many AT
and AR events; created device)^n, but it does not explain a sequence of (created
device)^n alone.

Jarod, what is this kernel doing?  (Or am I missing something obvious?  It's
past 01:00 AM here now.)

Comment 76 Jarod Wilson 2008-04-11 04:07:04 UTC

I'm quite confused now too. There are quite a few things that appear to be
getting logged multiple times... Whether that's a logging oddity or the driver
really is trying to log those things multiple times, I dunno... Its certainly
something I've never seen before. I really need to swipe the office dv camera
back from krh and hook it up to a few rawhide boxes to see what I can see...

Another odd thing in the log that I don't understand:

AT spd 1 tl 0b, ffc0 -> ffc1, ack_pending , QR req, fffff0000490
AR spd 1 tl 0b, ffc1 -> ffc0, ack_complete, QR resp = 00000000

OK, quadlet read request, quadlet read response, transaction label 0b...
Everything looks more or less okay up to this point...

AT spd 1 tl 0c, ffc0 -> ffc1, ack_pending , BW req, fffff0000b00 8,0
AR spd 1 tl 0c, ffc1 -> ffc0, ack_complete, W resp

Block write request and response, transaction label 0c...

AR spd 1 tl 0c, ffc1 -> ffc0, ack_pending , BW req, fffff0000d00 8,0

Now another response with transaction label 0c? Huh? (and yet, we reply with an
ack_pending?). Much weirdness...

AT spd 1 tl 0d, ffc0 -> ffc1, ack_busy_X, BW req, fffff0000b00 8,0
AT spd 1 tl 0e, ffc0 -> ffc1, ack_busy_X, BW req, fffff0000b00 8,0
AT spd 1 tl 0f, ffc0 -> ffc1, ack_busy_X, BW req, fffff0000b00 8,0

And of course from there, we go off into the woods.

Now, this is possibly relevant: I didn't recognize offsets 0xb00 and 0xd00 at
first glance... The code says they are CSR_FCP_COMMAND and CSR_FCP_RESPONSE
respectively. The ieee1394 driver stack has a number of functions that deal with
them (FCP == function control protocol), but there isn't a lick of code in the
firewire stack that explicitly touches them...

Okay, just did a bit more digging... The FCP regs appear to be specific to the
AV/C spec, they aren't mentioned anywhere that I can see in either the OHCI 1.1,
1394-1995, 1394a-200 or 1394b-2002 specs, or any other number of docs I've poked
at thus far.

First thing I've found that explicitly mentions them is this:
http://www.sfc.wide.ad.jp/~koh39/paper/dps2004.pdf

I suspect this might be stuff covered in the iec61883 specs or another 1394ta
spec...

Comment 77 Stefan Richter 2008-04-11 07:01:59 UTC

> There are quite a few things that appear to be getting logged
> multiple times...

Some things do happen twice, like bus resets.  (Bus generation increases by two
after those.)  But I'm not sure whether it is OK that we see two "firewire_core:
phy config: card 0, new root=ffc1, gap_count=5" right after another.


> AT spd 1 tl 0c, ffc0 -> ffc1, ack_pending , BW req, fffff0000b00 8,0
> AR spd 1 tl 0c, ffc1 -> ffc0, ack_complete, W resp
> 
> Block write request and response, transaction label 0c...
> 
> AR spd 1 tl 0c, ffc1 -> ffc0, ack_pending , BW req, fffff0000d00 8,0
> 
> Now another response with transaction label 0c? Huh? (and yet, we
> reply with an ack_pending?).

This is OK.  One was a transaction with local node as requester and remote node
as responder, the other a transaction with the roles swapped.  Transaction
labels correspond to ordered pairs of requester node ID and responder node ID.


> AT spd 1 tl 0f, ffc0 -> ffc1, ack_busy_X, BW req, fffff0000b00 8,0
> 
> And of course from there, we go off into the woods.

I occasionally see ack_busy_X with my DV camera too, but the next retry of the
request then always succeeds.  But in this log the request receive pipe of the
camera is clogged and never opens up again.  The reason /could/ be the fact that
the PC did not respond to previous write requests from the camera.

The fact that the PC did not respond may be a bug in the drivers or the controller.

BTW, my cheap JVC DV camera is a little bit more optimized as it concludes 8
byte block write requests from the PC with ack_complete (unified transaction)
instead ack_pending + write response (split transaction).  But which transaction
type is used shouldn't matter for the correctness of operation.


> The FCP regs appear to be specific to the AV/C spec, [...]
> I suspect this might be stuff covered in the iec61883 specs

Yes, the FCP part of the AV/C spec went into IEC 61883-1 AFAIK.

Comment 78 Stefan Richter 2008-04-11 07:05:34 UTC

Re comment #73:
Patch which shows bus reset packet generation in the AR event log:
http://lkml.org/lkml/2008/4/10/370

Comment 79 Jarod Wilson 2008-04-11 13:47:56 UTC

(In reply to comment #77)
> > There are quite a few things that appear to be getting logged
> > multiple times...
> 
> Some things do happen twice, like bus resets.  (Bus generation increases by two
> after those.)  But I'm not sure whether it is OK that we see two "firewire_core:
> phy config: card 0, new root=ffc1, gap_count=5" right after another.

I've seen it logged twice from time to time and its not been a problem, but its
not been amongst a fairly large group of events that are all duplicated, like
the very top of the log -- i.e., the phy config x2, created fw1 x2, phy config
x2, bm lock fail x2, phy config x2 part.


> > AT spd 1 tl 0c, ffc0 -> ffc1, ack_pending , BW req, fffff0000b00 8,0
> > AR spd 1 tl 0c, ffc1 -> ffc0, ack_complete, W resp
> > 
> > Block write request and response, transaction label 0c...
> > 
> > AR spd 1 tl 0c, ffc1 -> ffc0, ack_pending , BW req, fffff0000d00 8,0
> > 
> > Now another response with transaction label 0c? Huh? (and yet, we
> > reply with an ack_pending?).
> 
> This is OK.  One was a transaction with local node as requester and remote node
> as responder, the other a transaction with the roles swapped.  Transaction
> labels correspond to ordered pairs of requester node ID and responder node ID.

Ah, I was unaware of (or forgot about) the req and resp nodes mattering in
transaction label uniqueness.

> > AT spd 1 tl 0f, ffc0 -> ffc1, ack_busy_X, BW req, fffff0000b00 8,0
> > 
> > And of course from there, we go off into the woods.
> 
> I occasionally see ack_busy_X with my DV camera too, but the next retry of the
> request then always succeeds.  But in this log the request receive pipe of the
> camera is clogged and never opens up again.  The reason /could/ be the fact that
> the PC did not respond to previous write requests from the camera.

Indeed, my assumption is the camera is still waiting for the response to that
write request and rejecting all our later requests w/the ack_busy_x.

> The fact that the PC did not respond may be a bug in the drivers or the
controller.

Given that we don't seem to have anything in the way of FCP handling code in the
firewire stack, I'm leaning slightly toward us failing to do something in the
driver, but I'm not quite sure yet what... And it certainly could still be a
controller issue...

> BTW, my cheap JVC DV camera is a little bit more optimized as it concludes 8
> byte block write requests from the PC with ack_complete (unified transaction)
> instead ack_pending + write response (split transaction).  But which transaction
> type is used shouldn't matter for the correctness of operation.

I finally dragged my GA-K8N box into the office, so I'll get the Canon we've got
in the office hooked up to its on-board controller soonish. If it *is* a
controller issue Mladen is hitting, and my parsing of this massive bug is
correct... Then I'll at least finally be using the same controller as him...
Presumably a different camera, of course.

> > The FCP regs appear to be specific to the AV/C spec, [...]
> > I suspect this might be stuff covered in the iec61883 specs
> 
> Yes, the FCP part of the AV/C spec went into IEC 61883-1 AFAIK.

I've emailed someone over at 1394ta.org to see what it takes to get ahold of
iec61883 specs and/or their other AV/C stuff. Hopefully, not gobs and gobs of
cash... I'm sure this has been done before, but I figure it doesn't hurt to try
again.

Comment 80 Stefan Richter 2008-04-11 14:22:49 UTC

>> The fact that the PC did not respond may be a bug in the drivers or
>> the controller.
> 
> Given that we don't seem to have anything in the way of FCP handling
> code in the firewire stack, I'm leaning slightly toward us failing to
> do something in the driver, but I'm not quite sure yet what... And it
> certainly could still be a controller issue...

As long as userspace does not install an FCP handler, the fw-core transaction
layer should generate a response with address error return code.  The fact that
no such response is visible in the AT event log could for example be caused by
the generation issue which Mladen reports.

> If it *is* a controller issue Mladen is hitting, and my parsing of this
> massive bug is correct... Then I'll at least finally be using the same
> controller as him... Presumably a different camera, of course.

My 1394b card has the same lspci signature as Mladen's, i.e. "Texas Instruments
TSB82AA2 IEEE-1394b Link Layer Controller (rev 01)".  But so far I always got
the same generation in the bus reset packet as in the corresponding self ID
complete event.  I suppose I and Mladen may nevertheless have different chip
revisions though.

> I've emailed someone over at 1394ta.org to see what it takes to get
> ahold of iec61883 specs and/or their other AV/C stuff. Hopefully, not
> gobs and gobs of cash... I'm sure this has been done before, but I
> figure it doesn't hurt to try again.

I wasn't "there" at the time, but the interwebs show that this was a FAQ when
AV/C were turned into IEC norms.  webstore.iec.ch currently lists IEC
61883-1:2008-02 as PDF for CHF 193 ~ USD 193.

Comment 81 Jarod Wilson 2008-04-11 20:33:06 UTC

So... As I'd said before, I finally freed up a box at home that has the exact
same motherboard (so far as I can tell) as Mladen, right down to lspci output
that is 100% identical to comment #28.

But with a Canon camera hooked up right now, I can capture video just fine (save
the dvgrab segfault on exit). Kernel is 2.6.25-0.218.rc8.git7.fc9.x86_64. Ugh.

Comment 82 Jarod Wilson 2008-04-11 21:28:57 UTC

Created attachment 302172 [details]
Capture to GA-K8N onboard from Canon camera

Comment 83 Mladen Kuntner 2008-04-12 14:24:51 UTC

Created attachment 302220 [details]
dvcont play additional debug

re: comment 72

Thank you for sharing knowledge.

Attached dmesg with requested printk-s.
Actions done:
1. modprobe
2. turn camera on
3 dvcont play

Comment 84 Stefan Richter 2008-04-12 16:54:06 UTC

Attachment 302220 [details] shows that your TSB82AA2 repeatedly inserts a bus reset packet
whose generation is one off into the future:

   - The only time it is correct is right at the start, where the generation
     from the packet as well as the generation of the self ID buffer is 0.

   - Then presumably a series of two bus resets happens.

   - After that, the bus rest packet's generation already advanced to 2,
     while the self ID buffer's generation is still 1.

Comment 85 Stefan Richter 2008-04-12 17:16:41 UTC

Created attachment 302236 [details]
kernel log from a working system

This log shows a system with two controllers, an unused VT6307 and a TSB82AA2.
(The log was generated with the patch from comment #78.  I indented the log
lines which belong to the VIA controller.)

Superficially, this setup seems to work.  But my TSB82AA2 is in fact defective
too:

  - When I turn on the camera, it causes two bus resets.

  - The bus reset packets' generations advance from 1 to 2 and stay at 2,
    while the self ID buffer's generation advances from 1 to 3.
    From this point on, we wouldn't be able to send responses.	Luckily
    we don't get requests yet.

  - Later, the bus manager code resets the bus for gap count optimization.
    This bus reset rescues the setup, because after it the bus reset
    packet's generation is 4, and the self ID buffer's generation is 4.

(One other comment on this log:  The block write request to fffff0000b00 =
CSR_FCP_COMMAND at the camera at Apr 12 16:50:07, shortly after the camera's
device file was created, was caused by the hald service.  This is on Gentoo
Linux.)

My conclusion so far:  This bus reset packet thing in the OHCI spec is
ingenious.  Alas we can't really use this feature because at least TSB82AA2's
implementation is buggy.

Some more things which I noticed in a few further tests:

  - This accident with the double bus reset and transient generation mismatch
is about 90% repeatable if only the camcorder is attached to the TSB82AA2.  If
for example a 1394b hub is also plugged in into the card, switching on the
camera causes only a normal single bus reset and the generations stay in sync.

  - A NEC 1394a cardbus card + the camcorder also experiences the double bus
reset when the camcorder is switched on.  But this card sets the generation
correctly in the second bus reset.  VT63063, VT6307, FW323 so far only had
single bus resets and, like the NEC card, always kept bus reset packet
generation and self ID buffer generation in sync.

  - IRQ event logs from TSB82AA2, VT63063, VT6307, NEC-something, FW323 usually
show in case of a bus reset that the request-AR interrupt with the bus reset
packet happens first, then the self ID complete interrupt.  On two or three
cards I also saw the request-AR event and self ID complete event combined in a
single interrupt.  I never saw that the self ID complete IRQ came before the
request AR IRQ.

Because we can neither rely on a strict order of request AR event vs. self ID
complete event, nor on an order of execution of tasklets, it is hard or
impossible to check whether the bus reset packet's generation is off by one.

Comment 86 Stefan Richter 2008-04-12 17:33:22 UTC

Since Mike has a TSB43AB22/A, I guess that it has the same bug as Mladen's
or/and perhaps my TSB82AA2.

Comment 87 Stefan Richter 2008-04-12 18:10:42 UTC

Mike, please post the output of "lspci -nn | grep TSB".

Comment 88 Stefan Richter 2008-04-12 19:27:35 UTC

Proposed fix: http://lkml.org/lkml/2008/4/12/174

Comment 89 Jarod Wilson 2008-04-13 02:58:40 UTC

I'm tacking the proposed patch (well, the subsequent updated version) onto the
rawhide kernel right now... Okay, should be in kernel-2.6.25-0.226.rc9.fc9 and
later.

Comment 90 Mike Pope 2008-04-13 10:45:03 UTC

[root@malbec ~]# lspci -nn | grep TSB
04:05.0 FireWire (IEEE 1394) [0c00]: Texas Instruments TSB43AB22/A 
IEEE-1394a-2000 Controller (PHY/Link) [104c:8023]

Comment 91 Jarod Wilson 2008-04-15 19:34:50 UTC

Stefan's patch now added to kernel-2.6.24.4-81.fc8, currently working its way
through the build system.

Comment 92 Jarod Wilson 2008-04-16 20:15:23 UTC

Please try with kernel-2.6.24.4-82.fc8 and report results back here...

Comment 93 Mladen Kuntner 2008-04-16 22:59:19 UTC

Created attachment 302681 [details]
dmesg for kernel 2.6.24.4-82

attached dmesg for camera on sequence

dvcont play

and 

dvgrab -d 10 test

working OK :) (segfault at the end but this is another story)

Comment 94 Stefan Richter 2008-04-16 23:51:29 UTC

Good, TSB82AA2 down, one to go.  Mike, how about TSB43AB22?

Comment 95 Mike Pope 2008-04-17 00:22:26 UTC

There was no sign of kernel-2.6.24.4-82.fc8 in my .au mirror this morning 
before I left for work two hours ago.  Hopefully this evening I will have 
something to report.

Comment 96 Jarod Wilson 2008-04-17 01:50:30 UTC

Mike, -82 hasn't been pushed as an update, it won't be on any mirrors any time
soon, its currently only available via pulling it straight from the build system:

http://koji.fedoraproject.org/packages/kernel/2.6.24.4/82.fc8/

Comment 97 Mike Pope 2008-04-17 13:07:43 UTC

OK, -82 looks good.  dvgrab is now detecting the camcorder and capturing .avi.  
Out of time though, thorough check that its all fixed tomorrow.

Comment 98 Mike Pope 2008-04-18 13:17:02 UTC

I have done a capture of a full tape and checked the results.  Looks like you 
have nailed it.  My congratulations to all.

Comment 99 Jarod Wilson 2008-04-18 15:44:11 UTC

Excellent, glad to hear we've finally got this one figured out! I'm going to go
ahead and close out this bug. 2.6.24.4-84.fc8 is the latest and greatest built
in brew (and actually reverts one firewire bit in -82 that can cause sbp2
issues), I'll see if we can get that or something later pushed as an actual
update shortly...

Comment 100 Jarod Wilson 2008-04-18 16:23:14 UTC

I'm told we should have a formal kernel update carrying this fix pushed within a
few days (basically, once 2.6.24.5 is out).

Comment 101 Stefan Richter 2008-04-18 18:17:51 UTC

Fixed 10 months after report.  This is not even too bad for upstream IEEE 1394
driver standards...  Thanks guys, for staying persistent.

Comment 102 Fedora Update System 2008-04-22 00:04:26 UTC

kernel-2.6.24.5-85.fc8 has been submitted as an update for Fedora 8

Comment 103 Fedora Update System 2008-04-29 20:54:16 UTC

kernel-2.6.24.5-85.fc8 has been pushed to the Fedora 8 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 104 Stefan Richter 2008-05-05 17:37:27 UTC

Mike, could you please tell us with how much RAM you are (successfully) using
the TSB43AB22(A), and what the chipset of the motherboard is?

There is another bug filed where a TSB43AB22A does not work properly in
isochronous reception if buffers are located at physical addresses above 2GB
(bug 435550).  It would be interesting to know if anybody else uses the same
chip successfully with 3GB or more RAM present.

Comment 105 Mike Pope 2008-05-06 11:13:41 UTC

RAM = 2GB, motherboard = ASUS M2NPV-VM, chipset = nvidia geforce 6150 GPU / 
nvidia nforce 430 MCP

Comment 106 Stefan Richter 2008-05-06 12:34:35 UTC

This is exactly the board for which bug 435550 was reported; but the bug happens
only with more than 2GB RAM.  (We still haven't figured out who is the culprit
of that bug:  Drivers or controller or board or a combination.  Hence I am
looking for someone with the same controller but /different/ board + 3GB or more
RAM.)

Note You need to log in before you can comment on or make changes to this bug.