Description of problem: There appears to be a problem with libiec61883 in that any use of the library seems to result in a crash of the application with an error from glibc about a double-free or corruption: *** glibc detected *** ./firewire_tester: double free or corruption (top): 0x00000000006090a0 *** Version-Release number of selected component (if applicable): libiec61883-devel-1.1.0-1.fc7 libiec61883-1.1.0-1.fc7 libiec61883-debuginfo-1.1.0-1.fc7 libiec61883-utils-1.1.0-1.fc7 libraw1394-1.2.1-7.fc7 libraw1394-devel-1.2.1-7.fc7 glibc-devel-2.6-2 glibc-2.6-2 glibc-common-2.6-2 glibc-headers-2.6-2 How reproducible: 100%. It happens every time. Steps to Reproduce: 1. compile the attached program, firewire_tester, using the command in the comment 2. run the command (I was using: ./firewire_tester -p -n 0 -r 5 3. watch it crash (Note: I've got a DCT6200 cable box connected to my firewire port) Actual results: See attached output and gdb backtrace Expected results: Firewire should work and not crash. Additional info:
Created attachment 155633 [details] Firewire test application
Created attachment 155637 [details] Output from the firewire test Here's the output, printed on the terminal when I run the firewire tester.
Created attachment 155638 [details] Backtrace of the failure This backtrace might not be helpful. All it shows is that it crashed in iec61883_mpeg2_close(). I do have the libiec debuginfo package installed but it doesn't seem to print debug info. Strange. Maybe an x86_64 thing?
Blah, been meaning to get back to this one for weeks now... But yeah, I'm thinking this may well be an x86_64 thing, as my i686 box hooked to a DCT6200 cable box doesn't seem to have any issues. Kernel version could potentially come into play as well, so if possible, please verify this is still an issue with kernel 2.6.21-1.3228.fc7 or later. I'll have to get my x86_64 shuttle cube into the living room for a bit...
I wont be able to test for a while, because I'm on the road. I decided to run FC6 because I needed it to work (it's my in-production myth frontend and runs a backend to record from the STB). So, testing would require: backing up my FC6 configuration, then reloading my F7 image and updating to the current version and testing, all while nothing is scheduled to record off the STB (and I have the time). When I have this time I'll do it, but it'll be a while before I can.
Ah, no worries, I'll get things hooked up at my house. Neither the shuttle or my cable box is part of my own production myth setup right now, I just have to make room for the shuttle and hook it up.
Earlier, I shouldn't have said my i686 box has no issues. It does have issues, just no crash w/glibc double free... :) Anyhow, I got the x86_64 box hooked up last night. We're behaving slightly differently on x86_64 with kernel 2.6.21-1.3228.fc7 now: [root@prometheus ~]# ./firewire_tester -p -n 1 -r 5 Action: Test P2P connection 5 times, node 1, channel 1 P2P: Testing...Killed [root@prometheus ~]# Message from syslogd@ at Wed Jul 18 11:51:04 2007 ... prometheus kernel: Oops: 0000 [3] SMP Message from syslogd@ at Wed Jul 18 11:51:04 2007 ... prometheus kernel: CR2: ffffffffffffffea Still poking around...
The oops I'm seeing now is identical to the oops in bug 243081, and can be reproduced at will.
Created attachment 159665 [details] firewire_tester ran under gdb w/rawhide kernel So the glibc double free is back w/a rawhide kernel, but the backtrace is slightly different now...
*** Bug 240774 has been marked as a duplicate of this bug. ***
Out of my league here, punting to krh... Kristian, I can provide access via ssh to the crashing box (and probably can rig up serial console, if it would help).
With dvgrab-2.1-2.fc7.x86_64 libiec61883-1.1.0-1.fc7.x86_64 glibc-2.6-4.x86_64 kernel-2.6.23-0.43.rc0.git16.fc8.x86_64 the error message is a little different: dvgrab -i --format raw 2007play- Found AV/C device with GUID 0x00804580212881a0 ieee1394io.cc:456: In function "virtual bool iec61883Reader::StartReceive()": "iec61883_dv_fb_start( m_iec61883.dv, channel )" evaluated to -1 ieee1394io.cc:456: errno: 22 (Invalid argument) *** glibc detected *** dvgrab: double free or corruption (top): 0x0000000002196300 *** ======= Backtrace: ========= /lib64/libc.so.6<0x3e80670412> /lib64/libc.so.6(cfree+0x8c)<0x3e80673b1c> /usr/lib64/libiec61883.so.0(iec61883_dv_close+0x15)<0x3e89407d45> /usr/lib64/libiec61883.so.0(iec61883_dv_fb_close+0x11)<0x3e89407d91> dvgrab<0x412443> dvgrab<0x4111c5> dvgrab<0x42d4e1> dvgrab<0x420769> /lib64/libc.so.6(__libc_start_main+0xf4)<0x3e8061dab4> dvgrab(__gxx_personality_v0+0x209)<0x406b19> ...............
Now that f8t1 is out, could this issue get some attention. ATM, there's no way to use f8 for any video editing. We certainly can't import any video, and even editing is a problem because apps like kino also use libiec61883 for saving video. In any event, the fw subsystem just doesn't work, a major regression from fc7.
I think you mean a major regression from FC6! The firewire doesnt work in Fedora 7 either.
I stand corrected. Is this an upstream issue? or just Fedora? Looking at the linux1394 posts, I don't see anything on this. I'm hesitant to report it upstream since I don't know if it's a Fedora issue ( some odd interaction with Fedora glibc? ) or a general fw bug.
Fedora 7 debuted an entirely new firewire stack (primarily authored by krh), and not everything that worked in the old stack has been fully enabled in the new stack. Not sure exactly what the latest upstream status is on the new stack, but the good folks over at linux1394-devel do know all about it.
I believe this is fixed by the latest libraw1394 update pushed to f7 updates last night. The problem actually stemmed from a call libiec61883 was making over to libraw1349. Of course, you may well still not be able to get any video, as this bug appears to have only been triggering on ohci 1.0 firewire controllers, which unfortunately, aren't fully supported by the new firewire stack yet, but the double-free should be gone and you should see a warning about a failed ioctl instead. At the very least, everything finally works peachy in a system of mine with an ohci 1.1 firewire controller, and firewire_tester doesn't crash on my ohci 1.0-equipped system hooked to my own firewire-equipped cable box. Adding full 1.0 support is high on the TODO Real Soon Now list.
Thanks, but... Why are we using the experimental firewire stack in f7 and f8, when it breaks most of the video hardware out there - ohci 1.0? Even linux1394 advises against using it. Is anybody better off? I would have thought most ( almost all ? ) users of firewire are video users. But now we ship progs - dvgrab and kino - that can't be used with the most common hardware. Are there some users that actually see a clear benefit for the new stack?? Grumble, grumble.
In the long run, users should be better off with the much cleaner and easier to maintain codebase. At the moment, it may not look as rosy to end-users, since video + ohci 1.0 isn't working, but video + ohci 1.1 and storage + ohci 1.0 and 1.1 all work quite well now. We unfortunately underestimated the volume of ohci 1.0 controllers out there (the ohci 1.1 spec was released in 2000), and the ohci 1.0 spec isn't readily available anywhere, while the 1.1 version is. However, as mentioned over in bug 344851, we now have a decent understanding of what needs to be done to get video + ohci 1.0 support up and running, it just needs a bit of time and effort. As it happens, I've been putting in said time and effort the past few days, and I'm cautiously optimistic that with just a bit more, I may have something that works for 1.0 by week's end -- I'm still learning the code, reading the spec, digesting, etc., but I think I've got a decent grasp on it now... (btw, Fedora doesn't actually ship kino for assorted codec-related reasons, but I understand your point...)
Sorry, I know you're trying to do the right thing in the long run, but I have to point out that users don't care about the long run, they care about the here-and-now. End users tend to care more about the feature working than on the quality of the code or the maintainability thereof. So, no, an end user really DOESN'T care that the new code is much cleaner and easier to maintain. They DO care that they can't record off their cablebox. Having said that, I would have hoped that Fedora has the insight to back out this change when it was show to just not work at all. I'm still running FC6 on my myth box because I just don't want to risk my firewire not working. Indeed, I haven't updated the kernel for the same reason (I'm afraid that Fedora might have distributed the same broken 2.6.22 kernel with the bad firewire stack). I'm glad you're putting in the effort to finally fix this, but in the future I would hope that such major regressions could be avoided, or at best shipped with a switch so users could revert to the older (less clean but actually working) drivers.
We definitely should have got on top of this issue earlier, and I don't disagree with anything you said. We did sorta leave a lot of users hanging, and I feel bad about it, especially since the cable box side of things is sorta kinda near and dear to my own heart (though I've not actually been recording off mine for some time, since I've got plenty of HDTV capture cards...). Given that we were unable to put in the time to fix things earlier, reverting the stack until we could -- or at least giving users the fallback option -- certainly would have been more prudent. Btw, none of the updated Fedora Core 6 kernels should have the new firewire stack enabled (was accidentally enabled in one build after rebasing to 2.6.22, iirc, but never got pushed to updates), and the userspace is still built for the old stack. Hopefully, we do learn from this, though I don't know that we have any other major driver rewrites planned in the near future. :)
The glibc double-free issue is resolved, primary remaining problems are being tracked in other bugs. (bug 344851, ohci 1.0 controller support and bug 370931, a different dvgrab segfault).