Bug 241729 - firewire stack crashes with glibc double free
Summary: firewire stack crashes with glibc double free
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: libiec61883
Version: rawhide
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Jarod Wilson
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2007-05-29 21:11 UTC by Derek Atkins
Modified: 2007-11-30 22:12 UTC (History)
5 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2007-11-20 14:55:13 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
Firewire test application (9.94 KB, text/plain)
2007-05-29 21:11 UTC, Derek Atkins
no flags Details
Output from the firewire test (2.89 KB, text/plain)
2007-05-29 21:14 UTC, Derek Atkins
no flags Details
Backtrace of the failure (678 bytes, text/plain)
2007-05-29 21:17 UTC, Derek Atkins
no flags Details
firewire_tester ran under gdb w/rawhide kernel (4.06 KB, text/plain)
2007-07-20 16:45 UTC, Jarod Wilson
no flags Details

Description Derek Atkins 2007-05-29 21:11:30 UTC
Description of problem:

There appears to be a problem with libiec61883 in that any use of the library
seems to result in a crash of the application with an error from glibc about a
double-free or corruption:

*** glibc detected *** ./firewire_tester: double free or corruption (top):
0x00000000006090a0 ***

Version-Release number of selected component (if applicable):

libiec61883-devel-1.1.0-1.fc7
libiec61883-1.1.0-1.fc7
libiec61883-debuginfo-1.1.0-1.fc7
libiec61883-utils-1.1.0-1.fc7
libraw1394-1.2.1-7.fc7
libraw1394-devel-1.2.1-7.fc7
glibc-devel-2.6-2
glibc-2.6-2
glibc-common-2.6-2
glibc-headers-2.6-2

How reproducible:

100%.  It happens every time.

Steps to Reproduce:
1. compile the attached program, firewire_tester, using the command in the comment
2. run the command (I was using:  ./firewire_tester -p -n 0 -r 5
3. watch it crash

(Note:  I've got a DCT6200 cable box connected to my firewire port)
  
Actual results:

See attached output and gdb backtrace

Expected results:

Firewire should work and not crash.

Additional info:

Comment 1 Derek Atkins 2007-05-29 21:11:30 UTC
Created attachment 155633 [details]
Firewire test application

Comment 2 Derek Atkins 2007-05-29 21:14:26 UTC
Created attachment 155637 [details]
Output from the firewire test

Here's the output, printed on the terminal when I run the firewire tester.

Comment 3 Derek Atkins 2007-05-29 21:17:03 UTC
Created attachment 155638 [details]
Backtrace of the failure

This backtrace might not be helpful.  All it shows is that it crashed in
iec61883_mpeg2_close().  I do have the libiec debuginfo package installed but
it doesn't seem to print debug info.  Strange.	Maybe an x86_64 thing?

Comment 4 Jarod Wilson 2007-07-17 15:16:16 UTC
Blah, been meaning to get back to this one for weeks now... But yeah, I'm
thinking this may well be an x86_64 thing, as my i686 box hooked to a DCT6200
cable box doesn't seem to have any issues. Kernel version could potentially come
into play as well, so if possible, please verify this is still an issue with
kernel 2.6.21-1.3228.fc7 or later. I'll have to get my x86_64 shuttle cube into
the living room for a bit...

Comment 5 Derek Atkins 2007-07-17 15:20:12 UTC
I wont be able to test for a while, because I'm on the road.  I decided to run
FC6 because I needed it to work (it's my in-production myth frontend and runs a
backend to record from the STB).  So, testing would require:  backing up my FC6
configuration, then reloading my F7 image and updating to the current version
and testing, all while nothing is scheduled to record off the STB (and I have
the time).  When I have this time I'll do it, but it'll be a while before I can.

Comment 6 Jarod Wilson 2007-07-17 15:35:16 UTC
Ah, no worries, I'll get things hooked up at my house. Neither the shuttle or my
cable box is part of my own production myth setup right now, I just have to make
room for the shuttle and hook it up.

Comment 7 Jarod Wilson 2007-07-18 15:56:23 UTC
Earlier, I shouldn't have said my i686 box has no issues. It does have issues,
just no crash w/glibc double free... :)

Anyhow, I got the x86_64 box hooked up last night. We're behaving slightly
differently on x86_64 with kernel 2.6.21-1.3228.fc7 now:

[root@prometheus ~]# ./firewire_tester -p -n 1 -r 5
Action: Test P2P connection 5 times, node 1, channel 1
P2P: Testing...Killed
[root@prometheus ~]# 
Message from syslogd@ at Wed Jul 18 11:51:04 2007 ...
prometheus kernel: Oops: 0000 [3] SMP 
Message from syslogd@ at Wed Jul 18 11:51:04 2007 ...
prometheus kernel: CR2: ffffffffffffffea

Still poking around...

Comment 8 Jarod Wilson 2007-07-18 21:16:56 UTC
The oops I'm seeing now is identical to the oops in bug 243081, and can be
reproduced at will.

Comment 9 Jarod Wilson 2007-07-20 16:45:00 UTC
Created attachment 159665 [details]
firewire_tester ran under gdb w/rawhide kernel

So the glibc double free is back w/a rawhide kernel, but the backtrace is
slightly different now...

Comment 10 Jarod Wilson 2007-07-20 18:33:22 UTC
*** Bug 240774 has been marked as a duplicate of this bug. ***

Comment 11 Jarod Wilson 2007-07-20 18:34:59 UTC
Out of my league here, punting to krh... Kristian, I can provide access via ssh
to the crashing box (and probably can rig up serial console, if it would help).

Comment 12 sean 2007-07-23 14:17:12 UTC
With 
dvgrab-2.1-2.fc7.x86_64
libiec61883-1.1.0-1.fc7.x86_64
glibc-2.6-4.x86_64
kernel-2.6.23-0.43.rc0.git16.fc8.x86_64

the error message is a little different:

dvgrab -i --format raw 2007play-
Found AV/C device with GUID 0x00804580212881a0
ieee1394io.cc:456: In function "virtual bool iec61883Reader::StartReceive()":
"iec61883_dv_fb_start( m_iec61883.dv, channel )" evaluated to -1
ieee1394io.cc:456: errno: 22 (Invalid argument)
*** glibc detected *** dvgrab: double free or corruption (top):
0x0000000002196300 ***
======= Backtrace: =========
/lib64/libc.so.6<0x3e80670412>
/lib64/libc.so.6(cfree+0x8c)<0x3e80673b1c>
/usr/lib64/libiec61883.so.0(iec61883_dv_close+0x15)<0x3e89407d45>
/usr/lib64/libiec61883.so.0(iec61883_dv_fb_close+0x11)<0x3e89407d91>
dvgrab<0x412443>
dvgrab<0x4111c5>
dvgrab<0x42d4e1>
dvgrab<0x420769>
/lib64/libc.so.6(__libc_start_main+0xf4)<0x3e8061dab4>
dvgrab(__gxx_personality_v0+0x209)<0x406b19>
...............

Comment 13 sean 2007-08-03 22:32:11 UTC
Now that f8t1 is out, could this issue get some attention. ATM, there's no way
to use f8 for any video editing. We certainly can't import any video, and even
editing is a problem because apps like kino also use libiec61883 for saving video.

In any event, the fw subsystem just doesn't work, a major regression from fc7.

Comment 14 Derek Atkins 2007-08-03 22:54:19 UTC
I think you mean a major regression from FC6!  The firewire doesnt work in
Fedora 7 either.

Comment 15 sean 2007-08-05 16:38:35 UTC
I stand corrected.

Is this an upstream issue? or just Fedora?

Looking at the linux1394 posts, I don't see anything on this. I'm hesitant to
report it upstream since I don't know if it's a Fedora issue ( some odd
interaction with Fedora glibc? ) or a general fw bug.

Comment 16 Jarod Wilson 2007-08-06 14:57:06 UTC
Fedora 7 debuted an entirely new firewire stack (primarily authored by krh), and
not everything that worked in the old stack has been fully enabled in the new
stack. Not sure exactly what the latest upstream status is on the new stack, but
the good folks over at linux1394-devel do know all about it.

Comment 17 Jarod Wilson 2007-10-24 13:21:02 UTC
I believe this is fixed by the latest libraw1394 update pushed to f7 updates
last night. The problem actually stemmed from a call libiec61883 was making over
to libraw1349. Of course, you may well still not be able to get any video, as
this bug appears to have only been triggering on ohci 1.0 firewire controllers,
which unfortunately, aren't fully supported by the new firewire stack yet, but
the double-free should be gone and you should see a warning about a failed ioctl
instead.

At the very least, everything finally works peachy in a system of mine with an
ohci 1.1 firewire controller, and firewire_tester doesn't crash on my ohci
1.0-equipped system hooked to my own firewire-equipped cable box. Adding full
1.0 support is high on the TODO Real Soon Now list.

Comment 18 sean 2007-10-28 18:34:19 UTC
Thanks, but...

Why are we using the experimental firewire stack in f7 and f8, when it breaks 
most of the video hardware out there - ohci 1.0? Even linux1394 advises against 
using it.

Is anybody better off? I would have thought most ( almost all ? ) users of 
firewire are video users. But now we ship progs - dvgrab and kino - that can't 
be used with the most common hardware. Are there some users that actually see a 
clear benefit for the new stack??

Grumble, grumble.

 

Comment 19 Jarod Wilson 2007-10-29 03:02:28 UTC
In the long run, users should be better off with the much cleaner and easier to maintain codebase. At 
the moment, it may not look as rosy to end-users, since video + ohci 1.0 isn't working, but video + 
ohci 1.1 and storage + ohci 1.0 and 1.1 all work quite well now. We unfortunately underestimated the 
volume of ohci 1.0 controllers out there (the ohci 1.1 spec was released in 2000), and the ohci 1.0 spec 
isn't readily available anywhere, while the 1.1 version is. However, as mentioned over in bug 344851, 
we now have a decent understanding of what needs to be done to get video + ohci 1.0 support up and 
running, it just needs a bit of time and effort. As it happens, I've been putting in said time and effort 
the past few days, and I'm cautiously optimistic that with just a bit more, I may have something that 
works for 1.0 by week's end -- I'm still learning the code, reading the spec, digesting, etc., but I think 
I've got a decent grasp on it now...

(btw, Fedora doesn't actually ship kino for assorted codec-related reasons, but I understand your 
point...)

Comment 20 Derek Atkins 2007-10-29 13:04:04 UTC
Sorry, I know you're trying to do the right thing in the long run, but I have to
point out that users don't care about the long run, they care about the
here-and-now.  End users tend to care more about the feature working than on the
quality of the code or the maintainability thereof.  So, no, an end user really
DOESN'T care that the new code is much cleaner and easier to maintain.  They DO
care that they can't record off their cablebox.

Having said that, I would have hoped that Fedora has the insight to back out
this change when it was show to just not work at all.  I'm still running FC6 on
my myth box because I just don't want to risk my firewire not working.  Indeed,
I haven't updated the kernel for the same reason (I'm afraid that Fedora might
have distributed the same broken 2.6.22 kernel with the bad firewire stack).

I'm glad you're putting in the effort to finally fix this, but in the future I
would hope that such major regressions could be avoided, or at best shipped with
a switch so users could revert to the older (less clean but actually working)
drivers.

Comment 21 Jarod Wilson 2007-10-29 16:39:37 UTC
We definitely should have got on top of this issue earlier, and I don't disagree
with anything you said. We did sorta leave a lot of users hanging, and I feel
bad about it, especially since the cable box side of things is sorta kinda near
and dear to my own heart (though I've not actually been recording off mine for
some time, since I've got plenty of HDTV capture cards...). Given that we were
unable to put in the time to fix things earlier, reverting the stack until we
could -- or at least giving users the fallback option -- certainly would have
been more prudent.

Btw, none of the updated Fedora Core 6 kernels should have the new firewire
stack enabled (was accidentally enabled in one build after rebasing to 2.6.22,
iirc, but never got pushed to updates), and the userspace is still built for the
old stack.

Hopefully, we do learn from this, though I don't know that we have any other
major driver rewrites planned in the near future. :)

Comment 22 Jarod Wilson 2007-11-20 14:55:13 UTC
The glibc double-free issue is resolved, primary remaining problems are being
tracked in other bugs. (bug 344851, ohci 1.0 controller support and bug 370931,
a different dvgrab segfault).


Note You need to log in before you can comment on or make changes to this bug.