Bug 1135801 - Enabling Adobe Flash plugin crashes PCI bus
Summary: Enabling Adobe Flash plugin crashes PCI bus
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 21
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Neil Horman
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-08-31 19:52 UTC by AWF
Modified: 2023-09-14 02:46 UTC (History)
9 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2015-02-24 16:15:21 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
dmesg from fedora 19 (69.44 KB, text/plain)
2014-08-31 19:52 UTC, AWF
no flags Details
lspci from fedora 19 (40.85 KB, text/plain)
2014-08-31 19:53 UTC, AWF
no flags Details
rawhide lspci -vv after pci failure (58.06 KB, text/plain)
2014-09-03 13:37 UTC, AWF
no flags Details
rawhide dmesg.txt after pci failure (79.98 KB, text/plain)
2014-09-03 13:38 UTC, AWF
no flags Details

Description AWF 2014-08-31 19:52:12 UTC
Created attachment 933173 [details]
dmesg from fedora 19

Description of problem:
Enabling Adobe Flash plugin crashes PCI bus within one minute to one day. Onboard ethernet generating massive tx and rx errors, indicating phy problems (Qualcomm Atheros AR8151 atl1c module). Also certain PCI bus(es) and controllers report frozen or in wrong state.

Version-Release number of selected component (if applicable):
Long term problem, across many Firefox versions and across many kernels. This bug has been noted over time, apparently manifesting in several different ways on several different bugzilla reports (atl1c), most closed as not reproducible, works-for-me, or timed out due to next release. But not tied to flash plugin.

Currently failing on kernel 3.14.17-100 x86_64 (up to date Fedora 19 as of 8/2014) and Fedora 20 Live CD with older 3.11 kernel.

How reproducible:
Always fails.

Steps to Reproduce:
1. Install Adobe flash from adobe repo
2. Run firefox, browse web (especially view youtube or other videos)
3. Within 1 minute to 1 day network fails, devices frozen on PCI bus(es).

Actual results:
Firefox freezes. Network hangs with massively incrementing tx and rx errors, sata devices (dvd in my case) drop off some PCI bus(es) as frozen or wrong state.

Expected results:
To watch the kittehs.

Additional info:

System runs a busy desktop workload without issues when flash is uninstalled or disabled via Firefox. Many different versions, now at 31.0.x from repo. Noted that other Fedora and kernel bug reports and web sightings showed similar or related symptoms (atl1c rx/tx errors, for example).

System runs heavy graphics games via steam and 14.6 Catalyst beta without problems. Runs gallium drivers well (but crashes on Left 4 Dead 2).

Significant disk i/o with no problems. No other real issues with system for a very long time.

System is up to date x86_64 Fedora 19 w/3.14.17-100, but crashes similarly on Fedora 20 Live CD (older 3.11 kernel), and many past stable repo kernels, including debug.

Reloading atl1c module restores network function, but PCI buses require reboot. System doesn't totally crash as main PCI bus with primary HD is spared, for whatever reason. But very little in dmesg, logs or debug.

Initially thought hardware error. AMD 870 North, and SB850 south on ASRock 870 Extreme 3 R2.0, latest 1.80B bios. Removed extraneous hardware, cards. Turned off devices, blacklisted firewire. Swapped cpu 4 for 6  core, swapped video ati HD5500 to HD7750, swapped Catalyst and Gallium video drivers. Memtest x86 runs for days no errors. Disabled onboard ethernet and installed different PCI.


Tried to run gdb on external firefox plugin process, no good as I hit backtace.py bug.

Comment 1 AWF 2014-08-31 19:53:39 UTC
Created attachment 933174 [details]
lspci from fedora 19

Comment 2 Josh Boyer 2014-09-02 14:45:09 UTC
We're unlikely to get to this bug before F19 goes EOL.  If you can recreate your issues on Fedora 20 with the 3.15.10 or 3.16.y kernels without using flgrx, please let us know.

Comment 3 AWF 2014-09-03 13:26:10 UTC
Also occurs on Fedora 21 Live x86_64 from koji, September 2nd, 2014

Kernel 3.16.1-301.fc21.x86_64


Steps:

1. Boot Live CD, installed Flash plugin from Adobe (latest xxx.400)

2. Ran Firefox from Gnome on Web video (Comedy Central).

3. Hung as described, taking down PCI bus.

Will file new bug report (dmesg.tx, lspci -vv) unless you say otherwise.

Koji task 7509503, livecd (rawhide, Fedora-Live-Workstation-x86_64-rawhide, fedora-live-workstation-72304db.ks)

Comment 4 AWF 2014-09-03 13:37:42 UTC
Created attachment 934094 [details]
rawhide lspci -vv after pci failure

lspci -vv after pci bus failure

Comment 5 AWF 2014-09-03 13:38:38 UTC
Created attachment 934096 [details]
rawhide dmesg.txt after pci failure

Comment 6 AWF 2014-09-03 13:40:14 UTC
adobe flash 11.1.202.400 from adobe.com, rpm

Comment 7 AWF 2014-09-08 14:43:20 UTC
Also fails on Fedora 21 TC6 Live from DVD in exactly the same way.
1. Boot Fedora Live DVD.
2. Install Adobe Flash from Adom via rpm.
3. Watch videos (Comedy Central, but others like youtube crash it as well).

Eventually freezes, ethernet errors, sata devices frozen.

Comment 8 AWF 2014-09-10 09:17:36 UTC
Also crashes on Linux Mint 17 Live, so either a hardware problem (possible) or a common kernel issue (as likely).

Comment 9 AWF 2014-09-19 11:37:06 UTC
Solved. It appears to have been hardware issue, despite the fact it would only occur during flash video operation, and the great care I took to eliminate hardware.

Desperate, I removed the motherboard, reflowed the solder joints around both the southbridge and especially the atheros ethernet, and shook it upside down in case of any stray solder. I made sure that each standoff was securely grounded and each screw clean and firmly tightened.

No errors on flash nor any other operation after 4 days of various use.

I apologize for any wasted time.

Comment 10 AWF 2014-09-27 20:48:23 UTC
After one week, experienced issue again after watching an hour of youtube.

It appears directly related to bug https://bugzilla.redhat.com/show_bug.cgi?id=809706

Comment 11 AWF 2014-11-28 02:33:14 UTC
Adobe flash player update to 11.2.202.418 apparently has FIXED this problem on F19 on roughly 11-12-2014 or so, and then 11.2.202.424 soon after, after what, maybe over a year of these problems?

Why would a kernel allow this behavior from a Firefox plugin?

http://helpx.adobe.com/security/products/flash-player/apsb14-24.html

Will verify this fix after a few more weeks on F19, and will verify fix with the ship version of F21 in the next few weeks.

Thank you for your efforts, everyone involved.

Comment 12 AWF 2014-12-10 06:17:11 UTC
One failure after 12 days of medium desktop workload on Fedora 19. System failed while under heavy network load dowloading Fedora 21, concurrently streaming video from comedy central. But system is significantly more reliable since the aforementioned flash update.

Comment 13 AWF 2014-12-26 23:28:25 UTC
Fedora 21, fresh install from DVD. Maintained original /home from F19.

After 5 days of great and flawless operation, while viewing youtube video the system dropped PCI bus devices, including ethernet, as before.

Ethernet stats had massive RX/TX errors, hung transmitting.

<...>
Dec 26 17:05:20 localhost.localdomain /etc/gdm/Xsession[1740]: Window manager warning: Log level 16: STACK_OP_RAISE_ABOVE: sibling window 0x3000003 not in stack
Dec 26 17:20:13 localhost.localdomain /etc/gdm/Xsession[1740]: ** (nemo:2113): WARNING **: Can not determine workarea, guessing at layout
Dec 26 17:21:02 localhost.localdomain kernel: sata_sil24 0000:06:00.0: IRQ status == 0xffffffff, PCI fault or device removal?
Dec 26 17:21:02 localhost.localdomain kernel: sata_sil24 0000:06:00.0: IRQ status == 0xffffffff, PCI fault or device removal?
<...>

I can troubleshoot further. Thank You.

Comment 14 AWF 2014-12-27 19:19:56 UTC
enp3s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.10.100.101  netmask 255.255.255.0  broadcast 10.10.100.255
        inet6 fe80::225:22ff:fed9:3e0a  prefixlen 64  scopeid 0x20<link>
        ether 00:25:22:d9:3e:0a  txqueuelen 1000  (Ethernet)
        RX packets 36180805495544  bytes 4523977006945 (4.1 TiB)
        RX errors 31658203931445  dropped 4522600561635  overruns 4522600561635  frame 13567801684905
        TX packets 22613003238638  bytes 4522648783059 (4.1 TiB)
        TX errors 18090402246540  dropped 0 overruns 4522600561635  carrier 9045201123271  collisions 18090402246540

Comment 15 Neil Horman 2014-12-28 14:45:13 UTC
I'm not sure exactly what you want done here.  You're using a proprietary plugin, which apparently is triggering a pci bus error, that leads to the previously discovered issue involving atheros rx counter increments, among other things (your dmesg log above indicates your systems hard drive also encounters several i/o errors a times).  you should probably start with running the debug kernel so we can catch any software exceptions that lead to pci device resets/shutdowns.  If that doesn't catch anything, I don't see how this can be a problem in anything other than hardware.

Comment 16 AWF 2014-12-28 18:04:25 UTC
Thank you for your response; let me start with the system is flawless without the plugin, never an error even with heavy graphics (Steam games, video, Catalyst and Gallium), massive disk I/O (dd of TBs), and cache/memory stress (compression, mprime and memtest). If such a system is failing with the installation of a browser plugin, it is worrysome.


1) (Adobe) Flash is a global web browsing, defacto, standard. At least for now. Recent security fixes in their last Flash update made it much more stable under F19; upgrading to F21 (especially mesa on 12/26/2014 or so) reverted to flaky behavior. Is noone interested in (likely) the number one plugin not functioning properly in their system, for any reason, even if a crazy user problem?

2) Why is a browser plugin, supposedly sandboxed, able to trigger a PCI bus error which cascades to a full system failure? Why should it be able to do so? And is this not a possible DDos vector - a full system failure from web?

3) I have ran with debug, and self-compiled, vanilla, non redhat kernel(s) (3.18). Same problem. However no warning or messages in logs, ever, just the aftermath.

4) I do not see this as necessarily my hardware problem, as other systems report similar issues (e.g. the TX/RX issues). I rather see my enviroment as a corner case for some reason, but one that exists in the real world.

I'll run debug kernel again. Of course, I'll just throw the motherboard away; a huge effort by developers is not warranted for just me. My motivation in keeping this alive is that when I went to explore this issue others vaguely experienced the symptoms, nowhere was it explicitly spelled out that this motherboard or Fedora may have these issues. No responses, just vague comments and can't reproduce. Thank You again.

Comment 17 Neil Horman 2014-12-29 13:10:53 UTC
1) "Flash is a global web browsing, defacto, standard"
Nope. After the 11.2 release, there are no more linux releases on Adobes road map.  Chromium has already implemented its own flash support, and other browsers are following suit.  Adobe doesn't care about linux, and linux is abandoning adobe for flash needs

2) "Why is a browser plugin, supposedly sandboxed..."
Because browser sandboxing can be circumvented.  Not saying thats acceptible, just the state of affairs.  As for exactly what the flash plugin is doing, I have no idea, its proprietary, so I don't feel a large need to support it.

3) "Ran with debug, self compiled, vanilla, non rh-kernels...same problem"
And when a system crashes in a way that appears completely agnostic to software that commonly is seen as affecting the errant behavior, what does one consider next?  Typically hardware.  not saying thats the case here, but its usually the next suspect.

4) "Other systems report simmilar issues..."
Other systems report similar issues without the use of flash that you point to.  I would consider the atheros issue to be separate.  It really seems like a read clear register issue in that specific bit of hardware.  IIRC I had a patch in the other bug you reference to try to catch that, If you'd like to run it.

Im not saying this is impossible to fix, just that, given what you've reported here (lost pci devices without any sofware alert of bus disconnects), I can't really hypothesize any other solution.  If the debug kernel isn't giving you any indicator of whats going on, I'm not sure what else to try, as we have no visibility into the problem.

One thing that might be worth investigating would be running flash in a virtual guest on the system in question (or running the browser in a container with its own memory cgroup).  Seeing if that isolates the problem might be an interesting data point that would provide us with a next step.

Comment 18 Fedora Kernel Team 2015-02-24 16:15:21 UTC
*********** MASS BUG UPDATE **************
This bug is being closed with INSUFFICIENT_DATA as there has not been a response in over 3 weeks. If you are still experiencing this issue, please reopen and attach the relevant data from the latest kernel you are running and any data that might have been requested previously.

Comment 19 Red Hat Bugzilla 2023-09-14 02:46:43 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days


Note You need to log in before you can comment on or make changes to this bug.