Bug 847551 - Performance degradation with kernel 3.5 with handbrake / x264 video encode
Summary: Performance degradation with kernel 3.5 with handbrake / x264 video encode
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 17
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-08-12 22:25 UTC by Jeff Sheltren
Modified: 2013-02-01 00:13 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-02-01 00:13:23 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Jeff Sheltren 2012-08-12 22:25:03 UTC
Handbrake x264 encodes are drastically slower when using F17 with kernel 3.5.x than they were with kernel 3.4.x.

I use handbrake to encode HD video with some pretty high x264 settings
(x264 options displayed in the 'advanced' tab in handbrake are:
'mixed-refs=0:weightb=0:me=umh:ref=6:subme=8:merange=24:b-adapt=2:bframes=5').
 I do a 2-pass encode with an average bitrate of 10,000kbps.   My machine
is a quad-core core i5 and running Fedora 17.  Typically, when encoding a
video with these settings I get somewhere from 7-15 fps, typically around
9-10 average fps.

I've noticed significant slow downs with encoding ever since Fedora updated
to the 3.5.x kernel -- it may just be coincidence, but I never had such
slow downs before.  An encode with the same settings now goes at about 1
fps.  I was running Handbrake 0.9.6, and I've since updated to 0.9.8 but I
still see the slow performance when running kernel 3.5.x.  When I boot into 3.4.6-2.fc17.x86_64 the performance is back to normal.

The ghb process takes up ~390%+ of processor (4-core) in both cases (this is expected).

I've had this problem with two different 3.5 kernels:
kernel-3.5.0-2.fc17.x86_64
kernel-3.5.1-1.fc17.x86_64

I understand that Handbrake isn't part of Fedora, but it seems to be a very reproducible method for demonstrating a performance regression.

Comment 1 mus.svz 2012-08-30 16:01:06 UTC
I'm not on Fedora but Arch Linux x64, but since this is the only appropriate bug report I could find, here it goes:

I can confirm this with compressing videos directly with ffmpeg:

ffmpeg -i input.mkv -threads 0 -c:v libx264 -preset medium -crf 20 -c:a copy -c:s copy -map 0 output.mkv

with a DVD-Rip as input source this gives me around 94fps on kernel 3.4.9 but only about 12fps on kernel 3.5.3.

Everything else seems normal, ffmpeg uses around 390% CPU, all cores are running at full frequency (3.3 GHz).

I'm using an Intel Core i5 2500.

This is a pretty serious regression, so I'm quite surprised that I can't find any other reports about this... !?

Comment 2 Jeff Sheltren 2012-09-03 21:18:57 UTC
I'm still seeing this on the latest f17 kernel, kernel-3.5.3-1.fc17.x86_64

An encode actually started out at "normal" speeds (around 10 fps), and then dropped to 1 fps and stayed at that rate for quite a while before I stopped it.

Comment 3 Michael Hanson 2012-09-11 19:22:27 UTC
(In reply to comment #1)
> I'm not on Fedora but Arch Linux x64, but since this is the only appropriate
> bug report I could find, here it goes:
<snip>
> I'm using an Intel Core i5 2500.
> 
> This is a pretty serious regression, so I'm quite surprised that I can't
> find any other reports about this... !?

I see this issue on Arch x86_64 as well.  h.264 encodes with HandBrake slow to 1-2 fps on the 2nd pass when using 3.5 series kernel.  When I roll back to a 3.4 series kernel, I get speeds in the 20-25 fps range.  This issue only seems to affect my i5-3450S (Ivy Bridge).  I get the same encode speeds regardless of the kernel used on my i3-2105 (Sandy Bridge), which is about 10-15 fps on the 2nd pass.

Comment 4 Jeff Sheltren 2012-09-11 19:30:15 UTC
I didn't mention this in my initial report, but I'm running an Intel Core i5 2500 (same as comment #1)

Comment 5 John Stebbins 2012-10-10 20:59:35 UTC
I wonder if it could be the same problem (or similar) as this
http://lists.libav.org/pipermail/libav-devel/2012-July/031698.html

This got fixed in this commit
http://git.libav.org/?p=libav.git;a=commit;h=30b45d9

We first started seeing this problem when some code was converted from inline asm to yasm.  But maybe it existed before this as well and was just not being provoked due to different timing.

Comment 6 mus.svz 2012-10-11 12:15:14 UTC
I tried libav, the latest stable 0.8.3 and compiled from git - both show the same problem... :(

Comment 7 John Stebbins 2012-10-11 14:49:40 UTC
libav stable 0.8.3 is from before the inline asm code change that provoked the problem that we saw.  And git head has the fix the restored the performance for us.  

So it's not the exact same problem, but I still wonder if it is similar in some way.  Perhaps something in the kernel is doing the same thing that this error in libav did.

Comment 8 John Stebbins 2012-11-09 08:10:13 UTC
I have a theory, but not really enough information to back it up.  So maybe someone can shoot this down to rule it out.

I have upgraded 2 physical machines and 1 VM to latest ubuntu 12.10 or Fedora 17 and I have not seen this problem on any of them.  So there must be some other common factor other than the kernel version. I'm wondering if this is a driver issue.  Specifically I'm thinking intel graphics driver.

So is there anyone that has this problem and is running with non-intel graphics?

And is there anyone that could pop an alternate graphics card in a machine that exhibits this problem (and change drivers) and confirm or rule out this theory?

Comment 9 Jeff Sheltren 2012-11-09 15:43:42 UTC
The machine I'm experiencing this on is running with an nvidia graphics card running F17.  Which kernel(s) were you using that didn't exhibit the behavior?

Comment 10 John Stebbins 2012-11-09 19:50:37 UTC
Fedora 17 Linux version 3.6.3-1.fc17.x86_64
Ubuntu 12.10 Linux version 3.5.0-17-generic

Comment 11 Jeff Sheltren 2012-11-16 17:04:26 UTC
I'm still able to reproduce on 3.6.6-1.fc17.x86_64. Interestingly (to me), the encode sometimes takes a while before it slows down -- my test was going for about 30 minutes before it dropped down to encode at ~1-2 fps (from about 11).

@John, what is your test case that you weren't able to reproduce on the newer 3.6 kernel?

Comment 12 John Stebbins 2012-11-20 22:24:32 UTC
(In reply to comment #11)
> I'm still able to reproduce on 3.6.6-1.fc17.x86_64. Interestingly (to me),
> the encode sometimes takes a while before it slows down -- my test was going
> for about 30 minutes before it dropped down to encode at ~1-2 fps (from
> about 11).
> 
> @John, what is your test case that you weren't able to reproduce on the
> newer 3.6 kernel?

Kernel is 3.6.6-1.fc17.x86_64

lsmod 
Module                  Size  Used by
udf                    89486  0 
crc_itu_t              12579  1 udf
des_generic            21343  0 
md4                    12595  0 
nls_utf8               12557  1 
cifs                  310441  0 
dns_resolver           13096  1 cifs
fscache                55155  1 cifs
fuse                   77993  3 
lockd                  93476  0 
sunrpc                255586  1 lockd
rfcomm                 68884  4 
bnep                   19584  2 
vboxpci                23157  0 
vboxnetadp             25670  0 
vboxnetflt             23442  0 
vboxdrv               296272  3 vboxnetadp,vboxnetflt,vboxpci
snd_hda_codec_hdmi     36232  4 
iTCO_wdt               13480  0 
iTCO_vendor_support    13419  1 iTCO_wdt
uvcvideo               80922  0 
videobuf2_vmalloc      12967  1 uvcvideo
videobuf2_memops       13354  1 videobuf2_vmalloc
videobuf2_core         34237  1 uvcvideo
videodev              119976  2 uvcvideo,videobuf2_core
media                  20444  2 uvcvideo,videodev
coretemp               13393  0 
kvm_intel             132762  0 
btusb                  23870  0 
bluetooth             317602  24 bnep,btusb,rfcomm
kvm                   419458  1 kvm_intel
arc4                   12615  2 
microcode              23446  0 
snd_hda_codec_realtek    78081  1 
snd_hda_intel          37618  2 
snd_hda_codec         126889  3 snd_hda_codec_realtek,snd_hda_codec_hdmi,snd_hda_intel
ath9k                 145017  0 
serio_raw              13375  0 
snd_hwdep              17650  1 snd_hda_codec
ath9k_common           13602  1 ath9k
snd_seq                64844  0 
ath9k_hw              381816  2 ath9k_common,ath9k
snd_seq_device         14136  1 snd_seq
ath                    23103  3 ath9k_common,ath9k,ath9k_hw
snd_pcm                97459  3 snd_hda_codec_hdmi,snd_hda_codec,snd_hda_intel
mac80211              550587  1 ath9k
i7core_edac            24220  0 
edac_core              56455  1 i7core_edac
lpc_ich                17061  0 
cfg80211              196846  3 ath,ath9k,mac80211
mfd_core               13145  1 lpc_ich
r8169                  67600  0 
mei                    75376  0 
mii                    13527  1 r8169
snd_page_alloc         18101  2 snd_pcm,snd_hda_intel
snd_timer              28690  2 snd_pcm,snd_seq
snd                    79242  13 snd_hda_codec_realtek,snd_hwdep,snd_timer,snd_hda_codec_hdmi,snd_pcm,snd_seq,snd_hda_codec,snd_hda_intel,snd_seq_device
soundcore              14491  1 snd
asus_laptop            24722  0 
sparse_keymap          13526  1 asus_laptop
rfkill                 21384  6 cfg80211,asus_laptop,bluetooth
input_polldev          13646  1 asus_laptop
uinput                 17673  0 
crc32c_intel           12901  0 
nouveau               891090  4 
mxm_wmi                12823  1 nouveau
wmi                    18697  2 mxm_wmi,nouveau
video                  18936  1 nouveau
i2c_algo_bit           13257  1 nouveau
drm_kms_helper         44701  1 nouveau
ttm                    79760  1 nouveau
drm                   255010  6 ttm,drm_kms_helper,nouveau
i2c_core               38314  5 drm,drm_kms_helper,i2c_algo_bit,nouveau,videodev

As test cases, I've encoded both a dvd and a bd iso.  Source iso is on local hdd and destination file is also on local hdd.  I let the encodes complete, so ~20 minutes in the dvd case and ~2 hrs in the bd case.  I don't see any significant decrease in encoding speed during that time.  I'm using the "Normal" preset.  dvd encodes at around 150fps and bd encodes at around 20fps.

I've run these same tests on a ubuntu 12.10 installation with the similar results (much faster machine, 70fps bd encode).
Kernel 3.5.0-17-generic.

Comment 13 mus.svz 2012-12-04 14:16:19 UTC
anyone else running a RAID? I'm running an encrypted RAID 5 with 4 x 2TB and it seems like the slowdown only happens when the RAID is involved. If either the input or the output file (or both) is on the RAID, I get the slowdown. If however the output and input file are on both my SSD, everything seems to work fine!?

The speed on the RAID seems to be fine, "hdparm -t" on the LUKS mapper device gives me > 200MB/s. On the RAID device itself even > 300MB/s.

Comment 14 Jeff Sheltren 2012-12-04 15:27:36 UTC
Interesting.  I am running software RAID and it is typically the home of both my source and destination files when encoding.  I'll try something isolated to a non-RAID device (will be SSD as well in my case).

Comment 15 Michael Hanson 2012-12-05 01:10:24 UTC
I second that this is an interesting development.  On my i5-3450S (Ivy Bridge) machine both the source and destination files are on a software RAID 5 (mdadm) consisting of 4 x 3TB HDD's. This machine sees the slowdown.

On my i3-2105 (Sandy Bridge) system the source files are on a NFS share and the destination file also resides on an NFS share.  The NFS shares are on the i5-3450S (Ivy Bridge) system and the i3-2105 (Sandy Bridge) does not see the slowdown.

I'll try to isolate and see what happens as well.

Comment 16 Jeff Sheltren 2012-12-05 01:32:00 UTC
I was able to narrow this down in a test so that I also only saw the slow down when I was using the RAID device.

Running kernel-3.6.8-2.fc17.x86_64 on the same system, I copied the source video file to two location:
1) SSD drive
2) RAID partition (software RAID 5)

I was able to encode the file with the source/destination on the SSD without any slowdowns.  Then, encoding the file on the RAID partition, it eventually slowed way down and stayed at that slow speed as described before.  I was encoding with both the source and destination on the RAID device, so I'm not sure if it's one or the other or both that's at fault.  I could test this further to try to narrow it down more if needed.

To clarify a bit on my initial post, I don't always see the slow down in speed right away, it sometimes takes 30 minutes or an hour or more of encoding before it slows down, but once it slows down, it never increases above ~1-2fps encoding speed.

Comment 17 Michael Hanson 2012-12-05 17:54:10 UTC
In my testing on my i3-2105 (Sandy Bridge):

Linux desktop 3.6.8-1-ARCH #1 SMP PREEMPT Mon Nov 26 22:10:40 CET 2012 x86_64 GNU/Linux

Source file is a 30 minute MPEG2 recording (animated) made by MythTV 0.25-fixes. HandBrakeCLI command line:

HandBrakeCLI -i <mpeg2 file> -o speed_test.mkv -f mkv -m -e x264 --x264-tune animation -x b-adapt=2:rc-lookahead=50 -b 5000 -2 -T -w 1920 -X 1920 -Y 1080 -l 1080 -r 29.97 --cfr --crop=0:0:0:0 -d slow --strict-anamorphic -a 1 -E copy -s 1

1) Both source and destination on NFS share: no slowdown

2) Both source and destination on localhost software (mdadm) RAID 1 (2 x 1.5TB HDD): no slowdown

3) Both source and destination on localhost SSD (no RAID): no slowdown.

My i5-3450S (Ivy Bridge) system I have been unable to test on because I use an older "good" kernel (3.4.12) and can't update to a recent kernel because it's in production.

Does this slowdown happen only on i5 or i5,i7 CPUs?  Does anyone have the slowdown on i3 or AMD CPUs?  Here are the flags on my CPU's, as reported by /proc/cpuinfo:

Intel(R) Core(TM) i3-2105 CPU @ 3.10GHz:
fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36
clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm
constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc
aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16
xtpr pdcm pcid sse4_1 sse4_2 popcnt tsc_deadline_timer xsave avx lahf_lm
arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid

Intel(R) Core(TM) i5-3450S CPU @ 2.80GHz:
fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36
clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm
constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc
aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16
xtpr pdcm pcid sse4_1 sse4_2 popcnt tsc_deadline_timer aes xsave avx
f16c rdrand lahf_lm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi
flexpriority ept vpid fsgsbase smep erms

The i5 has all the same as i3 with the addition of:  aes f16c rdrand ida fsgsbase smep erms

What these mean can be found here: http://www.acsu.buffalo.edu/~charngda/x86cpufeatures_linux.html

*IF* I'm on the right track could it be Intel's Turbo Boost technology somehow mucking things up?  My i3 does not have it, but my i5 does.

Comment 18 Abraham Mara 2012-12-17 09:50:55 UTC
Sorry ahead of time if this doesn't meet the requirements for a good post, but I wanted to add some additional info:

I had the same machine on Ubuntu 12.02 and saw no issues, when I went to 12.10 I have the slowdown. It's slow right out of the gate and stays there.

I've been running the same soft-RAID (mdadm) *5* the whole time.

The two bits of info I see as relevant - I'm running an i3 (2100t I believe - the low power one), so I don't think it's specific to i5/7, and I'm running an RAID5.

The other guy posting he didn't see a slowdown was on a two disc raid making it unlikely to be a RAID5.

Happy to post any specs anyone wants to see, just tell me how to make my computer spit them out. About the only bit of interesting softwares I'm running:
HomerunHD tuner
Classic Menu Indicator
Pithos
Subsonic

All of which I was running on the old system as well.
              -Abe.

Comment 19 Abraham Mara 2012-12-18 19:08:00 UTC
Quick bit of additional info:
I ran compression all four ways:
RAID->RAID, SSD->RAID, RAID->SSD, SSD->SSD
All of them were slow.

Turned off power savings in the bios, turned off speed step (which grayed out Turbo Boost), turned off GPU Render Standby... then my computer stopped booting properly (outside the scope of this bug report but pretty frustrating personally)... I'd tried it in the past though, so the results remain valid. Also tried something called "VMM" which sounds like a virtualization thing.



The point is, none of that made any significant difference in the compression speeds. 1st pass, 1 CPU is at ~90%, the others at ~30% each. 2nd pass, they are all right around 100% - John would know better but I suspect the first pass doesn't lend itself to threading well.

Basically, it all looks just how it did when it was running full speed, but pretty consistently a factor of 5-7 slower (7-9 fps verses 25).

I noticed the change immediately on upgrade from 12.04 to 12.10. I remember a similar slowdown years back which went away "on its own" likely due to a system update.

Comment 20 mus.svz 2013-01-10 10:48:21 UTC
I just upgraded to kernel 3.8-rc2 and it has been running for an hour without slowdown, so I'm pretty sure it has been fixed. Can anyone confirm this?

Comment 21 mus.svz 2013-01-19 23:48:08 UTC
btw, kernel 3.7.3 seems to work fine as well. I thought I already tested 3.7, maybe it was fixed with one of the 3.7.x releases or I just remember it wrong... ;)

Comment 22 Michael Hanson 2013-01-31 23:53:46 UTC
I just installed Arch Linux's latest kernel on my problem i5 machine:

$ uname -a
Linux mbe 3.7.5-1-ARCH #1 SMP PREEMPT Mon Jan 28 10:03:32 CET 2013 x86_64 GNU/Linux

Intel(R) Core(TM) i5-3450S CPU @ 2.80GHz

and the problem does not exist anymore for me as well.

Mike

Comment 23 Justin M. Forbes 2013-02-01 00:13:23 UTC
If 3.7 is fixing it, I am going to close this out. please reopen if you are seeing the slowdown with 3.7.5 or newer.


Note You need to log in before you can comment on or make changes to this bug.