Bug 494047

Summary: Kernel GEM tiling Oops, after a couple hours the system starts swapping and X locks up.
Product: [Fedora] Fedora Reporter: Nate <drag>
Component: kernelAssignee: Jonathan Blandford <jrb>
Status: CLOSED WONTFIX QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: low    
Version: 11CC: benl, bobpoljakov, ddumas, kernel-maint
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-06-28 11:41:08 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
output of /proc/meminfo minutes before lock up.
none
Output of free -m and /proc/meminfo before and after killing X none

Description Nate 2009-04-03 20:49:26 UTC
Created attachment 338122 [details]
output of /proc/meminfo minutes before lock up.

Description of problem:

I am running Compiz on my Inspiron 1420n with X3100 intel graphics. Nothing special. 2GB of RAM.

I have no xorg.conf or anything like that. This is a fairly new install of Fedora 11 beta from your guy's iso images.

What happens is this:

I am running this with a Gnome desktop. Uses about 512MB idling. Then on top of that I am running a Debian VM in KVM (via virt-manager) and all that with it allocated 512MB of RAM. Typically with this setup I should have somewere between 300-700 megs just there for cache and zero swap activity. 

What happens is that I start getting slow responses from the system and nonsense coming from 'free' command. For example I just rebooted after doing a forced shutdown of my system after X locked up. Right now I have this from the 'free' command:

$ free -m
             total       used       free     shared    buffers     cached
Mem:          1982        874       1107          0        107        392
-/+ buffers/cache:        374       1607
Swap:         3903          0       3903

This is just a newly booted up Gnome + Compiz system. After starting up the VM I've been working on and running X for a while I start seeing odd values like:

-/+ buffers/cache:   481   1501
Swap:     3903       601    3302

and stuff like that. I had a swap used up that much just moments before X locked up. At most, with Fedora 10, in my worst/busiest days I may only end up with 32 or so of megs touched in swap. I very very rarely hit it, and yet now with Fedora 11 I hit it every single time. hit it hard.

I managed to grab a copy of /proc/meminfo a few minutes before I lost control of the machine. I attached it to this bug report.


Every boot up I get this message in my dmesg output. Every single time. sometimes multiple times:


[drm:i915_gem_object_unbind] *ERROR* Attempting to unbind pinned buffer
------------[ cut here ]------------
WARNING: at drivers/gpu/drm/i915/i915_gem_tiling.c:291 i915_gem_set_tiling+0x263/0x2c1 [i915]() (Not tainted)
Hardware name: Inspiron 1420                   
failed to unbind object for tiling switchModules linked in: fuse ipt_MASQUERADE iptable_nat nf_nat sco bridge stp llc bnep l2cap blue
tooth sunrpc ip6t_REJECT nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 cpufreq_ondemand acpi_cpufreq freq_table dm_multipath kvm_
intel kvm uinput arc4 ecb iwl3945 snd_hda_codec_idt snd_usb_audio snd_usb_lib snd_hda_intel snd_rawmidi mac80211 snd_hda_codec dell_l
aptop firewire_ohci sdhci_pci sdhci rfkill mmc_core snd_seq_device lib80211 firewire_core ricoh_mmc dcdbas snd_pcm crc_itu_t pl2303 p
cspkr i2c_i801 asix joydev tg3 iTCO_wdt usbnet iTCO_vendor_support snd_timer snd_hwdep snd usbserial cfg80211 mii usblp wmi soundcore
 snd_page_alloc i915 drm i2c_algo_bit i2c_core video output [last unloaded: microcode]
Pid: 2977, comm: Xorg Not tainted 2.6.29.1-46.fc11.x86_64 #1
Call Trace:
 [<ffffffff8104b013>] warn_slowpath+0xbc/0xf0
 [<ffffffff813a7d4b>] ? printk+0x41/0x46
 [<ffffffffa0061992>] i915_gem_set_tiling+0x263/0x2c1 [i915]
 [<ffffffffa006172f>] ? i915_gem_set_tiling+0x0/0x2c1 [i915]
 [<ffffffffa0023b54>] drm_ioctl+0x1e4/0x276 [drm]
 [<ffffffff810e402c>] vfs_ioctl+0x6f/0x87
 [<ffffffff810e44c7>] do_vfs_ioctl+0x462/0x4a3
 [<ffffffff8104f06a>] ? do_setitimer+0x19a/0x330
 [<ffffffff810e455e>] sys_ioctl+0x56/0x79
 [<ffffffff810113ba>] system_call_fastpath+0x16/0x1b
---[ end trace 2ad3b5b722e0c429 ]---



I get no errors in Xorg.*.log* and I have no xorg.conf

I was able to recover once using ssh from a seperate machine and kill -9 Xorg... immediately after that the machine stopped swaping (swap went from 500+ to 12) and I was able to log into it again, no problem and continue using it. I only tried that once, I am on dhcp and it's hard to find my address so I usually just use the power button to force a shutdown. 

This is about the 3rd time I've been trying to write this message before X locks up. 


Thanks otherwise. This version of Fedora is very fast and I have no performance penatly that I can tell from running composited desktop anymore, besides this of course. 

I am going to disable tiling in a new xorg.conf and see if I can make the problem go away. I just wanted to make sure to get this bug report out before I tried anything. 

It swaps so hard that I can just sit there with 'watch free' running and see the swap getting used up, bit by bit, the system not doing anything otherwise. It's very bad. 



Version-Release number of selected component (if applicable):

2.6.29.1-46.fc11.x86_64

Comment 1 Nate 2009-04-03 20:51:21 UTC
Oh, sorry. I forgot to mention that I am running Compiz this entire time. UXA, KMS, DRI2 and all that working as far as I can tell. I don't know this would happen without running Compiz, I am guessing that it would probably still happen, but it would take days to lock up instead of hours. just my guess.

Comment 2 Nate 2009-04-04 00:30:14 UTC
Hrm. Running with:
Section "Device"
	Identifier	"Builtin Default intel Device 0"
	Driver	"intel"
	Option "Tiling" "0"
EndSection

Section "Screen"
	Identifier "Builtin Default intel Screen 0"
EndSection

Section "ServerLayout"
	Identifier	"Builtin Default Layout"
	Screen	"Builtin Default intel Screen 0"
EndSection

As my xorg.conf file seems to solve the problem with X locking up. It was very stable.

However I was still swapping like crazy. By the time I had to shut down my laptop to go home I had 700+ MB written out to swap. But when I tried to figure out what the problem was I couldn't find it. 

I tried stuff like this:
POO=0
ps h -eo %mem|while read i ; do POO=`echo $i+$POO|bc`; echo $POO ; done

or 
ps h -eo rss|while read i ; do POO=`echo $i+$POO|bc`; echo $POO ; done

then I was only showing about 55% memory usage. 

So I have no clue what was going on. 

But disabling tiling in xorg.conf did stop the oops

Comment 3 Nate 2009-04-09 00:32:27 UTC
After doing a 'yum clean all' then a 'yum upgrade' then rebooting I am no longer experiencing oops, even with my xorg.conf removed and using all the defaults for X Windows which leaves tiling enabled.  

However I am still hitting the swap.

After a few hours of using Gnome+Compiz for little more then a web browser, a gnome-terminal, and a emacs session I am using up a full 2GB of RAM and nearly 300MB of swap.  No big compiles or VMs running or anything. Just a pretty much default desktop install of Gnome with compiz running.

However free -m still informs me that I have 1400MB free minus buffers/cache.

Once I killed X by running 'telinit 3' the amount of swap being used dropped down to 77MB and free -m says that I am only using 168MB of ram, which still seems excessive to me for running without any desktop.

I attached output from free and /proc/meminfo from before and after killing X.  I have not ran into issues with X locking up yet. But last time I would have to be using 500-700MB of swap before X locked up.

I am now on kernel 2.6.29.1-54.fc11.x86_64

Thankyou.

Comment 4 Nate 2009-04-09 00:34:46 UTC
Created attachment 338828 [details]
Output of free -m and /proc/meminfo before and after killing X

Comment 5 Nate 2009-04-14 18:44:42 UTC
Just a update.

The issues with memory leaking are still continuing with the latest updates.

~$ uname -r
2.6.29.1-68.fc11.x86_64


~$ rpm -qf /usr/lib64/xorg/modules/drivers/intel_drv.so
xorg-x11-drv-intel-2.6.99.902-2.fc11.x86_64

~$ rpm -qf /usr/lib64/dri/i965_dri.so 
mesa-dri-drivers-7.5-0.9.fc11.x86_64

A couple times a day, while using compiz, the system will gradually use up more and more ram and start swapping, eventually locking up X. Not using Compiz will slow it down consirably, but it will still cause problems.

No more oops, like I mentioned above they were gone for quite a while, but now stuff like this is showing up in dmesg:

Xorg:4996 freeing invalid memtype e18fb000-e18fc000
Xorg:4996 freeing invalid memtype e18fc000-e18fd000
Xorg:4996 freeing invalid memtype e18fd000-e18fe000
Xorg:4996 freeing invalid memtype e18fe000-e18ff000
Xorg:4996 freeing invalid memtype e18ff000-e1900000
Xorg:4996 freeing invalid memtype e1900000-e1901000
Xorg:4996 freeing invalid memtype e1901000-e1902000
Xorg:4996 freeing invalid memtype e1902000-e1903000
Xorg:4996 freeing invalid memtype e1903000-e1904000
Xorg:4996 freeing invalid memtype e1904000-e1905000
Xorg:4996 freeing invalid memtype e1905000-e1906000
Xorg:4996 freeing invalid memtype e1906000-e1907000
Xorg:4996 freeing invalid memtype e1907000-e1908000
Xorg:4996 freeing invalid memtype e1908000-e1909000
Xorg:4996 freeing invalid memtype e1909000-e190a000


Then when I tried "Yo Franky!" blender video game I found this:
blenderplayer.b:22291 freeing invalid memtype e9523000-e9524000
blenderplayer.b:22291 freeing invalid memtype e9524000-e9525000
blenderplayer.b:22291 freeing invalid memtype e9525000-e9526000
blenderplayer.b:22291 freeing invalid memtype e9526000-e9527000
blenderplayer.b:22291 freeing invalid memtype e9527000-e9528000
blenderplayer.b:22291 freeing invalid memtype e9528000-e9529000
blenderplayer.b:22291 freeing invalid memtype e9529000-e952a000
blenderplayer.b:22291 freeing invalid memtype e952a000-e952b000
blenderplayer.b:22291 freeing invalid memtype e952b000-e952c000
blenderplayer.b:22291 freeing invalid memtype e952c000-e952d000
blenderplayer.b:22291 freeing invalid memtype e952d000-e952e000
blenderplayer.b:22291 freeing invalid memtype e952e000-e952f000
blenderplayer.b:22291 freeing invalid memtype e952f000-e9530000
blenderplayer.b:22291 freeing invalid memtype e9530000-e9531000

It repeats quite a bit:

~$ dmesg |grep Xorg|wc
   1958    9790  101816

~$ dmesg |grep blender|wc
    326    1630   20864


I don't mind compiling custom kernels or patching things or anything like that. While I don't know which part of the equation is the problem I would be happy to do anything I can to help resolve this. 

Or am I completely confused about what is going on? Should all of this be a different bug report or anything like that?

Comment 6 Ben Levenson 2009-05-14 17:09:23 UTC
seeing the same thing on my X61 -- happened 2 min after firing up the desktop

default settings (desktop effects disabled)

kernel-2.6.29.3-140.fc11.x86_64
xorg-x11-drv-intel-2.7.0-4.fc11.x86_64
xorg-x11-server-Xorg-1.6.1-11.fc11.x86_64

00:02.0 VGA compatible controller: Intel Corporation Mobile GM965/GL960 Integrated Graphics Controller (rev 0c) (prog-if 00 [VGA controller])
        Subsystem: Lenovo T61
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 27
        Region 0: Memory at f8000000 (64-bit, non-prefetchable) [size=1M]
        Region 2: Memory at e0000000 (64-bit, prefetchable) [size=256M]
        Region 4: I/O ports at 1800 [size=8]
        Expansion ROM at <unassigned> [disabled]
        Capabilities: <access denied>
        Kernel driver in use: i915
        Kernel modules: i915

Comment 7 Bug Zapper 2009-06-09 13:13:52 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 11 development cycle.
Changing version to '11'.

More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 9 Bug Zapper 2010-04-27 13:29:34 UTC
This message is a reminder that Fedora 11 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 11.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '11'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 11's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 11 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 10 Bug Zapper 2010-06-28 11:41:08 UTC
Fedora 11 changed to end-of-life (EOL) status on 2010-06-25. Fedora 11 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.