Bug 593660

Summary: Memory corruption with 2.6.34 nouveau module
Product: [Fedora] Fedora Reporter: Milan Broz <mbroz>
Component: xorg-x11-drv-nouveauAssignee: Ben Skeggs <bskeggs>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: low    
Version: rawhideCC: airlied, ajax, bskeggs, pvrabec
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-06-06 23:25:32 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
full kernel log none

Description Milan Broz 2010-05-19 12:35:15 UTC
Created attachment 415096 [details]
full kernel log

Description of problem:

WARNING: at arch/x86/mm/ioremap.c:113 __ioremap_caller+0x174/0x31f()
Hardware name: Precision WorkStation 690    
Modules linked in: nouveau(+) ttm drm_kms_helper drm i2c_algo_bit i2c_core
Pid: 115, comm: modprobe Not tainted 2.6.34-2.fc14.x86_64 #1
Call Trace:
 [<ffffffff81050370>] warn_slowpath_common+0x7c/0x94
 [<ffffffff8105039c>] warn_slowpath_null+0x14/0x16
 [<ffffffff8103164c>] __ioremap_caller+0x174/0x31f
 [<ffffffff810318b6>] ioremap_wc+0x20/0x29
 [<ffffffffa0064468>] ttm_mem_reg_ioremap+0x82/0xaf [ttm]
 [<ffffffffa00648b5>] ttm_bo_move_memcpy+0x79/0x3d6 [ttm]
 [<ffffffff8107c71d>] ? trace_hardirqs_on+0xd/0xf
 [<ffffffffa0021dc2>] ? drm_mm_kmalloc+0x74/0xb3 [drm]
 [<ffffffffa0021f71>] ? drm_mm_split_at_start+0x5d/0x77 [drm]
 [<ffffffffa00811f7>] nouveau_bo_move+0x390/0x41d [nouveau]
 [<ffffffffa0062fdb>] ? ttm_bo_mem_space+0x188/0x446 [ttm]
 [<ffffffffa0061d49>] ttm_bo_handle_move_mem+0x1ad/0x2b5 [ttm]
 [<ffffffffa0063cc3>] ttm_bo_move_buffer+0xbc/0x10c [ttm]
 [<ffffffffa0063dc1>] ttm_bo_validate+0xae/0xf7 [ttm]
 [<ffffffffa006414d>] ttm_bo_init+0x343/0x37c [ttm]
 [<ffffffffa00818ba>] nouveau_bo_new+0x2ef/0x349 [nouveau]
 [<ffffffffa008152d>] ? nouveau_bo_del_ttm+0x0/0x9e [nouveau]
 [<ffffffffa0079a6f>] nouveau_mem_init+0x243/0x3d4 [nouveau]
 [<ffffffffa0077832>] nouveau_card_init+0xaf6/0xe16 [nouveau]
 [<ffffffffa0078012>] nouveau_load+0x3e7/0x3f5 [nouveau]
 [<ffffffffa001e4be>] drm_get_dev+0x3e6/0x4e7 [drm]
 [<ffffffffa00b8205>] nouveau_pci_probe+0x15/0x17 [nouveau]
 [<ffffffff81245de9>] local_pci_probe+0x17/0x1b
 [<ffffffff81246d71>] pci_device_probe+0xcd/0xfd
 [<ffffffff812f0d23>] ? driver_sysfs_add+0x4c/0x71
 [<ffffffff812f0efb>] driver_probe_device+0xed/0x21a
 [<ffffffff812f1085>] __driver_attach+0x5d/0x81
 [<ffffffff812f1028>] ? __driver_attach+0x0/0x81
 [<ffffffff812f02c7>] bus_for_each_dev+0x59/0x8e
 [<ffffffff812f0c82>] driver_attach+0x1e/0x20
 [<ffffffff812f08a3>] bus_add_driver+0xfa/0x263
 [<ffffffff812f138c>] driver_register+0x9e/0x10f
 [<ffffffff81246fb7>] __pci_register_driver+0x68/0xd8
 [<ffffffffa00d8000>] ? nouveau_init+0x0/0x52 [nouveau]
 [<ffffffffa00191d7>] drm_init+0x75/0xdb [drm]
 [<ffffffffa00d8000>] ? nouveau_init+0x0/0x52 [nouveau]
 [<ffffffffa00d8050>] nouveau_init+0x50/0x52 [nouveau]
 [<ffffffff8100207d>] do_one_initcall+0x72/0x18a
 [<ffffffff8108a886>] sys_init_module+0xd8/0x23a
 [<ffffffff81009c72>] system_call_fastpath+0x16/0x1b
---[ end trace 9a150a6711611cb2 ]---
[drm] nouveau 0000:07:00.0: failed to reserve VGA memory
[drm] nouveau 0000:07:00.0: 64 MiB GART (aperture)
mtrr: zero sized request
[drm] nouveau 0000:07:00.0: Allocating FIFO number 0
=============================================================================
async/2 used greatest stack depth: 4576 bytes left
BUG kmalloc-1024 (Tainted: G        W ): Poison overwritten
-----------------------------------------------------------------------------

INFO: 0xffff8801a39c1bd8-0xffff8801a39c1bdf. First byte 0x0 instead of 0x6b
INFO: Allocated in nouveau_bo_new+0x50/0x349 [nouveau] age=259 cpu=3 pid=115
INFO: Freed in nouveau_bo_del_ttm+0x96/0x9e [nouveau] age=25 cpu=3 pid=115
INFO: Slab 0xffffea0005bca200 objects=29 used=6 fp=0xffff8801a39c19b0 flags=0x400000000040c3
INFO: Object 0xffff8801a39c19b0 @offset=6576 fp=0xffff8801a39c1df8

Bytes b4 0xffff8801a39c19a0:  00 00 00 00 00 00 00 00 5a 5a 5a 5a 5a 5a 5a 5a ........ZZZZZZZZ
  Object 0xffff8801a39c19b0:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
  Object 0xffff8801a39c19c0:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
  Object 0xffff8801a39c19d0:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
async/0 used greatest stack depth: 4496 bytes left
  Object 0xffff8801a39c19e0:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
  Object 0xffff8801a39c19f0:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk

... (see attached full boot log)

Version-Release number of selected component (if applicable):

kernel-2.6.34-2.fc14.x86_64

How reproducible:

Load nouveau module on Dell Precision 690 workstation +

07:00.0 VGA compatible controller: nVidia Corporation NV44 [Quadro NVS 285] (rev a1)

(lspci -n -v)
07:00.0 0300: 10de:0165 (rev a1) (prog-if 00 [VGA controller])
        Subsystem: 10de:0334
        Flags: fast devsel, IRQ 11
        Memory at fa000000 (32-bit, non-prefetchable) [size=16M]
        Memory at d0000000 (64-bit, prefetchable) [size=256M]
        Memory at fb000000 (64-bit, non-prefetchable) [size=16M]
        Expansion ROM at fce00000 [disabled] [size=128K]
        Capabilities: [60] Power Management version 2
        Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
        Capabilities: [78] Express Endpoint, MSI 00
        Capabilities: [100] Virtual Channel
        Capabilities: [128] Power Budgeting <?>
        Kernel modules: nouveau, nvidiafb

Usually system crashes afterwards and only blacklisting nouveau module helps.

Works with 2.6.33 (and probably 2.6.34-rc, let me know if you need more info here).

Comment 1 Ben Skeggs 2010-05-19 22:15:30 UTC
The fix for the memory corruption bug is in the F13 kernel, and has been sent upstream for 2.6.35-rc1 also.  It'll make it into rawhide when they do an update.

Comment 2 Ben Skeggs 2010-06-06 23:25:32 UTC
This bug should be resolved now, 2.6.35 is in rawhide.  Closing.