Bug 1494191 - WARNING: CPU: 0 PID: 1300 at drivers/gpu/drm/nouveau/nouveau_bo.c:137 nouveau_bo_del_ttm+0x79/0x80 [nouveau] [NEEDINFO]
Summary: WARNING: CPU: 0 PID: 1300 at drivers/gpu/drm/nouveau/nouveau_bo.c:137 nouveau...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: xorg-x11-drv-nouveau
Version: 7.7
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: rc
: ---
Assignee: Lyude
QA Contact: Desktop QE
URL:
Whiteboard:
Depends On:
Blocks: 1547138
TreeView+ depends on / blocked
 
Reported: 2017-09-21 15:51 UTC by Alan Matsuoka
Modified: 2021-06-10 13:05 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-11-11 21:55:53 UTC
Target Upstream Version:
rclark: needinfo? (alanm)


Attachments (Terms of Use)

Description Alan Matsuoka 2017-09-21 15:51:34 UTC
Description of problem:
I have tried a new nvs 510, which has 2GB ddr3 mem. Initially for proof-of-concept I would like to run with nouveu driver.
<snip>
nouveau              1622010  32 
video                  24520  1 nouveau
mxm_wmi                13021  1 nouveau
i2c_algo_bit           13413  1 nouveau
drm_kms_helper        159169  1 nouveau
ttm                    99345  1 nouveau
drm                   370825  11 ttm,drm_kms_helper,nouveau
i2c_core               40756  5 drm,i2c_i801,drm_kms_helper,i2c_algo_bit,nouveau
wmi                    19070  3 dell_wmi,mxm_wmi,nouveau
<snip>

Few hours into running with nvs 510, we got an abrt alert. I have attached the abrt logs along with a sosreport.
xorg.o.log doesn't seem to indicate of any memory pressure. 


Version-Release number of selected component (if applicable):

xorg-x11-drv-nouveau-1.0.13-3.el7.x86_64                    Fri Sep  8 18:57:30 2017

How reproducible:
Few hours into running with nvs 510, we got an abrt alert. I have attached the abrt logs along with a sosreport.
xorg.o.log doesn't seem to indicate of any memory pressure. 

Steps to Reproduce:
1. installed new nvs 510 card
2. run with up to date RHEL 7.4
3.

Actual results:

Warning issued with eventual kernel oops.

Expected results:


Additional info:

Comment 2 Alan Matsuoka 2017-09-21 15:54:39 UTC
abrt has been picking up these backtraces:

bash-4.2$ cat oops-2017-09-15-17:51:12-56773-0/backtrace
WARNING: CPU: 0 PID: 1300 at drivers/gpu/drm/nouveau/nouveau_bo.c:137 nouveau_bo_del_ttm+0x79/0x80 [nouveau]
Modules linked in: fuse ipheth binfmt_misc nfsv3 nfs xt_CHECKSUM fscache iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack libcrc32c ipt_REJECT nf_reject_ipv4 tun bridge stp llc ebtable_filter ebtables ip6table_filter mvfs(OE) ip6_tables iptable_filter dm_mirror dm_region_hash dm_log dm_mod intel_powerclamp snd_hda_codec_analog snd_hda_codec_generic coretemp snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep kvm snd_seq snd_seq_device snd_pcm snd_timer irqbypass crc32_pclmul ghash_clmulni_intel snd iTCO_wdt aesni_intel dell_wmi dell_smbios sparse_keymap ppdev lrw gpio_ich iTCO_vendor_support sg soundcore pcspkr gf128mul glue_helper i7core_edac dcdbas ablk_helper i2c_i801 edac_core cryptd parport_pc
parport shpchp lpc_ich nfsd nfs_acl lockd grace auth_rpcgss sunrpc ip_tables ext4 mbcache jbd2 sd_mod sr_mod cdrom crc_t10dif crct10dif_generic nouveau video mxm_wmi i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ahci libahci drm libata tg3 crct10dif_pclmul crct10dif_common crc32c_intel serio_raw ptp pps_core i2c_core wmi
CPU: 0 PID: 1300 Comm: X Tainted: G           OE  ------------   3.10.0-693.2.1.el7.x86_64 #1
Hardware name: Dell Inc. Precision WorkStation T5500  /0CRH6C, BIOS A16 05/28/2013
0000000000000000 00000000b320290b ffff880036353b60 ffffffff816a3db1
ffff880036353ba0 ffffffff810879c8 0000008900000000 ffff8802ffb3b400
ffff88017a9dc000 ffff880306e5a1e8 ffff8802ffb3b400 ffff8802ffb3b400
Call Trace:
[<ffffffff816a3db1>] dump_stack+0x19/0x1b
[<ffffffff810879c8>] __warn+0xd8/0x100
[<ffffffff81087b0d>] warn_slowpath_null+0x1d/0x20
[<ffffffffc023dfe9>] nouveau_bo_del_ttm+0x79/0x80 [nouveau]
[<ffffffffc0142ebb>] ttm_bo_release_list+0xbb/0x1a0 [ttm]
[<ffffffffc01432bc>] ttm_bo_release+0xfc/0x220 [ttm]
[<ffffffffc0143409>] ttm_bo_unref+0x29/0x30 [ttm]
[<ffffffffc024192e>] nouveau_gem_object_del+0x8e/0xf0 [nouveau]
[<ffffffffc00e9869>] drm_gem_object_free+0x29/0x70 [drm]
[<ffffffffc00e9bd8>] drm_gem_object_unreference_unlocked+0x48/0xb0 [drm]
[<ffffffffc00e9cc9>] drm_gem_object_handle_unreference_unlocked+0x69/0xb0 [drm]
[<ffffffffc00e9d63>] drm_gem_object_release_handle+0x53/0x90 [drm]
[<ffffffffc00e9dff>] drm_gem_handle_delete+0x5f/0x90 [drm]
[<ffffffffc00ea5d5>] drm_gem_close_ioctl+0x25/0x30 [drm]
[<ffffffffc00eaedc>] drm_ioctl+0x20c/0x4b0 [drm]
[<ffffffffc00ea5b0>] ? drm_gem_handle_create+0x40/0x40 [drm]
[<ffffffff8109e922>] ? __set_current_blocked+0x42/0x70
[<ffffffff8103528e>] ? fpu_finit+0x1e/0x30
[<ffffffffc023a404>] nouveau_drm_ioctl+0x54/0xc0 [nouveau]
[<ffffffff8121524d>] do_vfs_ioctl+0x33d/0x540
[<ffffffff812b780f>] ? file_has_perm+0x9f/0xb0
[<ffffffff812154f1>] SyS_ioctl+0xa1/0xc0
[<ffffffff816b5009>] system_call_fastpath+0x16/0x1b
bash-4.2$ 



This appears to have been reported on other platforms elsewhere.

https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-nouveau/+bug/1698450
https://bugzilla.redhat.com/show_bug.cgi?id=1449961

Comment 3 Alan Matsuoka 2017-09-26 20:57:18 UTC
bash-4.2$ cat backtrace 
WARNING: CPU: 16 PID: 4286 at drivers/gpu/drm/nouveau/nouveau_bo.c:1212 nouveau_bo_move_ntfy+0xb8/0xc0 [nouveau]
Modules linked in: mvfs(OE) nfsv3 nfs fscache xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack libcrc32c ipt_REJECT nf_reject_ipv4 tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter binfmt_misc dm_mirror dm_region_hash dm_log dm_mod sb_edac edac_core intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi crc32_pclmul ghash_clmulni_intel snd_hda_intel aesni_intel dcdbas lrw gf128mul glue_helper ablk_helper cryptd snd_hda_codec iTCO_wdt sg snd_hda_core mei_wdt snd_hwdep iTCO_vendor_support pcspkr snd_seq snd_seq_device snd_pcm snd_timer snd shpchp soundcore i2c_i801 lpc_ich
mei_me mei nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables ext4 mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif crct10dif_generic nouveau video mxm_wmi i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm ahci e1000e libahci libata crct10dif_pclmul crct10dif_common crc32c_intel serio_raw ptp i2c_core pps_core wmi
CPU: 16 PID: 4286 Comm: gnome-shell Tainted: G        W  OE  ------------   3.10.0-693.2.1.el7.x86_64 #1
Hardware name: Dell Inc. Precision T5610/0WN7Y6, BIOS A03 09/05/2013
0000000000000000 0000000098821002 ffff88043dbc38c0 ffffffff816a3db1
ffff88043dbc3900 ffffffff810879c8 000004bc506cbc00 ffff88085265cc00
ffff88043dbc3a00 ffff8808506cbc00 ffff8808506cbec8 ffff8808506cbc00
Call Trace:
[<ffffffff816a3db1>] dump_stack+0x19/0x1b
[<ffffffff810879c8>] __warn+0xd8/0x100
[<ffffffff81087b0d>] warn_slowpath_null+0x1d/0x20
[<ffffffffc026f508>] nouveau_bo_move_ntfy+0xb8/0xc0 [nouveau]
[<ffffffffc00b1b0e>] ttm_bo_handle_move_mem+0x22e/0x5a0 [ttm]
[<ffffffffc00b26f3>] ? ttm_bo_mem_space+0x3b3/0x460 [ttm]
[<ffffffff811df73c>] ? kmem_cache_alloc_trace+0x3c/0x200
[<ffffffffc00b1fc2>] ttm_bo_evict+0x142/0x2e0 [ttm]
[<ffffffff81460019>] ? dma_fence_wait_timeout+0x39/0xd0
[<ffffffffc00b22c6>] ttm_mem_evict_first+0x166/0x1e0 [ttm]
[<ffffffffc00b261d>] ttm_bo_mem_space+0x2dd/0x460 [ttm]
[<ffffffffc00b2bea>] ttm_bo_validate+0xda/0x160 [ttm]
[<ffffffffc00b2ea0>] ttm_bo_init+0x230/0x4b0 [ttm]
[<ffffffffc02704ec>] nouveau_bo_new+0x1fc/0x340 [nouveau]
[<ffffffffc026ef70>] ? nv10_bo_put_tile_region+0x80/0x80 [nouveau]
[<ffffffffc0272da2>] nouveau_gem_new+0x82/0x140 [nouveau]
[<ffffffffc0272ee9>] nouveau_gem_ioctl_new+0x89/0x160 [nouveau]
[<ffffffffc010eedc>] drm_ioctl+0x20c/0x4b0 [drm]
[<ffffffffc0272e60>] ? nouveau_gem_new+0x140/0x140 [nouveau]
[<ffffffffc026b404>] nouveau_drm_ioctl+0x54/0xc0 [nouveau]
[<ffffffff8121524d>] do_vfs_ioctl+0x33d/0x540
[<ffffffff812b780f>] ? file_has_perm+0x9f/0xb0
[<ffffffff812154f1>] SyS_ioctl+0xa1/0xc0
[<ffffffff816b5009>] system_call_fastpath+0x16/0x1b

different backtrace but quite possibly the same problem

Comment 4 A. Fernando 2017-10-04 13:46:40 UTC
Similar behavior was experienced with nvidia quadro nvs 295, nvs 310 and fx 1800 cards running nouveau driver with RH kernel 7.4, kernel 3.10.0-693.2.1.el7.x86_64

Comment 5 Ben Skeggs 2018-04-13 10:56:45 UTC
Do you still see this in 7.5?

Comment 8 Lyude 2018-05-29 18:01:43 UTC
Actually going to dev_ack+ this bug because I'm able to reproduce this 100% of the time with rob's latest DRM backport on the ThinkPad W530 I've got over here in bss (which I can set you up with ssh credentials for, it's already in the red hat intranet). We've got some other bugs that are depending on this machine being able to work that are on the RPL: https://bugzilla.redhat.com/show_bug.cgi?id=1305618.

Comment 11 Rob Clark 2018-06-06 12:23:58 UTC
(In reply to Lyude from comment #8)
> Actually going to dev_ack+ this bug because I'm able to reproduce this 100%
> of the time with rob's latest DRM backport on the ThinkPad W530 I've got
> over here in bss (which I can set you up with ssh credentials for, it's
> already in the red hat intranet). We've got some other bugs that are
> depending on this machine being able to work that are on the RPL:
> https://bugzilla.redhat.com/show_bug.cgi?id=1305618.

I think that is yet a 3rd issue with an inadvertent/bogus warn_on splat in the rhel76 backport ;-)

(first two splats don't appear to be the same issue.. the one in #c3 looks like it could be a userspace issue)

Comment 14 Rob Clark 2018-11-28 16:24:17 UTC
(In reply to Ben Skeggs from comment #5)
> Do you still see this in 7.5?

reinstating needinfo was lost around the #c8 - #c11 confusions about an unrelated bug.

Comment 17 Chris Williams 2020-11-11 21:55:53 UTC
Red Hat Enterprise Linux 7 shipped it's final minor release on September 29th, 2020. 7.9 was the last minor releases scheduled for RHEL 7.
From intial triage it does not appear the remaining Bugzillas meet the inclusion criteria for Maintenance Phase 2 and will now be closed. 

From the RHEL life cycle page:
https://access.redhat.com/support/policy/updates/errata#Maintenance_Support_2_Phase
"During Maintenance Support 2 Phase for Red Hat Enterprise Linux version 7,Red Hat defined Critical and Important impact Security Advisories (RHSAs) and selected (at Red Hat discretion) Urgent Priority Bug Fix Advisories (RHBAs) may be released as they become available."

If this BZ was closed in error and meets the above criteria please re-open it flag for 7.9.z, provide suitable business and technical justifications, and follow the process for Accelerated Fixes:
https://source.redhat.com/groups/public/pnt-cxno/pnt_customer_experience_and_operations_wiki/support_delivery_accelerated_fix_release_handbook  

Feature Requests can re-opened and moved to RHEL 8 if the desired functionality is not already present in the product. 

Please reach out to the applicable Product Experience Engineer[0] if you have any questions or concerns.  

[0] https://bugzilla.redhat.com/page.cgi?id=agile_component_mapping.html&product=Red+Hat+Enterprise+Linux+7


Note You need to log in before you can comment on or make changes to this bug.