RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1463157 - [GK106] GTX 660 freeze computer shortly after login
Summary: [GK106] GTX 660 freeze computer shortly after login
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: xorg-x11-drv-nouveau
Version: 7.4
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Ben Skeggs
QA Contact: Desktop QE
URL:
Whiteboard:
Depends On:
Blocks: 1547138
TreeView+ depends on / blocked
 
Reported: 2017-06-20 09:10 UTC by Tomas Pelka
Modified: 2021-06-10 12:27 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-11-11 21:47:24 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Tomas Pelka 2017-06-20 09:10:29 UTC
Description of problem:
I can see following in kernel log

Jun 20 11:03:05 localhost.localdomain kernel: [drm:drm_atomic_helper_swap_state [drm_kms_helper]] *ERROR* [CRTC:40:head-1] hw_done timed out
Jun 20 11:03:15 localhost.localdomain kernel: [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [CRTC:40:head-1] hw_done timed out
Jun 20 11:03:25 localhost.localdomain kernel: [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [CRTC:40:head-1] flip_done timed out
Jun 20 11:03:35 localhost.localdomain kernel: [drm:drm_atomic_helper_swap_state [drm_kms_helper]] *ERROR* [CRTC:40:head-1] hw_done timed out
Jun 20 11:03:45 localhost.localdomain kernel: [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [CRTC:40:head-1] hw_done timed out
Jun 20 11:03:55 localhost.localdomain kernel: [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [CRTC:40:head-1] flip_done timed out
Jun 20 11:04:05 localhost.localdomain kernel: [drm:drm_atomic_helper_swap_state [drm_kms_helper]] *ERROR* [CRTC:40:head-1] hw_done timed out
Jun 20 11:04:15 localhost.localdomain kernel: [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [CRTC:40:head-1] hw_done timed out
Jun 20 11:04:25 localhost.localdomain kernel: [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [CRTC:40:head-1] flip_done timed out
Jun 20 11:04:35 localhost.localdomain kernel: [drm:drm_atomic_helper_swap_state [drm_kms_helper]] *ERROR* [CRTC:40:head-1] hw_done timed out
Jun 20 11:04:45 localhost.localdomain kernel: [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [CRTC:40:head-1] hw_done timed out
Jun 20 11:04:55 localhost.localdomain kernel: [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [CRTC:40:head-1] flip_done timed out
Jun 20 11:05:05 localhost.localdomain kernel: [drm:drm_atomic_helper_swap_state [drm_kms_helper]] *ERROR* [CRTC:40:head-1] hw_done timed out
Jun 20 11:05:15 localhost.localdomain kernel: [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [CRTC:40:head-1] hw_done timed out
Jun 20 11:05:25 localhost.localdomain kernel: [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [CRTC:40:head-1] flip_done timed out
Jun 20 11:05:31 localhost.localdomain kernel: INFO: task kworker/u16:3:339 blocked for more than 120 seconds.
Jun 20 11:05:31 localhost.localdomain kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

Version-Release number of selected component (if applicable):
kernel-3.10.0-680.el7.x86_64
xorg-x11-server-Xorg-1.19.3-7.el7.x86_64


How reproducible:
60%

Steps to Reproduce:
1. boot computer
2.
3.

Actual results:
see above

Expected results:
no freeze

Additional info:
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GK106 [GeForce GTX 660] [10de:11c0] (rev a1)

Comment 1 Tomas Pelka 2017-06-20 09:11:26 UTC
This freeze is actually also followed by crash:

Jun 20 11:05:31 localhost.localdomain kernel: kworker/u16:3   D 0000000000000246     0   339      2 0x00000000
Jun 20 11:05:31 localhost.localdomain kernel: Workqueue: events_unbound nv50_disp_atomic_commit_work [nouveau]
Jun 20 11:05:31 localhost.localdomain kernel:  ffff880506acfc00 0000000000000046 ffff880506ad0000 ffff880506acffd8
Jun 20 11:05:31 localhost.localdomain kernel:  ffff880506acffd8 ffff880506acffd8 ffff880506ad0000 0000000000000000
Jun 20 11:05:31 localhost.localdomain kernel:  ffff880506ad0000 7fffffffffffffff ffff8804eeafe540 0000000000000246
Jun 20 11:05:31 localhost.localdomain kernel: Call Trace:
Jun 20 11:05:31 localhost.localdomain kernel:  [<ffffffff816a6f09>] schedule+0x29/0x70
Jun 20 11:05:31 localhost.localdomain kernel:  [<ffffffff816a4a19>] schedule_timeout+0x239/0x2c0
Jun 20 11:05:31 localhost.localdomain kernel:  [<ffffffff811de381>] ? __slab_free+0x81/0x2f0
Jun 20 11:05:31 localhost.localdomain kernel:  [<ffffffff8145ec9f>] dma_fence_default_wait+0x1cf/0x230
Jun 20 11:05:31 localhost.localdomain kernel:  [<ffffffff8145e9a0>] ? dma_fence_free+0x20/0x20
Jun 20 11:05:31 localhost.localdomain kernel:  [<ffffffff8145e889>] dma_fence_wait_timeout+0x39/0xd0
Jun 20 11:05:31 localhost.localdomain kernel:  [<ffffffffc018cc0d>] drm_atomic_helper_wait_for_fences+0x7d/0x100 [drm_kms_helper]
Jun 20 11:05:31 localhost.localdomain kernel:  [<ffffffffc028e095>] nv50_disp_atomic_commit_tail+0x55/0x1180 [nouveau]
Jun 20 11:05:31 localhost.localdomain kernel:  [<ffffffffc028f1d2>] nv50_disp_atomic_commit_work+0x12/0x20 [nouveau]
Jun 20 11:05:31 localhost.localdomain kernel:  [<ffffffff810a87fa>] process_one_work+0x17a/0x440
Jun 20 11:05:31 localhost.localdomain kernel:  [<ffffffff810a94c6>] worker_thread+0x126/0x3c0
Jun 20 11:05:31 localhost.localdomain kernel:  [<ffffffff810a93a0>] ? manage_workers.isra.24+0x2a0/0x2a0
Jun 20 11:05:31 localhost.localdomain kernel:  [<ffffffff810b096f>] kthread+0xcf/0xe0
Jun 20 11:05:31 localhost.localdomain kernel:  [<ffffffff810b08a0>] ? insert_kthread_work+0x40/0x40
Jun 20 11:05:31 localhost.localdomain kernel:  [<ffffffff816b2958>] ret_from_fork+0x58/0x90
Jun 20 11:05:31 localhost.localdomain kernel:  [<ffffffff810b08a0>] ? insert_kthread_work+0x40/0x40
Jun 2

Comment 2 Tomas Pelka 2017-06-20 09:15:42 UTC
One more thing, seem I can 100% reproduce by logging in gnome-session and playing video (big buck cunny trailer, ogv) in totem.

Kernel shows: 
nouveau 0000:01:00.0: gr: TRAP ch 2 [023fad6000 X[1330]]
Jun 20 11:14:00 localhost.localdomain kernel: nouveau 0000:01:00.0: gr: GPC0/PROP trap: 00000080 [ZETA_STORAGE_TYPE_MISMATCH] x = 80, y = 96, format = 0, storage type = fe
Jun 20 11:14:00 localhost.localdomain kernel: nouveau 0000:01:00.0: gr: TRAP ch 2 [023fad6000 X[1330]]
Jun 20 11:14:00 localhost.localdomain kernel: nouveau 0000:01:00.0: gr: GPC0/PROP trap: 00000080 [ZETA_STORAGE_TYPE_MISMATCH] x = 160, y = 320, format = 0, storage type = fe
Jun 20 11:14:04 localhost.localdomain kernel: nouveau 0000:01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT]
Jun 20 11:14:04 localhost.localdomain kernel: nouveau 0000:01:00.0: fifo: gr engine fault on channel 4, recovering...


and desktop freeze

Comment 3 Tomas Pelka 2017-06-20 09:23:12 UTC
I was able to trigger this issue also by libreoffice presentation mode.

Comment 4 Tomas Pelka 2017-06-20 14:52:52 UTC
I can reproduce on 

01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GK110 [GeForce GTX 780] [10de:1004] (rev a1)

too

Comment 7 Ben Skeggs 2018-04-13 10:55:53 UTC
Tomas,

Can you reproduce this on 7.5?

Thanks,
Ben.

Comment 8 Tomas Pelka 2018-04-13 11:12:09 UTC
(In reply to Ben Skeggs from comment #7)
> Tomas,
> 
> Can you reproduce this on 7.5?
> 
> Thanks,
> Ben.

Tomas please have a look.

Thanks
-Tom

Comment 9 Tomas Hudziec 2018-04-17 14:10:06 UTC
I can reproduce it on 7.5 with kernel-3.10.0-862.el7.x86_64. Desktop froze when playing video, installing libreoffice-impress and moving lo-impress window.

Kernel call trace from journalctl:
Apr 17 14:04:38 localhost.localdomain kernel: INFO: task kworker/u16:5:343 blocked for more than 120 seconds.
Apr 17 14:04:38 localhost.localdomain kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 17 14:04:38 localhost.localdomain kernel: kworker/u16:5   D ffff943407281fa0     0   343      2 0x00000000
Apr 17 14:04:38 localhost.localdomain kernel: Workqueue: events_unbound nv50_disp_atomic_commit_work [nouveau]
Apr 17 14:04:38 localhost.localdomain kernel: Call Trace:
Apr 17 14:04:38 localhost.localdomain kernel:  [<ffffffffc02fd307>] ? nvkm_client_notify_get+0x27/0x40 [nouveau]
Apr 17 14:04:38 localhost.localdomain kernel:  [<ffffffffc02feb5a>] ? nvkm_ioctl_ntfy_get+0x6a/0xc0 [nouveau]
Apr 17 14:04:38 localhost.localdomain kernel:  [<ffffffff86512f49>] schedule+0x29/0x70
Apr 17 14:04:38 localhost.localdomain kernel:  [<ffffffff865108b9>] schedule_timeout+0x239/0x2c0
Apr 17 14:04:38 localhost.localdomain kernel:  [<ffffffffc03af912>] ? nvkm_client_ioctl+0x12/0x20 [nouveau]
Apr 17 14:04:38 localhost.localdomain kernel:  [<ffffffffc02fc048>] ? nvif_object_ioctl+0x48/0x60 [nouveau]
Apr 17 14:04:38 localhost.localdomain kernel:  [<ffffffffc03b266c>] ? nouveau_bo_rd32+0x2c/0x30 [nouveau]
Apr 17 14:04:38 localhost.localdomain kernel:  [<ffffffffc03cea2e>] ? nv84_fence_read+0x2e/0x30 [nouveau]
Apr 17 14:04:38 localhost.localdomain kernel:  [<ffffffffc03ccbfc>] ? nouveau_fence_no_signaling+0x2c/0x90 [nouveau]
Apr 17 14:04:38 localhost.localdomain kernel:  [<ffffffff86295adc>] dma_fence_default_wait+0x1cc/0x220
Apr 17 14:04:38 localhost.localdomain kernel:  [<ffffffff862956a0>] ? dma_fence_release+0xa0/0xa0
Apr 17 14:04:38 localhost.localdomain kernel:  [<ffffffff862954df>] dma_fence_wait_timeout+0x3f/0xe0
Apr 17 14:04:38 localhost.localdomain kernel:  [<ffffffffc02dc869>] drm_atomic_helper_wait_for_fences+0x69/0xe0 [drm_kms_helper]
Apr 17 14:04:38 localhost.localdomain kernel:  [<ffffffffc03c27b5>] nv50_disp_atomic_commit_tail+0x55/0x1200 [nouveau]
Apr 17 14:04:38 localhost.localdomain kernel:  [<ffffffff8651291c>] ? __schedule+0x41c/0xa20
Apr 17 14:04:38 localhost.localdomain kernel:  [<ffffffffc03c3972>] nv50_disp_atomic_commit_work+0x12/0x20 [nouveau]
Apr 17 14:04:38 localhost.localdomain kernel:  [<ffffffff85eb2dff>] process_one_work+0x17f/0x440
Apr 17 14:04:38 localhost.localdomain kernel:  [<ffffffff85eb3ac6>] worker_thread+0x126/0x3c0
Apr 17 14:04:38 localhost.localdomain kernel:  [<ffffffff85eb39a0>] ? manage_workers.isra.24+0x2a0/0x2a0
Apr 17 14:04:38 localhost.localdomain kernel:  [<ffffffff85ebae31>] kthread+0xd1/0xe0
Apr 17 14:04:38 localhost.localdomain kernel:  [<ffffffff85ebad60>] ? insert_kthread_work+0x40/0x40
Apr 17 14:04:38 localhost.localdomain kernel:  [<ffffffff8651f637>] ret_from_fork_nospec_begin+0x21/0x21
Apr 17 14:04:38 localhost.localdomain kernel:  [<ffffffff85ebad60>] ? insert_kthread_work+0x40/0x40

Comment 16 Chris Williams 2020-11-11 21:47:24 UTC
Red Hat Enterprise Linux 7 shipped it's final minor release on September 29th, 2020. 7.9 was the last minor releases scheduled for RHEL 7.
From intial triage it does not appear the remaining Bugzillas meet the inclusion criteria for Maintenance Phase 2 and will now be closed. 

From the RHEL life cycle page:
https://access.redhat.com/support/policy/updates/errata#Maintenance_Support_2_Phase
"During Maintenance Support 2 Phase for Red Hat Enterprise Linux version 7,Red Hat defined Critical and Important impact Security Advisories (RHSAs) and selected (at Red Hat discretion) Urgent Priority Bug Fix Advisories (RHBAs) may be released as they become available."

If this BZ was closed in error and meets the above criteria please re-open it flag for 7.9.z, provide suitable business and technical justifications, and follow the process for Accelerated Fixes:
https://source.redhat.com/groups/public/pnt-cxno/pnt_customer_experience_and_operations_wiki/support_delivery_accelerated_fix_release_handbook  

Feature Requests can re-opened and moved to RHEL 8 if the desired functionality is not already present in the product. 

Please reach out to the applicable Product Experience Engineer[0] if you have any questions or concerns.  

[0] https://bugzilla.redhat.com/page.cgi?id=agile_component_mapping.html&product=Red+Hat+Enterprise+Linux+7


Note You need to log in before you can comment on or make changes to this bug.