Bug 1957758

Summary: [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to allocate VRAM BO
Product: Red Hat Enterprise Linux 9 Reporter: Michal Odehnal <modehnal>
Component: kernelAssignee: Lyude <lyude>
kernel sub component: Graphics QA Contact: Desktop QE <desktop-qa-list>
Status: CLOSED CURRENTRELEASE Docs Contact:
Severity: high    
Priority: unspecified CC: airlied, coolmathgamesx, csoriano, dgilbert, kraxel, lyude, mkrajnak, philipp, pvlasin, rduda, wadehamptoniv
Version: 9.0Keywords: Regression, TestBlocker, Triaged
Target Milestone: beta   
Target Release: ---   
Hardware: x86_64   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-09-10 09:49:28 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
dmesg none

Description Michal Odehnal 2021-05-06 12:20:13 UTC
Description of problem:
RHEL-9 under Wayland, the session will freeze from time to time and journal will be showing kernel messages.

Version-Release number of selected component (if applicable):
kernel-core-5.12.0-1.el9.x86_64
kernel-modules-5.12.0-1.el9.x86_64
kernel-tools-libs-5.12.0-1.el9.x86_64
kernel-tools-5.12.0-1.el9.x86_64
kernel-5.12.0-1.el9.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Log in to session using Wayland.
2. Do some stuff.
3.

Actual results:
UI Freezes and journal gets many entries from kernel

Expected results:
No UI freeze and no journal entries.

Additional info:
kernel: [drm:qxl_release_from_id_locked [qxl]] *ERROR* failed to find id in release_idr
kernel: f 4026531848#538: failed to wait on release 8 after spincount 301
kernel: f 4026531848#538: failed to wait on release 8 after spincount 301
kernel: f 4026531848#538: failed to wait on release 8 after spincount 301
kernel: f 4026531848#538: failed to wait on release 8 after spincount 301
kernel: f 4026531848#538: failed to wait on release 8 after spincount 301
kernel: f 4026531848#538: failed to wait on release 8 after spincount 301
kernel: f 4026531848#538: failed to wait on release 8 after spincount 301
kernel: [TTM] Buffer eviction failed
kernel: qxl 0000:00:01.0: object_init failed for (3149824, 0x00000001)
kernel: [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to allocate VRAM BO

I can reproduce this on Fedora 35

When I installed kernel from rhel-8.2 4.18.0-193.el8 I was no longer seeing any logs and did not notice any freeze.

Comment 1 Radek Duda 2021-05-07 09:43:13 UTC
Reproduced with
xorg-x11-drv-qxl-0.1.5-21.el9.x86_64
kernel-5.12.0-1.el9.x86_64

VM guest is unusable after a while.

Comment 2 Radek Duda 2021-05-07 09:49:21 UTC
Created attachment 1780645 [details]
dmesg

Comment 3 Radek Duda 2021-05-07 12:26:03 UTC
This is not reproducible without spice graphics in guest qemu cmd-line

Comment 4 Radek Duda 2021-05-07 12:57:46 UTC
(In reply to Radek Duda from comment #3)
> This is not reproducible without spice graphics in guest qemu cmd-line

Not true, I can reproduce without spice now

Comment 5 Gerd Hoffmann 2021-05-10 12:12:34 UTC
Fix (landed upstream in 5.13-rc1):

commit 4fff19ae427548d8c37260c975a4b20d3c040ec6
Author: Gerd Hoffmann <kraxel>
Date:   Wed Feb 17 13:32:05 2021 +0100

    drm/qxl: use ttm bo priorities
    
    Allow to set priorities for buffer objects.  Use priority 1 for surface
    and cursor command releases.  Use priority 0 for drawing command
    releases.  That way the short-living drawing commands are first in line
    when it comes to eviction, making it *much* less likely that
    ttm_bo_mem_force_space() picks something which can't be evicted and
    throws an error after waiting a while without success.

Comment 6 Gerd Hoffmann 2021-05-10 12:40:40 UTC
> Fix (landed upstream in 5.13-rc1):

note/patch sent to stable@, so the fix should land in 5.{10,11,12} stable branches soon.

Comment 7 Michal Odehnal 2021-05-11 13:32:13 UTC
In latest RHEL-9 compose I am now seeing similar issue under Xorg, may I assume it is related or is this a different bug?

[ 2000.095786] f 4026531864#17936: failed to wait on release 24 after spincount 301
[ 2000.419825] f 4026531864#17936: failed to wait on release 24 after spincount 301
[ 2000.490606] [TTM] Buffer eviction failed
[ 2000.490882] qxl 0000:00:01.0: object_init failed for (262144, 0x00000001)
[ 2000.491346] [drm:qxl_gem_object_create [qxl]] *ERROR* Failed to allocate GEM object (258580, 1, 4096, -12)
[ 2000.491983] [drm:qxl_alloc_ioctl [qxl]] *ERROR* qxl_alloc_ioctl: failed to create gem ret=-12

Comment 8 Gerd Hoffmann 2021-05-11 14:41:12 UTC
(In reply to Michal Odehnal from comment #7)
> In latest RHEL-9 compose I am now seeing similar issue under Xorg, may I
> assume it is related or is this a different bug?
> 
> [ 2000.095786] f 4026531864#17936: failed to wait on release 24 after
> spincount 301
> [ 2000.419825] f 4026531864#17936: failed to wait on release 24 after
> spincount 301
> [ 2000.490606] [TTM] Buffer eviction failed
> [ 2000.490882] qxl 0000:00:01.0: object_init failed for (262144, 0x00000001)
> [ 2000.491346] [drm:qxl_gem_object_create [qxl]] *ERROR* Failed to allocate
> GEM object (258580, 1, 4096, -12)
> [ 2000.491983] [drm:qxl_alloc_ioctl [qxl]] *ERROR* qxl_alloc_ioctl: failed
> to create gem ret=-12

Same thing most likely, this is a kernel issue affecting both xorg and wayland.

Comment 9 Gerd Hoffmann 2021-05-14 13:36:25 UTC
> note/patch sent to stable@, so the fix should land in 5.{10,11,12} stable
> branches soon.

v5.12.4 has the fix now (v5.11.21 too, v5.10.x should follow shortly).

Comment 10 Carlos Soriano 2021-05-18 08:58:51 UTC
I believe this will be part of the stable backport. Lyude, make sure you include this one.

Comment 11 Carlos Soriano 2021-05-18 09:13:49 UTC
My bad, this is RHEL 9. It should be fixed in any future kernel rebase that includes that kernel version, which I don't know when is planned to happen.

From the graphics team perspective, we are not currently backporting fixes in RHEL 9 beta kernel, due to capacity limitations. We are relaying on the regular kernel rebases.

Comment 13 Dr. David Alan Gilbert 2021-08-19 15:19:27 UTC
hmm, I'm seeing something similar in 5.14.0-0.rc4.35.el9 guest, just at the text console; just with lots of stuff scrolling past it'll pause
and get a Buffer eviction failed.
object_init failed for (3149824, 0x0....1)
qxl_alloc_bo_reserved [qxl]]] *ERROR* failed to alloate VRAM BO

Comment 16 Lyude 2021-09-07 20:49:06 UTC
Is this bug still being seen on the latest RHEL9 kernels?

Comment 17 Gerd Hoffmann 2021-09-08 05:10:16 UTC
(In reply to Lyude from comment #16)
> Is this bug still being seen on the latest RHEL9 kernels?

Seems there are some rare cases where the priority (comment #5) doesn't help.
Test case: run "for i in $(seq 1 1000); do dmesg; done" on fbcon.
Hangs now and then for a short time, logging an eviction and allocation
failure (see also comment #13):

[  582.893166] [TTM] Buffer eviction failed
[  582.899206] qxl 0000:00:01.0: object_init failed for (3149824, 0x00000001)
[  582.900706] [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to allocate VRAM BO

The good news: Error handling is solid now, can't see any bad effects on driver stability.

Comment 18 Michal Odehnal 2021-09-08 06:41:57 UTC
I have not seen this for quite a while so I would consider this solved from my point of view.

Comment 20 Radek Duda 2021-09-10 09:49:28 UTC
I did not meet the bug using latest rhel-9. Concerning my findings and comment #c18  closing this.

Comment 21 Philip Prindeville 2021-11-08 22:51:08 UTC
(In reply to Dr. David Alan Gilbert from comment #13)
> hmm, I'm seeing something similar in 5.14.0-0.rc4.35.el9 guest, just at the
> text console; just with lots of stuff scrolling past it'll pause
> and get a Buffer eviction failed.
> object_init failed for (3149824, 0x0....1)
> qxl_alloc_bo_reserved [qxl]]] *ERROR* failed to alloate VRAM BO

Yup, I'm seeing this on F34 guest running in Qemu/KVM on a CentOS 8 host.

5.14.16-201.fc34.x86_64 is my kernel.

Updated as of last Friday to latest on F34 repo.

Also on a text-only console for a server.

Please reopen.

Comment 22 Philip Prindeville 2021-11-08 22:53:17 UTC
(In reply to Philip Prindeville from comment #21)
> (In reply to Dr. David Alan Gilbert from comment #13)
> > hmm, I'm seeing something similar in 5.14.0-0.rc4.35.el9 guest, just at the
> > text console; just with lots of stuff scrolling past it'll pause
> > and get a Buffer eviction failed.
> > object_init failed for (3149824, 0x0....1)
> > qxl_alloc_bo_reserved [qxl]]] *ERROR* failed to alloate VRAM BO
> 
> Yup, I'm seeing this on F34 guest running in Qemu/KVM on a CentOS 8 host.
> 
> 5.14.16-201.fc34.x86_64 is my kernel.
> 
> Updated as of last Friday to latest on F34 repo.
> 
> Also on a text-only console for a server.
> 
> Please reopen.

I should add, I'm *not* seeing this line anywhere in dmesg:

[TTM] Buffer eviction failed

Comment 23 Wade Hampton 2021-11-24 19:33:29 UTC
I am seeing this message on an AlmaLinux 8.5 guest running on an AlmaLinux 8.5 host, both fully updated (as of a few days ago).

Comment 24 Olene Osborn 2022-04-19 15:05:10 UTC
(In reply to Philip Prindeville from comment #21)
> (In reply to Dr. David Alan Gilbert from comment #13)
> > hmm, I'm seeing something similar in 5.14.0-0.rc4.35.el9 guest, just at the
> > text console; just with lots of stuff scrolling past it'll pause
> > and get a Buffer eviction failed. https://slopegame3d.com
> > object_init failed for (3149824, 0x0....1)
> > qxl_alloc_bo_reserved [qxl]]] *ERROR* failed to alloate VRAM BO
> 
> Yup, I'm seeing this on F34 guest running in Qemu/KVM on a CentOS 8 host.
> 
> 5.14.16-201.fc34.x86_64 is my kernel.
> 
> Updated as of last Friday to latest on F34 repo.
> 
> Also on a text-only console for a server.
> 
> Please reopen.

If docs needed, set a value