Bug 714069 - huge system load with nvidia graphic card
Summary: huge system load with nvidia graphic card
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: xorg-x11-drv-nouveau
Version: 6.4
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: beta
: ---
Assignee: Ben Skeggs
QA Contact: Desktop QE
URL:
Whiteboard:
Depends On:
Blocks: 842499 1318321 1360926 1438054
TreeView+ depends on / blocked
 
Reported: 2011-06-17 10:09 UTC by Levente Farkas
Modified: 2017-05-22 13:42 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-05-22 13:42:04 UTC
Target Upstream Version:


Attachments (Terms of Use)
an example screenshot (173.36 KB, image/png)
2012-05-16 11:12 UTC, Levente Farkas
no flags Details

Description Levente Farkas 2011-06-17 10:09:39 UTC
on rhel-6.0 and 6.1 we found the system with nvidia graphic card with this simple test command:
gst-launch videotestsrc ! xvimagesink videotestsrc ! xvimagesink 
which open 2 test video screen.
with this command the cpu load goes to 100%.
we test also the following:

the problem DO exists on:
- rhel-6.1 x86_64 nouveau
- rhel-6.1 x86_64 nvidia's proprietary driver

the problem DO NOT exists on:
- rhel-6.1 x86_64 intel on board graphic card
- rhel-6.1 x86_64 radeon graphic card
- fedora-15 x86_64 nouveau
- fedora-15 x86_64 nvidia's proprietary driver
- ubuntu-10.04 x86_64 nouveau
- ubuntu-10.04 x86_64 nvidia's proprietary driver

so it seems only the rhel-6 and nvidia card is the only combination where something strange happened.

Comment 2 Ben Skeggs 2011-06-17 10:42:41 UTC
What chipset are we talking about here?  Your kernel log would be useful too.

Comment 3 Levente Farkas 2011-06-17 11:07:16 UTC
we test with nvidia 8400GS, 8500GT, 6200TC (imho all).
which part of the kernel log do you need?

Comment 4 Ben Skeggs 2011-06-17 11:57:34 UTC
Interesting..  I can't think of any good reason why there'd be a difference on any of those chipsets to what's in Fedora 15.

"dmesg | grep nouveau" output from any/all of the effected ones would be interesting to see.

Aside from the distro/gpu combos, there were no other differences in the test systems?

Comment 5 Levente Farkas 2011-06-17 19:57:58 UTC
in the same machine we switch different video cards. so they are exactly the same. may be on f15 the xorg is newer...

anyway the log:

# dmesg | grep nouveau
nouveau 0000:01:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
nouveau 0000:01:00.0: setting latency timer to 64
[drm] nouveau 0000:01:00.0: Detected an NV50 generation card (0x298200a2)
[drm] nouveau 0000:01:00.0: Attempting to load BIOS image from PRAMIN
[drm] nouveau 0000:01:00.0: ... appears to be valid
[drm] nouveau 0000:01:00.0: BIT BIOS found
[drm] nouveau 0000:01:00.0: Bios version 62.98.18.00
[drm] nouveau 0000:01:00.0: TMDS table revision 2.0 not currently supported
[drm] nouveau 0000:01:00.0: Found Display Configuration Block version 4.0
[drm] nouveau 0000:01:00.0: Raw DCB entry 0: 02000300 00000028
[drm] nouveau 0000:01:00.0: Raw DCB entry 1: 01000302 00020030
[drm] nouveau 0000:01:00.0: Raw DCB entry 2: 04022320 00000028
[drm] nouveau 0000:01:00.0: Raw DCB entry 3: 02011312 00c20090
[drm] nouveau 0000:01:00.0: DCB connector table: VHER 0x40 5 16 4
[drm] nouveau 0000:01:00.0:   0: 0x00001030: type 0x30 idx 0 tag 0x07
[drm] nouveau 0000:01:00.0:   1: 0x00002161: type 0x61 idx 1 tag 0x08
[drm] nouveau 0000:01:00.0:   2: 0x00000300: type 0x00 idx 2 tag 0xff
[drm] nouveau 0000:01:00.0:   6: 0x00000462: type 0x62 idx 6 tag 0xff
[drm] nouveau 0000:01:00.0: unknown type, using 0xff
[drm] nouveau 0000:01:00.0: Parsing VBIOS init table 0 at offset 0xD062
[drm] nouveau 0000:01:00.0: Parsing VBIOS init table 1 at offset 0xD3D6
[drm] nouveau 0000:01:00.0: Parsing VBIOS init table 2 at offset 0xDBD0
[drm] nouveau 0000:01:00.0: Parsing VBIOS init table 3 at offset 0xDCC2
[drm] nouveau 0000:01:00.0: Parsing VBIOS init table 4 at offset 0xDE7F
[drm] nouveau 0000:01:00.0: Parsing VBIOS init table at offset 0xDEE4
[drm] nouveau 0000:01:00.0: 0xDEE4: Condition still not met after 20ms, skipping following opcodes
[drm] nouveau 0000:01:00.0: 0xC086: parsing output script 0
[drm] nouveau 0000:01:00.0: 0xC086: parsing output script 0
[drm] nouveau 0000:01:00.0: Detected 512MiB VRAM
[drm] nouveau 0000:01:00.0: 512 MiB GART (aperture)
[drm] nouveau 0000:01:00.0: gpio tag 0xff not found
[drm] nouveau 0000:01:00.0: Allocating FIFO number 1
[drm] nouveau 0000:01:00.0: nouveau_channel_alloc: initialised FIFO 1
[drm] nouveau 0000:01:00.0: allocated 1024x768 fb: 0x40250000, bo ffff88014729fa00
fbcon: nouveaufb (fb0) is primary device
[drm] nouveau 0000:01:00.0: 0x111F: parsing clock script 0
fb0: nouveaufb frame buffer device
[drm] Initialized nouveau 0.0.16 20090420 for 0000:01:00.0 on minor 0
nouveau 0000:02:00.0: enabling device (0000 -> 0003)
nouveau 0000:02:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
nouveau 0000:02:00.0: setting latency timer to 64
[drm] nouveau 0000:02:00.0: Detected an NV50 generation card (0x298200a2)
[drm] nouveau 0000:02:00.0: Attempting to load BIOS image from PRAMIN
[drm] nouveau 0000:02:00.0: ... BIOS signature not found
[drm] nouveau 0000:02:00.0: Attempting to load BIOS image from PROM
[drm] nouveau 0000:02:00.0: ... appears to be valid
[drm] nouveau 0000:02:00.0: BIT BIOS found
[drm] nouveau 0000:02:00.0: Bios version 62.98.18.00
[drm] nouveau 0000:02:00.0: TMDS table revision 2.0 not currently supported
[drm] nouveau 0000:02:00.0: Found Display Configuration Block version 4.0
[drm] nouveau 0000:02:00.0: Raw DCB entry 0: 02000300 00000028
[drm] nouveau 0000:02:00.0: Raw DCB entry 1: 01000302 00020030
[drm] nouveau 0000:02:00.0: Raw DCB entry 2: 04022320 00000028
[drm] nouveau 0000:02:00.0: Raw DCB entry 3: 02011312 00c20090
[drm] nouveau 0000:02:00.0: DCB connector table: VHER 0x40 5 16 4
[drm] nouveau 0000:02:00.0:   0: 0x00001030: type 0x30 idx 0 tag 0x07
[drm] nouveau 0000:02:00.0:   1: 0x00002161: type 0x61 idx 1 tag 0x08
[drm] nouveau 0000:02:00.0:   2: 0x00000300: type 0x00 idx 2 tag 0xff
[drm] nouveau 0000:02:00.0:   6: 0x00000462: type 0x62 idx 6 tag 0xff
[drm] nouveau 0000:02:00.0: unknown type, using 0xff
[drm] nouveau 0000:02:00.0: Adaptor not initialised
[drm] nouveau 0000:02:00.0: Running VBIOS init tables
[drm] nouveau 0000:02:00.0: Parsing VBIOS init table 0 at offset 0xD062
[drm] nouveau 0000:02:00.0: Parsing VBIOS init table 1 at offset 0xD3D6
[drm] nouveau 0000:02:00.0: Parsing VBIOS init table 2 at offset 0xDBD0
[drm] nouveau 0000:02:00.0: Parsing VBIOS init table 3 at offset 0xDCC2
[drm] nouveau 0000:02:00.0: Parsing VBIOS init table 4 at offset 0xDE7F
[drm] nouveau 0000:02:00.0: Parsing VBIOS init table at offset 0xDEE4
[drm] nouveau 0000:02:00.0: 0xC086: parsing output script 0
[drm] nouveau 0000:02:00.0: 0xC086: parsing output script 0
[drm] nouveau 0000:02:00.0: Detected 512MiB VRAM
[drm] nouveau 0000:02:00.0: 512 MiB GART (aperture)
[drm] nouveau 0000:02:00.0: gpio tag 0xff not found
[drm] nouveau 0000:02:00.0: Allocating FIFO number 1
[drm] nouveau 0000:02:00.0: nouveau_channel_alloc: initialised FIFO 1
[drm] nouveau 0000:02:00.0: allocated 1024x768 fb: 0x40250000, bo ffff880146b6b800
fb1: nouveaufb frame buffer device
[drm] Initialized nouveau 0.0.16 20090420 for 0000:02:00.0 on minor 1
[drm] nouveau 0000:01:00.0: Allocating FIFO number 2
[drm] nouveau 0000:01:00.0: nouveau_channel_alloc: initialised FIFO 2

Comment 6 Levente Farkas 2011-06-17 20:00:45 UTC
any way you can simple test with the above one line gstreamer command line on any nvidia card on rhel-6.

Comment 7 Levente Farkas 2011-06-23 11:04:59 UTC
did you try this simple command:
gst-launch videotestsrc ! xvimagesink videotestsrc ! xvimagesink 
and can reproduce it?
i don't know it's an xorg bug or kernel, but i'm sure a bug.

Comment 8 RHEL Program Management 2011-10-07 16:17:18 UTC
Since RHEL 6.2 External Beta has begun, and this bug remains
unresolved, it has been rejected as it is not proposed as
exception or blocker.

Red Hat invites you to ask your support representative to
propose this request, if appropriate and relevant, in the
next release of Red Hat Enterprise Linux.

Comment 9 Levente Farkas 2011-11-28 09:43:10 UTC
unfortunately it's still exists in rhel-6.2 beta. ie nvidia card is unusable with rhel-6, but working on all other distro: fedora, ubuntu:-(

Comment 10 Levente Farkas 2011-12-13 11:33:36 UTC
imho it's a critical bug (at least for us). currently there is no usable video card on linux which can shoe more then 16 xv output.
ati has only 4
intel has only 16
nvidia has 32 but not working (after 2-3 cpu load goes to 100%) we already test it with many more nvidia card.

could anyone help us to find why it's working on fedora and ubuntu?
is it a kernel or xorg-x11-drv-nouveau or other package bug?
how can we find the root of the problem?

we're willing to patch any package (kernel, xorg etc.) and build it our self just try to find the reason and a patch/fix for it.

thanks in advance.

Comment 11 Levente Farkas 2011-12-14 13:47:03 UTC
we further debug it and try the latest kernel from:
http://elrepo.org/tiki/kernel-ml
with this kernel:
http://elrepo.reloumirrors.net/kernel/el6/x86_64/RPMS/kernel-ml-2.6.39-4.1.el6.elrepo.x86_64.rpm
it's still not working:-(
neither with xorg-x11-drv-nouveau (which comes from rhel-6.1/6.2) nor the latest 
NVIDIA-Linux-x86_64-290.10.run.
so it seems for us it's a generic xorg bug somewhere...

Comment 12 Levente Farkas 2012-02-10 12:41:52 UTC
is there any progress with this serious bug?

Comment 13 Levente Farkas 2012-05-11 09:33:33 UTC
the problem still exists in 6.3 beta:-((

any progress with this problem? since it's working in fedora it can be easily ported to rhel 6.3...

Comment 14 Ben Skeggs 2012-05-16 00:50:54 UTC
I can *not* confirm this issue on at least NV98 (Quadro NVS 295) with the nouveau driver.

Using the gst-launch recipe above I'm seeing system load of ~5%...

Comment 15 Levente Farkas 2012-05-16 09:34:15 UTC
it's not happened with only one pipeline, but happened with 2-3...
just test with this command:

gst-launch videotestsrc ! xvimagesink videotestsrc ! xvimagesink videotestsrc ! xvimagesink videotestsrc ! xvimagesink

it's important that all window must be visible on the desktop!!! in this case the above pipeline use 100% cpu while the same command on intel gpu or fedora 16 use almost no cpu.

could you retest it?

Comment 16 Ben Skeggs 2012-05-16 10:45:54 UTC
I used the above command from the original report, which created two Xv windows.  I'll try with a couple more tomorrow morning anyway.

Comment 17 Levente Farkas 2012-05-16 10:53:26 UTC
just to repeat myself it's important that _ALL_ xv window must be visible on the desktop. if they cover each other then the load drops!

Comment 18 Ben Skeggs 2012-05-16 11:02:59 UTC
Yep, they were uncovered.

Comment 19 Levente Farkas 2012-05-16 11:12:12 UTC
Created attachment 584930 [details]
an example screenshot

and the system becomes totally unusable. if you hide all window the everything works again.

Comment 20 Levente Farkas 2012-05-18 19:53:06 UTC
did you able to reproduce it?

Comment 21 Ben Skeggs 2012-05-21 05:48:45 UTC
(In reply to comment #20)
> did you able to reproduce it?

I'm not able to reproduce the high load you're seeing.  However, with 4 windows launched I do see the system become unresponsive soon after.  And, this didn't appear to happen in F17 (at least, not as quickly, I didn't leave it running for too long).

I'm not sure we can blame the video driver still though, what was released with 6.2 would have matched the Fedora release you'd already tested and reported working fine.

I did also try testing the current RHEL userspace with the kernel I used when I tested Fedora, and the issue still occurred.  Which probably points to something in userspace being the culprit.

Comment 22 Ben Skeggs 2012-05-21 05:49:50 UTC
(In reply to comment #21)
> (In reply to comment #20)
> > did you able to reproduce it?
> 
> I'm not able to reproduce the high load you're seeing.  However, with 4
> windows launched I do see the system become unresponsive soon after.
I should clarify: desktop became unresponsive, but top doesn't report any process taking a significant portion of CPU time.

Comment 23 Levente Farkas 2012-05-21 06:38:48 UTC
to be more precise the load do not get high just only Xorg process use 99-100% cpu and the system goes unusable.
anyway as i wrote in #c11 we test it with the latest upstream kernel (ok it was almost a half year now) and also test it with nvidia binary driver and got the same result. so that's why we assume it's some kind of generic xorg bug in rhel (which is already fixed in fedora).

Comment 24 Ben Skeggs 2012-05-21 11:38:37 UTC
(In reply to comment #23)
> to be more precise the load do not get high just only Xorg process use
> 99-100% cpu and the system goes unusable.
Yep, I got that part.  To clarify further, I see no process at all using a significant amount of the CPU.

> anyway as i wrote in #c11 we test it with the latest upstream kernel (ok it
> was almost a half year now) and also test it with nvidia binary driver and
> got the same result. so that's why we assume it's some kind of generic xorg
> bug in rhel (which is already fixed in fedora).
I'm attempting to narrow down the appropriate component for this.  Once I rule out the Nouveau driver as the issue I'll see if I can't narrow it further.

Comment 25 Levente Farkas 2012-05-21 11:41:39 UTC
(In reply to comment #24)
> (In reply to comment #23)
> > to be more precise the load do not get high just only Xorg process use
> > 99-100% cpu and the system goes unusable.
> Yep, I got that part.  To clarify further, I see no process at all using a
> significant amount of the CPU.

if you see my attached screenshot then you can see 99% cpu usage by the Xorg process.

Comment 26 Levente Farkas 2012-06-05 04:28:48 UTC
any progress with it? i'd be happy to test any kind of src.rpm:-)

Comment 27 Levente Farkas 2012-06-21 17:56:00 UTC
as 6.3 is released any post 6.3 fixes?

Comment 28 Levente Farkas 2012-07-19 14:16:20 UTC
any progress with it?

Comment 29 Tomas Pelka 2012-08-06 07:20:17 UTC
I can reproduce on Quadro NVS 290 [G86].

Comment 33 Levente Farkas 2012-09-13 15:22:02 UTC
as you can already reproduce it and as fedora has a working version. would it be too difficult to fix it?
thanks

Comment 34 Levente Farkas 2012-11-12 08:45:35 UTC
any progress with this bug?

Comment 35 Levente Farkas 2012-12-07 11:45:45 UTC
any progress with it? dare i ask a fix in 6.4?

Comment 36 RHEL Program Management 2012-12-14 08:49:43 UTC
This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.

Comment 37 Levente Farkas 2013-03-25 13:08:45 UTC
it's still wrong in 6.4 which has n xorg rebase. since you can easily reproduce it is there any chance for a fix? it've been fixed in fedora for almost 2 years:-(

Comment 38 Levente Farkas 2013-11-11 16:05:00 UTC
is there any progress with it? afais it's still exists in 6.5 beta.
we've to solve it asap. so i allocate one of our developer to this bug. can you help us to give us some direction? we already found that the problem is not in the:
- kernel
- libdrm
- gstreamer
- libxv
what else can be the problem?
- xorg from fedora also not working...

can you help us something?

Comment 39 Levente Farkas 2013-11-25 15:19:03 UTC
we found another may be useful info.
on fedora 15 if we run gnome and especially run gnome-shell then it's working properly, BUT if we run only icewm then the same happened like on rhel-6! s o it seems running gnome-shell cause something that makes it working properly, but if we use something else then it's not working properly.

is it help? do you have any tip after that?

Comment 41 Joseph Kachuck 2016-01-19 14:30:10 UTC
Hello,
This still an issue in RHEL 6.7 or above?

Thank You
Joe Kachuck

Comment 42 Levente Farkas 2016-01-19 19:59:51 UTC
Yes

Comment 45 Takuma Umeya 2017-05-22 13:42:00 UTC
Red Hat Enterprise Linux 6 transitioned to the Production 3 Phase on May 10, 2017.  During the Production 3 Phase, Critical impact Security Advisories (RHSAs) and selected Urgent Priority Bug Fix Advisories (RHBAs) may be released as they become available.

The official life cycle policy can be reviewed here:

	http://redhat.com/rhel/lifecycle

This issue does not appear to meet the inclusion criteria for the Production 3 Phase and will be marked as CLOSED/WONTFIX. If this remains a critical requirement, please contact your EPM to request a re-evaluation of the issue, citing a clear business justification.


Note You need to log in before you can comment on or make changes to this bug.