Bug 1661305

Summary: Firefox file browse window dies with Gdk-ERROR (BadValue) when scrollbar is needed
Product: Red Hat Enterprise Linux 6 Reporter: Brian Nelson <brinel+redhat>
Component: firefoxAssignee: Martin Stransky <stransky>
Status: CLOSED WONTFIX QA Contact: Desktop QE <desktop-qa-list>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 6.10CC: asoler, jhorak, toneata
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-05-29 12:54:36 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Backtrace from error none

Description Brian Nelson 2018-12-20 18:35:04 UTC
Description of problem:

This is a SunRay terminal server. The issue occurs only on the terminals (which are treated as local displays by the OS).

Popup app windows like the file browser and the printer picker cause the error when there are sufficient items in the list to warrant a scroll bar. When there are few items, the window works fine.


Version-Release number of selected component (if applicable):

Happens with all FF RPMs >=60 (ie GTK-3). Tested with firefox-60.4.0-1.el6.x86_64

How reproducible:
Always

Steps to Reproduce:
Open firefox
Select File->Open File
Click a location with many files (ex Home)
firefox dies

Or

Open firefox
Select File->Print
Click 'print' button on a system with many printers
firefox dies


Actual results:

(firefox:20316): Gdk-ERROR **: The program 'firefox' received an X Window System error.
This probably reflects a bug in the program.
The error was 'BadValue (integer parameter out of range for operation)'.
  (Details: serial 14517 error_code 2 request_code 151 (RENDER) minor_code 34)


Expected results:

It doesn't die.

Additional info:

The firefox-debuginfo package doesn't seem to provide (working) debug info for all the GTK3 bundled libraries.

It's dying on a call to XRenderCreateLinearGradient where nStops == 2 and stop[0] == stop[1]. The second condition is not allowed hence the error.

Comment 2 Brian Nelson 2018-12-20 19:43:15 UTC
Created attachment 1515955 [details]
Backtrace from error

Note: back trace generated using a custom-built version of gtk3-private-3.22.26-1.el6.src.rpm so that there is working debug info.

Comment 3 Martin Stransky 2018-12-21 08:00:41 UTC
That error comes from XRenderCreateLinearGradient() which is XRender X11 extension and it's called from cairo-xlib-source.c where is this comment about XRenderCreateLinearGradient:

#if 0
    /* For some weird reason the X server is sometimes getting
     * CreateGradient requests with bad length. So far I've only seen
     * XRenderCreateLinearGradient request with 4 stops sometime end up
     * with length field matching 0 stops at the server side. I've
     * looked at the libXrender code and I can't see anything that
     * could cause this behavior. However, for some reason having a
     * XSync call here seems to avoid the issue so I'll keep it here
     * until it's solved.
     */
    XSync (display->display, False);
#endif

we can try to enable this workaround for gtk3-private-3.22.26-1.el6.src.rpm package. Jan, can you look at it please?

Comment 4 Brian Nelson 2018-12-21 15:50:28 UTC
Hi Martin,

I saw that comment/workaround in the code but completely missed the fact that it was if'd out. I tried enabling that code locally and there's no change in the problem.


At the point where that Xsync is called, the data is already bad in 'gradient':
gradient->n_stops == 2
gradient->stops[0].offset == 0.5
gradient->stops[1].offset == 0.5

I've been able to trace those values back to where they're being set in gtk_css_image_linear_draw() in gtkcssimagelinear.c. The issue at that point seems to be that this code is setting pos=0.5 for both i==0 and i==1 which ends up getting set as gradient->stops[i].offset further down the line:

stop = &g_array_index (linear->stops, GtkCssImageLinearColorStop, i);
...
pos = _gtk_css_number_value_get (stop->offset, length) / length;

I don't think the problem is there, but beyond that point I'm not able to comprehend the structure of the 'linear' and 'image' variables and all the G/GTK macros involved. So I can't tell what's right or wrong to be able to trace further back.

Thanks

Comment 5 Martin Stransky 2019-01-04 12:21:38 UTC
I think you pretty much got the point - the SunRay X render implementation just fails to process the values here. You can try a different Gtk+ theme which can have a different css styling for the scrollbars without the gradient.

You can also remove and/or create custom css styles just for the scrollbars - look at https://thegnomejournal.wordpress.com/2011/03/15/styling-gtk-with-css/ how to deploy custom ~/.config/gtk-3.0/gtk.css file.

Comment 6 Brian Nelson 2019-01-11 23:17:41 UTC
Ah. I hadn't thought about that. I assumed the gradient data was being generated wrong only in the SunRay case. However I confirmed that it's being generated the same way in both cases and the native xserver accepts it while the SunRay xserver does not. This is as you said.

SunRay uses xorg-xserver 1.3. The exact behavior seems to have changed in xserver 1.4 and beyond, but I can't find a specific reason as to why. From what I can find I suspect this change as part of a 'gradient optimization' commit is the culprit:
https://gitlab.freedesktop.org/xorg/xserver/commit/0a9239ec258828ec1da6c208634a55fc4053d7da#c47fd3a8917b0f55408f656aedaa74bb89d3e196_956_924
(picture.c lines 956/924)
https://lwn.net/Articles/243902/

The render protocol spec doesn't say explicitly whether multiple stops must have unique offsets, but it seems implied to me. So I'm not quite convinced that the current xserver behavior is 'correct'. For as long as it's been in place though I don't expect that it will be changed.
https://www.x.org/releases/current/doc/renderproto/renderproto.txt

So I still feel that GTK 3 generating a gradient that has overlapping stop offsets isn't correct behavior, even though it works as-is on current xservers. I'm feeling that isn't likely to be addressed here though as it's a bit of a corner case. I'll try to look into the CSS stuff that you mentioned. I also found I can disable Xrender support on the SunRays completely, but I don't know if that will cause other problems itself. Limited testing shows it does seem to fix the firefox issue at least.

Comment 7 Martin Stransky 2019-01-12 08:58:42 UTC
(In reply to Brian Nelson from comment #6)
> So I still feel that GTK 3 generating a gradient that has overlapping stop
> offsets isn't correct behavior, even though it works as-is on current
> xservers. I'm feeling that isn't likely to be addressed here though as it's
> a bit of a corner case. 

If it's fixed in Fedora it can be backported to the RHEL - but timeframe for it is unsure. But I suspect that's not a Gtk+ but a theme issue - the scrollbar css styles are defined by Gtk themes. So it may be a bug in default Adwaita theme used by Gtk+.

> I'll try to look into the CSS stuff that you
> mentioned. I also found I can disable Xrender support on the SunRays
> completely, but I don't know if that will cause other problems itself.
> Limited testing shows it does seem to fix the firefox issue at least.

Yes, disabling Xrender should be a workaround here.

Comment 8 Brian Nelson 2019-01-18 20:54:31 UTC
(In reply to Martin Stransky from comment #7)
> But I suspect that's not a Gtk+ but a theme issue - the
> scrollbar css styles are defined by Gtk themes. So it may be a bug in
> default Adwaita theme used by Gtk+.

You are exactly right again! The 'bad' gradient data comes from the 'scrolledwindow undershoot' object/property in Adwaita. Upon further research though, it seems that (per css spec at least) the 'bad' data isn't bad at all and should be perfectly acceptable. This is perhaps one reason why xserver was changed to allow it in 1.4 as mentioned previously.

At any rate, I submitted a little patch upstream to cairo. That seems like the appropriate place for a workaround as they already have some in place for other X server versions.
https://gitlab.freedesktop.org/cairo/cairo/issues/356

I don't know if/when they'll accept it, or if/when it could get pulled into RHEL 6. I won't hold my breath on either. Since I was able to track it down to something not SunRay-specific though, perhaps it won't be a big deal to implement.

Thanks for your help!

Comment 10 Alejandro 2019-08-14 11:08:19 UTC
Hello, i have the same issue and the same environment.

the same problem appears with thunderbird

Brian, can you solve this issue?

Thanks