RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1910092 - WebKitGTK should use posix_spawn() to launch subprocesses (requires changes in GSubprocess)
Summary: WebKitGTK should use posix_spawn() to launch subprocesses (requires changes i...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: glib2
Version: 9.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Michael Catanzaro
QA Contact: Tomas Pelka
URL:
Whiteboard:
: 1970469 (view as bug list)
Depends On:
Blocks: 1970469
TreeView+ depends on / blocked
 
Reported: 2020-12-22 16:22 UTC by Simeon Andreev
Modified: 2024-04-04 17:15 UTC (History)
13 users (show)

Fixed In Version: glib2-2.68.4-4.el9
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1970469 (view as bug list)
Environment:
Last Closed: 2022-05-17 15:51:31 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Test build (5.97 MB, application/x-xz)
2021-11-11 23:21 UTC, Michael Catanzaro
no flags Details
Test build for el9 (3.01 MB, application/x-xz)
2021-12-07 16:27 UTC, Michael Catanzaro
no flags Details


Links
System ID Private Priority Status Summary Last Updated
GNOME Gitlab GNOME glib merge_requests 1968 0 None None None 2021-02-25 18:34:09 UTC
Red Hat Knowledge Base (Solution) 7063248 0 None None None 2024-04-04 17:15:51 UTC
Red Hat Product Errata RHBA-2022:3931 0 None None None 2022-05-17 15:51:38 UTC
WebKit Project 220090 0 None None None 2020-12-22 18:45:37 UTC
WebKit Project 222049 0 None None None 2021-02-17 18:28:22 UTC

Description Simeon Andreev 2020-12-22 16:22:19 UTC
Description of problem:

WebKit exits Eclipse JVM if process forking fails due to OOM.

See https://bugs.eclipse.org/bugs/show_bug.cgi?id=569878.

Version-Release number of selected component (if applicable):

We have: webkitgtk4-2.22.7-2.el7.x86_64

How reproducible:

Run the following SWT snippet, with high max memory heap, about 80-90% of the workstation total memory (we tested with -Xmx100g -Xms90g for a 128 GB workstation):

public class TestJep2425 {

	public static final List<double[]> arrays = new ArrayList<>();
	
	public static void main(String[] args) {
		Display display = new Display();
		Shell shell = new Shell(display);
		shell.setSize(600, 400);
		shell.setLayout(new FillLayout());
		Composite composite = new Composite(shell, SWT.NONE);
		composite.setLayout(new FillLayout(SWT.VERTICAL));

		fillHeap();

		Browser browser = new Browser(composite, SWT.BORDER);
		browser.setText("<!DOCTYPE html><html><head></head><body>hello world</body></html>");
		shell.open();

		while (!shell.isDisposed()) {
			if (!display.readAndDispatch())
				display.sleep();
		}
		display.dispose();
	}
	
	private static void fillHeap() {
		try {
			int iterations = 10_000;
			for (int iteration = 0; iteration < iterations; ++iteration) {
				int n = 100 * 1024;
				int count = 512;
				for (int c = 0; c < count; ++c) {
					double[] array = new double[n];
					Arrays.fill(array, Math.PI);
					arrays.add(array);
				}
				if (iteration % 10 == 0) {
					long heapSize = Runtime.getRuntime().totalMemory();
					long heapMaxSize = Runtime.getRuntime().maxMemory();
					long heapFreeSize = Runtime.getRuntime().freeMemory();
					System.out.println("heapSize = " + heapSize);
					System.out.println("heapMaxSize = " + heapMaxSize);
					System.out.println("heapFreeSize = " + heapFreeSize);
					int freePercent = (int)(((double) heapFreeSize / (double) heapSize) * 100.0);
					System.out.println("" + freePercent + "% free");
					if (freePercent < 5) {
						return;
					}
				}
			}
		} catch (OutOfMemoryError e) {
			e.printStackTrace(System.out);
		}
	}
}



Actual results:

Eclipse JVM writes a core dump and exits. With the snippet above, we see:

** (SWT:104388): ERROR **: 16:28:01.541: Unable to fork a new child process: Failed to fork (Cannot allocate memory)

In our product not even that was seen (or we missed it in the GTK+ error spam). Backtrace from the core dump in our product crash:

#0  0x00007fff6cad8b11 in _g_log_abort () from /lib64/libglib-2.0.so.0
Missing separate debuginfos, use: debuginfo-install java-11-openjdk-headless-debug-11.0.8.10-0.el7.x86_64
(gdb) where
#0  0x00007fff6cad8b11 in _g_log_abort () at /lib64/libglib-2.0.so.0
#1  0x00007fff6cad9e32 in g_logv () at /lib64/libglib-2.0.so.0
#2  0x00007fff6cad9f9f in g_log () at /lib64/libglib-2.0.so.0
#3  0x00007fff2da4aa75 in WebKit::ProcessLauncher::launchProcess() () at /lib64/libwebkit2gtk-4.0.so.37
#4  0x00007fff2d8d8d10 in WebKit::AuxiliaryProcessProxy::connect() () at /lib64/libwebkit2gtk-4.0.so.37
#5  0x00007fff2d9383c8 in WebKit::WebProcessProxy::create(WebKit::WebProcessPool&, WebKit::WebsiteDataStore*, WebKit::WebProcessProxy::IsPrewarmed, WebKit::WebProcessProxy::ShouldLaunchProcess) () at /lib64/libwebkit2gtk-4.0.so.37
#6  0x00007fff2d956a0e in WebKit::WebProcessPool::createNewWebProcess(WebKit::WebsiteDataStore*, WebKit::WebProcessProxy::IsPrewarmed) () at /lib64/libwebkit2gtk-4.0.so.37
#7  0x00007fff2d957025 in WebKit::WebProcessPool::processForRegistrableDomain(WebKit::WebsiteDataStore&, WebKit::WebPageProxy*, WebCore::RegistrableDomain const&) ()
    at /lib64/libwebkit2gtk-4.0.so.37
#8  0x00007fff2d957171 in WebKit::WebPageProxy::launchProcess(WebCore::RegistrableDomain const&, WebKit::WebPageProxy::ProcessLaunchReason) ()
    at /lib64/libwebkit2gtk-4.0.so.37
#9  0x00007fff2d95a833 in WebKit::WebPageProxy::loadData(IPC::DataReference const&, WTF::String const&, WTF::String const&, WTF::String const&, API::Object*, WebCore::ShouldOpenExternalURLsPolicy) () at /lib64/libwebkit2gtk-4.0.so.37
#10 0x00007fff2d9f56be in webkit_web_view_load_html () at /lib64/libwebkit2gtk-4.0.so.37
#11 0x00007fff323f558a in  ()
#12 0x0000000700000008 in  ()
#13 0x00007fff354fddc0 in  ()
#14 0x00007ffff455eff0 in  ()
#15 0x0000000000000000 in  ()

Expected results:

WebKit does not kill the Eclipse JVM. Possibly by using a smarter forking mechanism that doesn't need as much memory as Eclipse is consuming.


Additional info:

See https://bugs.eclipse.org/bugs/show_bug.cgi?id=569878#c0 for more info.

Comment 2 Michael Catanzaro 2020-12-22 16:32:26 UTC
This is as designed. If you don't have enough memory to launch a new web process, the UI process is going to crash. WebKit just is not designed to cope with a missing web process. Sorry.

If you don't have enough memory to launch a web process, you're probably going to crash soon anyway, because all GLib memory allocation functions will crash on failure.

Comment 3 RHEL Program Management 2020-12-22 16:32:37 UTC
Development Management has reviewed and declined this request. You may appeal this decision by reopening this request.

Comment 4 Simeon Andreev 2020-12-22 16:35:34 UTC
There *is* enough memory to comply with whatever webkit wants to do. There is *not* enough memory to clone the entire Eclipse JVM heap.

Comment 5 Andrey Loskutov 2020-12-22 17:04:56 UTC
(In reply to Michael Catanzaro from comment #2)
> This is as designed. If you don't have enough memory to launch a new web
> process, the UI process is going to crash. WebKit just is not designed to
> cope with a missing web process. Sorry.
> 
> If you don't have enough memory to launch a web process, you're probably
> going to crash soon anyway, because all GLib memory allocation functions
> will crash on failure.

There is enough memory, the problem is the fork() in webkit! 

See similar discussion in scons http://scons.1086193.n5.nabble.com/fork-exec-vs-posix-spawn-td13812.html, this bug in Java (fixed) https://bugs.java.com/bugdatabase/view_bug.do?bug_id=5049299 and for example this https://unix.stackexchange.com/questions/206823/when-a-process-forks-is-its-virtual-or-resident-memory-copied. 

The point #1 is, by using fork() child process duplicates memory from parent (even if not needed) and here webkit duplicates entire JVM heap just to fork some web process that may be needs just a fraction of JVM process size. So webkit should try to avoid this by using posix_spawn() instead of fork(). 

Point #2 is: webkit embedded in other process should not terminate *both* processes.

So please reopen this bug.

Comment 6 Michael Catanzaro 2020-12-22 17:45:31 UTC
Hm, switching to posix_spawn() is a much lighter ask. That's something we could try to do, although it's too late for RHEL 7. Do you want me to reassign this to RHEL 8?

Anyway, let's see, it seems GSubprocess can use posix_spawn() since https://gitlab.gnome.org/GNOME/glib/-/commit/61f54591acdfe69315cef6d1aa6d3bf1ff763082, which is already in RHEL 8. Now, from the comment in that commit, posix_spawn() is only used subject to the following conditions:

 * 1. %G_SPAWN_DO_NOT_REAP_CHILD is set
 * 2. %G_SPAWN_LEAVE_DESCRIPTORS_OPEN is set
 * 3. %G_SPAWN_SEARCH_PATH_FROM_ENVP is not set
 * 4. @working_directory is %NULL
 * 5. @child_setup is %NULL
 * 6. The program is of a recognised binary format, or has a shebang. Otherwise, GLib will have to execute the program through the shell, which is not done using the optimized codepath.

WebKit's ProcessLauncherGLib.cpp, BubblewrapLauncher.cpp, and FlatpakLauncher.cpp are all currently using g_subprocess_launcher_spawnv(), using a GSubprocessLauncher created with G_SUBPROCESS_FLAGS_INHERIT_FDS. GSubprocess unconditionally sets G_SPAWN_DO_NOT_REAP_CHILD, so condition 1 is met. Condition 2 is met because G_SUBPROCESS_FLAGS_INHERIT_FDS corresponds to G_SPAWN_LEAVE_DESCRIPTORS_OPEN. I believe condition 3 is also met, because GSubprocess only sets G_SPAWN_SEARCH_PATH_FROM_ENVP if self->launcher->path_from_envp is TRUE, and that is never set anywhere is GLib; it seems to be dead code, oops. Then I believe condition 4 is also met, because WebKit does not call g_subprocess_launcher_set_cwd(). Condition 6 is probably also met.

Condition 5 is not met, though, because WebKit sets this child setup function:

static void childSetupFunction(gpointer userData)
{
    int socket = GPOINTER_TO_INT(userData);
    close(socket);
}

OK, it's just closing a socket. And that should be easy to replace by just using CLOEXEC. So in theory, a fix might be pretty simple.

If that's really all it takes, then I'll send a patch upstream, and I'm willing to reopen this against RHEL 8 just to make sure we don't miss it in the next WebKit update. (But if it turns out to be more complex than that, I'll insist it would need to be tracked upstream instead; even if it's a valid bug, we don't leave RHEL bugs open unless we intend to work on them in the near future.)

Comment 7 Michael Catanzaro 2020-12-22 17:47:01 UTC
Actually it looks like that socket already uses CLOEXEC, so the child setup function is probably redundant. I bet that close() call is always failing, and WebKit just doesn't notice because there's no error checking. Let me check to be sure....

Comment 8 Michael Catanzaro 2020-12-22 18:45:33 UTC
(In reply to Michael Catanzaro from comment #7)
> I bet that close() call is always failing,
> and WebKit just doesn't notice because there's no error checking. Let me
> check to be sure....

Nah, it succeeds because the child setup function is called before exec(), but it's really not needed. I checked for leaking fds, and I suspect there actually *is* a fd leak somewhere in WebKit, because I can see each new web process gets spawned using one more fd than the one before. But that's going to be very difficult to track down, and this particular socket does not seem to be leaking, so it's unrelated. I've created an upstream WebKit bug for the WebKit side of this: https://bugs.webkit.org/show_bug.cgi?id=220090.

Sadly, it turns out removing WebKit's child setup function is not enough. GSubprocess unconditionally sets its own child setup function, and that child setup call's WebKit's child setup. So GSubprocess itself would have to stop using a child setup function. I'm not sure if we can do that or not. I would start by creating an upstream issue with GLib explaining the problem and see where it goes. It *might* be possible to change when G_SUBPROCESS_FLAGS_INHERIT_FDS is used and none of the various GSubprocessLauncher functions that play with stdin/stdout/stderr have been used. WebKit doesn't use any of that. The right place to discuss this would be the GLib issue tracker; it's not the sort of issue that will be solved here, sorry.

Comment 9 Simeon Andreev 2020-12-23 07:21:12 UTC
Thanks for looking into this!

Comment 10 Simeon Andreev 2020-12-23 09:08:11 UTC
Looking at WebProcessPool::prewarmProcess(), is there some (more or less) guaranteed way to ensure WebKit starts its processes "early on", while the Eclipse heap is "small"? Unfortunately I don't know how many processes WebKit needs and when they are started; is the process starting fully internal to WebKit or can some exposed API also result in spawning a process (that could not be started "early on")?

Comment 11 Michael Catanzaro 2020-12-23 14:14:30 UTC
(In reply to Simeon Andreev from comment #10)
> Looking at WebProcessPool::prewarmProcess(), is there some (more or less)
> guaranteed way to ensure WebKit starts its processes "early on", while the
> Eclipse heap is "small"? 

I bet Eclipse doesn't get prewarmed processes at all because process-swap-on-cross-site-navigation-enabled is disabled by default. This setting is an important security feature, but we can't enable it for GTK 3 apps because process swapping will break applications that don't expect it. So it won't be enabled by default until GTK 4. (And Eclipse might not be able to use GTK 4, because GTK's foreign drawing API has been removed, so it's probably time for Eclipse to come up with a migration plan.)

I think it starts one process whenever all other processes are in use. E.g. create a WebKitWebContext, it should prewarm one web process. Load a page in a WebKitWebView, it should use that process and prewarm an additional process so it's ready for the next load. Load a page from a different website in the same view, it should swap to the prewarmed process and return the original process to the cache. Create a second web view, it should prewarm a third process. That is, there should always be one process more than is currently in use. And processes will never be reused by a different view. So if you turn that setting on, you should be able to create a bunch of extra WebKitWebViews, destroy them, and hopefully their prewarmed processes will stick around? You would need to test to verify that actually works.

This would work well if Eclipse only allows creating a limited number of views. But if it allows creating an arbitrary number of views in tabs -- e.g. if it uses a web view to display documentation and allows creating an arbitrary number of documentation tabs, like I would expect for an IDE -- then eventually the user will open one more tab than you have cached, and you'll wind up crashing when fork() fails, as before. So I think it can only delay the pain.

It might make more sense to focus on how to solve this in GSubprocessLauncher. 

> Unfortunately I don't know how many processes
> WebKit needs and when they are started; is the process starting fully
> internal to WebKit or can some exposed API also result in spawning a process
> (that could not be started "early on")?

This is an implementation detail that's too risky to expose in the API. (Every public API we have ever added related to process management has wound up broken when the internal process model changes to account for new security needs.)

Comment 16 Michael Catanzaro 2021-01-04 16:51:51 UTC
A couple notes on the upstream bug:

> It would be good to know why only 1 process is spawned with SWT (so that we know if we can rely on this for a workaround in our product). If I run a GTK+/webkit snippet with 2 browser widgets, I see 2 webkit processes (as expected, every view seems to be supported by its own process by design).

Expecting only one web process to be spawned might work for now, but it is not future-proof. You should assume multiple web processes will be spawned when you create your WebKitWebContext. There is also network process, storage process, and in the future, GPU process. In the future, process prewarming and web process cache will also be enabled.

> Also we noticed the webkit process are children of the systemd process (PID 1), instead of being children of the Eclipse process. Any idea why this is the case? Andrey suggests due to dbus; an explanation would be nice.

D-Bus is not involved in launching WebKit subprocesses. When %G_SPAWN_DO_NOT_REAP_CHILD is *not* set, then gspawn is going to do two forks(). The first child process is just an intermediate child that forks() the real subprocess and then exits immediately, causing the second child to be reparented by pid 1. The parent process then immediately reaps the intermediate child. This way, you don't have to manually remember to reap the "real" subprocess, because reaping will be handled by pid 1. However, WebKit is using GSubprocess, which always sets G_SPAWN_DO_NOT_REAP_CHILD. So if the sandbox is disabled (it is disabled by default, and not available in RHEL anyway), then there *should* be no intermediate process. That is, the subprocess *should* be parented by the UI process. And indeed, that is the behavior I see on Fedora 33: all subprocesses are direct descendants of the UI process. It's possible that behavior has changed since RHEL 7, but I don't see any suspicious commits in gspawn.c. (With bubblewrap sandbox or flatpak sandbox enabled, things are a little more complicated, but those cases do not apply to RHEL 7.) Anyway, it should have no effect on this bug.

Comment 19 Michael Catanzaro 2021-01-04 20:01:35 UTC
(In reply to Michael Catanzaro from comment #16)
> > Also we noticed the webkit process are children of the systemd process (PID 1), instead of being children of the Eclipse process. Any idea why this is the case? Andrey suggests due to dbus; an explanation would be nice.

Maybe you tested this on RHEL 7.8? There you have WebKitGTK 2.22, which used g_spawn_async() without G_SPAWN_DO_NOT_REAP_CHILD, so that would be expected there. It doesn't use GSubprocess until WebKitGTK 2.24. (RHEL 7.9 has 2.28.)

Comment 20 Andrey Loskutov 2021-01-04 20:21:48 UTC
(In reply to Michael Catanzaro from comment #19)
> (In reply to Michael Catanzaro from comment #16)
> > > Also we noticed the webkit process are children of the systemd process (PID 1), instead of being children of the Eclipse process. Any idea why this is the case? Andrey suggests due to dbus; an explanation would be nice.
> 
> Maybe you tested this on RHEL 7.8? There you have WebKitGTK 2.22, which used
> g_spawn_async() without G_SPAWN_DO_NOT_REAP_CHILD, so that would be expected
> there. It doesn't use GSubprocess until WebKitGTK 2.24. (RHEL 7.9 has 2.28.)

We have webkitgtk4-2.22.7-2.el7.x86_64 and run on RHEL 7.4. We can understand if the patch would be provided for 7.9+ / webkit 2.28+.

Comment 23 Michael Catanzaro 2021-01-05 20:41:44 UTC
Hi, before we go further, we want to check if you have overcommit disabled. Please check with:

$ sudo sysctl -a | grep vm.overcommit

Reference: https://www.kernel.org/doc/Documentation/vm/overcommit-accounting

Would be good to confirm that you have those knobs set to the default values. In particular, if you have vm.overcommit_memory = 2, try setting it back to 0 to see if that helps.

Comment 24 Owen Taylor 2021-01-05 20:42:41 UTC
I believe that it's possible to implement the no-child-setup-function path in GSubprocess completely in terms of posix_spawn() using  posix_spawn_file_actions - what it needs to do is:

 duplicate file descriptors
 close file descriptors
 unset the cloexec flag

The last isn't obvious, but there's a comment in the glibc source code:

              /* Austin Group issue #411 requires adddup2 action with source
                 and destination being equal to remove close-on-exec flag.  */

So one possible path here would be to add a special-cased path for GSubprocess that uses posix_spawn() directly rather than g_spawn_async(). That's quite a bit of work and would definitely need to be done upstream rather than as a RHEL patch - so more of a long-term thing.

There would also need to be evaluation of the performance impact in typical cases - is it worth the extra code and the chance of a regression? Using posix_spawn might save time by not having to copy the page table structures for the app and set up copy-on-write. [If posix_spawn was worthwhile for g_spawn_async(), then it would be nice if it worked for GSubprocess which is the more modern interface.]

Comment 25 Michael Catanzaro 2021-01-05 20:43:50 UTC
To clarify, we'd like to see all three overcommit settings, e.g.:

vm.overcommit_kbytes = 0
vm.overcommit_memory = 0
vm.overcommit_ratio = 50

Comment 26 Simeon Andreev 2021-01-06 08:18:26 UTC
socbm275:/home/sandreev$  sudo sysctl -a | grep vm.overcommit
sysctl: reading key "net.ipv6.conf.all.stable_secret"
sysctl: reading key "net.ipv6.conf.br-instruments.stable_secret"
sysctl: reading key "net.ipv6.conf.default.stable_secret"
sysctl: reading key "net.ipv6.conf.eno1.stable_secret"
sysctl: reading key "net.ipv6.conf.enp8s0.stable_secret"
sysctl: reading key "net.ipv6.conf.lo.stable_secret"
vm.overcommit_kbytes = 0
vm.overcommit_memory = 0
vm.overcommit_ratio = 50

Comment 27 Owen Taylor 2021-01-06 17:30:36 UTC
> socbm275:/home/sandreev$  sudo sysctl -a | grep vm.overcommit
> vm.overcommit_kbytes = 0
> vm.overcommit_memory = 0
> vm.overcommit_ratio = 50

Interesting - I would not generally expect fork() to fail in these condition just because the forking processes heap size is large. Perhaps one of the following is the case:

 * The kernel's guess of maximum possible process size is not working properly (recent kernels simplify this a bunch - https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1977041.html)

 * The Eclipse process has a lot of writable-but-zero mappings (https://www.kernel.org/doc/Documentation/vm/overcommit-accounting mentions "classic example is code using sparse arrays and just relying on the virtual memory consisting almost entirely of zero pages.") which aren't actually taking up memory, but potentially could. You could check that by looking at VmSize in /proc/<pid>/status (VIRT column in top) - if that's bigger than the amount of physical memory+swap, then this is the issue.

Whether either of these is the problem or something else, setting overcommit_memory to 1 would likely resolve this problem. I wouldn't expect it to have a significant effect on overall system stability compared to the default of 0, since with the default of 0 only *obviously* unsatisfiable requests are supposed to be rejected - it does nothing if memory is overallocated in smaller chunks. (That is, with overcommit_memory set to 0, if the system has 32GB of total ram and swap, and you try to allocate 64GB, then the allocation will be rejected, but if you allocate 16GB 4 times, that's considered fine.)

Comment 28 Michael Catanzaro 2021-01-08 21:03:46 UTC
(In reply to Owen Taylor from comment #24)
> I believe that it's possible to implement the no-child-setup-function path
> in GSubprocess completely in terms of posix_spawn() using 
> posix_spawn_file_actions

Anyway, this seems like the path forward. Good discovery....

Comment 30 Simeon Andreev 2021-01-11 10:27:15 UTC
We will evaluate overcommit strategy 1, though this can take a while. The strategy does seem to solve our problem in Eclipse, but Eclipse is only a part of the application/environment; we'll need time to validate the strategy works for the rest of the environment. We were unable to "pre-start" WebKit, as already hinted at in comment 11 and comment 16.

Judging from the discussion so far, a fix will not be coming in "near future". While its not great that in case of a failed fork(), a WebKit view will not work, our problem is the actual exit() call (due to logging a fatal with glib). Is it possible to provide an option to not exit the entire application if the fork() fails? Especially if this is a trivial patch for WebKit, controlled in the most trivial way (e.g. with an ENV variable). We realize this might not be an option (e.g. the code in question would require massive restructuring), but if it *is* possible, we would like such a patch as a short-term workaround (overcommit strategy 1 might be viable for us, but we don't know yet).

Comment 31 Michael Catanzaro 2021-01-11 15:05:19 UTC
I don't think that's going to be practical: WebKit has no mechanism for handling unexpected failure to launch a web process. Let's look into posix_spawn_file_actions....

Comment 34 Michael Catanzaro 2021-01-20 00:07:58 UTC
WebKit change has landed in https://trac.webkit.org/changeset/271610/webkit.

Still need to look into the required glib changes, which will be significantly more complex.

Comment 35 Andrey Loskutov 2021-01-25 20:07:20 UTC
@Mickael, Alexander: FYI. This issue affects all Eclipse based products on RHEL, the crash may happen at any time if Eclipse process memory size is higher than available system memory.

Comment 45 Michael Catanzaro 2021-02-10 14:24:06 UTC
Hi Simeon, is it OK if I reassign this bug to RHEL 8? The odds of it being fixed in RHEL 7 are quite low since it's in Maintenance Phase 2 now and updates are only permitted for the most serious bugs and security issues.

I still need to look into how hard it would be to make GSubprocess use posix_spawn(). Also, I found a few other places in WebKit that are using the child setup function, which fortunately I think can be fixed similarly. Specifically, the sandboxed process launcher (which is not used in RHEL 7/8) is still using child setups, so I need to change those too.

Comment 46 Simeon Andreev 2021-02-10 14:35:06 UTC
(In reply to Michael Catanzaro from comment #45)
> Hi Simeon, is it OK if I reassign this bug to RHEL 8? The odds of it being
> fixed in RHEL 7 are quite low since it's in Maintenance Phase 2 now and
> updates are only permitted for the most serious bugs and security issues.
> 
> I still need to look into how hard it would be to make GSubprocess use
> posix_spawn(). Also, I found a few other places in WebKit that are using the
> child setup function, which fortunately I think can be fixed similarly.
> Specifically, the sandboxed process launcher (which is not used in RHEL 7/8)
> is still using child setups, so I need to change those too.

I'll ask our sysadmins & stakeholders, though so far the only planned update is to RHEL 7.9 (as far as the product in question is concerned). What would a target RHEL 8 mean? Will WebKit builds with the fix be available for RHEL 7, despite the fix target being RHEL 8? I assume we can build WebKit on our own, for RHEL 7, but I would like to avoid this if possible.

Comment 47 Michael Catanzaro 2021-02-10 15:25:49 UTC
We're not planning to do any updates for this issue in RHEL 7 since it's Maintenance Support 2 phase and is restricted to Important or Critical security issues and Urgent-priority bugfixes. (It would be pretty difficult to argue this is Urgent, since we've already found an easy workaround.) In contrast, RHEL 8 is in Full Support phase and doesn't have these restrictions.

For RHEL 7, I think the best workaround is the overcommit knob you already have access to, since that's an easy solution that doesn't require changing any packages. If you don't want to use that, then you would need to build your own glib2 package with the upstream patch that teaches glib to use posix_spawn(), and you'd need to pin WebKit in yum to lock it to webkitgtk4-2.22.7-2.el7, the version you're already using, since that's the last RHEL 7 version that doesn't yet use GSubprocess. Those two changes should be sufficient to avoid this issue. But the overcommit knob is a lot easier.

Comment 48 Michael Catanzaro 2021-02-10 16:33:07 UTC
(In reply to Michael Catanzaro from comment #47)
> Those two changes should be sufficient to avoid this issue.

Well actually not, there is a warning comment in gspawn.c warning that glibc's posix_spawn was buggy prior to glibc 2.24, so it falls back to fork() unless you have glibc 2.24. RHEL 7 has glibc 2.17.

Comment 49 Michael Catanzaro 2021-02-24 20:22:51 UTC
Another progress update: a major refactor landed in https://gitlab.gnome.org/GNOME/glib/-/merge_requests/1958 and obsoleted my attempts to solve this in GSubprocess. But it's good news actually, because the refactored code takes a smarter approach than I did: I had been trying to make GSubprocess bypass gspawn and use posix_spawn directly, which in retrospect was not the best approach. The refactored code instead moves all of GSubprocessLauncher's fd-reassignment logic into gspawn, which worked out nicely.

Anyway, after this refactor, GSubprocess is no longer using a child setup function, and we no longer need any changes in GSubprocess. The only remaining upstream work is here in gspawn.c:

  /* FIXME: Handle @source_fds and @target_fds in do_posix_spawn() using the
   * file actions API. */
  if (!intermediate_child && working_directory == NULL && !close_descriptors &&
      !search_path_from_envp && child_setup == NULL && n_fds == 0)

The n_fds == 0 check is now the only condition stopping us from using posix_spawn, so this issue should be *much* easier to solve now: it's just a matter of fixing the FIXME and removing that check. I want to say this shouldn't be hard. I will try and find out if there are unexpected problems.

Comment 50 Michael Catanzaro 2021-02-24 20:39:00 UTC
(In reply to Michael Catanzaro from comment #49)
> Another progress update: a major refactor landed in
> https://gitlab.gnome.org/GNOME/glib/-/merge_requests/1958

Oops, I meant to link to https://gitlab.gnome.org/GNOME/glib/-/merge_requests/1690.

Comment 51 Michael Catanzaro 2021-02-25 19:23:26 UTC
(In reply to Michael Catanzaro from comment #49)
> The n_fds == 0 check is now the only condition stopping us from using
> posix_spawn, so this issue should be *much* easier to solve now: it's just a
> matter of fixing the FIXME and removing that check. I want to say this
> shouldn't be hard. I will try and find out if there are unexpected problems.

I have a draft implementation in https://gitlab.gnome.org/GNOME/glib/-/merge_requests/1968 that only needs a little more work and causes WebKit's subprocess launching to use posix_spawn(). We need, in total:

 * https://gitlab.gnome.org/GNOME/glib/-/merge_requests/1690
 * https://gitlab.gnome.org/GNOME/glib/-/merge_requests/1958 (regression fix for the previous MR)
 * https://gitlab.gnome.org/GNOME/glib/-/merge_requests/1968 (not quite ready yet)
 * https://trac.webkit.org/changeset/271610/webkit

The glib commits have fairly high potential for unexpected regressions that could affect a huge number of applications. We could fix this in RHEL 8 if requested -- so far I only see a request to fix RHEL 7! -- but if so, we should land the changes close to the beginning of a minor release cycle (e.g. 8.5 development is beginning soon), so that we have a full minor release cycle for testing to see if anything unexpectedly breaks. If there are further unexpected problems, I expect we should discover them fairly quickly in Fedora, so I think it will be safe enough for RHEL 8 as long as we leave plenty of time for testing.

If we need to fix RHEL 7 (unlikely to be approved), then we also need changes in glibc, but I'm not sure what. I would need to investigate what changes in glibc are necessary for glib's use of posix_spawn to be safe. We could avoid the need for the larger refactorings in gspawn/GSubprocess by hacking WebKit to go back to using gspawn instead of GSubprocess, which is OK for a hack downstream patch. But justifying a glibc update in ultra-stable RHEL 7 would be really tough. My opinion is this would be clearly outside the scope of the RHEL 7 Maintenance Support phase.

Finally, I also landed https://trac.webkit.org/changeset/273087/webkit to future-proof this as far as possible, but that is only needed if Eclipse opts-in to the web process sandbox, and then only in RHEL 9+ because the sandbox is not built in RHEL 7 or RHEL 8.

Comment 52 Andrey Loskutov 2021-02-25 19:53:00 UTC
(In reply to Michael Catanzaro from comment #51)
> I have a draft implementation in
> https://gitlab.gnome.org/GNOME/glib/-/merge_requests/1968 that only needs a
> little more work and causes WebKit's subprocess launching to use
> posix_spawn(). We need, in total:
> 
>  * https://gitlab.gnome.org/GNOME/glib/-/merge_requests/1690
>  * https://gitlab.gnome.org/GNOME/glib/-/merge_requests/1958 (regression fix
> for the previous MR)
>  * https://gitlab.gnome.org/GNOME/glib/-/merge_requests/1968 (not quite
> ready yet)
>  * https://trac.webkit.org/changeset/271610/webkit

So that would switch to posix_spawn *and* avoid system exit on error?
 
> The glib commits have fairly high potential for unexpected regressions that
> could affect a huge number of applications. We could fix this in RHEL 8 if
> requested -- so far I only see a request to fix RHEL 7! 

As of today we still evaluate (since months) if we switch from 7.x to 8 or jump to 9, but if that is possible without much effort, I would also request RHEL 8 fix.
And of course that must be in RHEL 9.x, because even if we switch to 8.x, we will sooner or later land on 9.x.

> -- but if so, we
> should land the changes close to the beginning of a minor release cycle
> (e.g. 8.5 development is beginning soon), so that we have a full minor
> release cycle for testing to see if anything unexpectedly breaks. If there
> are further unexpected problems, I expect we should discover them fairly
> quickly in Fedora, so I think it will be safe enough for RHEL 8 as long as
> we leave plenty of time for testing.

Sounds reasonable. Crossing fingers :-)

Comment 53 Michael Catanzaro 2021-02-25 20:00:14 UTC
(In reply to Andrey Loskutov from comment #52)
> So that would switch to posix_spawn *and* avoid system exit on error?

No, WebKit will still exit if the subprocess fails to launch. Changing that is impractical.

I have verified that my changes cause WebKit to use posix_spawn rather than fork/exec to launch subprocesses. Haven't tested more than that.

Comment 61 Michael Catanzaro 2021-10-28 17:59:51 UTC
Hi, update on this issue: the required WebKit changes landed upstream a while back (comment #34), and will be included in RHEL 8.5. The required GLib changes have not landed yet. I've just updated my upstream merge request https://gitlab.gnome.org/GNOME/glib/-/merge_requests/1968, and it is now pending upstream review. Assuming that review goes well -- knock on wood -- I'm tentatively expecting the GLib changes to land in 8.6 and 9.0. (This is not a promise from Red Hat, it's just my current plan.)

I will test myself to ensure that WebKit really does take the posix_spawn() codepath in combination with these changes (it did last time I checked with an earlier version of my GLib changes, but I need to reverify with the latest version), but you might also want to test to see if that's really enough to fully resolve the Eclipse crashes you were experiencing. Would you like me to prepare a scratch build of GLib that you can test in the meantime? (Note that using scratch builds is entirely unsupported.) The problem here is I remember you were using RHEL 7 last we checked. I was thinking I could still prepare an scratch build you could use for testing, even though we won't actually release a fix for RHEL 7, but then I remembered that the GLib changes depend on a newer glibc than is available in RHEL 7. So I don't think there's any point in doing a RHEL 7 scratch build. I'm not sure if you'd be interested in a scratch build for RHEL 8 or not, but if so, let me know.

Comment 63 Simeon Andreev 2021-10-29 09:22:00 UTC
I can probably compile glibc locally, to try the change out. Though I assume the same applies also for WebKit, it would not hurt if you provide a build that we can use.

I believe we can also validate on RHEL 8, we did have a few test machines and reproduction only requires an Eclipse SWT snippet. If its too much work to provide a WebKit build for RHEL 7 for us to test with, we'll manage testing the fix without the build.

Comment 64 Michael Catanzaro 2021-10-29 17:09:28 UTC
Ack. I don't know how hard it will be to provide a RHEL 7 scratch build of WebKit. I'm guessing that upgrading to newer WebKit would be too much effort for just a scratch build, but maybe I can backport just the ProcessLauncherGLib changes to the current package version. (All unsupported, of course, but this ought to be useful for testing purposes.)

Comment 65 Andrey Loskutov 2021-10-29 17:12:52 UTC
Please don't waste your time. We won't update glib on our 7.x RHEL, the only interesting option is now RHEL 8+.

Comment 66 Michael Catanzaro 2021-10-29 17:39:34 UTC
Do you want a scratch build for RHEL 8?

Comment 67 Andrey Loskutov 2021-10-29 17:59:34 UTC
(In reply to Michael Catanzaro from comment #66)
> Do you want a scratch build for RHEL 8?

If it is not require much effort on your side, yes, please.

Comment 68 Michael Catanzaro 2021-11-02 21:06:43 UTC
(In reply to Andrey Loskutov from comment #67)
> If it is not require much effort on your side, yes, please.

After backporting the required changes to RHEL 8 (required to solve this bug anyway), a scratch build is little additional effort. The backporting is a bit tougher than expected, but I should finish it soon.

Comment 69 Michael Catanzaro 2021-11-11 23:21:46 UTC
Created attachment 1841295 [details]
Test build

OK, here is a test build for you to try. Only glib2-2.56.4-157.el8.gspawn1.x86_64.rpm is required: the subpackages are provided only in case you have corresponding packages installed already and need to replace them. Debug packages omitted to stay under Bugzilla's attachment size limit.

I've verified that it causes RHEL 8.5's WebKitGTK to launch subprocesses with posix_spawn() by running 'G_MESSAGES_DEBUG=GLib gnome-control-center' and observing the debug messages it prints, which will tell you whether it used posix_spawn() and if not, why. (Note this easy method to see whether it's working only works in this RHEL 8 build because I decided not to backport the upstream patch that switched to using systemtap tracepoints instead of logging. In RHEL 9, you'd need to use something like sysprof to check the tracepoints.)

Reminders:

 * Not expected to work with RHEL 8.4's WebKitGTK, really depends on 8.5
 * Test package is not supported by Red Hat and is for evaluation purposes only

Comment 70 Michael Catanzaro 2021-11-11 23:24:22 UTC
(In reply to Michael Catanzaro from comment #69)
> I've verified that it causes RHEL 8.5's WebKitGTK to launch subprocesses
> with posix_spawn() by running 'G_MESSAGES_DEBUG=GLib gnome-control-center'

Oh, I forgot the rest: open gnome-control-center, switch to Online Accounts panel, click on Google. It's just an easy way to test using only apps provided by the stock Workstation install. Anything that launches a web view would of course work just as well.

Comment 71 Simeon Andreev 2021-11-12 07:19:24 UTC
Thanks, I'll try to validate the fix next.

Comment 72 Michael Catanzaro 2021-12-06 13:53:03 UTC
(In reply to Simeon Andreev from comment #71)
> Thanks, I'll try to validate the fix next.

Any good news?

Comment 73 Simeon Andreev 2021-12-06 14:14:12 UTC
(In reply to Michael Catanzaro from comment #72)
> (In reply to Simeon Andreev from comment #71)
> > Thanks, I'll try to validate the fix next.
> 
> Any good news?

Sorry, I still don't have a RHEL 8 workstation available. I'll try to check on RHEL 9, I believe we have a few setup (though I will need to be granted access first). I'll also have to check (once my account has access to a RHEL 9 workstation) what glib2 version is installed, though I assume its the required 2.56.4-157 one or higher.

Comment 74 Michael Catanzaro 2021-12-06 14:25:14 UTC
I'll prepare a scratch build that you can use to test for RHEL 9.

BTW if you're going to jump straight from 7 -> 9, then my preference would be to not update RHEL 8 at all, since the patchset is very intrusive.

Comment 75 Simeon Andreev 2021-12-06 14:30:06 UTC
The most recent plans that I know of are to skip RHEL 8. But just to be certain, Andrey, Advantest will skip RHEL 8 and update straight to RHEL 9?

Comment 76 Andrey Loskutov 2021-12-06 14:36:54 UTC
(In reply to Simeon Andreev from comment #75)
> The most recent plans that I know of are to skip RHEL 8. But just to be
> certain, Andrey, Advantest will skip RHEL 8 and update straight to RHEL 9?

I assume we will not update to 8.x anymore, there was some plans before but I believe 9+ is set now.
Evaluation for 9 will start early next year, and even if that should fail, we will most likely NOT select 8 but wait for example for 9.1 or something like that.

Comment 77 Michael Catanzaro 2021-12-06 16:28:04 UTC
OK, in that case.

Comment 79 Michael Catanzaro 2021-12-06 16:28:35 UTC
*** Bug 1970469 has been marked as a duplicate of this bug. ***

Comment 80 Michael Catanzaro 2021-12-06 16:37:32 UTC
(In reply to Michael Catanzaro from comment #77)
> OK, in that case.

Oops. In that case, I'll reassign this bug to RHEL 9 and close bug #1970469, so we can continue in the bug with all the comments.

Comment 83 Simeon Andreev 2021-12-07 12:26:04 UTC
I'm not sure we have the required RPMs to test the fix. I find the following versions:

[sandreev@socvm342 ~]$ rpm -qa | grep glib2
glib2-2.68.4-1.el9.x86_64
pulseaudio-libs-glib2-15.0-2.el9.x86_64
glib2-devel-2.68.4-1.el9.x86_64

[sandreev@socvm342 ~]$ rpm -qa | grep webkit
webkit2gtk3-jsc-2.32.3-2.el9.x86_64
webkit2gtk3-2.32.3-2.el9.x86_64

This is enough? Unfortunately I have no idea what the difference is between webkit2gtk3 and webkitgtk4.

Comment 84 Michael Catanzaro 2021-12-07 16:13:52 UTC
(In reply to Simeon Andreev from comment #83)
> I'm not sure we have the required RPMs to test the fix.

Sorry, it seems the scratch build is hidden behind our VPN. I had tested an incognito window to make sure the page was public, but that was not a good enough test. Let me attach the glib2 build.

(In reply to Simeon Andreev from comment #83)
> Unfortunately I have no idea what the difference is between
> webkit2gtk3 and webkitgtk4.

Absolutely no difference, webkitgtk4 was the old RHEL 7 name and webkit2gtk3 is the RHEL 8+ name for the same package.

Comment 85 Michael Catanzaro 2021-12-07 16:27:36 UTC
Created attachment 1845090 [details]
Test build for el9

Comment 86 Simeon Andreev 2021-12-08 07:21:12 UTC
(In reply to Michael Catanzaro from comment #85)
> Created attachment 1845090 [details]
> Test build for el9

Could you check the versions I listed in comment 83? I tried testing with those versions and didn't run into the WebKit crash. Unfortunately the RHEL 9 VM I'm using has 4 GB of RAM configured; I'm not confident in the results.

If the fix is not contained in the listed versions, and only in the RPMs you attached, then at least I'll be sure we need another VM to test the fix.

Comment 87 Michael Catanzaro 2021-12-08 16:51:26 UTC
(In reply to Simeon Andreev from comment #86)
> Could you check the versions I listed in comment 83? I tried testing with
> those versions and didn't run into the WebKit crash. Unfortunately the RHEL
> 9 VM I'm using has 4 GB of RAM configured; I'm not confident in the results.

You've got new enough WebKit, but it's still going to use fork()/exec() because we need the GLib changes that are not present yet. Only the scratch builds have the required GLib changes to use posix_spawn().

Remember you need to disable overcommit to force it to fail to allocate sufficient address space. Perhaps you forgot to do that?

Comment 88 Simeon Andreev 2021-12-09 07:12:40 UTC
(In reply to Michael Catanzaro from comment #87)
> Remember you need to disable overcommit to force it to fail to allocate
> sufficient address space. Perhaps you forgot to do that?

Overcommit is disabled. I've requested another RHEL 9 VM with more RAM.

Comment 89 Michael Catanzaro 2021-12-09 20:58:19 UTC
Wouldn't more RAM make it *harder* to reproduce this issue?

Comment 90 Simeon Andreev 2021-12-10 08:11:52 UTC
(In reply to Michael Catanzaro from comment #89)
> Wouldn't more RAM make it *harder* to reproduce this issue?

We need enough so that duplicating the Eclipse memory consumption is not possible. With only 4 GB (the RHEL 9 VM I'm currently using) this is difficult; already 2 GB are consumed by processes not under my control.

Comment 92 Michael Catanzaro 2022-01-19 14:37:47 UTC
Please note there are some deadlines coming up to include the fix in 9.0, so if you're able to test it soon, that would be ideal.

Comment 93 Simeon Andreev 2022-01-19 15:00:31 UTC
Still waiting on Advantest IT for a RHEL workstation with sufficient RAM, hopefully one is available soon... I'll comment here when I get an update.

Comment 94 Simeon Andreev 2022-01-21 14:19:28 UTC
(In reply to Michael Catanzaro from comment #92)
> Please note there are some deadlines coming up to include the fix in 9.0, so
> if you're able to test it soon, that would be ideal.

Alright, I got a RHEL 9 workstation with 64 GB RAM and was able to validate the fix. The WebKit browser view opens and can be interacted with, there is no JVM crash anymore.

Thank you very much!


Details:

vm.overcommit is 0 (good for the test):

...$ sysctl -a 2>/dev/null | grep vm.overcommit_memory
vm.overcommit_memory = 0


WebKit version:

webkit2gtk3-2.34.2-1.el9.x86_64


glib version:

glib2-2.68.4-3.el9.x86_64


Arguments for the snippet: -Xmx60g -Xms50g

Snippet:

public class TestJep2425 {

	public static final List<char[]> arrays = new ArrayList<>();
	
	public static void main(String[] args) {
		Display display = new Display();
		Shell shell = new Shell(display);
		shell.setSize(600, 400);
		shell.setLayout(new FillLayout());
		Composite composite = new Composite(shell, SWT.NONE);
		composite.setLayout(new FillLayout(SWT.VERTICAL));

		fillHeap();

		Browser browser = new Browser(composite, SWT.BORDER);
		browser.setText("<!DOCTYPE html><html><head></head><body>hello world</body></html>");
		shell.open();

		while (!shell.isDisposed()) {
			if (!display.readAndDispatch())
				display.sleep();
		}
		display.dispose();
	}
	
	private static void fillHeap() {
		try {
			int iterations = 10_000;
			for (int iteration = 0; iteration < iterations; ++iteration) {
				int n = 100 * 1024;
				int count = 512;
				for (int c = 0; c < count; ++c) {
						char[] array = new char[n];
						for (int i = 0; i < n; ++i) {
							array[i] = (char) (i % 100);
						}
						arrays.add(array);
				}
				if (iteration % 10 == 0) {
					long heapSize = Runtime.getRuntime().totalMemory();
					long heapMaxSize = Runtime.getRuntime().maxMemory();
					long heapFreeSize = Runtime.getRuntime().freeMemory();
					System.out.println("heapSize = " + heapSize);
					System.out.println("heapMaxSize = " + heapMaxSize);
					System.out.println("heapFreeSize = " + heapFreeSize);
					int freePercent = (int)(((double) heapFreeSize / (double) heapSize) * 100.0);
					System.out.println("" + freePercent + "% free");
					if (freePercent < 5) {
						return;
					}
				}
			}
		} catch (OutOfMemoryError e) {
			e.printStackTrace(System.out);
		}
	}
}

WebKit browser shows correct contents, no JVM crash.

Comment 95 Michael Catanzaro 2022-01-21 14:59:33 UTC
And just to confirm: your test is broken if you downgrade to the standard RHEL 9 glib package rather than the scratch build, so the scratch build really fixed your issue?

If so, I'll land this for 9.0.

Comment 96 Simeon Andreev 2022-01-21 15:11:44 UTC
(In reply to Michael Catanzaro from comment #95)
> And just to confirm: your test is broken if you downgrade to the standard
> RHEL 9 glib package rather than the scratch build, so the scratch build
> really fixed your issue?
> 
> If so, I'll land this for 9.0.

What is the standard RHEL 9 glib version? I've not installed the scratch build (https://bugzilla.redhat.com/attachment.cgi?id=1845090).

The glib version that I see is:


glib2-2.68.4-3.el9.x86_64

Comment 97 Michael Catanzaro 2022-01-21 15:16:01 UTC
Ah, see, that's why it's good to check. You do not have the fix at all, then! Your test case must not be enough to cause fork() to fail.

The build with the fix for this issue is comment #85, glib2-2.68.4-0testgspawn.el9.

Comment 98 Simeon Andreev 2022-01-21 15:37:06 UTC
(In reply to Michael Catanzaro from comment #97)
> Ah, see, that's why it's good to check. You do not have the fix at all,
> then! Your test case must not be enough to cause fork() to fail.
> 
> The build with the fix for this issue is comment #85,
> glib2-2.68.4-0testgspawn.el9.

Same snippet crashes on RHEL 7.9 though (vm.overcommit set to 0):

** (SWT:46175): ERROR **: 16:29:35.101: Unable to fork a new child process: Failed to fork (Cannot allocate memory)

How do I check what kind of fork() WebKit is using during the snippet?

Comment 99 Michael Catanzaro 2022-01-21 16:04:39 UTC
(In reply to Simeon Andreev from comment #98)
> Same snippet crashes on RHEL 7.9 though (vm.overcommit set to 0):

Heh, then I did a whole lot of work that wound up not being needed after all. :P Well, I fixed some things along the way, so it was still worth it....

> ** (SWT:46175): ERROR **: 16:29:35.101: Unable to fork a new child process:
> Failed to fork (Cannot allocate memory)
> 
> How do I check what kind of fork() WebKit is using during the snippet?

You can use a systemtap probe, though I don't know how to do that. (I think they can be inspected using sysprof?) That said, there's not much point because it's definitely doing fork(). There is no way the code could possibly take the posix_spawn() codepath unless you use the scratch build. It's just not hooked up.

There's a lot of time between 7 and 9... maybe something changed somewhere else (memory allocator?).

Comment 100 Simeon Andreev 2022-01-21 16:10:57 UTC
OK.

I'll anyway check the attached build on RHEL 9, to be sure it also works. Assuming the results are good, are you proceeding with the change?

Comment 101 Michael Catanzaro 2022-01-21 16:18:41 UTC
I suppose so.

At least, two of the commits "gspawn: fix hangs when duping child_err_report_fd" and "gspawn: fix fd remapping conflation issue" are wanted regardless, because those fix regressions introduced between 8 and 9. And if those are going to land, might as well land the rest. It's probably not really needed if you can't reproduce your trouble with Eclipse on 9, but who knows, maybe they'll save you from trouble in the future.

Comment 104 Simeon Andreev 2022-01-24 07:34:30 UTC
I tested on RHEL 9 with the attached RPM, also no JVM crash. The WebKit browser view is shown as expected.

Comment 109 errata-xmlrpc 2022-05-17 15:51:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (new packages: glib2), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:3931


Note You need to log in before you can comment on or make changes to this bug.