Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1820095

Summary:	kernel: execve synthesizes SIGSEGV when failing to load main program segments
Product:	Red Hat Enterprise Linux 8	Reporter:	Florian Weimer <fweimer>
Component:	kernel	Assignee:	core-kernel-bot <core-kernel-mgr>
kernel sub component:	Process management	QA Contact:	Kernel General QE <kernel-general-qe>
Status:	CLOSED CANTFIX	Docs Contact:
Severity:	unspecified
Priority:	unspecified	CC:	efuller, jbastian, onestero
Version:	8.2	Flags:	pm-rhel: mirror+
Target Milestone:	rc
Target Release:	8.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2020-04-02 10:28:28 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Florian Weimer 2020-04-02 09:02:57 UTC

Description of problem:

When execve cannot map LOAD segments in the main executable, it creates a new process that immediately dies with SIGSEGV, without running any code. This is very confusing to users, as can be seen from bug 1817106.

Version-Release number of selected component (if applicable):

kernel-4.18.0-193.el8.x86_64

How reproducible:  Compile this program and run it (this may need adjustments for very large systems which can successfully allocate 512 GiB of memory):

char large_data[1LL << 39];

int
main (void)
{
}

Steps to Reproduce:
1. gcc reproducer.c
2. ./a.out

Actual results:

The shell reports: Segmentation fault

Expected results:

The execve system call should fail with ENOMEM, resulting in a more understandable error message from the shell.

Comment 1 Oleg Nesterov 2020-04-02 10:28:28 UTC

(In reply to Florian Weimer from comment #0)
>
> char large_data[1LL << 39];
> 
> int
> main (void)
> {
> }
> 
> Steps to Reproduce:
> 1. gcc reproducer.c
> 2. ./a.out
> 
> Actual results:
> 
> The shell reports: Segmentation fault

Yes,
 
> Expected results:
> 
> The execve system call should fail with ENOMEM, resulting in a more
> understandable error message from the shell.

but this is impossible :/

The old VM was already destroyed, the process simply cannot return to
userspace.

Comment 2 Florian Weimer 2020-04-02 11:28:54 UTC

Yuck, I didn't consider this. Thanks.

Would it be possible to communicate somehow the cause of the SIGSEGV in the siginfo_t value provided by waitid?

Comment 3 Oleg Nesterov 2020-04-02 17:18:26 UTC

(In reply to Florian Weimer from comment #2)
> 
> Would it be possible to communicate somehow the cause of the SIGSEGV in the
> siginfo_t value provided by waitid?

Unfortunately no. The only additional info you can get from this siginfo_t
(compared to wait(&status)) is pid/uid.

Comment 4 Eirik Fuller 2020-04-02 18:39:09 UTC

The ENOMEM from execve is visible in strace output immediately before strace reports the SIGSEGV, so while the shell can't easily report it, it can be seen with suitable tool usage.

The SIGSEGV comes from search_binary_handler (the force_sigsegv(SIGSEGV, current) line), called by exec_binprm, called by __do_execve_file, in fs/exec.c; I have not yet found documentation for this behavior, unless https://lkml.org/lkml/2013/2/15/572 qualifies as such.

It appears there was already precedent for killing processes in response to execve failures, but the patch in that message (or something very similar like https://github.com/torvalds/linux/commit/19d860a140beac48a1377f179e693abe86a9dac9) seems mostly responsible for the present behavior.

Comment 5 Oleg Nesterov 2020-04-03 10:36:04 UTC

(In reply to Eirik Fuller from comment #4)
>
> The ENOMEM from execve is visible in strace output immediately before strace
> reports the SIGSEGV, so while the shell can't easily report it, it can be
> seen with suitable tool usage.

Yes,
 
> It appears there was already precedent for killing processes in response to
> execve failures, but the patch in that message (or something very similar
> like
> https://github.com/torvalds/linux/commit/
> 19d860a140beac48a1377f179e693abe86a9dac9) seems mostly responsible for the
> present behavior.

No really, this patch cleanups the "suicide" logic we always had.

Once again, the execing task simply can't return to usermode after de_thread/
exec_mmap, sys_execve() does return the error code, strace can report it, but
this task can't.

IOW, consider

        sys_execve(...);
        printf("exec failed");

if execve fails after original VM is already destroyed, the code above no longer
exist, its page is already freed.