Bug 1761749

Summary: numactl crashes while using --touch option
Product: Red Hat Enterprise Linux 7 Reporter: Christophe Besson <cbesson>
Component: numactlAssignee: Pingfan Liu <piliu>
Status: CLOSED ERRATA QA Contact: Petr Dancak <pdancak>
Severity: high Docs Contact:
Priority: urgent    
Version: 7.7CC: aquini, bhe, dshumake, fkrska, lwoodman, pdancak, piliu, psklenar, qe-baseos-daemons, ruyang
Target Milestone: rcKeywords: Patch, Regression, Reproducer, TestCaseProvided
Target Release: 7.7Flags: fkrska: needinfo+
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: numactl-2.0.12-5.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1761754 1762680 1763628 (view as bug list) Environment:
Last Closed: 2020-03-31 20:08:17 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1653509, 1711211, 1762680, 1763628    

Description Christophe Besson 2019-10-15 09:27:13 UTC
Description of problem:
numactl segfaults while using the --touch option.
Customer noticed a regression after the upgrade of RHEL 7.7, I can't completely confirm this last point since I succeeded to make it crashing under RHEL 7.6 if I use a length > 10M.


Version-Release number of selected component (if applicable):
numactl-2.0.12-3

How reproducible:
100%, just do:
# numactl --length=100M --file /dev/shm/data1 --interleave=all --touch

Actual results:
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7bd48e3 in numa_police_memory (mem=0x7ffff11ec000, size=104857600) at libnuma.c:863
863	        ((volatile char*)mem)[i] = ((volatile char*)mem)[i];

Expected results:
Works without crashing with SIGSEGV or SIGBUS

Additional info:
The issue has been fixed upstream a few days ago. I just tried a backport and that works for me.

https://github.com/numactl/numactl/pull/80

Comment 6 Pingfan Liu 2019-10-17 08:41:51 UTC
I had back port it in local branch. But I need qa_ack+ and rhel-7.7.z+ to push the code.

Can anyone give help?

Thanks,
Pingfan

Comment 7 Pingfan Liu 2019-10-17 08:42:31 UTC
Hi,

please help to set qa_ack+

Comment 8 Pingfan Liu 2019-10-17 08:53:51 UTC
I have clone the bug against rhel-7.8 https://bugzilla.redhat.com/show_bug.cgi?id=1762680

Comment 17 Pingfan Liu 2019-11-01 01:29:25 UTC
After debugging and testing on several machine, I realizes this is not a bug.

Refer to mm/shmem.c

static int shmem_fill_super(struct super_block *sb, struct fs_context *fc)
        /*
         * Per default we only allow half of the physical ram per
         * tmpfs instance, limiting inodes to one per page of lowmem;
         * but the internal instance is left unlimited.
         */
        if (!(sb->s_flags & SB_KERNMOUNT)) {
                if (!(ctx->seen & SHMEM_SEEN_BLOCKS))
                        ctx->blocks = shmem_default_max_blocks();
                if (!(ctx->seen & SHMEM_SEEN_INODES))
                        ctx->inodes = shmem_default_max_inodes();
        } else {
                sb->s_flags |= SB_NOUSER;
        }

As the tmpfs used up half of the system ram, no memory will be allocated for it, which causes "numactl --top" failed.

Comment 18 Petr Sklenar 2019-11-01 10:04:25 UTC
This issue is exactly written in comment 0 as expected results is not crashing with 'SIGBUG'.


old:
numactl --length=10GB --file /dev/shm/data1 --interleave=all --touch
ftruncate: Invalid argument
Segmentation fault
######## ^^^^ Exactly described in https://access.redhat.com/solutions/4500811

and new says 'Bus error'

---

the interesting is that it returns more type of errors depends on size of lenght:
No ERROR:
[root@ci-vm-10-0-136-119 ~]# numactl --length=100M --file /dev/shm/data1 --interleave=all --touch

ERROR:
[root@ci-vm-10-0-136-119 ~]# numactl --length=10000M --file /dev/shm/data1 --interleave=all --touch
Bus error

bigger ERROR:
[root@ci-vm-10-0-136-119 ~]# numactl --length=10000000G --file /dev/shm/data1 --interleave=all --touch
numactl: shm mmap: Cannot allocate memory

and the biggest ERROR:
[root@ci-vm-10-0-136-119 ~]# numactl --length=1000000000000000G --file /dev/shm/data1 --interleave=all --touch
ftruncate: Invalid argument
numactl: shm mmap: Cannot allocate memory

free
              total        used        free      shared  buff/cache   available
Mem:        1882036      149744      235092      949744     1497200      623104
Swap:             0           0           0

----
This bugfix is not solving FULL customer issue and bugfix is addressing just part of the customer issue. I can't recognized if this bugfix is sufficient for customer so I filled  https://bugzilla.redhat.com/show_bug.cgi?id=1767744 which should be somehow documented both in https://access.redhat.com/solutions/4500811 and in docs of this bug

Comment 20 Pingfan Liu 2019-11-04 08:17:42 UTC
Hi Petr,

Thank you for the careful test. But pls see the comment in line.

(In reply to Petr Sklenar ⛄ from comment #18)
> This issue is exactly written in comment 0 as expected results is not
> crashing with 'SIGBUG'.
> 
> 
> old:
> numactl --length=10GB --file /dev/shm/data1 --interleave=all --touch
> ftruncate: Invalid argument
> Segmentation fault
> ######## ^^^^ Exactly described in
> https://access.redhat.com/solutions/4500811
> 
> and new says 'Bus error'
> 
Yes, this depends on kernel implementation.
> ---
> 
> the interesting is that it returns more type of errors depends on size of
> lenght:
> No ERROR:
> [root@ci-vm-10-0-136-119 ~]# numactl --length=100M --file /dev/shm/data1
> --interleave=all --touch
> 
> ERROR:
> [root@ci-vm-10-0-136-119 ~]# numactl --length=10000M --file /dev/shm/data1
> --interleave=all --touch
> Bus error
Hit the limitation enforced by tmpfs policy.
> 
> bigger ERROR:
> [root@ci-vm-10-0-136-119 ~]# numactl --length=10000000G --file
It is beyond the available user space size.
Refer to linux/Documentation/x86/x86_64/mm.txt. The user space only has 128TB space. 
                    |            |                  |         |
   0000000000000000 |    0       | 00007fffffffffff |  128 TB | user-space virtual memory, different per mm

> /dev/shm/data1 --interleave=all --touch
> numactl: shm mmap: Cannot allocate memory
> 
> and the biggest ERROR:
> [root@ci-vm-10-0-136-119 ~]# numactl --length=1000000000000000G --file
This num is greater than 2^32G, and beyond the ability of filesystem.
> /dev/shm/data1 --interleave=all --touch
> ftruncate: Invalid argument
fs tries to extent the file size beyond its ability. Apparently fail, but move on.
> numactl: shm mmap: Cannot allocate memory
Hit user space limitation, fail.
> 
> free
>               total        used        free      shared  buff/cache  
> available
> Mem:        1882036      149744      235092      949744     1497200     
> 623104
> Swap:             0           0           0
> 
> ----
> This bugfix is not solving FULL customer issue and bugfix is addressing just
> part of the customer issue. I can't recognized if this bugfix is sufficient
> for customer so I filled 
> https://bugzilla.redhat.com/show_bug.cgi?id=1767744 which should be somehow
> documented both in https://access.redhat.com/solutions/4500811 and in docs
> of this bug
Sorry, I can not agree. The test case is very extremely and encounter the limitations of different kernel components.

And the following code from numactl/shm.c for reference

void attach_shared(char *name, char *opt)
{
[...]
        if (fstat64(shmfd, &st) < 0)
                err("shm stat"); 
        if (shmlen > st.st_size) { 
                if (ftruncate64(shmfd, shmlen) < 0) {   // filesystem limitation, which does not allow size > 2^64Bytes
                        /* XXX: we could do it by hand, but it would it
                           would be impossible to apply policy then.
                           need to fix that in the kernel. */
                        perror("ftruncate");
                }
        }

        shm_pagesize = st.st_blksize;

        check_region(opt);

        /* RED-PEN For shmlen > address space may need to map in pieces.
           Left for some poor 32bit soul. */
        shmptr = mmap64(NULL, shmlen, PROT_READ | PROT_WRITE, MAP_SHARED, shmfd, shmoffset);  //user space limitation, <128TB on x86
        if (shmptr == (char*)-1)
                err("shm mmap");

}

Thanks,
Pingfan

Comment 21 Pingfan Liu 2019-11-04 08:24:54 UTC
I am not sure who maintain mm/shmem.c, adding mem maintainer Larry for comment.

Comment 22 Rafael Aquini 2019-11-05 07:04:38 UTC
(In reply to Petr Sklenar ⛄ from comment #18)
> This issue is exactly written in comment 0 as expected results is not
> crashing with 'SIGBUG'.
> 
> 
> old:
> numactl --length=10GB --file /dev/shm/data1 --interleave=all --touch
> ftruncate: Invalid argument
> Segmentation fault
> ######## ^^^^ Exactly described in
> https://access.redhat.com/solutions/4500811
> 
> and new says 'Bus error'
> 
> ---
> 
> the interesting is that it returns more type of errors depends on size of
> lenght:
> No ERROR:
> [root@ci-vm-10-0-136-119 ~]# numactl --length=100M --file /dev/shm/data1
> --interleave=all --touch
> 
> ERROR:
> [root@ci-vm-10-0-136-119 ~]# numactl --length=10000M --file /dev/shm/data1
> --interleave=all --touch
> Bus error
> 

So, it was broken before and it is still broken (IOW it has always been broken)

The fundamental problem is in numactl code which ignores that ftruncate() has failed, 
happily tries to keep on going, mmaps the tmpfs file, and tries to touch it.

It was crashing with SIGSEGV before the patch because it was attempting to write into a region that was write-protected.

It is crashing with SIGBUS after the patch (exactly) because ftruncate() has just failed file truncation and the program
ends up trying to access a portion of the buffer that do not correspond to the file.

The man page for mmap helps shedding some light over this case:

---8<---
       Use of a mapped region can result in these signals:

       SIGSEGV
              Attempted write into a region mapped as read-only.

       SIGBUS Attempted  access to a portion of the buffer that does not correspond to the file (for example, beyond
              the end of the file, including the case where another process has truncated the file).

--->8---


Here's the code snippet that should abort (from comment #20), but instead just print the error message and let go.
...
        if (shmlen > st.st_size) { 
                if (ftruncate64(shmfd, shmlen) < 0) {   // filesystem limitation, which does not allow size > 2^64Bytes
                        /* XXX: we could do it by hand, but it would it
                           would be impossible to apply policy then.
                           need to fix that in the kernel. */
                        perror("ftruncate");
                }
        }
...

From this point onwards, we are in uncharted territory, and undefined behaviour should be expected.


> bigger ERROR:
> [root@ci-vm-10-0-136-119 ~]# numactl --length=10000000G --file
> /dev/shm/data1 --interleave=all --touch
> numactl: shm mmap: Cannot allocate memory
> 
> and the biggest ERROR:
> [root@ci-vm-10-0-136-119 ~]# numactl --length=1000000000000000G --file
> /dev/shm/data1 --interleave=all --touch
> ftruncate: Invalid argument
> numactl: shm mmap: Cannot allocate memory
> 

These two are not actually errors, but mmap() telling you that the size requested 
cannot be satisfied due to system limits.

In the "bigger ERROR" case, you are asking for a map of 10 * 2^50 bytes, and in 
"the biggest ERROR" case, your request was the incredible 10^15 * 2^30 bytes, 
or approximately 1.07 * 2^80 bytes. In both cases, unfortunately, not supported
by RHEL-7 kernel that only is capable of giving out virtual address space in the range of 64 * 2^46 bytes (64 terabytes)


Apart from the poor error handling for that ftruncate case, in numactl code, I don't think there is a real bug
to be dealt with here.

These are my $ 0.02

-- Rafael

Comment 24 Petr Sklenar 2019-11-05 11:42:30 UTC
Ok, thanks all of you for the detailed explanation.

Comment 26 Pingfan Liu 2019-11-06 03:15:47 UTC
Hi Rafael,

Great thanks for your comment. But I have a slight disagreement with the cause of some cases after I revisit the kernel code.
The name ftruncate64() is misleading, and indeed here it extends the size of file, instead of truncating.
[...]


> > ERROR:
> > [root@ci-vm-10-0-136-119 ~]# numactl --length=10000M --file /dev/shm/data1
> > --interleave=all --touch
> > Bus error
Here, no error message from ftruncate64(). Ans SIGBUG should be raised by shmem_fault()->vmf_error(), due to limitation of tmpfs size.
> > 
> 
> So, it was broken before and it is still broken (IOW it has always been
> broken)
> 
> The fundamental problem is in numactl code which ignores that ftruncate()
> has failed, 
> happily tries to keep on going, mmaps the tmpfs file, and tries to touch it.
> 
> It was crashing with SIGSEGV before the patch because it was attempting to
> write into a region that was write-protected.
> 
> It is crashing with SIGBUS after the patch (exactly) because ftruncate() has
> just failed file truncation and the program
> ends up trying to access a portion of the buffer that do not correspond to
> the file.
Aha, as said ftruncate is misleading. Here it extends size of file.
> 
> The man page for mmap helps shedding some light over this case:
> 
> ---8<---
>        Use of a mapped region can result in these signals:
> 
>        SIGSEGV
>               Attempted write into a region mapped as read-only.
> 
>        SIGBUS Attempted  access to a portion of the buffer that does not
> correspond to the file (for example, beyond
>               the end of the file, including the case where another process
> has truncated the file).
> 
> --->8---
> 
> 
> Here's the code snippet that should abort (from comment #20), but instead
> just print the error message and let go.
> ...
>         if (shmlen > st.st_size) { 
>                 if (ftruncate64(shmfd, shmlen) < 0) {   // filesystem
> limitation, which does not allow size > 2^64Bytes
>                         /* XXX: we could do it by hand, but it would it
>                            would be impossible to apply policy then.
>                            need to fix that in the kernel. */
>                         perror("ftruncate");
>                 }
>         }
> ...
> 
> From this point onwards, we are in uncharted territory, and undefined
> behaviour should be expected.
I have not figured out why an opened shmfd can not meet the demanded size, then can be extended successfully.
Neither do I know if failed, the numactl can go on or not in some cases.

But here, we just leave mmap() to capture the error as "in the biggest ERROR"

> 
> 
> > bigger ERROR:
> > [root@ci-vm-10-0-136-119 ~]# numactl --length=10000000G --file
> > /dev/shm/data1 --interleave=all --touch
> > numactl: shm mmap: Cannot allocate memory
> > 
> > and the biggest ERROR:
> > [root@ci-vm-10-0-136-119 ~]# numactl --length=1000000000000000G --file
> > /dev/shm/data1 --interleave=all --touch
> > ftruncate: Invalid argument
> > numactl: shm mmap: Cannot allocate memory
> > 
> 
> These two are not actually errors, but mmap() telling you that the size
> requested 
> cannot be satisfied due to system limits.
> 
> In the "bigger ERROR" case, you are asking for a map of 10 * 2^50 bytes, and
> in 
> "the biggest ERROR" case, your request was the incredible 10^15 * 2^30
> bytes, 
> or approximately 1.07 * 2^80 bytes. In both cases, unfortunately, not
> supported
> by RHEL-7 kernel that only is capable of giving out virtual address space in
> the range of 64 * 2^46 bytes (64 terabytes)
> 
> 
> Apart from the poor error handling for that ftruncate case, in numactl code,
As explained, for now, I have no idea whether in some cases where ftruncate() fails, but numactl can success to go on.

I have submit a patch to numactl upstream, hoping maintainer can explain the reason.
But anyway, the puzzle has no relation to this bug.

Thanks,
Pingfan
> I don't think there is a real bug
> to be dealt with here.
> 
> These are my $ 0.02
> 
> -- Rafael

Comment 27 Rafael Aquini 2019-11-06 14:21:11 UTC
(In reply to Pingfan Liu from comment #26)
> Hi Rafael,
> 
> Great thanks for your comment. But I have a slight disagreement with the
> cause of some cases after I revisit the kernel code.
> The name ftruncate64() is misleading, and indeed here it extends the size of
> file, instead of truncating.
> [...]



[1] numactl/libnuma.c:
...
 863 void numa_police_memory(void *mem, size_t size)
 864 {
 865         int pagesize = numa_pagesize_int();
 866         unsigned long i;
 867         for (i = 0; i < size; i += pagesize)
 868         ((volatile char*)mem)[i] = ((volatile char*)mem)[i];
 869 }
...

# rpm -q numactl
numactl-2.0.9-7.el7.x86_64  (unpatched)

# rm -f /dev/shm/data1
# strace numactl --length=1G --file /dev/shm/data1 --interleave=all --touch
...
openat(AT_FDCWD, "/dev/shm/data1", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/dev/shm/data1", O_RDWR|O_CREAT, 0600) = 3
fstat(3, {st_mode=S_IFREG|0600, st_size=0, ...}) = 0
ftruncate(3, 1073741824)                = 0
mmap(NULL, 1073741824, PROT_READ, MAP_SHARED, 3, 0) = 0x7fe0c5174000
get_mempolicy(NULL, NULL, 0, NULL, 0)   = 0
mbind(0x7fe0c5174000, 1073741824, MPOL_INTERLEAVE, [0x000000000000001f, 000000000000000000, 000000000000000000, 000000000000000000, 000000000000000000, 000000000000000000, 000000000000000000, 000000000000000000, 000000000000000000, 000000000000000000, 000000000000000000, 000000000000000000, 000000000000000000, 000000000000000000, 000000000000000000, 000000000000000000], 1025, 0) = 0
--- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_ACCERR, si_addr=0x7fe0c5174000} ---
+++ killed by SIGSEGV (core dumped) +++
Segmentation fault (core dumped)


[1] tries to write into a RD_ONLY memory which causes a protection fault, 
    thus SIGSEGV.


# strace numactl --length=10G --file /dev/shm/data1 --interleave=all --touch
open("/dev/shm/data1", O_RDONLY)        = 3
fstat(3, {st_mode=S_IFREG|0600, st_size=1073741824, ...}) = 0
ftruncate(3, 10737418240)               = -1 EINVAL (Invalid argument)
[ ... dumping ftruncate error to stderr ...]
mmap(NULL, 10737418240, PROT_READ, MAP_SHARED, 3, 0) = 0x7f7e39cc0000
get_mempolicy(NULL, NULL, 0, NULL, 0)   = 0
mbind(0x7f7e39cc0000, 10737418240, MPOL_INTERLEAVE, [0x0000000000000003, 000000000000000000, 000000000000000000, 000000000000000000, 000000000000000000, 000000000000000000, 000000000000000000, 000000000000000000, 000000000000000000, 000000000000000000, 000000000000000000, 000000000000000000, 000000000000000000, 000000000000000000, 000000000000000000, 000000000000000000], 1025, 0) = 0
--- SIGBUS {si_signo=SIGBUS, si_code=BUS_ADRERR, si_addr=0x7f7e73389000} ---
+++ killed by SIGBUS (core dumped) +++
Bus error (core dumped)

[1] fails as it attempts to  access to a portion of the mapped buffer that 
     does not correspond to the file, since ftruncate failed, in this example, 
     the file already was there, from the previous run, with 1GB. As soon as
     [1] iterated over 1GB + 1 byte it got its SIGBUS. 


Here, one might ask why ftruncate has failed for 10G, but has not failed for
1G. The answer might lay on the shm limits (too low kernel.shmmax) or just that
there is not enough space in the tmpfs mount point -- which is the case in this
example I ran above:

# df -h | grep shm
tmpfs                  919M  919M     0 100% /dev/shm


(this is a 2GB RAM VM, thus out-of-the-box I get a 1GB tmpfs mount there.



So, as I mentioned earlier (a) the SIGBUS error was always there. The patch for this BZ fixes the RD_ONLY mapping case only.
And (b) the problem is with numactl poor error handling. If aborting is not an option, after ftruncate() returns an error code,
then numactl's attach_shared() should (re)fstat(shmfd) and force update the global shmlen to keep things going consistently (this
is bad pattern, btw -- best thing to do in such condition is just abort upon the error, IMO). 



-- Rafael

Comment 28 Rafael Aquini 2019-11-06 14:24:49 UTC
(In reply to Pingfan Liu from comment #26)

> Here, no error message from ftruncate64(). Ans SIGBUG should be raised by
> shmem_fault()->vmf_error(), due to limitation of tmpfs size.


I have not looked into this patched case, but I'm assuming it's a similar issue
as described on the 2nd example run of my previous comment.

-- Rafael

Comment 29 Petr Dancak 2019-11-07 10:06:16 UTC
QA:

Can I get a status about this bug ? Are there any other problems that need to be fixed ?
For now status changed to ON_QA.

Comment 30 Pingfan Liu 2019-11-08 01:43:28 UTC
(In reply to Petr Dancak 🦁 from comment #29)
> QA:
> 
> Can I get a status about this bug ? Are there any other problems that need
> to be fixed ?
> For now status changed to ON_QA.

I discussed with Rafael about the root cause about SIGBUS, and how to handle ftruncate more gracefully.
But it should not be considered as a bug as comment#22.

Thanks,
Pingfan

Comment 31 Pingfan Liu 2019-11-08 01:55:01 UTC
(In reply to Rafael Aquini from comment #27)
> (In reply to Pingfan Liu from comment #26)
[...]
> 
> So, as I mentioned earlier (a) the SIGBUS error was always there. The patch
> for this BZ fixes the RD_ONLY mapping case only.
> And (b) the problem is with numactl poor error handling. If aborting is not
> an option, after ftruncate() returns an error code,
> then numactl's attach_shared() should (re)fstat(shmfd) and force update the
> global shmlen to keep things going consistently (this
> is bad pattern, btw -- best thing to do in such condition is just abort upon
> the error, IMO). 
OK, thank you for more detailed explain and good advice. I will try to sync this with upstream. And hope maintainer can accept this advice.

Regards,
Pingfan

Comment 32 Rafael Aquini 2019-11-08 05:18:17 UTC
(In reply to Pingfan Liu from comment #26)
> Hi Rafael,
> 
> Great thanks for your comment. But I have a slight disagreement with the
> cause of some cases after I revisit the kernel code.
> The name ftruncate64() is misleading, and indeed here it extends the size of
> file, instead of truncating.
> [...]
> 
> 
> > > ERROR:
> > > [root@ci-vm-10-0-136-119 ~]# numactl --length=10000M --file /dev/shm/data1
> > > --interleave=all --touch
> > > Bus error
> Here, no error message from ftruncate64(). Ans SIGBUG should be raised by
> shmem_fault()->vmf_error(), due to limitation of tmpfs size.

This was the last case to look at, and I so did it.

It's possible to have a file truncated to a size bigger than the capacity of the
file system that contains it (sparse file). The after-match of this adventure with
code construction similar to numactl's attach_shared() routine (fstat() to query the file size + mmap)
is that one might end up attempting to write pass the physical end of the file, 
which in turn will cause a SIGBUS killing, even with a successful ftruncate() call. 

Although numactl, and any other program to that extent, should work defensively
and check for such conditions before assuming the map is properly backed by the
underlying file, it's common practice to not write code defensively. This is
not a BUG, per se, but it surely is an annoyance.


This very simple C program demonstrates the rationale above:

[root@localhost ~]# cat tst.c 
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <errno.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <sys/resource.h>

#define handle_error_exit(msg) \
		do { perror(msg); exit(EXIT_FAILURE); } while (0)

static inline void write_bytes(unsigned long length, void *addr)
{
	unsigned long i;
	for (i = 0; i < length; i++)
		*((unsigned char *)(addr + i)) = 0xff;
}

int main(int argc, char *argv[])
{
	struct stat sb;
	size_t len;
	void *addr;
	int fd;

	if (argc < 2)
		return 1;

	fd = open (argv[1], O_RDWR);
	if (fd == -1)
		handle_error_exit("open");

	if (fstat (fd, &sb) == -1)
		handle_error_exit("fstat");

	len = sb.st_size;

	addr = mmap(NULL, len, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
	if (addr == MAP_FAILED)
		handle_error_exit("mmap");

	write_bytes(len, addr);

	close(fd);

	return 0;
}


In a system that only has a 1GB tmpfs mountpoint for /dev/shm, I'll truncate a 2GB file and run that tst program:

[root@localhost ~]# df -h | grep shm
tmpfs                  1.0G     0  1.0G   0% /dev/shm

[root@localhost ~]# truncate -s 2G /dev/shm/data
[root@localhost ~]# ls -lh /dev/shm/data 
-rw-r--r--. 1 root root 2.0G Nov  8 00:09 /dev/shm/data

[root@localhost ~]# make tst
cc     tst.c   -o tst
[root@localhost ~]# strace ./tst /dev/shm/data 
...
open("/dev/shm/data", O_RDWR)           = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=2147483648, ...}) = 0
mmap(NULL, 2147483648, PROT_READ|PROT_WRITE, MAP_SHARED, 3, 0) = 0x7fa894cc7000
--- SIGBUS {si_signo=SIGBUS, si_code=BUS_ADRERR, si_addr=0x7fa8d4cc7000} ---
+++ killed by SIGBUS (core dumped) +++


The signal reports the address where the fault occurred: si_addr=0x7fa8d4cc7000
subtracting the map base address from it, we'll get the amount of bytes in the map: 

0x7fa8d4cc7000 - 0x7fa894cc7000 = 0x40000000   (which is exactely 1GB, the hard limit of that tmpfs mountpoint)

$ echo "ibase=16;40000000" | bc
1073741824


I hope this whole discussion is somewhat helpful to clarify the all points raised on previous comments.

Cheers,
-- Rafael

Comment 33 Pingfan Liu 2019-11-08 05:25:33 UTC
*** Bug 1767744 has been marked as a duplicate of this bug. ***

Comment 35 Pingfan Liu 2019-11-12 06:12:41 UTC
(In reply to Rafael Aquini from comment #32)
> (In reply to Pingfan Liu from comment #26)
> > Hi Rafael,
> > 
> > Great thanks for your comment. But I have a slight disagreement with the
> > cause of some cases after I revisit the kernel code.
> > The name ftruncate64() is misleading, and indeed here it extends the size of
> > file, instead of truncating.
> > [...]
> > 
> > 
> > > > ERROR:
> > > > [root@ci-vm-10-0-136-119 ~]# numactl --length=10000M --file /dev/shm/data1
> > > > --interleave=all --touch
> > > > Bus error
> > Here, no error message from ftruncate64(). Ans SIGBUG should be raised by
> > shmem_fault()->vmf_error(), due to limitation of tmpfs size.
> 
> This was the last case to look at, and I so did it.
> 
> It's possible to have a file truncated to a size bigger than the capacity of
> the
> file system that contains it (sparse file). The after-match of this
> adventure with
> code construction similar to numactl's attach_shared() routine (fstat() to
> query the file size + mmap)
> is that one might end up attempting to write pass the physical end of the
> file, 
> which in turn will cause a SIGBUS killing, even with a successful
> ftruncate() call. 
> 
> Although numactl, and any other program to that extent, should work
> defensively
> and check for such conditions before assuming the map is properly backed by
> the
> underlying file, it's common practice to not write code defensively. This is
> not a BUG, per se, but it surely is an annoyance.

I followed your suggestion, and tried a patch merged https://github.com/numactl/numactl/commit/3648aa5bf6e29bf618195c615ff2ced4bb995327

Thanks,
Pingfan

Comment 40 errata-xmlrpc 2020-03-31 20:08:17 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:1163