Bug 1886816

Summary: [abrt] sssd-common: sss_mmap_cache_pw_store(): sssd_nss killed by SIGBUS
Product: [Fedora] Fedora Reporter: Kamil Páral <kparal>
Component: sssdAssignee: Sumit Bose <sbose>
Status: CLOSED EOL QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 33CC: abokovoy, atikhono, jhrozek, kheine7, lslebodn, mzidek, pbrezina, sbose, ssorce, sssd-maintainers, tmihinto
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Unspecified   
URL: https://retrace.fedoraproject.org/faf/reports/bthash/41f41e27be638cfa97c4b8371a77be7203c319bb
Whiteboard: abrt_hash:cd13c552f85949e67c1f8b9066dc4393eee0ce85;VARIANT_ID=workstation;
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-11-30 16:08:52 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
File: backtrace
none
File: core_backtrace
none
File: cpuinfo
none
File: dso_list
none
File: environ
none
File: exploitable
none
File: limits
none
File: maps
none
File: mountinfo
none
File: open_fds
none
File: proc_pid_status
none
File: var_log_messages
none
coredump none

Description Kamil Páral 2020-10-09 13:00:39 UTC
Version-Release number of selected component:
sssd-common-2.3.1-4.fc33

Additional info:
reporter:       libreport-2.14.0
backtrace_rating: 4
cgroup:         0::/system.slice/sssd.service
cmdline:        /usr/libexec/sssd/sssd_nss --uid 0 --gid 0 --logger=files
crash_function: sss_mmap_cache_pw_store
executable:     /usr/libexec/sssd/sssd_nss
journald_cursor: s=b2e7c0583c004ae8bd317a4440255076;i=2ee4a7;b=7c261537394f4d169971a59cd1761dfe;m=55626889;t=5b13c2149d0cf;x=8fed0b78d48ecd6c
kernel:         5.8.14-300.fc33.x86_64
rootdir:        /
runlevel:       N 5
type:           CCpp
uid:            0

Truncated backtrace:
Thread no. 1 (10 frames)
 #0 sss_mmap_cache_pw_store at src/responder/nss/nsssrv_mmap_cache.c:769
 #1 nss_protocol_fill_pwent at src/responder/nss/nss_protocol_pwent.c:308
 #2 nss_protocol_reply at src/responder/nss/nss_protocol.c:91
 #3 nss_getby_done at src/responder/nss/nss_cmd.c:626
 #4 tevent_common_invoke_immediate_handler at ../../tevent_immediate.c:166
 #5 tevent_common_loop_immediate at ../../tevent_immediate.c:203
 #6 epoll_event_loop_once at ../../tevent_epoll.c:917
 #7 std_event_loop_once at ../../tevent_standard.c:110
 #8 _tevent_loop_once at ../../tevent.c:772
 #9 tevent_common_loop_wait at ../../tevent.c:895

Comment 1 Kamil Páral 2020-10-09 13:00:44 UTC
Created attachment 1720250 [details]
File: backtrace

Comment 2 Kamil Páral 2020-10-09 13:00:46 UTC
Created attachment 1720251 [details]
File: core_backtrace

Comment 3 Kamil Páral 2020-10-09 13:00:48 UTC
Created attachment 1720252 [details]
File: cpuinfo

Comment 4 Kamil Páral 2020-10-09 13:00:49 UTC
Created attachment 1720253 [details]
File: dso_list

Comment 5 Kamil Páral 2020-10-09 13:00:51 UTC
Created attachment 1720254 [details]
File: environ

Comment 6 Kamil Páral 2020-10-09 13:00:52 UTC
Created attachment 1720255 [details]
File: exploitable

Comment 7 Kamil Páral 2020-10-09 13:00:53 UTC
Created attachment 1720256 [details]
File: limits

Comment 8 Kamil Páral 2020-10-09 13:00:56 UTC
Created attachment 1720257 [details]
File: maps

Comment 9 Kamil Páral 2020-10-09 13:00:58 UTC
Created attachment 1720258 [details]
File: mountinfo

Comment 10 Kamil Páral 2020-10-09 13:01:00 UTC
Created attachment 1720259 [details]
File: open_fds

Comment 11 Kamil Páral 2020-10-09 13:01:01 UTC
Created attachment 1720260 [details]
File: proc_pid_status

Comment 12 Kamil Páral 2020-10-09 13:01:03 UTC
Created attachment 1720261 [details]
File: var_log_messages

Comment 13 Alexey Tikhonov 2020-10-09 14:04:15 UTC
Hi Kamil,

do you have a coredump?

Comment 14 Alexey Tikhonov 2020-10-09 14:11:14 UTC
Program terminated with signal SIGBUS, Bus error.
#0  0x000055b728aceb15 in sss_mmap_cache_pw_store (_mcc=_mcc@entry=0x55b72a00dd28, name=0x55b72a030380, pw=pw@entry=0x7fff947313c0, uid=uid@entry=173, gid=gid@entry=173, gecos=gecos@entry=0x7fff947313d0, homedir=0x7fff947313e0, shell=0x7fff947313f0) at src/responder/nss/nsssrv_mmap_cache.c:769

This ^^ is `MC_RAISE_BARRIER(rec);`:
```
#define MC_RAISE_BARRIER(m) do { \
    m->b2 = MC_NEXT_BARRIER(m->b1); \
    __sync_synchronize(); \
} while (0)
```

In "open_fds" there is:
```
17:/var/lib/sss/mc/passwd (deleted)
pos:	0
flags:	02100000
mnt_id:	65

18:/var/lib/sss/mc/passwd
pos:	0
flags:	0100002
mnt_id:	65
lock:	1: POSIX  ADVISORY  WRITE 710 00:20:429251 0 0
```
  --  seems SIGBUS was triggered by attempt to access memory mmap-ed to deleted file.

Is there any chance this file ("/var/lib/sss/mc/passwd") was deleted by some outer process?

Comment 15 Kamil Páral 2020-10-09 14:12:17 UTC
Created attachment 1720291 [details]
coredump

Yes, here it is. I have no idea how it happened and how to reproduce that.

Comment 16 Kamil Páral 2020-10-09 14:14:20 UTC
> Is there any chance this file ("/var/lib/sss/mc/passwd") was deleted by some outer process?

I don't know, this is a fairly fresh F33 VM, and I wasn't doing anything particularly interesting, just playing with abrt. The file is currently there:

$ ll /var/lib/sss/mc/passwd
-rw-rw-r--. 1 root root 9253600 Oct  9 16:12 /var/lib/sss/mc/passwd

Comment 17 Alexey Tikhonov 2021-06-21 07:42:57 UTC
*** Bug 1974235 has been marked as a duplicate of this bug. ***

Comment 18 Ben Cotton 2021-11-04 13:44:56 UTC
This message is a reminder that Fedora 33 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora 33 on 2021-11-30.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
Fedora 'version' of '33'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 33 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 19 Ben Cotton 2021-11-04 14:14:26 UTC
This message is a reminder that Fedora 33 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora 33 on 2021-11-30.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
Fedora 'version' of '33'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 33 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 20 Ben Cotton 2021-11-04 15:12:02 UTC
This message is a reminder that Fedora 33 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora 33 on 2021-11-30.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
Fedora 'version' of '33'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 33 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 21 Sumit Bose 2021-11-23 17:34:43 UTC
Hi,

I run couple of tests and was able to reproduce similar crashes with SIGBUS only on files which were shortened with e.g. the truncate command. I'm not sure if it is worth to try to protect the memory mapped files against this. I'm also not sure how. Calling fstat() before every memory access will slow things down considerably. Maybe sssd_nss can set an inotify watch to detect such a change but there would still be a chance the the truncation happens while sssd_nss is working in the files before handling inotify.

The most promising protection I found is F_SEAL_SHRINK, see man fcntl for details, but this requires an anonymous file in tmpfs, see man memfd_create for details.

bye,
Sumit

Comment 22 Simo Sorce 2021-11-23 19:31:25 UTC
The only good way to handle this is to change how we open the memory mapped files.
Instead of opening them directly from the client, we need to introduce a new command call over the pipe that will ask the parent to open them for us, and then pass the fd over the socket to the client.

This method has a few advantages:
- clients will not be allowed to directly open mmapped files which means the only thing to bind mount over (for stuff like container access) is the sockets.
- the server can simply create new files when needed and just *mark* the old files as obsoleted before simply renaming them or even unlink() them on the spot.
- the server can move the cache files at will. for Example it can decide to create them on tmpfs for speed on machines where populating the cache at every reboot is ok, while keeping them on long lived storage for machines (like laptops) that are frequently rebooting in disconnected mode and need to preserve the caches.

Once this is done, the only case of encountering a truncated() file in the client is generally a server bug. The server should never change the size of the cache files, it should always mark and rename/unlink an old cache and create a new one instead.

Comment 23 Ben Cotton 2021-11-30 16:08:52 UTC
Fedora 33 changed to end-of-life (EOL) status on 2021-11-30. Fedora 33 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.