Bug 1423417 - RGW daemons can not dump core because PR_SET_DUMPABLE is set to 0 after setuid call
Summary: RGW daemons can not dump core because PR_SET_DUMPABLE is set to 0 after setui...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: RGW
Version: 2.2
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: rc
: 2.2
Assignee: Brad Hubbard
QA Contact: Vidushi Mishra
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-02-17 09:40 UTC by Vidushi Mishra
Modified: 2017-07-30 15:46 UTC (History)
11 users (show)

Fixed In Version: RHEL: ceph-10.2.5-32.el7cp Ubuntu: ceph_10.2.5-24redhat1
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-03-14 15:49:39 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Ceph Project Bug Tracker 19089 0 None None None 2017-02-27 06:21:15 UTC
Red Hat Bugzilla 1389159 0 high CLOSED RHCS 2 daemons can not dump core because PR_SET_DUMPABLE is set to 0 after setuid call 2021-02-22 00:41:40 UTC
Red Hat Bugzilla 1427116 0 unspecified CLOSED RHCS 2 core dumps not getting generated as a ceph user on Ubuntu likely due clrearing away of the PR_SET_DUMPABLE flag ... 2021-02-22 00:41:40 UTC
Red Hat Product Errata RHBA-2017:0514 0 normal SHIPPED_LIVE Red Hat Ceph Storage 2.2 bug fix and enhancement update 2017-03-21 07:24:26 UTC

Internal Links: 1389159 1427116

Description Vidushi Mishra 2017-02-17 09:40:13 UTC
Description of problem:

As a ceph user, core dumps are not getting generated for "rgw-daemon" likely as the result of the below actions and clearing away of its DUMPABLE flag.


Version-Release number of selected component (if applicable):
ceph version 10.2.5-26.el7cp (99b2480b95cab20252d1e06445208d5787da4699)

How reproducible:
100%

Steps to Reproduce:

This can cause EPERM errors on anything that calls PTRACE_ATTACH.

set DefaultLimitCORE=infinity in /etc/systemd/system.conf

$ sudo su - ceph
$

$ ps aux | grep radosgw
ceph       46948  0.1  0.1 7579320 43300 ?       Ssl  Feb15   3:37 /usr/bin/radosgw -f --cluster ceph1 --name client.rgw.host --setuser ceph --setgroup ceph
ceph      462885  0.0  0.0 112652   960 pts/0    S+   09:38   0:00 grep --color=auto radosgw
$ 

$ strace -p 46948
strace: attach: ptrace(PTRACE_ATTACH, ...): Operation not permitted
$ 

$ gstack 46948
$

$ gcore 46948
ptrace: Operation not permitted.
You can't do that without a process to debug.
The program is not being run.
gcore: failed to create core.46948
$ 

Expected results:
Core should be generated and strace, gcore should work.
  

Additional info:
As a root, coredumps are getting generated and working as expected.

Comment 3 Brad Hubbard 2017-02-17 23:25:11 UTC
I am currently working on this. The workaround is to set the sysctl fs.suid_dumpable to 1 or 2 as documented in https://bugzilla.redhat.com/show_bug.cgi?id=1389159#c6 Matt and I have discussed a possible patch and I am testing it. I will then check whether master is still affected by this and post upstream accordingly.

Comment 4 Brad Hubbard 2017-02-20 09:23:20 UTC
At this stage I believe I've convinced myself that there is a problem here. The essence of it is that after doing the following it should be possible to send the rgw daemon SIGSEGV and get a core show up when doing a "sudo coredumpctl list".

sudo systemctl stop ceph-radosgw.target                                                                                                                                                                                                      
sudo sysctl -w kernel.core_pattern='|/usr/lib/systemd/systemd-coredump %p %u %g s %t %e'                                                                                                                                                     
sudo sed -i -e 's/\[Service\]/\[Service\]\nLimitCORE=infinity/' /lib/systemd/system/ceph-radosgw\@.service                                                                                                                                   
sudo systemctl daemon-reload                                                                                                                                                                                                                 
sudo systemctl start ceph-radosgw.target

This does *not* generate a core with 10.2.5-26.el7cp (or earlier) however, with the addition of this patch to 10.2.5-26.el7cp it is possible to dump core (thanks for the position advice Matt).

http://pkgs.devel.redhat.com/cgit/rpms/ceph/commit/?h=private-bhubbard-wip-rgw-set-dumpable-flag-after-setuid-1&id=aeef05a4dd013aab8f3c52950229ffd90db0f8b9

The next step is to work out how this affects upstream.

Comment 5 Brad Hubbard 2017-02-22 20:49:26 UTC
(In reply to Brad Hubbard from comment #4)                                  
> 
> sudo sysctl -w kernel.core_pattern='|/usr/lib/systemd/systemd-coredump %p %u
> %g s %t %e'

Should be.

$ sudo sysctl -w kernel.core_pattern='|/usr/lib/systemd/systemd-coredump %p %u %g %s %t %e'

Comment 6 Brad Hubbard 2017-02-23 00:42:41 UTC
The problem can be demonstrated with a systemtap script. With the patch for bz1389159 we see the following output for rgw (option=0x4 is PR_SET_DUMPABLE) when starting the daemon.

# stap -e 'probe syscall.prctl{if($option == 0x4) printf("%d - %s: %s\n", gettimeofday_ns(), probefunc(), $$parms)} probe syscall.setuid{printf("%d - %s: %s\n", gettimeofday_ns(), probefunc(), $$parms)} probe begin{print("
Ready\n")}'
Ready
1487806337276844415 - SyS_prctl: option=0x4 arg2=0x1 arg3=0x7fbc237e0700 arg4=0x0 arg5=0x0
1487806337425109295 - sys_setuid: uid=0xa7
1487806337425127520 - sys_setuid: uid=0xa7
1487806337425135480 - sys_setuid: uid=0xa7
1487806337425143105 - sys_setuid: uid=0xa7
1487806337425149809 - sys_setuid: uid=0xa7
...
1487806337425421682 - sys_setuid: uid=0xa7
1487806337425430265 - sys_setuid: uid=0xa7
1487806337425437577 - sys_setuid: uid=0xa7
1487806337425444765 - sys_setuid: uid=0xa7
1487806337425451672 - sys_setuid: uid=0xa7
1487806337425458336 - sys_setuid: uid=0xa7
1487806337425464797 - sys_setuid: uid=0xa7
1487806337425471866 - sys_setuid: uid=0xa7
1487806337425478957 - sys_setuid: uid=0xa7
1487806337425485887 - sys_setuid: uid=0xa7
1487806337425492401 - sys_setuid: uid=0xa7
1487806337425499140 - sys_setuid: uid=0xa7
1487806337425505562 - sys_setuid: uid=0xa7
1487806337425512795 - sys_setuid: uid=0xa7
1487806337425520707 - sys_setuid: uid=0xa7
1487806337425533842 - sys_setuid: uid=0xa7

Whereas for the osd we see this.

# stap -e 'probe syscall.prctl{if($option == 0x4) printf("%d - %s: %s\n", gettimeofday_ns(), probefunc(), $$parms)} probe syscall.setuid{printf("%d - %s: %s\n", gettimeofday_ns(), probefunc(), $$parms)} probe begin{printReady\n")}'
Ready
1487806453801840303 - sys_setuid: uid=0xa7
1487806453801853293 - sys_setuid: uid=0xa7
1487806453801862885 - SyS_prctl: option=0x4 arg2=0x1 arg3=0x7fe04598a010 arg4=0x0 arg5=0x0

The call to prctl needs to come after the last call to setuid of course. With the new patch we see the following.

# stap -e 'probe syscall.prctl{if($option == 0x4) printf("%d - %s: %s\n", gettimeofday_ns(), probefunc(), $$parms)} probe syscall.setuid{printf("%d - %s: %s\n", gettimeofday_ns(), probefunc(), $$parms)} probe begin{print
Ready\n")}'
Ready
1487810070544291689 - SyS_prctl: option=0x4 arg2=0x1 arg3=0x7fb669a3c700 arg4=0x0 arg5=0x0
1487810070700573549 - sys_setuid: uid=0xa7
1487810070700590993 - sys_setuid: uid=0xa7
1487810070700600077 - sys_setuid: uid=0xa7
1487810070700609206 - sys_setuid: uid=0xa7
1487810070700617631 - sys_setuid: uid=0xa7
...
1487810070701084570 - sys_setuid: uid=0xa7
1487810070701092730 - sys_setuid: uid=0xa7
1487810070701100523 - sys_setuid: uid=0xa7
1487810070701108297 - sys_setuid: uid=0xa7
1487810070701118095 - sys_setuid: uid=0xa7
1487810070701135526 - sys_setuid: uid=0xa7
1487810070704708059 - SyS_prctl: option=0x4 arg2=0x1 arg3=0x1 arg4=0x7ffe1a76ebb8 arg5=0x0

So that should resolve the issue and the osd results of course remain unchanged.

Comment 12 errata-xmlrpc 2017-03-14 15:49:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2017-0514.html


Note You need to log in before you can comment on or make changes to this bug.