Bug 140013
Summary: | rt_sigqueueinfo send signal sometimes results in garbage in signal handler's siginfo_t | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 4 | Reporter: | Jakub Jelinek <jakub> | ||||||||||
Component: | kernel | Assignee: | David Woodhouse <dwmw2> | ||||||||||
Status: | CLOSED RAWHIDE | QA Contact: | Brian Brock <bbrock> | ||||||||||
Severity: | medium | Docs Contact: | |||||||||||
Priority: | medium | ||||||||||||
Version: | 4.0 | CC: | bugproxy, jlaska, jturner, riel | ||||||||||
Target Milestone: | --- | ||||||||||||
Target Release: | --- | ||||||||||||
Hardware: | powerpc | ||||||||||||
OS: | Linux | ||||||||||||
Whiteboard: | |||||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||||
Doc Text: | Story Points: | --- | |||||||||||
Clone Of: | Environment: | ||||||||||||
Last Closed: | 2005-01-14 12:48:26 UTC | Type: | --- | ||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||
Documentation: | --- | CRM: | |||||||||||
Verified Versions: | Category: | --- | |||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
Embargoed: | |||||||||||||
Bug Depends On: | |||||||||||||
Bug Blocks: | 135876 | ||||||||||||
Attachments: |
|
Description
Jakub Jelinek
2004-11-19 10:16:24 UTC
Created attachment 107046 [details]
Statically linked linuxthreads ppc64 tst-timer4 from glibc testsuite
2002/02/05 torvalds | /* Not even root can pretend to send signals from the kernel. 2002/02/05 torvalds | Nor can they impersonate a kill(), which adds source info. */ 2002/02/05 torvalds | if (info.si_code >= 0) 2002/02/05 torvalds | return -EPERM; 2002/02/05 torvalds | info.si_signo = sig; in sys_rt_sigqueuinfo even if the caller passed bogus siginfo_t down (which it doesn't) would ensure that at least si_signo is equal to the signal number. But I'm getting: 36 36 -2 13119 0 0xa333333333 36 941686912 1140850690 0 0 0x1ffffffef60 36 36 -2 13119 0 0xa333333333 36 36 -2 13119 0 0xa333333333 36 36 -2 13119 0 0xa333333333 36 36 -2 13119 0 0xa333333333 *** errors occurred in sig2 handler 6 where the debugging printouts are in order sig, info->si_signo, info->si_code, info->si_pid, info->si_uid, info->si_value.sival_ptr right at the beginning of signal handler for signal 36. Probably the same problem is causing e.g. tst-cancel4 and a bunch of other NPTL tests to fail. Under debugger, I see: sigcancel_handler (sig=32, si=0x8000bdd2d8, ctx=0x52004422) at init.c:147 147 { (gdb) p *si $10 = {si_signo = 941686912, si_errno = 939524268, si_code = 1140850690, _sifields = {_pad = {0, 0, 128, 12440320, 128, 12438824, 32, 0, -6, 1486384728, 13574, 0, 0, 0, -1073741824, 661228, 0 <repeats 12 times>}, _kill = {si_pid = 0, si_uid = 0}, _timer = {si_tid = 0, si_overrun = 0, si_sigval = {sival_int = 128, sival_ptr = 0x8000bdd300}}, _rt = { si_pid = 0, si_uid = 0, si_sigval = {sival_int = 128, sival_ptr = 0x8000bdd300}}, _sigchld = {si_pid = 0, si_uid = 0, si_status = 128, si_utime = 549768252712, si_stime = 137438953472}, _sigfault = {si_addr = 0x0}, _sigpoll = { si_band = 0, si_fd = 128}}} The SIGCANCEL (==32) signal is sent by val = INTERNAL_SYSCALL (tgkill, err, 3, THREAD_GETMEM (THREAD_SELF, pid), pd->tid, SIGCANCEL); if (INTERNAL_SYSCALL_ERROR_P (val, err) && INTERNAL_SYSCALL_ERRNO (val, err) == ENOSYS) val = INTERNAL_SYSCALL (tkill, err, 2, pd->tid, SIGCANCEL); Some more info: sigcancel_handler (sig=32, si=0x8000bdd2d8, ctx=0x52004422) at init.c:147 147 { (gdb) x/24w $r4 0x8000bdd2d8: 0x38210080 0x380000ac 0x44000002 0x00000000 0x8000bdd2e8: 0x00000000 0x00000000 0x00000080 0x00bdd300 0x8000bdd2f8: 0x00000080 0x00bdcd28 0x00000020 0x00000000 0x8000bdd308: 0xfffffffa 0x982092d8 0x00007286 0x00000000 0x8000bdd318: 0x00000000 0x00000000 0x00000000 0x00000021 0x8000bdd328: 0x00000080 0x00031008 0x00000000 0x100002d4 (gdb) p *(siginfo_t *)si $13 = {si_signo = 941686912, si_errno = 939524268, si_code = 1140850690, _sifields = {_pad = {0, 0, 128, 12440320, 128, 12438824, 32, 0, -6, -1742695720, 29318, 0, 0, 0, 0, 33, 128, 200712, 0, 268436180, 0, 1, -1073741820, 941653552, 0, 0, -1073741824, 294764}, _kill = {si_pid = 0, si_uid = 0}, _timer = {si_tid = 0, si_overrun = 0, si_sigval = { sival_int = 128, sival_ptr = 0x8000bdd300}}, _rt = {si_pid = 0, si_uid = 0, si_sigval = {sival_int = 128, sival_ptr = 0x8000bdd300}}, _sigchld = {si_pid = 0, si_uid = 0, si_status = 128, si_utime = 549768252712, si_stime = 137438953472}, _sigfault = {si_addr = 0x0}, _sigpoll = {si_band = 0, si_fd = 128}}} (gdb) p *(siginfo_t *)0x8000bdd300 $14 = {si_signo = 32, si_errno = 0, si_code = -6, _sifields = {_pad = {29318, 0, 0, 0, 0, 33, 128, 200712, 0, 268436180, 0, 1, -1073741820, 941653552, 0, 0, -1073741824, 294764, 0, 0, 0, 0, 0, 0, 0, 0, 0, 403}, _kill = {si_pid = 29318, si_uid = 0}, _timer = {si_tid = 29318, si_overrun = 0, si_sigval = {sival_int = 0, sival_ptr = 0x0}}, _rt = { si_pid = 29318, si_uid = 0, si_sigval = {sival_int = 0, sival_ptr = 0x0}}, _sigchld = {si_pid = 29318, si_uid = 0, si_status = 0, si_utime = 33, si_stime = 549756014600}, _sigfault = {si_addr = 0x728600000000}, _sigpoll = { si_band = 125919851184128, si_fd = 0}}} What $r4 (aka si) points to looks very much like struct rt_sigframe's tramp: 2003/03/25 bergner | /* addi r1, r1, __SIGNAL_FRAMESIZE # Pop the dummy stackframe */ 2003/03/25 bergner | err |= __put_user(0x38210000UL | (__SIGNAL_FRAMESIZE & 0xffff), &tramp[0]); 2003/03/25 bergner | /* li r0, __NR_[rt_]sigreturn| */ 2003/03/25 bergner | err |= __put_user(0x38000000UL | (syscall & 0xffff), &tramp[1]); 2003/03/25 bergner | /* sc */ 2003/03/25 bergner | err |= __put_user(0x44000002UL, &tramp[2]); and clearly if $r4 points to &frame->tramp[0], frame->pinfo contains the expected value and siginfo_t there is sane. So it looks like the rt signal frame is setup correctly, just $r4 and $r5 passed to the routine are messed up. %r5 contains 0x52004422 which is even weirder. Other registers, e.g. $r1, seem to be set up correctly, $r1 is 0x8000bdcca8 which is frame->puc - 128 == &frame->uc - 128. The code does: 2003/03/25 bergner | err |= __put_user(&frame->info, &frame->pinfo); 2003/03/25 bergner | err |= __put_user(&frame->uc, &frame->puc); ... 2003/03/25 bergner | if (ka->sa.sa_flags & SA_SIGINFO) { 2004/05/31 torvalds | err |= get_user(regs->gpr[4], (unsigned long __user *)&frame->pinfo); 2004/05/31 torvalds | err |= get_user(regs->gpr[5], (unsigned long __user *)&frame->puc); 2003/03/25 bergner | regs->gpr[6] = (unsigned long) frame; 2003/03/25 bergner | } else { 2003/03/25 bergner | regs->gpr[4] = (unsigned long)&frame->uc.uc_mcontext; 2003/03/25 bergner | } First of all, this looks silly to me, as we know we saved there &frame->info resp. &frame->uc a few instructions before, so why can't it: regs->gpr[4] = (unsigned long) &frame->info; regs->grp[5] = (unsigned long) &frame->uc; and avoid the unnecessary read from userland. But, if a __put_user followed by get_user to the same location doesn't yield the value stored there, something is very wrong. This has clearly nothing to do with that put_user or get_user, as handle_rt_signal32 doesn't do this, yet there is the same problem for 32-bit binaries. The common thing I see on entering userland sigcancel_handler (i.e. right after kernel returns to userland so that userland can handle the signal) is that $r4 is mistakenly equal to $lr and $r5 is equal to $cr. Yes, it's not there -- it seems to be getting trashed _later_, in fact. See results of single-stepping. See the registers being fine when we first stop, and then getting stomped on later... Not entirely sure why we see pc=0x8000009dc8 twice. It may be related to the recent single-stepping changes, in which case it ought to be unrelated since this problem is happening even when untraced. I'll look into that too, though. Program received signal SIG32, Real-time event 32. [Switching to Thread 549768053376 (LWP 25550)] 0x0000008000087050 in *__GI___sigsuspend (set=0x8000bab5a0) at ../sysdeps/unix/sysv/linux/sigsuspend.c:73 warning: Source file is more recent than executable. 73 int result = INLINE_SYSCALL (rt_sigsuspend, 2, CHECK_SIGSET (set), 6: /x $sp = 0x8000bab4a0 5: /x $lr = 0x800008703c 4: /x $r5 = 0x0 3: /x $r4 = 0x8 2: x/i $pc 0x8000087050 <*__GI___sigsuspend+120>: mfcr r0 (gdb) si sigcancel_handler (sig=32, si=0x8000bab300, ctx=0x8000baad28) at init.c:148 warning: Source file is more recent than executable. 148 if ((char *)si - (char *)ctx != 0x5d8 || si->si_signo == 0x38210080) 6: /x $sp = 0x8000baaca8 5: /x $lr = 0x8000bab2d8 4: /x $r5 = 0x8000baad28 3: /x $r4 = 0x8000bab300 2: x/i $pc 0x8000009dc8: subf r5,r5,r4 (gdb) 148 if ((char *)si - (char *)ctx != 0x5d8 || si->si_signo == 0x38210080) 6: /x $sp = 0x8000baaca8 5: /x $lr = 0x8000bab2d8 4: /x $r5 = 0x8000baad28 3: /x $r4 = 0x8000bab300 2: x/i $pc 0x8000009dc8: subf r5,r5,r4 (gdb) 147 { 6: /x $sp = 0x8000baaca8 5: /x $lr = 0x8000bab2d8 4: /x $r5 = 0x7faeba6eb6 3: /x $r4 = 0x8000bab2d8 2: x/i $pc 0x8000009dcc: mflr r0 (gdb) c The syscall_exit_trace path is stomping on r4 and r5 and not restoring them. Something like this appears to fix the problem at hand, but at first glance it looks like we're also stomping on other registers. Will double-check. --- entry.S~ 2004-11-22 15:40:42.000000000 +0000 +++ entry.S 2004-11-22 16:07:52.000000000 +0000 @@ -185,12 +185,14 @@ beq- 1f /* only restore r13 if */ ld r13,GPR13(r1) /* returning to usermode */ 1: ld r2,GPR2(r1) - ld r1,GPR1(r1) li r12,MSR_RI andc r10,r10,r12 mtmsrd r10,1 /* clear MSR.RI */ mtlr r4 mtcr r5 + ld r4,GPR4(r1) + ld r5,GPR4(r1) + ld r1,GPR1(r1) mtspr SRR0,r7 mtspr SRR1,r8 rfid Created attachment 107188 [details]
Better fix.
Jakub points out that it would be more appropriate just to use ret_from_except
for this instead of syscall_exit which _deliberately_ doesn't restore regs it
doesn't normally need to.
Created attachment 107536 [details]
Updated (and hopefully final) patch.
Created attachment 107537 [details]
Updated (and hopefully final) patch.
*** Bug 140102 has been marked as a duplicate of this bug. *** Patch confirmed in 2.6.9-1.860_EL. Anyone want to do some testing and toss some feedback in here? Thanks. Closing out on the assumption that no news is good news. Please reopen if that's not the case. |