660204 – ptrace: watchpoint-zeroaddr: EINVAL on PTRACE_POKEUSER of DR7

Bug 660204 - ptrace: watchpoint-zeroaddr: EINVAL on PTRACE_POKEUSER of DR7

Summary: ptrace: watchpoint-zeroaddr: EINVAL on PTRACE_POKEUSER of DR7

Keywords:
Status:	CLOSED UPSTREAM
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	19
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	low
Target Milestone:	---
Assignee:	Oleg Nesterov
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2010-12-06 03:43 UTC by Jan Kratochvil
Modified:	2013-05-22 16:54 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2013-05-21 19:00:48 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Jan Kratochvil 2010-12-06 03:43:18 UTC

Description of problem:
When you set first DR7 and then DR0 - the setting of DR7 fails, because DR0 has still address 0 that time.  The process is still stopped so there is no reason for a sanity check that time.
This is a regression.

Version-Release number of selected component (if applicable):
PASS kernel-2.6.32-71.7.1.el6.x86_64
FAIL kernel-2.6.35.6-48.fc14.x86_64
FAIL kernel-2.6.35.6-48.noutrace.fc14.x86_64

How reproducible:
Always.

Steps to Reproduce:
wget -O watchpoint-zeroaddr.c http://sources.redhat.com/cgi-bin/cvsweb.cgi/~checkout~/tests/ptrace-tests/tests/watchpoint-zeroaddr.c?cvsroot=systemtap; gcc -o watchpoint-zeroaddr watchpoint-zeroaddr.c -Wall -g; ./watchpoint-zeroaddr; echo $?

Actual results:
1

Expected results:
0

Additional info:

Comment 1 Oleg Nesterov 2010-12-06 10:37:38 UTC

(In reply to comment #0)
>
> When you set first DR7 and then DR0 - the setting of DR7 fails,
> because DR0 has still address 0 that time.

More precisely, it fails because ptrace_write_dr7() never tries
to allocate the perf_event connected to this bpnum, it blindly
calls ptrace_modify_breakpoint(bp => NULL)  which obviously fails
with EINVAL.

And it is not trivial to change this, despite its name
ptrace_write_dr7() doesn't write dr7, we do not have  debugreg7
any longer. We really need bp != NULL to store the necessary
info.

Another problem is that this code has other bugs, and those
who wrote this code do not reply at all.

So. This is upstream problem. Unlike the previous regression
(watchpoints && fork), this one looks like a bug to me.

But. I do not know how long will it take to fix.


Just in case, to clarify once again... "zeroaddr" is fine.
The problem is, with the recent changes there is no "adrress"
at all before the tracer writes to u_debugreg[0]. But if you
try to read u_debugreg[0], get_watchpoint() will happily return
zero.

Isn't this nice...

Comment 2 Josh Boyer 2011-08-31 17:34:34 UTC

Did this ever get fixed?

Comment 3 Oleg Nesterov 2011-08-31 17:48:52 UTC

(In reply to comment #2)
>
> Did this ever get fixed?

argh.... thanks for reminding. Should be fixed upstream iirc.

Damn. I'll _try_ to look further this week. I forgot the details,
but afaics we already fixed other (more serious) problems which
delayed this problem...

Comment 4 Jan Kratochvil 2011-09-16 21:52:08 UTC

Still FAILs:
kernel-3.1.0-0.rc6.git0.0.fc17.x86_64

Comment 5 Fedora End Of Life 2013-04-03 18:09:53 UTC

This bug appears to have been reported against 'rawhide' during the Fedora 19 development cycle.
Changing version to '19'.

(As we did not run this process for some time, it could affect also pre-Fedora 19 development
cycle bugs. We are very sorry. It will help us with cleanup during Fedora 19 End Of Life. Thank you.)

More information and reason for this action is here:
https://fedoraproject.org/wiki/BugZappers/HouseKeeping/Fedora19

Comment 6 Justin M. Forbes 2013-04-05 16:58:00 UTC

Is this still a problem with 3.9 based F19 kernels?

Comment 7 Jan Kratochvil 2013-04-05 17:04:18 UTC

Yes; BTW the reproducer in Comment 0 is really simple to run.

kernel-3.8.4-202.fc18.x86_64

Comment 8 Oleg Nesterov 2013-04-13 18:40:44 UTC

(In reply to comment #7)
> Yes; BTW the reproducer in Comment 0 is really simple to run.

All I can say is: sorry for delay.

I constantly forgot about this problem. Fortunataly, it seems that
today the fix should be simple. Still it is complicated by the fact
that this code needs cleanups, in particular we should kill
ptrace_get_breakpoints() and ->ptrace_bp_refcnt, and this needs
per-arch changes.

I am working on the patches.

Comment 9 Oleg Nesterov 2013-04-14 19:22:43 UTC

Cough. Jan, suddenly I started to doubt this should be fixed.

Lets discuss this upstream, you were cc'ed. I already sent the
preparation patches, the actual fix should be trivial.

But. Could you please tell me if gdb really wants this fix?

Comment 10 Jan Kratochvil 2013-04-14 19:39:06 UTC

Recent GDB versions are fully compatible with recent Linux kernels, there was a special care about it:
  http://sourceware.org/git/?p=gdb.git;a=commitdiff;h=2bfeecf636f5790a7e850e11c7936c4d9814d8ef

Old GDBs were also compatible, although just accidentally:
  /* Finally, actually pass the info to the inferior.  */
  i386_dr_low.set_addr (i, addr);
  i386_dr_low.set_control (dr_control_mirror);

Unfortunately I really do not remember which specific use case had a problem with it.  I was heavily patching this GDB code before so some Fedora/RHEL GDB could be dependent on it, or maybe just some my work-in-progress patches that time.

I do not see any Red Hat business reason for it.  I still believe it is a needless ABI backward compatibility breakage but I do not mind if the kernel developers do not agree and WONTFIX it.

Comment 11 Oleg Nesterov 2013-04-14 19:48:02 UTC

(In reply to comment #10)
> I still believe it is a
> needless ABI backward compatibility breakage

Yes, yes, I agree. But given that this was broken a long ago...

> but I do not mind if the kernel
> developers do not agree and WONTFIX it.

OK. Thanks Jan. Lets wait for other upstream comments, then
we will see.

Comment 12 Jan Kratochvil 2013-04-14 19:56:01 UTC

(In reply to comment #11)
> Yes, yes, I agree. But given that this was broken a long ago...

The kernel ABI got broken ~3 years ago. Most of the "recent" users run Ubuntu 10.04 which I believe was still before the breakage.  Other users run even more old GNU/Linux releases.  The users still may not have noticed the change.

Comment 13 Oleg Nesterov 2013-05-17 18:06:22 UTC

(In reply to comment #12)
> (In reply to comment #11)
> > Yes, yes, I agree. But given that this was broken a long ago...
> 
> The kernel ABI got broken ~3 years ago. Most of the "recent" users run
> Ubuntu 10.04 which I believe was still before the breakage.  Other users run
> even more old GNU/Linux releases.  The users still may not have noticed the
> change.

OK.

The patches are already in -mm, and they were acked. Can we close
this bug?

Comment 14 Jan Kratochvil 2013-05-21 19:00:48 UTC

If the testcase really PASSes there then OK, CLOSED-UPSTREAM.

Comment 15 Oleg Nesterov 2013-05-22 16:54:34 UTC

(In reply to Jan Kratochvil from comment #14)
>
> If the testcase really PASSes there then OK, CLOSED-UPSTREAM.

Yes, yes, I even wrote the new test-case (the old one passes too).

Thanks!

Note You need to log in before you can comment on or make changes to this bug.