Bug 595424

Summary: ia32el instruction translator doesn't flush instruction cache after gdb inserts breakpoint, which leads to not hitting a breakpoint
Product: Red Hat Enterprise Linux 5 Reporter: Martin Osvald 🛹 <mosvald>
Component: ia32elAssignee: Petr Machata <pmachata>
Status: CLOSED NOTABUG QA Contact: qe-baseos-tools-bugs
Severity: medium Docs Contact:
Priority: medium    
Version: 5.5CC: eric.lin, jane.lv, jvillalo, jwest, jwilleford, luyu, mnewsome, rdoty, rpacheco, xiaolan.huang, yihua.jin, zhongjian.xiong
Target Milestone: rc   
Target Release: 5.7   
Hardware: ia64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-06-30 15:59:35 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 629795    
Attachments:
Description Flags
requested binary
none
the requested gdb package none

Description Martin Osvald 🛹 2010-05-24 15:16:51 UTC
Description of problem:

When 32bit gdb tries to write a breakpoint to traced child using ptrace(PTRACE_POKEDATA), the instruction translator (so called btgeneric), which translate gdb itself, doesn't notify the traced process's instruction translator to re-translate/flush instruction at given address. It doesn't do that because it /relies on/ or /implements/ some internal routine (internal, because btgeneric's source code is closed) which checks for SMC (self modifying code) by checking for any changes in memory pages flagged as writable to flush possible changes. But permissions for pages containing code segment are read and execute only. See 'Additional info' section for more info.


Version-Release number of selected component (if applicable):

RHEL5.5 ia32el-1.7-5 and all earlier versions (including ia32el versions shipped with RHEL4)


How reproducible:

Always


Steps to Reproduce:

1. Compile the following code to be 32bit:

=== <snip repro.c> ===
int main(void) {
  return 0;
}
=== </snip> ===

  $ gcc -m32 -g -o repro repro.c

2. Instal ia32el and 32bit gdb on ia64 and test ia32 execution layer:

  $ yum -y install ia32el gdb.i386

note: to install ia32el on ia64, you will probably need to subscribe your system to Supplementary Software Channel, which contains ia32el package, through Satellite/RHN Hosted.

  $ service ia32el status

3. Run the reproducer on ia64 using ia32el execution layer under 32bit gdb, set breakpoint and run it:

  $ /emul/ia32-linux/usr/bin/gdb ./repro

  (gdb) b main
  (gdb) r

 
Actual results:

Breakpoint is not being hit.


Expected results:

Breakpoint should be hit.


Additional info:

According to ia32el source code, which is responsible for ptrace(PTRACE_POKEDATA) syscall emulation, it seems that there was planned some code, which would use some inter process communication mechanism, which would notify child process's translator to remotely flush instructions, but it wasn't due to btgeneric running in "SMC mode". You can find a mention about it in the bellow source code (see unused BTGENERIC_FLUSH_IA32_INSTRUCTION_CACHE_REMOTE macro):

RHEL-5/ia32el-build/ia32el_7042_7022/src/ia32x/btlib_ptemu.c:ptemu_pokedata():
=== <snip> ===
static long
ptemu_pokedata (int pid, unsigned addr, unsigned data)
{
       long retval;
       unsigned long p = addr;
       int is_hi = 0;
       union {
               unsigned long l;
               struct {
                       unsigned lo;
                       unsigned hi;
               } s;
       } ldata;

       DBPRINT (btl_prt_pt, "n");

try_peek:

       /* read 8 bytes from the debuggee */
       retval = ia64_peekdata (pid, p, &ldata.l);
       if (retval < 0) {
               /* handle a case where the user tries to read the last 4 
                * bytes in a page and the next page is not readable */
               if (is_hi == 0) {
                       int is_on_page_boundry =
                           (PAGESTART (addr, btl_page_size) !=
                            PAGESTART (addr + 4, btl_page_size)) ? 1 : 0;
                       if ((retval == -EIO) && is_on_page_boundry) {
                               p -= 4;
                               is_hi = 1;
                               goto try_peek;
                       }
               }
       }
       DBPRINT (btl_prt_pt, "ia64_peekdata=%ld ldata=%lxn", retval, ldata.l);

       if (retval < 0) {
               retval = -EIO;
               goto out;
       }

       /* insert the 4 bytes requested by the user */
       if (is_hi) {
               ldata.s.hi = data;
       } else {
               ldata.s.lo = data;
       }

       DBPRINT (btl_prt_pt, "ldata=%lxn", ldata.l);

       /* write the resulted 8 bytes back to the debuggee */
       retval = SYSCALL (ptrace, (PTRACE_POKEDATA, pid, p, ldata.l));

       DBPRINT (btl_prt_pt, "poke_data=%ldn", retval);

       if (retval < 0) {
               goto out;
       }

/* remote-flush is not implemented yet in btgeneric. but - except
* of attaching to running processes and detaching from them -
* we dont need to flush instruction cache because we work
* in SMC mode when debugged ( see BtlMemoryQueryPermissions )
*
               
       if (retval == 0) {
               BTGENERIC_FLUSH_IA32_INSTRUCTION_CACHE_REMOTE (     <<<---
                       (void*)(long)pid, 
                       (void*)(uintptr_t)addr, 
                       sizeof(int), 
                       BT_FLUSH_FORCE);
       }
*/
out:
       return retval;
}
=== snip> ===

The Intel's documentation for btgeneric mentions:

"Writable page translations include code for detecting possible changes from the code used for translation."

You can find the documentation here: http://software.intel.com/file/21727

I tried to set a writable flag on load segment, which contains .text section in elf program header of reproducer binary, but after executing this binary under gdb it also didn't hit the breakpoint.

From the above it seems that there is a bug either in btgeneric or there is no support for such kind of notifications to force re-translation of code at address of inserted breakpoint. The possible solution could be to implement such IPC communication between parent and child instruction translators or fix the possible bug in btgeneric, which doesn't check correctly for changes made in code segment.

Comment 1 Eric Lin 2010-05-25 00:58:59 UTC
Martin, 

Thanks for the analysis and detailed discription. Actually IA-32 EL does consider the debug senario and will force all the code in ptraced progress as potential self-modifying code. Your case may fail due to other reason. We will investigate. 


Eric.

Comment 2 Eric Lin 2010-05-26 03:58:22 UTC
Hi Martin,
we have built up OS RHEL5.5(Tganga), and copied the entire /emul stuff from RHLE5.4, with 32-bit gdb version 6.8-37.el5, and we find it goes well without encounter the issue you described.
can you tell the 32-bit gdb version you tested?

Comment 4 Martin Osvald 🛹 2010-07-07 11:06:27 UTC
Hello,

I am sorry for the delay, I must have overlooked the email with your reply. :(

Sending the results from RHEL4.8, RHEL5.4, RHEL5.5 + appropriate gdb and ia32el versions:

RHEL4.8:

$ rpm -qa ia32el --qf "%{NAME}-%{VERSION}-%{RELEASE}.%{ARCH}\n"
ia32el-1.6-14.EL4.ia64
$ rpm -qf /emul/ia32-linux/usr/bin/gdb --qf "%{NAME}-%{VERSION}-%{RELEASE}.%{ARCH}\n"
gdb-6.3.0.0-1.162.el4.i386
$

$ /emul/ia32-linux/usr/bin/gdb -q ./reproducer/repro
Using host libthread_db library "/emul/ia32-linux/lib/tls/libthread_db.so.1".
(gdb) b main
Breakpoint 1 at 0x8048350: file repro.c, line 2.
(gdb) r
Starting program: /root/reproducer/repro 
Program exited normally.
You can't do that without a process to debug.
(gdb)

RHEL5.4:

$ rpm -qa ia32el --qf "%{NAME}-%{VERSION}-%{RELEASE}.%{ARCH}\n"
ia32el-1.7-3.el5.ia64
$ rpm -qf /emul/ia32-linux/usr/bin/gdb --qf "%{NAME}-%{VERSION}-%{RELEASE}.%{ARCH}\n"
gdb-6.8-37.el5.i386
$

$ /emul/ia32-linux/usr/bin/gdb -q ./reproducer/repro
(gdb) b main
Breakpoint 1 at 0x8048382: file repro.c, line 2.
(gdb) r
Starting program: /root/reproducer/repro 

Program exited normally.
You can't do that without a process to debug.
(gdb)

RHEL5.5:

$ rpm -qa ia32el --qf "%{NAME}-%{VERSION}-%{RELEASE}.%{ARCH}\n"
ia32el-1.7-5.el5.ia64
$ rpm -qf /emul/ia32-linux/usr/bin/gdb --qf "%{NAME}-%{VERSION}-%{RELEASE}.%{ARCH}\n"
gdb-7.0.1-23.el5_5.1.i386
$

$ /emul/ia32-linux/usr/bin/gdb -q ./reproducer/repro
Reading symbols from /root/reproducer/repro...done.
(gdb) b main
Breakpoint 1 at 0x8048382: file repro.c, line 2.
(gdb) r
Starting program: /root/reproducer/repro 
During startup program exited normally.
(gdb)

Best regards,
-Martin

Comment 5 Eric Lin 2010-07-08 08:20:38 UTC
Hi Martin, 
I configured exactly the same RHEL5.4 (same version of ia32el and i386 gdb) as you listed. while still I found it runs correctly.
if possible, can you package the repro binary and attach it here?

Comment 6 Eric Lin 2010-07-08 08:21:04 UTC
Hi Martin, 
I configured exactly the same RHEL5.4 (same version of ia32el and i386 gdb) as you listed. while still I found it runs correctly.
if possible, can you package the repro binary and attach it here?

Comment 7 Eric Lin 2010-07-22 04:22:13 UTC
Hi Martin,
Any updates?

Comment 8 Martin Osvald 🛹 2010-07-22 16:52:26 UTC
Created attachment 433760 [details]
requested binary

Comment 10 Xiaolan 2010-08-03 01:16:54 UTC
Hi Martin,

we failed to reproduce the bug with your binary. could you help to double check your ia32el installation?

for example, on RHEL5.5

[root@tiger38-mad ~]# md5sum /usr/lib/ia32el/*
9ff41cdf1c87e0f2851a3c729a78653e  /usr/lib/ia32el/auxapp
bc31b4cefc063e4aef83562caaadae48  /usr/lib/ia32el/ia32exec.bin
c59e8d21a20400f9c0668c3551b4a76f  /usr/lib/ia32el/ia32x_loader
75555f72574a452e3328905f21cdaa4c  /usr/lib/ia32el/is_ia32el
427a5213463076f8efa2d453fc874e16  /usr/lib/ia32el/libia32x.so
8307057973182405e3f9e9cd9caef689  /usr/lib/ia32el/suid_ia32x_loader
[root@tiger38-mad ~]# cat /proc/sys/fs/binfmt_misc/ia32el
enabled
interpreter /usr/lib/ia32el/ia32x_loader
flags: POC
offset 0
magic 7f454c4601010100000000000000000002000300
[root@tiger38-mad ~]# /usr/lib/ia32el/ia32x_loader -v
IA32X Loader:           version=01.release
IA32X Generic   :       version=7,2,7042,0      path=/usr/lib/ia32el/ia32exec.bin
IA32X OS-Wrapper:       version=7.2.7030.13.12.release

Thanks
Xiaolan

Comment 11 Martin Osvald 🛹 2010-08-05 13:35:36 UTC
Hello,

that is strange, I can reproduce it every time. :-( The version you are using is the same:

[root@planck repro]# md5sum /usr/lib/ia32el/*
9ff41cdf1c87e0f2851a3c729a78653e  /usr/lib/ia32el/auxapp
bc31b4cefc063e4aef83562caaadae48  /usr/lib/ia32el/ia32exec.bin
c59e8d21a20400f9c0668c3551b4a76f  /usr/lib/ia32el/ia32x_loader
75555f72574a452e3328905f21cdaa4c  /usr/lib/ia32el/is_ia32el
427a5213463076f8efa2d453fc874e16  /usr/lib/ia32el/libia32x.so
8307057973182405e3f9e9cd9caef689  /usr/lib/ia32el/suid_ia32x_loader
[root@planck repro]# cat /proc/sys/fs/binfmt_misc/ia32el
enabled
interpreter /usr/lib/ia32el/ia32x_loader
flags: POC
offset 0
magic 7f454c4601010100000000000000000002000300
[root@planck repro]# /usr/lib/ia32el/ia32x_loader -v
IA32X Loader:           version=01.release
IA32X Generic   :       version=7,2,7042,0      path=/usr/lib/ia32el/ia32exec.bin
IA32X OS-Wrapper:       version=7.2.7030.13.12.release
[root@planck repro]#

I will ask whether we can arrange some machine for you to /access/test/reproduce/ it at our site and will let you know.

Best regards,
-Martin

Comment 12 RHEL Program Management 2010-08-09 19:28:01 UTC
This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated in the
current release, Red Hat is unfortunately unable to address this
request at this time. Red Hat invites you to ask your support
representative to propose this request, if appropriate and relevant,
in the next release of Red Hat Enterprise Linux.

Comment 13 Gary Case 2010-08-20 18:42:08 UTC
Jane and Luming,

Red Hat would like your help debugging this issue using a machine inside the Red Hat firewall. Other colleagues at Intel haven't been able to reproduce the issue using what seems to be an identical setup on a machine at Intel, so we thought we would ask for your help. Because you are on-site and have access to our VPN, you would be able to connect to the machine within our firewall that we are using to reproduce the issue. Martin Osvald (mosvald) would be the contact at Red Hat for getting the machine set up. Martin, could you set up a reproducer machine and send the access information to Jane and Luming so that they would be able to connect to it? Jane Lv (jlv) and Luming Yu (luyu) are their email addresses.

-Gary Case

Comment 14 Luming Yu 2010-08-23 01:57:47 UTC
Gary,

Thanks for letting us know the existence of the problem. Although I'm not a IA32 EL guy at all, but I can help if the issue was reproduced with a native ia64 test case first.

Btw, Would Red hat permit Intel IA32 EL experts to access that system that Martin would prepare for me and Jane to debug?


Thanks,
Luming

Comment 15 Gary Case 2010-08-23 14:57:49 UTC
Luming,

Red Hat would definitely permit the IA32 EL experts to access the system. I asked for you and Jane to be put on the CC list as I knew that you two had access already and could assist other members of your group to get to the machine at Red Hat. 

-Gary

Comment 16 Jane Lv 2010-08-24 02:33:51 UTC
Luming,

Will you be working with Eric or Xiaolan on this issue?  Or I can do that.


-Jane

Comment 17 Luming Yu 2010-08-24 02:54:03 UTC
Jane,

I will leave it to you unless you have other urgent things to do.

-Luming

Comment 18 Jane Lv 2010-08-24 06:38:03 UTC
OK.  I will handle this.

Martin,

Could you send me (jlv) the access information of the reproducer machine?

Thanks.


-Jane

Comment 19 Xiaolan 2010-08-27 06:55:03 UTC
Hi all, 

Thanks a lot for Luming and Jane's help. We found that the version of x86 gdb on Redhat's box is a little bit different from that on Intel's box. 

Gary, could you help to check the version of x86 gdb and send us the install package?


on Redhat's box:
Version GNU gdb (GDB) Red Hat Enterprise Linux (7.0.1-23.el5_5.2)
0575c6d831c818091fa995cea9d5f302  /emul/ia32-linux/usr/bin/gdb
0c7bf6807fdde7d646a4290ea7921e7b  /emul/ia32-linux/usr/bin/gdbserver
0575c6d831c818091fa995cea9d5f302  /emul/ia32-linux/usr/bin/gdbtui

on Intel's box:
Version GNU gdb (GDB) Red Hat Enterprise Linux (7.0.1-23.el5)
a5af2b485a3844a999af9a759ee25ed3  /emul/ia32-linux/usr/bin/gdb
e6c48c579abfd8b9be568268955ec243  /emul/ia32-linux/usr/bin/gdbserver
a5af2b485a3844a999af9a759ee25ed3  /emul/ia32-linux/usr/bin/gdbtui

thanks
xiaolan

Comment 20 Ronald Pacheco 2010-09-20 15:22:56 UTC
What is the status of this bug?

Comment 21 Xiaolan 2010-09-21 06:50:40 UTC
We found that redhat use different gdb from ours. Could we get your gdb package?  

Thanks
Xiaolan

Comment 22 Martin Osvald 🛹 2010-09-28 07:07:58 UTC
Created attachment 450108 [details]
the requested gdb package

Comment 23 Xiaolan 2010-09-30 06:01:25 UTC
we will have our national holidays from 10.1-10.7. we will continue to work on it after that.

thanks
Xiaolan

Comment 24 Xiaolan 2010-10-11 03:18:54 UTC
Hi Gary,

Unfortunately, we failed to reproduce the failure with the new gdb binary.
Could you help to prepare the system? so that we can continue to debug it in redhat's environment.

Thanks
Xiaolan

Comment 25 Gary Case 2010-10-12 19:43:15 UTC
Xiaolan,

Would you like Martin to put the old gdb package on the system at Red Hat so that you can use the same versions?

-Gary

Comment 26 Xiaolan 2010-10-13 02:07:01 UTC
Gary,

Is it the old package the one you sent us? If so, that's fine to use the version. thanks a lot!

-Xiaolan

Comment 27 Luming Yu 2010-11-08 08:18:28 UTC
Any updates? 

If this is supposed to be packages version problem, I suppose I need to get a ia64 system in beaker takeable to me to help this kind of issue in future.

Gary, 
could you help me?

Thanks,
Luming

Comment 30 Gary Case 2010-11-15 18:38:05 UTC
I emailed the login details for the machine that Martin set up to Luming last week, but forgot to update the BZ to explain that.

Comment 31 Luming Yu 2010-11-17 12:57:56 UTC
Xiaolan,

The system is ready for you to use, please let me know when you need it, so that I can prepare it for you. Or, you can just post all libraries version here that work for you. I can help check if there is mis-matched library installed in the current RHEL 5.5 release.

Thanks,
Luming

Comment 32 Xiaolan 2010-11-22 08:04:16 UTC
Hi Case,

we found that the box doesn't install x86 version bash package. could you help to update the system and make a try again?

Thanks
Xiaolan

Comment 33 Luming Yu 2010-11-22 13:51:59 UTC
Gary,

Could you also let me know if we can just run yum to install ia32el and the related packages? So Xiaolan could pull in all required packages from repo by herself on the nec-nx2-1.rhts.eng.bos.redhat.com.

I assume it's fine that Xiaolan can do that on nec-nx2-1.rhts.eng.bos.redhat.com.
Please correct me if I'm wrong.


Thanks,
Luming

Comment 34 Gary Case 2010-11-23 16:08:20 UTC
Luming,

The system was already registered with RHN. All you need to do to install packages is run "yum install packagename". I've put bash.i386 on and ia32el was already installed, so you should be ready to go.

-Gary

Comment 37 Eric Lin 2011-03-30 02:37:39 UTC
Hi Case,
is the system still be there for us to look into the case, i heard from Xiaolan that it's been re-installed.

thanks.

Comment 38 RHEL Program Management 2011-06-20 21:52:01 UTC
This request was evaluated by Red Hat Product Management for inclusion in Red Hat Enterprise Linux 5.7 and Red Hat does not plan to fix this issue the currently developed update.

Contact your manager or support representative in case you need to escalate this bug.

Comment 43 Jeremy West 2011-06-30 15:59:35 UTC
Red Hat does not plan to address this bug within the RHEL5 lifecycle, as there is yet no planned fix upstream for this.  If this bug continues to impact newer deployments of Red Hat Enterprise Linux 5 or 6, please file a request against that product and we'll work with the upstream community to try and address this problem.