Bug 202096 - ia64: unaligned accesses during dmraid execution at startup
ia64: unaligned accesses during dmraid execution at startup
Status: CLOSED ERRATA
Product: Fedora
Classification: Fedora
Component: dmraid (Show other bugs)
rawhide
ia64 Linux
medium Severity medium
: ---
: ---
Assigned To: Heinz Mauelshagen
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2006-08-10 15:24 EDT by Prarit Bhargava
Modified: 2007-11-30 17:11 EST (History)
15 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-09-12 11:47:03 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Patch to fix it in glibc (1.03 KB, patch)
2006-12-07 23:36 EST, Zhang Yanmin
no flags Details | Diff

  None (edit)
Description Prarit Bhargava 2006-08-10 15:24:11 EDT
Description of problem:

During system startup, unaligned access messages are generated from dmraid
execution.

Version-Release number of selected component (if applicable): FC6-test2,
dmraid-1.0.0.rc11-FC6.2


How reproducible: 100%


Steps to Reproduce:
1. Install FC6 Test2 from oss.sgi.com/projects/fedora (or the rawhide equivalent)
  
Actual results:

During boot, I see:

Setting hostname localhost.localdomain:  [  OK  ]
dmraid(1037): unaligned access to 0x20000000000b6a74, ip=0x2000000000017d90
dmraid(1037): unaligned access to 0x20000000000b6a74, ip=0x2000000000017da0
dmraid(1037): unaligned access to 0x20000000000b6a8c, ip=0x2000000000017d90
dmraid(1037): unaligned access to 0x20000000000b6a8c, ip=0x2000000000017da0
dmraid(1037): unaligned access to 0x20000000000b6aa4, ip=0x2000000000017d90
Setting up Logical Volume Management:   2 logical volume(s) in volume group
"VolGroup00" now active

Expected results:

No unaligned accesses should be seen.
Comment 1 Aron Griffis 2006-08-17 16:40:34 EDT
FWIW, I'm seeing the same thing updating the kernel rpm:

$ sudo rpm -Uvh *
Preparing...                ########################################### [100%]
   1:kernel-debuginfo-common########################################### [ 14%]
   2:kernel                 ########################################### [ 29%]
dmraid(13293): unaligned access to 0x200000000009aa74, ip=0x2000000000017e80
dmraid(13293): unaligned access to 0x200000000009aa74, ip=0x2000000000017e90
dmraid(13293): unaligned access to 0x200000000009aa8c, ip=0x2000000000017e80
dmraid(13293): unaligned access to 0x200000000009aa8c, ip=0x2000000000017e90
dmraid(13293): unaligned access to 0x200000000009aaa4, ip=0x2000000000017e80
   3:kernel-debuginfo       ########################################### [ 43%]
   4:kernel-devel           ########################################### [ 57%]
   5:kernel-xen             ########################################### [ 71%]
dmraid(14164): unaligned access to 0x200000000009aa74, ip=0x2000000000017e80
dmraid(14164): unaligned access to 0x200000000009aa74, ip=0x2000000000017e90
dmraid(14164): unaligned access to 0x200000000009aa8c, ip=0x2000000000017e80
dmraid(14164): unaligned access to 0x200000000009aa8c, ip=0x2000000000017e90
dmraid(14164): unaligned access to 0x200000000009aaa4, ip=0x2000000000017e80
   6:kernel-xen-debuginfo   ########################################### [ 86%]
   7:kernel-xen-devel       ########################################### [100%]
Comment 2 Erik Jacobson 2006-09-14 10:17:52 EDT
We're seeing this too.  
Comment 4 Luming Yu 2006-10-08 21:18:51 EDT
I saw it too.  It is due to packed structure format_member array. I don't know 
why it should be packed. The following patch just removes that attribute. I 
don't know if it will cause any run-time problem.But it solves the alignment 
issue. 


--- dmraid/1.0.0.rc12/lib/format/format.c       2006-09-21 06:17:05.000000000 -
0400
+++ /tmp/format.c       2006-10-10 13:03:35.000000000 -0400
@@ -37,7 +37,7 @@
        const unsigned short offset;
        const unsigned char flags;
        const char *msg;
-} __attribute__ ((packed));
+} /*__attribute__ ((packed))*/;
 
 enum { FMT_ALL = 0x01, FMT_METHOD = 0x02 } format_flags;
 #define        IS_FMT_ALL(member)      (member->flags & FMT_ALL)
Comment 5 Zhang Yanmin 2006-10-11 04:27:47 EDT
I checked it and found the ip is at function __dl_relocate_object in glibc.

I recompiled the glibc with some walkaround and located the source line is in 
function elf_machine_rela_relative in file sysdeps/ia64/dl-machine.h.

elf_machine_rela_relative (Elf64_Addr l_addr, const Elf64_Rela *reloc,
                           void *const reloc_addr_arg)
{
  Elf64_Addr *const reloc_addr = reloc_addr_arg;
  /* ??? Ignore MSB and Instruction format for now.  */
  assert (ELF64_R_TYPE (reloc->r_info) == R_IA64_REL64LSB);

  *reloc_addr += l_addr;/////////This line caused unaligned access
}

Comment 6 Doug Chapman 2006-10-11 09:32:45 EDT
Yes, the unaligned access happens during dynamic linking.  The
/sbin/dmraid.static version does not have this problem.  The only difference
between dmraid and dmraid.static is dmraid has libdmraid.so dynamically linked.

Why do we need libdmraid as a dynamic lib?  I am willing to bet money that
nobody else links to it.  Looking at the source it appears that linking this
dynamicly is a bit of a hack.  Seems we should just be using the static version.
Comment 7 Heinz Mauelshagen 2006-10-11 11:35:49 EDT
Hrm, this looks more like a general architecture specific flaw in the dynamic
loader to me. If so, that needs fixing in order to avoid potential problems with
other DSOs.
Comment 8 Luming Yu 2006-10-11 11:45:13 EDT
I don't know the dmraid.staic version don't have unaligned problem, or just 
haven't been seen/reported...  Anyway, any memory access to 64-bits pointer -- 
format_member[i].msg should cause unaligned problem for whatever dynamic or 
static linked version, because it is NOT aligned at 8-byte boundary. 

The dynamic link version caused visible problem is just a side effect of the 
packed structure, because the Address of the relocation table entry for 
format_member[i].msg seems to Not aligned at 8-byte boundary too. But, this 
should be another bug, if it is a bug.
Comment 9 Zhang Yanmin 2006-10-12 05:04:13 EDT
I debugged it more by instrumenting glibc ld.so.

1) When the first unaligned access happens, the Elf64_Rela's r_offset is equal 
to 0x4aa74 which points to lib/format/format.c=>format_member[0].msg.
2) When ld.so loads dmraid, it loads dmraid.so. When it loads dmraid.so, there 
are relocation entries to point to format_member[XXX].msg, because msg points 
to read-only strings. When dmraid.so is relocated, all the read-only strings 
are moved to another place, so the values of format_member[XXX].msg should be 
relocated by being added a new start address of dmraid.so.
3) When format_member is packed, format_member[XXX].msg address is not aligned 
with 8-bytes, so ld.so hits unaligned access when access/change the values of 
format_member[XXX].msg.

dmraid.static has no dynamic relocation, so doesn't trigger it by ld.so. 

What Luming said is dmraid self might access format_member[XXX].msg to trigger 
unaligned access. What we see is ld.so triggers it before dmraid uses it.

Conclusion:
1)It's not a general architecture specific flaw in the dynamic loader;
2)Luming's patch is correct.

Comment 10 Zhang Yanmin 2006-10-12 05:07:19 EDT
Suggest to delete the packed attribute instead of just commenting out in 
Luming's patch.
Comment 11 Luming Yu 2006-10-23 02:59:33 EDT
Prarit,
Please verify if that patch works for you. And I'm going to change the status 
to NEEDINFO.

Thanks,
Luming

Comment 12 Heinz Mauelshagen 2006-10-23 04:22:46 EDT
Lumming's patch breaks runtime.
Again: this happens on this arch and not just for the dmraid.
Why not fix the ld problem ?

With respect to comment #6: yes, we need the dmraid DSO for online
reconfiguration capabilities to link it to libdevmapper's dmeventd.
Comment 13 Luming Yu 2006-10-23 04:41:49 EDT
Thanks for testing. I will try to fix the ld problem.
Comment 14 Zhang Yanmin 2006-10-23 05:14:43 EDT
What is runtime? Why does the patch break runtime? With Luming's patch, all 
packages related to dmraid need to be recompiled. Perhaps you forgot to 
compile one package?

I don't think it's a good idea to fix it in ld. It might hurt ld performance.

Comment 15 Heinz Mauelshagen 2006-10-23 07:28:45 EDT
runtime = dmraid

Applications need to access unaligned structure members inettionally (eg. for
ondisk structures portable accross architectures).

I don't like to see that restricted because of such mandatory portability reasons.
If ld takes a, presumably minor, performamce deficit, that's the price which
needs to be paid then.
Comment 16 Luming Yu 2006-11-06 03:35:42 EST
ok, please try the following patch, and rebuild ld-linux-ia64.so. Please note, 
this patch just fix the unaligned issue in dynamic link time.

For the unaligned issue in actual use of that 4-byte aligned pointer, please 
consider my previous patch. It needs dmraid change.

signed-off-by Luming Yu <luming.yu@intel.com>

--- glibc-20060815T2033/sysdeps/ia64/dl-machine.h.orig  2006-11-06 
17:07:00.000000000 +0800
+++ glibc-20060815T2033/sysdeps/ia64/dl-machine.h       2006-11-06 
16:59:18.000000000 +0800
@@ -482,11 +482,24 @@
 elf_machine_rela_relative (Elf64_Addr l_addr, const Elf64_Rela *reloc,
                           void *const reloc_addr_arg)
 {
-  Elf64_Addr *const reloc_addr = reloc_addr_arg;
+  Elf64_Addr *const reloc_addr_64 = reloc_addr_arg;
+  Elf32_Addr *reloc_addr_32 = reloc_addr_arg;
+  uint64_t tmp4 = (uint64_t) reloc_addr_arg;
+
   /* ??? Ignore MSB and Instruction format for now.  */
   assert (ELF64_R_TYPE (reloc->r_info) == R_IA64_REL64LSB);

-  *reloc_addr += l_addr;
+  if(tmp4 & 0x7) {
+       uint32_t tmp1, tmp2;
+       uint64_t tmp3;
+       tmp1 = (uint32_t) *reloc_addr_32;
+       tmp2 = (uint32_t) *(reloc_addr_32+1);
+       tmp3 = (((uint64_t) tmp2 << 32) | tmp1);
+       tmp3 += l_addr;
+       *reloc_addr_32 = (uint32_t) tmp3;
+       *(reloc_addr_32+1) = (uint32_t) (tmp3 >> 32);
+  }else
+       *reloc_addr_64 += l_addr;
 }

 /* Perform a RELATIVE reloc on the .got entry that transfers to the .plt.  */


Comment 17 Zhang Yanmin 2006-11-13 00:31:45 EST
With the latest FC6GA, anaconda has the similiar unaligned access.

*******************Log*********************
anaconda(13240): unaligned access to 0x2000000004ebe0b3, ip=0x2000000000018080
anaconda(13240): unaligned access to 0x2000000004ebe0b3, ip=0x2000000000018090
anaconda(13240): unaligned access to 0x2000000004ebe0be, ip=0x2000000000018080
anaconda(13240): unaligned access to 0x2000000004ebe0be, ip=0x2000000000018090
anaconda(13240): unaligned access to 0x2000000004ebe0c9, ip=0x2000000000018080
Comment 18 Zhang Yanmin 2006-11-13 00:48:02 EST
The patch in #16 is incorrect, because int (4 bytes) access also causes
unaligned fault if the 4 bytes are not in one 8-byte  which aligns with 8-byte.

For example, if reloc_addr_32 is at 0x2000000004ebe0b3, below statement will
cause  unaligned access:

tmp2 = (uint32_t) *(reloc_addr_32+1);
Comment 19 Zhang Yanmin 2006-11-13 01:03:43 EST
I think it's a bad idea to fix it in glibc, because if so, it just fixes the
issue caused by relocation. Later on, when the application accesses the variable
directly, the unaligned access will be triggered again which glibc fix couldn't
provide help any more.


Heinz Mauelshagen,

As for the dmraid case, I think it doesn't affect the disk storage any more. The
definition of format_member and its utilization are limited in file
lib/format/format.c. The unaligned access is caused by relocating the static
read-only string. Pls. read #8 and #9. Did you really test the patch in #4?

Comment 20 Luming Yu 2006-11-13 01:04:31 EST
To comment#18,
Yanmin, 

I don't understand what are you talking about.
0x2000000004ebe0b3 : this is NOT a address aligned at 4 byte.
 
It is aligned at 1 byte. This is not related to the issue we are talking about.



Comment 21 Luming Yu 2006-11-13 01:05:28 EST
To comment#18,
Yanmin, 

I don't understand what are you talking about.
0x2000000004ebe0b3 : this is NOT a address aligned at 4 byte.
 
It is aligned at 1 byte. This is not related to the issue we are talking about.



Comment 22 Zhang Yanmin 2006-11-13 01:17:52 EST
I got below new log when using kernel 2.6.18-1.2798.fc6. The data address is not
aligned with 4 bytes.

**************************************log*************************************
dmraid(2377): unaligned access to 0x20000000000ca0b3, ip=0x2000000000018080
dmraid(2377): unaligned access to 0x20000000000ca0b3, ip=0x2000000000018090
dmraid(2377): unaligned access to 0x20000000000ca0be, ip=0x2000000000018080
dmraid(2377): unaligned access to 0x20000000000ca0be, ip=0x2000000000018090
dmraid(2377): unaligned access to 0x20000000000ca0c9, ip=0x2000000000018080
Comment 23 Luming Yu 2006-11-13 12:24:33 EST
If I interpret it correctly, the following new definition of format_member 
should cause unaligment issue on 32-bit machine too. So, please fix it first 
on 32-bit machine(sounds like a generic problem). Then, patch at comment #16 
should fix the unalignment issue for ia64, if "msg" is aligned at 32-bit.
So, I don't understand why the patch at comment# 16 is incorrect for solving 
this alignment issue. Please feel free to correct me, if I'm wrong. Also 
please note, the side effect of the patch at comment# 16 would crash the 
application which breaks the alignment requirement like dmraid. This sounds 
bad. Probably, we need some balance. 
struct format_member {
        const unsigned short offset;
        const unsigned char flags;
        const char *msg;
} __attribute__ ((packed));

So, if you really want to test my patch, please make sure the format_member is 
like:
struct format_member {
        const unsigned short offset;
        const unsigned char all;
        const unsigned char method;
        const char *msg;
} __attribute__ ((packed));
Comment 24 Luming Yu 2006-11-13 20:04:02 EST
> Also please note, the side effect of the patch at comment# 16 would crash 
>the application which breaks the alignment requirement like dmraid.
Probably, I can update that patch a little bit to avoid the bad crash.
Comment 25 Heinz Mauelshagen 2006-11-20 08:23:59 EST
what about:
struct format_member [
        const chat *msg;
        const unsigned short offset;
        const unsigned short flags;
} __attribute__((packed));
?

with respect to comment #4 + #19: yes I did. It broke dmraid.
Comment 26 Heinz Mauelshagen 2006-11-20 08:24:20 EST
what about:
struct format_member [
        const char *msg;
        const unsigned short offset;
        const unsigned short flags;
} __attribute__((packed));
?

with respect to comment #4 + #19: yes I did. It broke dmraid.
Comment 27 Zhang Yanmin 2006-12-07 23:36:20 EST
Created attachment 143115 [details]
Patch to fix it in glibc

Here is the patch to fix it in glibc. Assume the pointer only alignes with 1
byte.
Comment 28 Zhang Yanmin 2006-12-07 23:38:03 EST
Heinz Mauelshagen,

Would you like to test the patch at #27? I did initial testing and it works well.

Thanks,
Yanmin
Comment 29 Bryn M. Reeves 2006-12-08 05:09:47 EST
Fixing this in the dynamic linker was already rejected by the maintainers - see
bug 214440

Thanks,

Bryn.

Note You need to log in before you can comment on or make changes to this bug.