651861 – hang in readdir64_r on i686 in EC2

Bug 651861 - hang in readdir64_r on i686 in EC2

Summary: hang in readdir64_r on i686 in EC2

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	14
Hardware:	i686
OS:	Linux
Priority:	low
Severity:	high
Target Milestone:	---
Assignee:	Justin M. Forbes
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	658894
TreeView+	depends on / blocked

Reported:	2010-11-10 13:51 UTC by Robert de Bock
Modified:	2013-01-02 14:04 UTC (History)
CC List:	26 users (show)
Fixed In Version:
Clone Of:
Clones:	658894 (view as bug list)
Environment:
Last Closed:	2011-08-30 20:04:45 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
minimal readdir64_r test (482 bytes, text/plain) 2010-11-12 16:34 UTC, Joe Orton	no flags	Details
View All

Description Robert de Bock 2010-11-10 13:51:48 UTC

Description of problem: When trying to start httpd, the process hangs for an indefinite time.

Version-Release number of selected component (if applicable):
- Apache: httpd-2.2.17-1.fc14.i686
- Fedora: Fedora release 14 (Laughlin)
- Kernel: 2.6.35.6-48.fc14.i686.PAE
- Amazon EC2 AIM: ami-ac281dd8 (fedora-images-eu-west-1/fedora-14-i386-S3.ec2.manifest.xml)
- SElinux is disabled by default on the Fedora-14 AIMs. (SELinux status: disabled)

How reproducible:
Install an Amazon EC2 AIM, login, install httpd and try to start it. 

Steps to Reproduce:
1. Install a Fedora-14 provided AIM. I used ami-ac281dd8. (EU West)
2. Login, run "yum update" and install httpd. (yum -y install httpd)
3. Try to run "apachectl configtest". It will hang.
  
Actual results:
A hanging httpd.

Expected results:
A running httpd.

Additional info:
I've tried to debug this problem with strace, the bottom output is:
# strace -f httpd
<skipped many lines>
mmap2(NULL, 24656, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 4, 0) = 0xc14000
mmap2(0xc19000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 4, 0x4) = 0xc19000
close(4)                                = 0
mprotect(0xc19000, 4096, PROT_READ)     = 0
open("/etc/httpd/modules/mod_version.so", O_RDONLY) = 4
read(4, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0p\6\0\0004\0\0\0"..., 512) = 512
fstat64(4, {st_mode=S_IFREG|0755, st_size=9540, ...}) = 0
mmap2(NULL, 12368, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 4, 0) = 0x713000
mprotect(0x714000, 4096, PROT_NONE)     = 0
mmap2(0x715000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 4, 0x1) = 0x715000
close(4)                                = 0
mprotect(0x715000, 4096, PROT_READ)     = 0
read(3, "cern_meta.so\n#LoadModule cgid_mo"..., 4096) = 4096
stat64("/etc/httpd/conf.d", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
open("/etc/httpd/conf.d", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY|O_CLOEXEC) = 4
getdents64(4, /* 6 entries */, 32768)   = 184
getdents64(4, /* 0 entries */, 32768)   = 0
<here httpd is hanging>

Comment 1 Joe Orton 2010-11-10 14:02:42 UTC

Can you install the debuginfo and run it under gdb?

# debuginfo-install httpd
# gdb --args /usr/sbin/httpd -X
...
(gdb) run
...hang...
<CTRL-C>
(gdb) bt

Comment 2 Robert de Bock 2010-11-10 14:47:23 UTC

Okay:
# yum -y install yum-utils gdb 
# debuginfo-install httpd
# debuginfo-install cyrus-sasl-lib libuuid nspr nss nss-softokn-freebl nss-util
# gdb --args /usr/sbin/httpd -XGNU gdb (GDB) Fedora (7.2-23.fc14)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "i686-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/sbin/httpd...Reading symbols from /usr/lib/debug/usr/sbin/httpd.debug...done.
done.
(gdb) run
Starting program: /usr/sbin/httpd -X
[Thread debugging using libthread_db enabled]
^C
Program received signal SIGINT, Interrupt.
0xb7c146d0 in __readdir64_r (dirp=0x1c7360, entry=0x1c5188, result=0xbfffdfe8)
    at ../sysdeps/unix/readdir_r.c:132
132	  return dp != NULL ? 0 : reclen ? errno : 0;
(gdb) bt
#0  0xb7c146d0 in __readdir64_r (dirp=0x1c7360, entry=0x1c5188, 
    result=0xbfffdfe8) at ../sysdeps/unix/readdir_r.c:132
#1  0xb7d37865 in apr_dir_read (finfo=<value optimized out>, 
    wanted=<value optimized out>, thedir=<value optimized out>)
    at file_io/unix/dir.c:157
#2  0x001395d8 in ap_process_resource_config (s=0x172ad8, 
    fname=0x1c5120 "/etc/httpd/conf.d/*.conf", conftree=0xbffff18c, 
    p=0x16d0b8, ptemp=0x19d178)
    at /usr/src/debug/httpd-2.2.17/server/config.c:1712
#3  0x0012aa8b in include_config (cmd=0xbffff46c, dummy=0xbffff344, 
    name=0x1c5088 "conf.d/*.conf")
    at /usr/src/debug/httpd-2.2.17/server/core.c:2605
#4  0x0013673c in invoke_cmd (cmd=0x1624d8, parms=0xbffff46c, 
    mconfig=0xbffff344, args=0x1a41a5 "")
    at /usr/src/debug/httpd-2.2.17/server/config.c:895
#5  0x0013822b in execute_now (p=<value optimized out>, 
    temp_pool=<value optimized out>, l=0x1a4190 "Include conf.d/*.conf", 
    parms=0xbffff46c, current=0xbffff3ac, curr_parent=0xbffff3a8, 
    conftree=0x163d54) at /usr/src/debug/httpd-2.2.17/server/config.c:1441
#6  ap_build_config_sub (p=<value optimized out>, 
    temp_pool=<value optimized out>, l=0x1a4190 "Include conf.d/*.conf", 
    parms=0xbffff46c, current=0xbffff3ac, curr_parent=0xbffff3a8, 
    conftree=0x163d54) at /usr/src/debug/httpd-2.2.17/server/config.c:1012
#7  0x0013884d in ap_build_config (parms=0xbffff46c, p=0x16d0b8, 
    temp_pool=0x19d178, conftree=0x163d54)
    at /usr/src/debug/httpd-2.2.17/server/config.c:1224
#8  0x00138cec in process_resource_config_nofnmatch (s=0x172ad8, 
    fname=<value optimized out>, conftree=0x163d54, p=0x16d0b8, 
    ptemp=0x19d178, depth=0)
    at /usr/src/debug/httpd-2.2.17/server/config.c:1634
#9  0x00139534 in ap_process_resource_config (s=0x172ad8, 
    fname=0x19fc58 "/etc/httpd/conf/httpd.conf", conftree=0x163d54, 
    p=0x16d0b8, ptemp=0x19d178)
    at /usr/src/debug/httpd-2.2.17/server/config.c:1666
#10 0x0013a1e1 in ap_read_config (process=0x16b140, ptemp=0x19d178, 
    filename=0x14f1cc "conf/httpd.conf", conftree=0x163d54)
    at /usr/src/debug/httpd-2.2.17/server/config.c:2026
#11 0x00120688 in main (argc=2, argv=0xbffff844)
    at /usr/src/debug/httpd-2.2.17/server/main.c:632

Interesting to know too; httpd consumes 99.something% of the CPU while running and hanging.

Regards, Robert de Bock.

Comment 3 Joe Orton 2010-11-10 15:32:32 UTC

1) What are the contents of:

   /etc/httpd/conf.d/

2) Can you step a few times in gdb to see where the hang occurs?

(enter "s" in gdb and see if it steps or simply hangs again)

Comment 4 Robert de Bock 2010-11-10 15:54:24 UTC

Hi,

The contents of the /etc/httpd/conf.d directory:
---
README  welcome.conf
---
(This is just the package httpd, no configuration changes have been done.)

I have stepped through gdb. Hitting CTRL+C every +- 10 seconds followed by "s" and RETURN. (If this is not correct, please let me know; I am not very familiar with gdb.)

Here is the output:
---
# gdb --args httpd -X
GNU gdb (GDB) Fedora (7.2-23.fc14)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "i686-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/sbin/httpd...Reading symbols from /usr/lib/debug/usr/sbin/httpd.debug...done.
done.
(gdb) run
Starting program: /usr/sbin/httpd -X
[Thread debugging using libthread_db enabled]
^C
Program received signal SIGINT, Interrupt.
0xb7c146d0 in __readdir64_r (dirp=0x1c7360, entry=0x1c5188, 
    result=0xbfffdfd8) at ../sysdeps/unix/readdir_r.c:132
132	  return dp != NULL ? 0 : reclen ? errno : 0;
(gdb) s
^C
Program received signal SIGINT, Interrupt.
0xb7c146d0 in __readdir64_r (dirp=0x1c7360, entry=0x1c5188, 
    result=0xbfffdfd8) at ../sysdeps/unix/readdir_r.c:132
132	  return dp != NULL ? 0 : reclen ? errno : 0;
(gdb) s
^C
Program received signal SIGINT, Interrupt.
0xb7c146d0 in __readdir64_r (dirp=0x1c7360, entry=0x1c5188, 
    result=0xbfffdfd8) at ../sysdeps/unix/readdir_r.c:132
132	  return dp != NULL ? 0 : reclen ? errno : 0;
(gdb) s
^C
Program received signal SIGINT, Interrupt.
0xb7c146d0 in __readdir64_r (dirp=0x1c7360, entry=0x1c5188, 
    result=0xbfffdfd8) at ../sysdeps/unix/readdir_r.c:132
132	  return dp != NULL ? 0 : reclen ? errno : 0;
(gdb) bt
#0  0xb7c146d0 in __readdir64_r (dirp=0x1c7360, entry=0x1c5188, 
    result=0xbfffdfd8) at ../sysdeps/unix/readdir_r.c:132
#1  0xb7d37865 in apr_dir_read (finfo=<value optimized out>, 
    wanted=<value optimized out>, thedir=<value optimized out>)
    at file_io/unix/dir.c:157
#2  0x001395d8 in ap_process_resource_config (s=0x172ad8, 
    fname=0x1c5120 "/etc/httpd/conf.d/*.conf", conftree=0xbffff17c, 
    p=0x16d0b8, ptemp=0x19d178)
    at /usr/src/debug/httpd-2.2.17/server/config.c:1712
#3  0x0012aa8b in include_config (cmd=0xbffff45c, dummy=0xbffff334, 
    name=0x1c5088 "conf.d/*.conf")
    at /usr/src/debug/httpd-2.2.17/server/core.c:2605
#4  0x0013673c in invoke_cmd (cmd=0x1624d8, parms=0xbffff45c, 
    mconfig=0xbffff334, args=0x1a41a5 "")
    at /usr/src/debug/httpd-2.2.17/server/config.c:895
#5  0x0013822b in execute_now (p=<value optimized out>, 
    temp_pool=<value optimized out>, 
    l=0x1a4190 "Include conf.d/*.conf", parms=0xbffff45c, 
    current=0xbffff39c, curr_parent=0xbffff398, conftree=0x163d54)
    at /usr/src/debug/httpd-2.2.17/server/config.c:1441
#6  ap_build_config_sub (p=<value optimized out>, 
    temp_pool=<value optimized out>, 
    l=0x1a4190 "Include conf.d/*.conf", parms=0xbffff45c, 
    current=0xbffff39c, curr_parent=0xbffff398, conftree=0x163d54)
    at /usr/src/debug/httpd-2.2.17/server/config.c:1012
#7  0x0013884d in ap_build_config (parms=0xbffff45c, p=0x16d0b8, 
    temp_pool=0x19d178, conftree=0x163d54)
    at /usr/src/debug/httpd-2.2.17/server/config.c:1224
#8  0x00138cec in process_resource_config_nofnmatch (s=0x172ad8, 
    fname=<value optimized out>, conftree=0x163d54, p=0x16d0b8, 
    ptemp=0x19d178, depth=0)
    at /usr/src/debug/httpd-2.2.17/server/config.c:1634
#9  0x00139534 in ap_process_resource_config (s=0x172ad8, 
    fname=0x19fc58 "/etc/httpd/conf/httpd.conf", conftree=0x163d54, 
    p=0x16d0b8, ptemp=0x19d178)
    at /usr/src/debug/httpd-2.2.17/server/config.c:1666
#10 0x0013a1e1 in ap_read_config (process=0x16b140, ptemp=0x19d178, 
    filename=0x14f1cc "conf/httpd.conf", conftree=0x163d54)
    at /usr/src/debug/httpd-2.2.17/server/config.c:2026
#11 0x00120688 in main (argc=2, argv=0xbffff834)
    at /usr/src/debug/httpd-2.2.17/server/main.c:632
(gdb) 
---

Regards, Robert de Bock.

Comment 5 Marek Goldmann 2010-11-11 21:47:03 UTC

I have same issue.

If you comment this line:

	Include conf.d/*.conf

in /etc/httd/conf/httpd.conf Apache at least will start. Unfortunately it still hangs on any request and therefore is unusable. Strace:

https://gist.github.com/664565

I tried to find the issue with #fedora-devel guys, but failed to track the issue :(

Comment 6 Joe Orton 2010-11-12 09:41:30 UTC

Marek, are you seeing this in EC2 or on plain i686?  Please give uname -a output.

Comment 7 Marek Goldmann 2010-11-12 10:41:07 UTC

Joe,

I saw this on EC2 *only*. On my local boxes (both: bare metal and KVM/VMware VM's) – it runs great.

I used ami-669f680f listed here: https://fedoraproject.org/wiki/Cloud_SIG/EC2_Images#Fedora_14

[ec2-user@ip-10-112-19-176 ~]$ uname -a
Linux ip-10-112-19-176 2.6.35.6-48.fc14.i686.PAE #1 SMP Fri Oct 22 15:27:53 UTC 2010 i686 i686 i386 GNU/Linux

--Marek

Comment 8 Joe Orton 2010-11-12 12:12:07 UTC

OK, thanks.

Is there anything special about the affected installs other than that they run in EC2?  Is it an extN filesystem?

Comment 9 Joe Orton 2010-11-12 12:13:51 UTC

Whether this is reproducable with the x86_64 F14 AMI would be a useful data point.

Comment 10 Marek Goldmann 2010-11-12 12:28:41 UTC

I tried 64 bit appliance: ami-e291668b.

[ec2-user@ip-10-204-33-212 ~]$ uname -a
Linux ip-10-204-33-212 2.6.35.6-48.fc14.x86_64 #1 SMP Fri Oct 22 15:36:08 UTC 2010 x86_64 x86_64 x86_64 GNU/Linux

And httpd *starts normally*. Both AMIs have ext3 filesystem.

Comment 11 Marek Goldmann 2010-11-12 12:29:56 UTC

Forgot to add: I tried to create (with BoxGrinder) AMI with ext4 filesystem, but it had same issue, everything else worked great.

Comment 12 Robert de Bock 2010-11-12 12:47:58 UTC

Confirmed; httpd starts normally on 64 bit: "Linux ip-10-227-158-79 2.6.35.6-48.fc14.x86_64 #1 SMP Fri Oct 22 15:36:08 UTC 2010 x86_64 x86_64 x86_64 GNU/Linux" (AIM: ami-a8281ddc)

Comment 13 Simon Smith 2010-11-12 16:07:34 UTC

I see similar behavior when installing a perl module with "perl Makefile.PL" (example strace output below is from Bloom-Filter-1.0 but I see it on other perl modules also).  It's hanging on getdents64.

getcwd("/root/Bloom-Filter-1.0", 4095)  = 23
lstat64(".", {st_mode=S_IFDIR|0777, st_size=4096, ...}) = 0
stat64(".", {st_mode=S_IFDIR|0777, st_size=4096, ...}) = 0
open(".", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY|O_CLOEXEC) = 3
getdents64(3, /* 9 entries */, 32768)   = 264
getdents64(3, /* 0 entries */, 32768)   = 0


When I run the same command on my rackspace Fedora 14 instance, which happens to be 64-bit, it uses getdents, NOT gendents64 (knowing absolutely nothing about getdents, it seems strange that the 64-bit instance uses getdents and the 32-bit instance uses getdents64, but maybe that's normal.)

If I have some time I'll try running the perl command under the debugger as requested for httpd.

Comment 14 Simon Smith 2010-11-12 16:10:32 UTC

sorry, here is the uname info that strace was from:


[ec2-user@ip-10-212-118-242 Bloom-Filter-1.0]$ uname -a
Linux ip-10-212-118-242 2.6.35.6-48.fc14.i686.PAE #1 SMP Fri Oct 22 15:27:53 UTC 2010 i686 i686 i386 GNU/Linux


The ami is: ami-669f680f, m1.small

I installed the following software:

yum -y install wget
yum -y install perl
yum -y install perl-ExtUtils-MakeMaker

and then after the problem occurred, 

yum -y install strace

Comment 15 Joe Orton 2010-11-12 16:34:42 UTC

Created attachment 460093 [details]
minimal readdir64_r test

Can those seeing this issue try this minimal test case for readdir64_r?

gcc -Wall -O2 -Werror readdir64.c -o readdir64
./readdir64

should list current directory contents.  In case this is specific to some directory try 

cd /etc/httpd/conf.d
/path/to/readdir64

to see whether that makes a difference.

Comment 16 Marek Goldmann 2010-11-12 16:49:53 UTC

(In reply to comment #15)

> should list current directory contents.  In case this is specific to some
> directory try 

Here is my output:

[ec2-user@ip-10-244-190-103 ~]$ ls
readdir64.c
[ec2-user@ip-10-244-190-103 ~]$ gcc -Wall -O2 -Werror readdir64.c -o readdir64
[ec2-user@ip-10-244-190-103 ~]$ ls
readdir64  readdir64.c
[ec2-user@ip-10-244-190-103 ~]$ ./readdir64 
entry: .bash_logout
entry: .bash_profile
entry: .ssh
entry: ..
entry: .bashrc
entry: readdir64
entry: .
entry: readdir64.c
^C
[ec2-user@ip-10-244-190-103 ~]$ strace ./readdir64 
execve("./readdir64", ["./readdir64"], [/* 20 vars */]) = 0
brk(0)                                  = 0x869c000
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7835000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY)      = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=15248, ...}) = 0
mmap2(NULL, 15248, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb7831000
close(3)                                = 0
open("/lib/libc.so.6", O_RDONLY)        = 3
read(3, "\177ELF\1\1\1\3\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0po\1\0004\0\0\0"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=1886052, ...}) = 0
mmap2(NULL, 1649160, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x9b0000
mmap2(0xb3d000, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x18c) = 0xb3d000
mmap2(0xb40000, 10760, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb40000
close(3)                                = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7830000
set_thread_area({entry_number:-1 -> 6, base_addr:0xb78306c0, limit:1048575, seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0, useable:1}) = 0
mprotect(0xb3d000, 8192, PROT_READ)     = 0
mprotect(0xea3000, 4096, PROT_READ)     = 0
munmap(0xb7831000, 15248)               = 0
open(".", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY|O_CLOEXEC) = 3
brk(0)                                  = 0x869c000
brk(0x86c5000)                          = 0x86c5000
brk(0)                                  = 0x86c5000
getdents64(3, /* 8 entries */, 32768)   = 240
fstat64(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7834000
write(1, "entry: .bash_logout\n", 20entry: .bash_logout
)   = 20
write(1, "entry: .bash_profile\n", 21entry: .bash_profile
)  = 21
write(1, "entry: .ssh\n", 12entry: .ssh
)           = 12
write(1, "entry: ..\n", 10entry: ..
)             = 10
write(1, "entry: .bashrc\n", 15entry: .bashrc
)        = 15
write(1, "entry: readdir64\n", 17entry: readdir64
)      = 17
write(1, "entry: .\n", 9entry: .
)               = 9
write(1, "entry: readdir64.c\n", 19entry: readdir64.c
)    = 19
getdents64(3, /* 0 entries */, 32768)   = 0
^C--- SIGINT (Interrupt) @ 0 (0) ---

Comment 17 Joe Orton 2010-11-12 16:59:49 UTC

Reassigning to glibc since it appears the hang is there (thought it may be a kernel bug.

Comment 18 Joe Orton 2010-11-12 17:00:55 UTC

... or even something caused by the Amazon environment)...

Comment 19 Simon Smith 2010-11-12 17:10:28 UTC

from the EC2 32-bit instance, the ami is: ami-669f680f, m1.small

uname -a
Linux ip-10-212-118-242 2.6.35.6-48.fc14.i686.PAE #1 SMP Fri Oct 22 15:27:53
UTC 2010 i686 i686 i386 GNU/Linux


Here is the output of the readdir64 test you asked for:




[root@ip-10-212-118-242 tmp]# strace ./readdir64 
execve("./readdir64", ["./readdir64"], [/* 18 vars */]) = 0
brk(0)                                  = 0x93a0000
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7705000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY)      = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=15248, ...}) = 0
mmap2(NULL, 15248, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb7701000
close(3)                                = 0
open("/lib/libc.so.6", O_RDONLY)        = 3
read(3, "\177ELF\1\1\1\3\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0po\1\0004\0\0\0"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=1886052, ...}) = 0
mmap2(NULL, 1649160, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xb41000
mmap2(0xcce000, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x18c) = 0xcce000
mmap2(0xcd1000, 10760, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xcd1000
close(3)                                = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7700000
set_thread_area({entry_number:-1 -> 6, base_addr:0xb77006c0, limit:1048575, seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0, useable:1}) = 0
mprotect(0xcce000, 8192, PROT_READ)     = 0
mprotect(0xd7e000, 4096, PROT_READ)     = 0
munmap(0xb7701000, 15248)               = 0
open(".", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY|O_CLOEXEC) = 3
brk(0)                                  = 0x93a0000
brk(0x93c9000)                          = 0x93c9000
brk(0)                                  = 0x93c9000
getdents64(3, /* 7 entries */, 32768)   = 232
fstat64(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7704000
write(1, "entry: Bloom-Filter-1.0\n", 24entry: Bloom-Filter-1.0
) = 24
write(1, "entry: .ICE-unix\n", 17entry: .ICE-unix
)      = 17
write(1, "entry: Bloom-Filter-1.0.tar.gz\n", 31entry: Bloom-Filter-1.0.tar.gz
) = 31
write(1, "entry: ..\n", 10entry: ..
)             = 10
write(1, "entry: readdir64\n", 17entry: readdir64
)      = 17
write(1, "entry: .\n", 9entry: .
)               = 9
write(1, "entry: readdir64.c\n", 19entry: readdir64.c
)    = 19
getdents64(3, /* 0 entries */, 32768)   = 0

Comment 20 Kent 2010-11-12 23:47:45 UTC

I experienced the same hang on getdents64 when trying to install the latest Oracle/Sun JRE RPM on Fedora 14 EC2 32-bit AMI.  It invokes the java runtime and that's the one that stalls out.  Seems like plenty of strace logs have been posted but let me know if you'd like me to generate another one for this specific circumstance.

Comment 21 Andreas Schwab 2010-11-15 11:10:48 UTC

What are the contents of *dirp and *entry?  What is the returned value?

Comment 22 Marek Goldmann 2010-11-15 11:22:40 UTC

(In reply to comment #21)
> What are the contents of *dirp and *entry?  What is the returned value?

I'm not familiar with C. Could you please describe step-by-step instructions what I should do? Or better – an attachment with instructions?

Thanks!

Comment 23 Andreas Schwab 2010-11-15 11:33:18 UTC

Just print them.

Comment 24 tcrawley 2010-11-15 16:50:24 UTC

Andreas:

What do you specifically want to see about *d and *entry? The names? The full struct contents? It may be easier if you modify readdir64.c (https://bugzilla.redhat.com/attachment.cgi?id=460093) to printf what you would like to see - that would save some back and forth.

Comment 25 Andreas Schwab 2010-11-15 17:10:37 UTC

Just print it in the debugger.

Comment 26 tcrawley 2010-11-15 17:51:28 UTC

Here is my gdb session. Let me know if there is anything else you want me to try. 

[ec2-user@ip-10-243-15-194 ~]$ gcc -Wall -g -Werror readdir64.c -o readdir64
[ec2-user@ip-10-243-15-194 ~]$ gdb readdir64
GNU gdb (GDB) Fedora (7.2-23.fc14)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "i686-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/ec2-user/readdir64...done.
(gdb) run
Starting program: /home/ec2-user/readdir64 
entry: .bash_logout
result: .bash_logout
entry: .bash_profile
result: .bash_profile
entry: .bash_history
result: .bash_history
entry: .ssh
result: .ssh
entry: .readdir64.c.swp
result: .readdir64.c.swp
entry: typescript
result: typescript
entry: ..
result: ..
entry: .bashrc
result: .bashrc
entry: readdir64
result: readdir64
entry: .
result: .
entry: readdir64.c
result: readdir64.c
^C
Program received signal SIGINT, Interrupt.
0x001cf6d0 in __readdir64_r (dirp=0x804a008, entry=0xbffff578, result=0xbffff574) at ../sysdeps/unix/readdir_r.c:132
132	  return dp != NULL ? 0 : reclen ? errno : 0;
(gdb) list
127	  else
128	    *result = NULL;
129	
130	  __libc_lock_unlock (dirp->lock);
131	
132	  return dp != NULL ? 0 : reclen ? errno : 0;
133	}
134	
135	#ifdef __READDIR_R_ALIAS
136	weak_alias (__readdir_r, readdir_r)
(gdb) c
Continuing.
^C
Program received signal SIGINT, Interrupt.
0x001cf6d0 in __readdir64_r (dirp=0x804a008, entry=0xbffff578, result=0xbffff574) at ../sysdeps/unix/readdir_r.c:132
132	  return dp != NULL ? 0 : reclen ? errno : 0;
(gdb) s
^C
Program received signal SIGINT, Interrupt.
0x001cf6d0 in __readdir64_r (dirp=0x804a008, entry=0xbffff578, result=0xbffff574) at ../sysdeps/unix/readdir_r.c:132
132	  return dp != NULL ? 0 : reclen ? errno : 0;
(gdb) bt
#0  0x001cf6d0 in __readdir64_r (dirp=0x804a008, entry=0xbffff578, result=0xbffff574) at ../sysdeps/unix/readdir_r.c:132
#1  0x08048532 in main (argc=1, argv=0xbffff744) at readdir64.c:20
(gdb) up
#1  0x08048532 in main (argc=1, argv=0xbffff744) at readdir64.c:20
20	    while (readdir64_r(d, &entry, &result) == 0
(gdb) print *d
$1 = {fd = 7, lock = 0, allocation = 32768, size = 352, offset = 352, filepos = 2147483647, data = 0x804a008 "\a"}
(gdb) print entry
$2 = {d_ino = 524299, d_off = 2147483647, d_reclen = 32, d_type = 8 '\b', 
  d_name = "readdir64.c\000\000swp\000\000\000\000\000\364d\033\000\344\367\023\000q\352\261\a\363\003\000\000\b\000\000\000.N=\366\000\371\377\267\003\000\000\000\000\031\023\000\000\000\000\000\000\000\000\000\001\000\000\000\237\b\000\000\060\371\377\267P\366\377\267\220\202\004\b\004\000\024\000́\004\b\001\000\000\000\274\017\023\000\360\366\377\277\270\032\023\000\300\366\377\277\241\247\021\000\260\366\377\277́\004\b\244\366\377\277\\\032\023\000\000\000\000\000\060\371\377\267\001\000\000\000\000\000\000\000\001\000\000\000\000\031\023\000\000\000`\000\000\b\004\002\000\000`\000\001\000\000\000\000\200\000\000\364\037,\000\344\001,\000D\367\377\277X\366\377\277\000\000\000\000\360\366\377\277ȗ\004\bh\366\377\277d\203\004\b\225/\026\000ȗ\004\b\230\366\377\277y\205\004\b\220\202\004\b\240,,\000\240,,\000\364\037,\000`\205\004\b\360\203\004\bk\205\004"}
(gdb) list
15	    if (d == NULL) {
16	        perror("opendir64");
17	        return 1;
18	    }
19	
20	    while (readdir64_r(d, &entry, &result) == 0
21	           && result != NULL) {
22	      printf("entry: %s\n", entry.d_name);
23	      printf("result: %s\n", result->d_name);
24	    }
(gdb) quit

Comment 27 tcrawley 2010-11-15 17:59:18 UTC

Just realized I pasted the version with the prints in main, instead of in __readdir64_r. Here are the *dirp and *entry values at that point in the call stack: 

[ec2-user@ip-10-243-15-194 ~]$ gdb readdir64
GNU gdb (GDB) Fedora (7.2-23.fc14)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "i686-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/ec2-user/readdir64...done.
(gdb) run
Starting program: /home/ec2-user/readdir64 
entry: .bash_logout
result: .bash_logout
entry: .bash_profile
result: .bash_profile
entry: .bash_history
result: .bash_history
entry: .ssh
result: .ssh
entry: .readdir64.c.swp
result: .readdir64.c.swp
entry: typescript
result: typescript
entry: ..
result: ..
entry: .bashrc
result: .bashrc
entry: readdir64
result: readdir64
entry: .
result: .
entry: readdir64.c
result: readdir64.c
^C
Program received signal SIGINT, Interrupt.
0x001cf6d0 in __readdir64_r (dirp=0x804a008, entry=0xbffff578, result=0xbffff574) at ../sysdeps/unix/readdir_r.c:132
132	  return dp != NULL ? 0 : reclen ? errno : 0;
(gdb) print dirp
$1 = (DIR *) 0x804a008
(gdb) print *dirp
$2 = {fd = 7, lock = 0, allocation = 32768, size = 352, offset = 352, filepos = 2147483647, data = 0x804a008 "\a"}
(gdb) print entry
$3 = (struct dirent64 *) 0xbffff578
(gdb) print *entry
$4 = {d_ino = 524299, d_off = 2147483647, d_reclen = 32, d_type = 8 '\b', 
  d_name = "readdir64.c\000\000swp\000\000\000\000\000\364d\033\000\344\367\023\000q\352\261\a\363\003\000\000\b\000\000\000.N=\366\000\371\377\267\003\000\000\000\000\031\023\000\000\000\000\000\000\000\000\000\001\000\000\000\237\b\000\000\060\371\377\267P\366\377\267\220\202\004\b\004\000\024\000́\004\b\001\000\000\000\274\017\023\000\360\366\377\277\270\032\023\000\300\366\377\277\241\247\021\000\260\366\377\277́\004\b\244\366\377\277\\\032\023\000\000\000\000\000\060\371\377\267\001\000\000\000\000\000\000\000\001\000\000\000\000\031\023\000\000\000`\000\000\b\004\001\000\000`\000\001\000\000\000\000\200\000\000\364\037,\000\344\001,\000D\367\377\277X\366\377\277\000\000\000\000\360\366\377\277ȗ\004\bh\366\377\277d\203\004\b\225/\026\000ȗ\004\b\230\366\377\277y\205\004\b\220\202\004\b\240,,\000\240,,\000\364\037,\000`\205\004\b\360\203\004\bk\205\004"}
(gdb) bt
#0  0x001cf6d0 in __readdir64_r (dirp=0x804a008, entry=0xbffff578, result=0xbffff574) at ../sysdeps/unix/readdir_r.c:132
#1  0x08048532 in main (argc=1, argv=0xbffff744) at readdir64.c:20

Comment 28 Andreas Schwab 2010-11-17 12:48:54 UTC

Where does it hang?

Comment 29 tcrawley 2010-11-17 13:01:26 UTC

It appears to hang at ../sysdeps/unix/readdir_r.c:132. I issued the interrupt when in hung in my previous two comments. 

If you need access to an ec2 instance to debug, I can provide that. Just let me know.

Comment 30 Andreas Schwab 2010-11-17 13:11:18 UTC

Where does it hang _exactly_?

Comment 31 tcrawley 2010-11-17 13:34:51 UTC

Everything I know about where it is hanging is in the two gdb sessions above. If you have any tips on gathering more information, I would be happy to follow them - I am not a C developer. I would also be happy to give you access to an instance for you to debug on, or share an AMI that you can launch if you would prefer.

Comment 32 Andreas Schwab 2010-11-19 10:29:34 UTC

disp/i $pc
si

Comment 33 tcrawley 2010-11-19 12:02:46 UTC

It hangs on the si call, requiring an interrupt. I also printed the registers.

Could it be related to this issue? http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1432 The xensource issue is a segfault instead of a hang, but is on the same operation. 

xen-detect on the instance reports:
Running in PV context on Xen v3.0.


[ec2-user@ip-10-243-15-194 ~]$ gdb readdir64
GNU gdb (GDB) Fedora (7.2-23.fc14)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "i686-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/ec2-user/readdir64...done.
(gdb) disp/i $pc
(gdb) run
Starting program: /home/ec2-user/readdir64 
entry: .bash_logout
result: .bash_logout
entry: .bash_profile
result: .bash_profile
entry: .bash_history
result: .bash_history
entry: .ssh
result: .ssh
entry: .readdir64.c.swp
result: .readdir64.c.swp
entry: typescript
result: typescript
entry: ..
result: ..
entry: .bashrc
result: .bashrc
entry: readdir64
result: readdir64
entry: .
result: .
entry: readdir64.c
result: readdir64.c
^C
Program received signal SIGINT, Interrupt.
0x001cf6d0 in __readdir64_r (dirp=0x804a008, entry=0xbffff578, result=0xbffff574)
    at ../sysdeps/unix/readdir_r.c:132
132	  return dp != NULL ? 0 : reclen ? errno : 0;
1: x/i $pc
=> 0x1cf6d0 <__readdir64_r+240>:	cmovne %gs:(%edx),%eax
(gdb) si
^C
Program received signal SIGINT, Interrupt.
0x001cf6d0 in __readdir64_r (dirp=0x804a008, entry=0xbffff578, result=0xbffff574)
    at ../sysdeps/unix/readdir_r.c:132
132	  return dp != NULL ? 0 : reclen ? errno : 0;
1: x/i $pc
=> 0x1cf6d0 <__readdir64_r+240>:	cmovne %gs:(%edx),%eax
(gdb) si
^C
Program received signal SIGINT, Interrupt.
0x001cf6d0 in __readdir64_r (dirp=0x804a008, entry=0xbffff578, result=0xbffff574)
    at ../sysdeps/unix/readdir_r.c:132
132	  return dp != NULL ? 0 : reclen ? errno : 0;
1: x/i $pc
=> 0x1cf6d0 <__readdir64_r+240>:	cmovne %gs:(%edx),%eax
(gdb) info registers
eax            0x0	0
ecx            0x0	0
edx            0xffffffc8	-56
ebx            0x2c1ff4	2891764
esp            0xbffff530	0xbffff530
ebp            0xbffff558	0xbffff558
esi            0x804a008	134520840
edi            0x0	0
eip            0x1cf6d0	0x1cf6d0 <__readdir64_r+240>
eflags         0x10246	[ PF ZF IF RF ]
cs             0x73	115
ss             0x7b	123
ds             0x7b	123
es             0x7b	123
fs             0x0	0
gs             0x33	51
(gdb)

Comment 34 Andreas Schwab 2010-11-19 12:31:51 UTC

Tell amazon.

Comment 35 Ben Browning 2010-11-19 13:01:20 UTC

Andreas,

Why was this marked as CLOSED NOTABUG?. So many apps failing to run on the 32-bit F14 AMI definitely seems like a bug. Please describe why you believe it is not a bug and what you think we need to tell Amazon.

Thanks!

Comment 36 Garrett Holmstrom 2010-11-22 18:09:16 UTC

Regardless of the reason, what do you recommend we tell Amazon in the bug report we send them?  I'm afraid it's too low-level for most of us to know how to interpret the info you found.

Comment 37 Marek Goldmann 2010-11-26 15:50:11 UTC

I started a thread on AWS forums: https://forums.aws.amazon.com/thread.jspa?threadID=55419

Comment 38 Brian LaMere 2010-11-29 19:39:19 UTC

Amazon is listening; what is it that we should tell them?

Closing in on 3 weeks now since this was reported; apache not starting certainly seems to qualify as a bug in many people's opinion...?

Comment 39 Jeff Darcy 2010-11-29 21:33:37 UTC

I investigated this a little.  First I noted that changing register values so that the cmovne instruction would do something different didn't help, but pounding $eip to skip that instruction did.  This led me to believe that the problem did indeed have something to do with Xen trapping this instruction, which led me to several references (e.g. [1] [2] [3]) about segfaults and other problems with negative gs offsets - which are being used in this case.  They also had some references to a "nosegneg" hardware capability in the dynamic-loading machinery to use library versions which avoid such offsets.  I'm very far from being an expert in any of these areas, but I'm a curious kind of guy so I started experimenting.  Sure enough, the following commands allowed the previously failing test to run:

    echo "hwcap 1 nosegneg" > /etc/ld.so.conf.d/libc6-xen.conf
    ldconfig


I don't know whether this will hold up as a more general fix/workaround, or whether this hwcap is supposed to be set as part of how we build the images.  Maybe someone with more relevant knowledge will chip in, but it seems like an important data point.

[1] http://xen.1045712.n5.nabble.com/quot-hwcap-0-nosegneg-quot-doesnt-work-with-paravirt-ops-xen-as-of-2-6-23-9-td2513579.html

[2] http://web.archiveorange.com/archive/v/tXSRNylI9Pu2xPy0T9k5

[3] http://www.mail-archive.com/blag-devel@lists.blagblagblag.org/msg00008.html

Comment 40 Brian LaMere 2010-11-29 23:58:33 UTC

Good catch; was going to jump in myself and try to do this, but being at work I couldn't really do it until I had a free moment or so.  From everything I'm reading, that should be the default setting anyway:

<pre>testhost$ pwd
/root/linux-2.6.35/arch/x86/xen
testhost$ cat vdso.h 
/* Bit used for the pseudo-hwcap for non-negative segments.  We use
   bit 1 to avoid bugs in some versions of glibc when bit 0 is
   used; the choice is otherwise arbitrary. */
#define VDSO_NOTE_NONEGSEG_BIT	1</pre>

That's the source from "yumdownloader --source kernel" - why is it 0, when it should be 1?

It's set to 1 on Ubuntu images...though, it seems to have been 0 on older fedora images?

http://web.archiveorange.com/archive/v/bUZ6Keh9JWAEGccFUebP shows this fun line:  

<pre>NOTE_KERNELCAP(1, "nosegneg")  /* Change 1 back to 0 when glibc is fixed! */</pre>

At http://wiki.xensource.com/xenwiki/XenSpecificGlibc  we find a suggestion to compile glibc with a particular flag.  But, looking in the glibc source file, we find these lines:

<pre>%ifarch i686
BuildFlags="-march=i686 -mtune=generic"
%endif
%ifarch i386 i486 i586
BuildFlags="$BuildFlags -mno-tls-direct-seg-refs"</pre>

So, here are my questions:

1)  which should it be?  if it's truly otherwise "arbitrary" like the kernel source inline documentation suggests, and only matters when it's needed (and when it's needed, it should be 1) then...why is it even an option?
2)  was glibc "fixed" after the above info, making 0 the right choice, and then "broken" again recently?
3)  shouldn't "-mno-tls-direct-seg-refs" be a build option for i686 too, not just i[345]86?

Comment 41 Brian LaMere 2010-11-30 00:00:23 UTC

err...note, that's the glibc source spec file; from "yumdownloader --source glibc", and then ~/rpmbuild/SPECS/glibc.spec

Comment 42 Nick Bebout 2010-11-30 01:27:24 UTC

Reassigned to Fedora/kernel

Comment 43 Jeff Darcy 2010-11-30 13:45:42 UTC

Having talked to some folks who know a lot more about this stuff than I do, I think the picture has become clearer.  Apparently this issue of negative segment offsets is (or was once) fairly well known, which is why the whole "nosegneg" thing exists.  This was apparently set by default in RHEL/Fedora until fairly recently, but it does have a fairly serious performance impact for bare metal so it is not set in RHEL6/F14.  However, *some* of the machines at Amazon - apparently not all - are running a version of Xen that's buggy with respect to emulation of the cmovxx instruction with negative offsets.  Even without such bugs, using nosegneg in the guest performs better than relying on emulation in the host, so for EC2 and other Xen-based infrastructures this option should still be set.  To answer Brian's questions as best I can, then:

(1) We *should* enable nosegneg for EC2 to avoid the faulty emulation.

(2) I think the fix was to Xen, but we should set the option for performance reasons even where the fix is applicable.  We should *not* set it for bare metal, though.

(3) No idea.  We'd probably need to investigate more to know whether it's even relevant.

Nick, does "kernel" in this case include ld.so, or (considering that the bug might not apply to current Fedora kernels with KVM) should that go to glibc instead?

Comment 44 Andrew Jones 2010-12-01 14:51:01 UTC

(In reply to comment #43)
> (2) I think the fix was to Xen,

Hi Jeff,

Can you expand on this a bit? Did you come across any particular threads with respect to fixing it in xen that you can point me to?

Thanks,
Drew

Comment 45 Jeff Darcy 2010-12-01 15:04:00 UTC

I'm afraid I don't have a more specific reference, Andrew, but I think the fix would have to be in Xen.  All of the references I can find refer to negative offsets having a big performance impact, but this bug report exists because the emulation was not working *at all* on EC2.  Maybe it's more a case of Amazon's proprietary patches breaking it than of anyone fixing it, but when stepping over a single instruction in the guest hangs it's hard to reach any conclusion other than that the bug is in the hypervisor emulation of that instruction.

Comment 46 Andrew Jones 2010-12-01 15:14:16 UTC

Hmm, this sounds like something I should look into then. I think I'll clone this bug over to kernel-xen to see if the reproducer for it also reproduces on RHEL as well as on EC2.

Comment 47 Chris Lalancette 2010-12-01 16:38:24 UTC

(In reply to comment #44)
> (In reply to comment #43)
> > (2) I think the fix was to Xen,
> 
> Hi Jeff,
> 
> Can you expand on this a bit? Did you come across any particular threads with
> respect to fixing it in xen that you can point me to?

I'm the one who talked to Jeff about this.  To be clear, I didn't mean to imply that the bug had definitely been fixed, just that I know between 5.0 and now we have fixed several emulation bugs in the Xen hypervisor.

That being said, the performance impact of *not* using nosegneg on Xen is pretty horrendous, so we should definitely turn nosegneg on for F-14 guests running under Xen (irrespective of the emulation bug).

Chris Lalancette

Comment 48 Paolo Bonzini 2010-12-01 17:47:28 UTC

This is a problem in Amazon's hypervisor.  Some EC2 machines are running a very old version of Xen (based on RHEL5.0).

This particular bug has been fixed in RHEL5.2.  In order to backport the fix, Amazon would only need the Xensource patch linked above; other relevant hypervisor fixes appeared in the RHEL kernels 2.6.18-133 and 2.6.18-170.

Until Amazon fixes the hypervisor, no fix is possible.  glibc is fine since it provides the nosegneg files; there is no need to compile everything with -mno-tls-direct-seg-refs:

%define xenarches i686 athlon
%ifarch %{xenarches}
%define buildxen 1
%define xenpackage 0
...
%endif
...
%if %{buildxen}
build_nptl linuxnptl-nosegneg -mno-tls-direct-seg-refs
%endif

etc.

The kernel package is "guilty" since its /etc/ld.so.conf.d/*.conf file should include the hwcap.  From a quick look at the spec, an ldconfig-kernel.conf file is missing.

Comment 49 Justin M. Forbes 2010-12-02 21:14:20 UTC

So the least invasive fix is to add echo "hwcap 1 nosegneg" > /etc/ld.so.conf.d/libc6-xen.conf in the 32bit image post.  The next images will have this done.

Comment 50 Brian LaMere 2010-12-02 21:45:55 UTC

For what it's worth, I would concur; it seems that Amazon needs to patch their old servers running the 32bit machines, but until that's completed the way to fix it is for the 32bit AMI to have that setting.

Comment 51 Matthew Miller 2013-01-02 14:04:26 UTC

Just a note that Amazon is still carrying a similar workaround in their Amazon Linux images as of January 2013.

Note You need to log in before you can comment on or make changes to this bug.

agospoda
bbrownin
brianlamere
clalance
dougsland
drjones
gaiser
gansalmon
gholms
itamar
jakub
jdarcy
jforbes
jonathan
jorton
kernel-maint
madhu.chinakonda
mattdm
mgoldman
pahan
redhat-bugzilla
rjones
ron
schwab
simongsmith
tcrawley