Description of problem: I type "make check" on my code, and kernel dies :) Version-Release number of selected component (if applicable): kernel(0:2.6.11-1.1369_FC4).i686 How reproducible: 100% Steps to Reproduce: 1. download http://www.and.org/vstr/vstr-1.0.14james.tar.bz2 2. untar, and cd inside 3. mkdir j; ../scripts/b/def-tst.sh Actual results: kernel death at start of third configuration of ex_httpd tests in example directory. Expected results: happy, working kernel :) Additional info: You'll need socket_poll-1.0.1 and timer_q-1.0.5 installed, or the httpd example won't be compiled. I have rpms. It's not obvious to me if it's happening due to the start of the third configuration of tests, or at the end of process of the second (the cleanup tells the configuration to exit, but that just closes the listen sockets and waits for the other connections to die). You can test any configuration manually by doing... ./ex_httpd -P0 --cntl-file=ex_httpd_cntl <printed options> ...and then in another window... SRCDIR=../../examples ../../examples/tst/tst_httpd.pl ex_httpd_cntl setup vhosts cleanup ...also note that the second configuration uses mmap(), so you'll want to add a "trunc" option before vhosts (or it'll do truncate tests on some of the files and SEGV). Feel free to ask for anymore info.
I just tried 2.6.12-1.1385_FC4, from updates-testing, that also dies in the same way.
It's the third configuration that kills it, I assume it has to be something to do with socket filters because that's one of the main differences of that configuration, and if I remove that option the kernel doesn't die. Here is the end of the strace to the screen from over ssh: writev(1, [{"[", 1}, {"Jul 5 19:50:54", 15}, {"]: ", 3}, {"READY [0.0.0.0@32781]: ex_httpd_"..., 38}], 4[Jul 5 19:50:54]: READY [0.0.0.0@32781]: ex_httpd_root/ ) = 57 poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}], 2, 0) = 0 poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN, revents=POLLIN}], 2, -1) = 1 gettimeofday({1120607465, 601546}, NULL) = 0 accept(4, {sa_family=AF_FILE, path=@}, [2]) = 5 gettimeofday({1120607465, 602636}, NULL) = 0 fcntl64(5, F_SETFD, FD_CLOEXEC) = 0 fcntl64(5, F_GETFL) = 0x2 (flags O_RDWR) fcntl64(5, F_SETFL, O_RDWR|O_NONBLOCK) = 0 time(NULL) = 1120607465 stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=1267, ...}) = 0 stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=1267, ...}) = 0 stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=1267, ...}) = 0 writev(1, [{"[", 1}, {"Jul 5 19:51:05", 15}, {"]: ", 3}, {"CNTL CONNECT from[local]\n", 25}], 4[Jul 5 19:51:05]: CNTL CONNECT from[local] ) = 44 accept(4, 0xbfddb9f4, [2]) = -1 EAGAIN (Resource temporarily unavailable) readv(5, 0xa0991f0, 6) = -1 EAGAIN (Resource temporarily unavailable) poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN}], 3, 0) = 0 poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN, revents=POLLIN}], 3, -1) = 1 gettimeofday({1120607465, 608434}, NULL) = 0 readv(5, [{"6:STATUS,should be std. in /etc/"..., 120}, {" zsync\nimage/x-icon "..., 120}, {"text/plain MIME type\ntext/plain "..., 120}, {"d because\n# we don\'t default to "..., 120}, {"xt/plain ./Makefile ./Makefile."..., 120}, {"tension, allow files to be filte"..., 120}], 6) = 9 getpid() = 2251 writev(5, [{"64:6:", 5}, {"STATUS", 6}, {",11:from[local],24:ctime[1120607"..., 62}, {"STATUS", 6}, {",19:from[0.0.0.0@32781],24:ctime"..., 71}], 5) = 150 poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN}], 3, 0) = 0 poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN, revents=POLLIN|POLLHUP}], 3, -1) = 1 gettimeofday({1120607465, 615069}, NULL) = 0 readv(5, [{"6:STATUS,should be std. in /etc/"..., 120}, {",19:from[0.0.0.0@32781],24:ctime"..., 120}, {",11:from[local],24:ctime[1120607"..., 120}, {"4294967264:6:94967295:on "..., 120}, {"xt/plain ./Makefile ./Makefile."..., 120}, {"tension, allow files to be filte"..., 120}], 6) = 0 close(5) = 0 time(NULL) = 1120607465 writev(1, [{"[", 1}, {"Jul 5 19:51:05", 15}, {"]: ", 3}, {"CNTL FREE from[local] req_got[1:"..., 48}, {"recv[9B:9] send[150B:150]\n", 26}], 5[Jul 5 19:51:05]: CNTL FREE from[local] req_got[1:1] req_put[1:1] recv[9B:9] send[150B:150] ) = 93 poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}], 2, 0) = 0 poll(Read from remote host sophia.and.org: Connection reset by peer ...I'm also uploading the full output of strace to disk (which ends much sooner). I've tried to create a small test case that uses the same socket filter, but that doesn't fail.
Created attachment 116398 [details] strace of process that kills kernel (server) I assume it's something on the server side, the readv() just before the FILTER_ADD is the data for the filter. As I said before all code can be downloaded, and this is 100% reproducible.
Is this related to bug 162388? I think there have been a couple other people reporting iptable problems with the 2.6.11 kernel but but I couldn't find the bug numbers right off the bat.
One of the machines I've run this on (it's killed all fc4 machines I've tried) has zero iptables rules, although the ip_tables module is loaded. I'll probably test again with the module unloaded tonight, but I'd hope it wouldn't do that if you didn't have at least one rule.
do you get a kernel oops dump at all when it crashes ?
bug 162388 is a different problem, when that is triggered traffic simply doesn't behave as one desires, it doesn't crash the box what is the exact socket filter being loaded? it's probably just some logic error in the socket filter code, and Patrick McHardy found a bunch of errors there recently, will attach those patches if I get a chance.
Created attachment 116553 [details] Potential fix for socket filter hang
Can someone try out the test case with the patch attachment in comment #8 applied and see if that fixes it? THanks.
davej: There is no oops, I run the test and everything stops (mouse pointer doesn't even move). davem: Well I haven't tried it, but I don't see how that can be the problem. 1. The only code in the socket filter is the two instructions: load LEN into A, return A ...and as part of trying to make a simple test I tried loading the same socket filter onto a listening socket ... but couldn't trigger it. 2. The absolute address would have to be in the range [INT_MAX-len+1..INT_MAX], so even if the socket filter is compiled wrong (I don't think it is, but I had to write my own compiler) hexdump shows no numbers in that range.
Please try out what I've asked you to test, even if it may seem unlikely that the fix I'm pointing out is relevant.
[This comment has been added as a mass update for all FC4 kernel bugs. If you have migrated this bug from an FC3 bug today, ignore this comment.] Please retest your problem with todays 2.6.12-1.1398_FC4 update. If your problem involved being unable to boot, or some hardware not being detected correctly, please make sure your /etc/modprobe.conf is correct *BEFORE* installing any kernel updates. If in doubt, you can recreate this file using.. mv /etc/sysconfig/hwconf /etc/sysconfig/hwconf.bak mv /etc/modprobe.conf /etc/modprobe.conf.bak kudzu Thank you.
kernel(0:2.6.12-1.1398_FC4).i686 still dies 100% of the time.
Ok, I've tracked it down and got a small reproducer, the problem happens if you: 1) have a socket filter installed on an accept socket. 2) connect and send large amounts of data. ...I'm uploading a client (in perl) and a server (in C) ... run the server, note the port, then run the client with the port as the argument ... kernel will be dead, again no Oops just nothing works. % wc -l fc4_kern_die_client.pl fc4_kern_die_serv.c 110 fc4_kern_die_client.pl 173 fc4_kern_die_serv.c 283 total
Created attachment 116954 [details] Server, creates accept socket (prints bound port) and adds socket filter To compile use: gcc -Wall -W -o fc4_kern_die_serv fc4_kern_die_serv.c
Created attachment 116955 [details] connects to port and sends data To run use: ./fc4_kern_die_client.pl <port>
It's possibly worth noting that the patch in comment #8 isn't in kernel(0:2.6.12-1.1398_FC4).i686, but given how I have to hack the spec file just to install the .src.rpm, and the fact I don't have any test machines to keep rebooting for fixes I'm 99% sure aren't the problem ... I'm unlikely to test it in the near future.
That patch is in the latest errata kernel in updates-testing fwiw.
Mass update to all FC4 bugs: An update has been released (2.6.13-1.1526_FC4) which rebases to a new upstream kernel (2.6.13.2). As there were ~3500 changes upstream between this and the previous kernel, it's possible your bug has been fixed already. Please retest with this update, and update this bug if necessary. Thanks.
2.6.13-1.1526_FC4 seems to fix it.