Escalated to Bugzilla from IssueTracker
(1) Category Defect Report (2) Abstract Even if a process have recieved data but schedule() in select() cannot return. (3) Symptom In the application product made by Hitachi, even if data is transmitted from the process of the server while the process of the client is waiting for data by select(), the process does not wake up. Server process Client process readv() select() writev() --------------> Not return from select() (4) Environment RHEL4.5 2.6.9-55.0.12.ELsmp(EM64T) (5) Recreation Steps When local data delivery is repeated many times by the application. the problem occurs. We made a simple reproducer and we're trying to reproduce. However, the phenomenon has not been reproduced yet. (6) Investigation We have investigated the system occurring the phenomenon. Then, we found that the process waiting by select() was connected in the wait queue, and received data were stored the reception queue of the process. Details are as follows. * server process: pdfes * client process: pdbes * The client process of PID16812 was not returned from select(). [Backtrace of PID16812] crash> bt 16812 PID: 16812 TASK: 1020cbd97f0 CPU: 2 COMMAND: "pdbes" #0 [1001e38dca8] schedule at ffffffff8030c89e #1 [1001e38dd80] schedule_timeout at ffffffff8030d331 #2 [1001e38dde0] do_select at ffffffff8018cabf #3 [1001e38ded0] sys_select at ffffffff8018ce3e #4 [1001e38df80] system_call at ffffffff8011026a RIP: 0000003df2ec0176 RSP: 0000002b1ec27000 RFLAGS: 00010246 RAX: 0000000000000017 RBX: ffffffff8011026a RCX: 0000002b0aec9570 RDX: 0000000000000000 RSI: 00000000005588b8 RDI: 0000000000000007 RBP: 0000000000000000 R8: 0000000000000000 R9: 000000000000000b R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000 R13: 0000007fbffffc10 R14: 0000000000406c70 R15: 0000007fbfffd0c0 ORIG_RAX: 0000000000000017 CS: 0033 SS: 002b [Status of WAIT queue] crash> net -s 16812 PID: 16812 TASK: 1020cbd97f0 CPU: 2 COMMAND: "pdbes" FD SOCKET SOCK FAMILY:TYPE SOURCE-PORT DESTINATION-PORT 3 1016c9118c0 10110e6a0c0 INET:STREAM 0.0.0.0-0 0.0.0.0-0 4 10145904680 100253e4040 UNIX:STREAM 6 10066d22400 1016f4d8700 INET:STREAM 0.0.0.0-0 0.0.0.0-2768 crash> struct sock 0x100253e4040 | grep sk_sleep sk_sleep = 0x101459046b0, crash> waitq 0x101459046b0 PID: 16812 TASK: 1020cbd97f0 CPU: 2 COMMAND: "pdbes" [Result of netstat] # netstat -anp |grep 16812 ------------------------------------------------------------------- tcp 0 0 0.0.0.0:57192 0.0.0.0:* LISTEN 16812/pdbes tcp 13572 0 10.208.131.224:54096 10.208.131.227:57147 ESTABLISHED 16812/pdbes unix 2 [ ACC ] STREAM LISTENING 2464034988 16812/pdbes /dev/HiRDB/pth/tk26847 ------------------------------------------------------------------- * There are data of 13572bytes in the reception queue of PID16812. [Collection of the system info by systemtap] Based on the above-mentioned result of the survey, when we tried the information collection by systemtap, we found the server process did not call try_to_wake_up(). WAIT queue and the result of netstat command have the same situation to the survey of PID16812 as above. * the client process of PID17519 was not returned from select(). ---------------------------------------------------------------------- … pdbes : do_select(pid:17519) pdbes : add_wait_queue(pid:17519) pdbes : add_wait_queue(pid:17519) pdbes : add_wait_queue(pid:17519) pdfes : sock_def_readable(sock:0x101CEEAF840) //pdbes : PID 17519 pdfes : try_to_wake_up(17519) pdbes : do_select(pid:17519) pdbes : add_wait_queue(pid:17519) pdbes : add_wait_queue(pid:17519) pdbes : add_wait_queue(pid:17519) pdfes : sock_def_readable(sock:0x101CEEAF840) //pdbes : PID 17519 ---------------------------------------------------------------------- => The display of the client process(PID17519) is as above. It seems try_to_wake_up() was not called. We can mention the following points from the investigation. - try_to_wake_up() was not called. The task is not added to the WAIT queue when pdfes wake the task (when calling sock_def_readable()). - The process is added to the WAIT queue after occurring the phenomonon. - After occurring the phenomenon, tp->rcv_nxt was updated and stored to the reception queue. * The size of the reception queue is calculated by using tp->rcv_nxt in "netstat -anp" We think the cause of this problem might be that try_to_wake_up() was not called when data was received since local delivery procedure of the server process conflicted with select() procedure of the client process. (7) Related Documentation/Related Bugzilla # Not found. (8) Attachments sysreport (9) Business Impacts The application where the problem occurred is an important product for Hitachi's business. We think that Linux OS has a fundamental problem from our investigation. Three months have passed since the problem occurred first. The problem occurs frequently, and the customer is embarrassed. (10) Requests Please let me know if you know similar issues. This event sent from IssueTracker by fleitner [SEG - Kernel] issue 233481
File uploaded: sysreport-root.HiRDBtest.tar.bz2 This event sent from IssueTracker by fleitner [SEG - Kernel] issue 233481 it_file 167929
Hi, SEG, allow me to escalate in advance, while me trying to reproduce this issue. 1. Provide time and date of the problem N/A customer's site hit this. Has been hitting for 3 months in high frequence but apparently vendor cannot reproduce the behavior. 2. Indicate the platform(s) (architectures) the problem is being reported against. x86_64 RHEL4.5 3. Provide clear and concise problem description as it is understood at the time of escalation * Observed behavior (According to report) after number of local data delivery client process stops to react to server. apparently stopping at select(). * Desired behavior Shouldn't be staying at select(). 4. State specific action requested of SEG I'll get more on this not to mention that I'll try reproducing this. Before all, are you aware of such issue? If so please indicate the BZ. Otherwise, I'll try reproducing this in-shop and see. Offer me any suggestion and hint if there's anything you can think up of. 5. State whether or not a defect in the product is suspected * Provide Bugzilla if one already exists N/A Issue escalated to Support Engineering Group by: tumeya. Internal Status set to 'Waiting on SEG' This event sent from IssueTracker by fleitner [SEG - Kernel] issue 233481
File uploaded: select-tp.tgz This event sent from IssueTracker by fleitner [SEG - Kernel] issue 233481 it_file 201560
We have reproduced the phenomenon. The reproduction steps is as follows. (1) Extract select-tp.tgz # tar zxvpf select-tp.tgz (2) Make the test program. # cd tp # make (3) Set two IP aliases to eth0. e.g. # ifconfig eth0:1 10.208.173.184 # ifconfig eth0:2 10.208.173.188 (4) Edit tp3.sh. FIRST_PORT LAST_PORT Specify the first/last port which is allocated to this test program. The number of test programs(servers) is LAST_PORT - FIRST_PORT + 1. Set the values so that the number of test programs becomes about 8. S_IPADDR Specify the IP address which the test program(server) listens. It should be the one of IP aliases which set by (3). C_IPADDR Specify the IP address which the test program(client) uses. It should be the one of IP aliases different from S_IPADDR which set by (3). (5) Run the test program(server). # ./tp3.sh server start (6) Run the test program(client). # ./tp3.sh client start (7) Confirm the test programs is running. # top -> The CPU usage will be high by tp3-server and tp3-client. The phenomenon occurred in about 5 hours on Hitachi's environment(Quad Core Xeon E5345 2.33GHz * 2). When it occurs, both the server and the client are waiting by select(2). Then you can see these programs don't use CPU by top command . And, the output of netstat is as follows. # netstat -atnp |grep tp3 tcp 0 0 10.208.173.188:10001 0.0.0.0:* LISTEN 12737/tp3-server tcp 0 0 10.208.173.188:10002 0.0.0.0:* LISTEN 12738/tp3-server tcp 0 0 10.208.173.188:10003 0.0.0.0:* LISTEN 12739/tp3-server tcp 0 0 10.208.173.188:10004 0.0.0.0:* LISTEN 12740/tp3-server tcp 0 0 10.208.173.188:10005 0.0.0.0:* LISTEN 12741/tp3-server tcp 0 0 10.208.173.188:10006 0.0.0.0:* LISTEN 12742/tp3-server tcp 0 0 10.208.173.188:10007 0.0.0.0:* LISTEN 12743/tp3-server tcp 0 0 10.208.173.188:10008 0.0.0.0:* LISTEN 12744/tp3-server tcp 0 0 10.208.173.188:10007 10.208.173.184:32803 ESTABLISHED 12743/tp3-server tcp 0 0 10.208.173.188:10006 10.208.173.184:32802 ESTABLISHED 12742/tp3-server tcp 0 0 10.208.173.188:10005 10.208.173.184:32801 ESTABLISHED 12741/tp3-server tcp 0 0 10.208.173.188:10004 10.208.173.184:32800 ESTABLISHED 12740/tp3-server tcp 5508 0 10.208.173.188:10008 10.208.173.184:32804 ESTABLISHED 12744/tp3-server <== here tcp 0 0 10.208.173.188:10003 10.208.173.184:32799 ESTABLISHED 12739/tp3-server tcp 0 0 10.208.173.188:10002 10.208.173.184:32798 ESTABLISHED 12738/tp3-server tcp 0 0 10.208.173.188:10001 10.208.173.184:32797 ESTABLISHED 12737/tp3-server tcp 21892 0 10.208.173.184:32800 10.208.173.188:10004 ESTABLISHED 12750/tp3-client tcp 21892 0 10.208.173.184:32801 10.208.173.188:10005 ESTABLISHED 12750/tp3-client tcp 21892 0 10.208.173.184:32802 10.208.173.188:10006 ESTABLISHED 12750/tp3-client tcp 21892 0 10.208.173.184:32803 10.208.173.188:10007 ESTABLISHED 12750/tp3-client tcp 21892 0 10.208.173.184:32797 10.208.173.188:10001 ESTABLISHED 12750/tp3-client tcp 21892 0 10.208.173.184:32798 10.208.173.188:10002 ESTABLISHED 12750/tp3-client tcp 21892 0 10.208.173.184:32799 10.208.173.188:10003 ESTABLISHED 12750/tp3-client tcp 0 0 10.208.173.184:32804 10.208.173.188:10008 ESTABLISHED 12750/tp3-client * Though PID:12744 received data(Recv-Q was not 0), the process was waiting by select(2). Use crash command if you would like to confirm whether the processes are waiting by select(2). Don't use strace and gdb because these will wake up the test programs. * The test programs work as stated below. Server: 1) Send data to the client by using writev(2). 2) Recieve data from the client by using readv(2). 3) Call select(2) if readv(2) returns EAGAIN. 4) Repeat 1) to 3). Client: 1) Recieve data from the server by using readv(2). 2) Call select(2) if readv(2) returns EAGAIN. 3) Send data to the server by using writev(2). 4) Do 1) to 3) against the next server. 5) Repeat 1) to 4). Please investigation by using this test program. We found the phenomenon looked like this issue reported in upstream. http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.28.y.git;a=commit;h=6cb2a21049b8990df4576c5fce4d48d0206c22d5 We think our issue also corresponds the CPU cache synchronization program like this. This event sent from IssueTracker by fleitner [SEG - Kernel] issue 233481
Hi Umeya-san, >did you test the suggested memory barrier and it fixed the issue? No. At first, we try to reproduce with original latest kernel (2.6.9-78.0.13.EL), and then with patched kenrel. miki Internal Status set to 'Waiting on Support' Status set to: Waiting on Tech This event sent from IssueTracker by fleitner [SEG - Kernel] issue 233481
Hmm. I've run kenrel-smp on .55 and .78 for hours but neither can reproduce the issue. I'll try running this more, and if this doesn't reoccur then will consider again. This event sent from IssueTracker by fleitner [SEG - Kernel] issue 233481
Have done with fv guest with two cpu(amd) but didn't occur. May need to try with other box. This event sent from IssueTracker by fleitner [SEG - Kernel] issue 233481
Hi, Recently, this issue occurs once a week on a customer's site. A counter measure is required immediately. Could you inform us your current situation? This event sent from IssueTracker by fleitner [SEG - Kernel] issue 233481
Hi, sorry for delay. I got to get a hold of E5345 box for 24 hours and tried running the test for the maximum of given time with -55 kernel but unfortunate it didn't reproduce and had to give away the leased machine. Did this reproduce with your configuration? If so, possible for you to test -55, latest, and, if the latest didn't fix the issue, patched kernel and report it to us? One of my box that's been available didn't hit this yet btw. Trying to see what's missing. Severity set to: High Priority set to: 2 This event sent from IssueTracker by fleitner [SEG - Kernel] issue 233481
Hi Umeya-san, We tested it with two boxes which are equipped Xeon E5345*2. The phenomenon occurred on both boxes. In certain case, the phenomenon occurred in one hour and in another case, it occurred in three days. The following is the CPU and the kernel version which we tested. Xeon E5345 * 2 -55.0.12 => occurred -67 => occurred -78.0.13 => did not occur (over 9 days) Xeon MV * 2 -55.0.12 => did not occur (over 7 days) Internal Status set to 'Waiting on Support' Status set to: Waiting on Tech This event sent from IssueTracker by fleitner [SEG - Kernel] issue 233481
SEG, I've discussed the matter with the vendor and they think this is somewhat hw specific. Possible to suggest anything, or otherwise reproduce this? I've tried with my AMD machine but wasn't able to reproduce this. We tested it with two boxes which are equipped Xeon E5345*2. The phenomenon occurred on both boxes. In certain case, the phenomenon occurred in one hour and in another case, it occurred in three days. The following is the CPU and the kernel version which we tested. Xeon E5345 * 2 -55.0.12 => occurred -67 => occurred -78.0.13 => did not occur (over 9 days) Xeon MV * 2 -55.0.12 => did not occur (over 7 days) Internal Status set to 'Waiting on SEG' This event sent from IssueTracker by fleitner [SEG - Kernel] issue 233481
Hi, Are we sure that -78.0.13 didn't reproduce? The sosreport shows X5355 and they said about E5345, not sure about the differences between those two processor. I'm just making sure I'm seeing the correct sysreport. Would be very helpful to reproduce this in-house, so do you remember the system name/lab with that processor? I'd like to give a try. In the meanwhile, can you ask them to provide a vmcore showing data in recv-Q and the process stuck on select()? thanks, Flavio Internal Status set to 'Waiting on Support' This event sent from IssueTracker by fleitner [SEG - Kernel] issue 233481
Hi, on sysreport I'll let you know once it's ready. Meanwhile we have some question, too. - Are we sure that -78.0.13 didn't reproduce? The sosreport shows X5355 and they said about E5345, not sure about the differences between those two processor. I'm just making sure I'm seeing the correct sysreport. -Can you provide a vmcore showing data inrecv-Q and the process stuck on select()? If possible can you help us in collecting these information? Internal Status set to 'Waiting on Customer' Status set to: Waiting on Client This event sent from IssueTracker by fleitner [SEG - Kernel] issue 233481
Hi Umeya-san, >- Are we sure that -78.0.13 didn't reproduce? >The sosreport shows X5355 and they said about E5345, not sure about the >differences between those two processor. I'm just making sure I'm seeing >the correct sysreport. At this time, -78.0.13 still doesn't reproduce. However, we are not sure that the version resolved the problem or it cannot reproduce easily. I think the differences between X5355 and E5345 are only a clock frequency and a TDP. Please use the same CPU as us as much as possible since I'm not sure how the differences influence the reproduction. >-Can you provide a vmcore showing data inrecv-Q and the process stuck on > select()? > >If possible can you help us in collecting these information? I uploaded the vmcore to dropbox. # md5sum IT233481-vmcore.bz2 64a7a48996b4274a680aba1a97750ab5 IT233481-vmcore.bz2 And I attach a sosreport. The vmcore and sosreport were collected in the environment kernel-smp-2.6.9-67.0.4.EL running. Internal Status set to 'Waiting on Support' Status set to: Waiting on Tech This event sent from IssueTracker by fleitner [SEG - Kernel] issue 233481
Info: Current status of our tests. kernel-smp-2.6.9-55.0.12.EL => occurred kernel-smp-2.6.9-67.EL => occurred kernel-smp-2.6.9-67.0.1.EL => not tested kernel-smp-2.6.9-67.0.4.EL => occurred kernel-smp-2.6.9-67.0.7.EL => occurred kernel-smp-2.6.9-67.0.15.EL => now testing (#1 10days, #2 6hours) kernel-smp-2.6.9-67.0.20.EL => not tested kernel-smp-2.6.9-67.0.22.EL => not tested kernel-smp-2.6.9-78.EL => didn't occur (7days) kernel-smp-2.6.9-78.0.13.EL => didn't occur (9days) This event sent from IssueTracker by fleitner [SEG - Kernel] issue 233481
Flavio, I re-did it and now it's working. You can get in hook with it at: Your corefile is ready for you You may view it at megatron.gsslab.rdu.redhat.com Login with kerberos name/password $ cd /cores/20090402224803/work /cores/20090402224803/work$ ./crash Can you look in and see if you can find anything soon? Thanks! This event sent from IssueTracker by fleitner [SEG - Kernel] issue 233481
crash> ps | grep client 9975 1 7 10226899030 IN 0.0 2404 360 tp31-client crash> net -s 9975 PID: 9975 TASK: 10226899030 CPU: 7 COMMAND: "tp31-client" FD SOCKET SOCK FAMILY:TYPE SOURCE-PORT DESTINATION-PORT 3 1021b5ce200 1021ad2c040 INET:STREAM 0.0.0.0-0 0.0.0.0-2768 4 1021b4b99c0 1021a9447c0 INET:STREAM 0.0.0.0-0 0.0.0.0-2768 5 1021b4b9700 1021acd32c0 INET:STREAM 0.0.0.0-0 0.0.0.0-2768 6 1021b4b9440 1021acd26c0 INET:STREAM 0.0.0.0-0 0.0.0.0-2768 7 1021e0301c0 1021b2b1900 INET:STREAM 0.0.0.0-0 0.0.0.0-2768 8 1021e030480 1021b2b0d00 INET:STREAM 0.0.0.0-0 0.0.0.0-2768 9 1021a9cbb80 1021b2b0100 INET:STREAM 0.0.0.0-0 0.0.0.0-2768 10 1021a9cb8c0 1021b2b3340 INET:STREAM 0.0.0.0-0 0.0.0.0-2768 11 1021a9cb600 1021b2b2740 INET:STREAM 0.0.0.0-0 0.0.0.0-2768 crash> socket.sk 1021b5ce200 sk = 0x1021ad2c040, crash> sock 0x1021ad2c040 sk_receive_queue = { next = 0x1021a4a2540, prev = 0x1021aa192c0, qlen = 0x2, <------------ lock = { lock = 0x1, magic = 0xdead4ead } }, crash> sock.sk_sleep 0x1021ad2c040 sk_sleep = 0x1021b5ce230 crash> __wait_queue_head 0x1021b5ce230 struct __wait_queue_head { lock = { lock = 0x1, magic = 0xdead4ead }, task_list = { next = 0x1021b5ce238, <--- empty. prev = 0x1021b5ce238 } } There are two entries in receive queue and no task waiting for them. That would explain the symptom we are seeing on this issue. That wait queue is protected by sk->sk_callback_lock rwlock. I'll check upstream if there was any patch around rwlock for this processor. Flavio This event sent from IssueTracker by fleitner [SEG - Kernel] issue 233481
ah, no flags on struct sock. crash> sock.sk_flags 0x1021ad2c040 sk_flags = 0x0, still checking upstream code but it seems to me that there is no locking tighting wakeup list and receive queue list together. It usually does: ... skb_queue_tail(&sk->sk_receive_queue, skb); <--- queue data sk->sk_data_ready(sk, skb->len); <--- wake up tasks waiting ... That indicates this issue should happens on any processor. I'm not sure yet if I understand this issue fully, still working, but seems that a simple work around would be not let select() sleep forever and set up a timeout. The test program does: ... 79 if (errno == EAGAIN) { 80 ret = select(dstsocks[i] + 1, 81 &readfds, NULL, NULL, NULL); ... ^^^^------------ timeout SELECT(2) timeout is an upper bound on the amount of time elapsed before select() returns. If both fields of the timeval stucture are zero, then select() returns immediately. (This is useful for polling.) If time- out is NULL (no timeout), select() can block indefinitely. Providing a timeout value to select() would work around this issue because another select() will take place and the queue data should be provided. Flavio This event sent from IssueTracker by fleitner [SEG - Kernel] issue 233481
Hi Umeya-san, > Hi. Uploading kernel-smp. This contains following patches from 446409 and 433685: > 4c803516a9150ea8ae949071f3640b0d28ed0369 > c8b5e8bd9f625a1266b337d20508d7e2290d8c34 These patches are included in 4.7, aren't they? Unfortunately, this problem also occurred on -78.0.13. These patches will be ineffectual. Internal Status set to 'Waiting on Support' Status set to: Waiting on Tech This event sent from IssueTracker by Y.Sonoda issue 233481
> Let me double-check here. Te last comment from Hitachi indicates > this didn't occur on -78* kernel. Is your indication correct? Yes. The problem occurred on -78.0.13 after that comment. Current status: > kernel-smp-2.6.9-55.0.12.EL => occurred > kernel-smp-2.6.9-67.EL => occurred > kernel-smp-2.6.9-67.0.1.EL => not tested > kernel-smp-2.6.9-67.0.4.EL => occurred > kernel-smp-2.6.9-67.0.7.EL => occurred > kernel-smp-2.6.9-67.0.15.EL => occurred <== Updated!! > kernel-smp-2.6.9-67.0.20.EL => not tested > kernel-smp-2.6.9-67.0.22.EL => not tested > kernel-smp-2.6.9-78.EL => didn't occur (7days)#<--- > kernel-smp-2.6.9-78.0.13.EL => occurred <== Updated!! Internal Status set to 'Waiting on Support' Status set to: Waiting on Tech This event sent from IssueTracker by Y.Sonoda issue 233481
We summarized the mechanism of this phenomenon as select-issue-en.pdf. The attached patch, select-forever.patch is a solution of this issue. We are testing this patch now. Please confirm this patch and test it. This event sent from IssueTracker by Y.Sonoda issue 233481 it_file 214576
I'm thinking if the patch below would fix this issue too without adding the locking overhead. diff --git a/net/core/sock.c b/net/core/sock.c index 4bb1732..2106029 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -1529,7 +1529,7 @@ static void sock_def_error_report(struct sock *sk) static void sock_def_readable(struct sock *sk, int len) { read_lock(&sk->sk_callback_lock); - if (sk->sk_sleep && waitqueue_active(sk->sk_sleep)) + if (sk->sk_sleep) wake_up_interruptible(sk->sk_sleep); sk_wake_async(sk,1,POLL_IN); read_unlock(&sk->sk_callback_lock); Flavio
Created attachment 341229 [details] Alternative patch
Hi Umeya-san, At last teleconference, you commented about RedHat's original patch. Our engineer is interested in it, so could you kindly upload the patch ? He wants to compare his idea and RedHat's idea. miki This event sent from IssueTracker by T.Miki issue 233481
(In reply to comment #26) > Hi Umeya-san, > At last teleconference, you commented about RedHat's original patch. > Our engineer is interested in it, so could you kindly upload the patch ? > > He wants to compare his idea and RedHat's idea. > > miki > > > This event sent from IssueTracker by T.Miki > issue 233481 (In reply to comment #24) > I'm thinking if the patch below would fix this issue too without adding > the locking overhead. > > diff --git a/net/core/sock.c b/net/core/sock.c > index 4bb1732..2106029 100644 > --- a/net/core/sock.c > +++ b/net/core/sock.c > @@ -1529,7 +1529,7 @@ static void sock_def_error_report(struct sock *sk) > static void sock_def_readable(struct sock *sk, int len) > { > read_lock(&sk->sk_callback_lock); > - if (sk->sk_sleep && waitqueue_active(sk->sk_sleep)) > + if (sk->sk_sleep) > wake_up_interruptible(sk->sk_sleep); > sk_wake_async(sk,1,POLL_IN); > read_unlock(&sk->sk_callback_lock); > > Flavio I was able to recreate the issue in 4 days run of the test program Now I plan to run the test again with your fix and then with the fix with the locking overhead. Plz let me know if you have different test/patch priorities, or if there's another fix available. jirka
(In reply to comment #27) > (In reply to comment #26) > > Hi Umeya-san, > > At last teleconference, you commented about RedHat's original patch. > > Our engineer is interested in it, so could you kindly upload the patch ? > > > > He wants to compare his idea and RedHat's idea. > > > > miki > > > > > > This event sent from IssueTracker by T.Miki > > issue 233481 > > (In reply to comment #24) > > I'm thinking if the patch below would fix this issue too without adding > > the locking overhead. > > > > diff --git a/net/core/sock.c b/net/core/sock.c > > index 4bb1732..2106029 100644 > > --- a/net/core/sock.c > > +++ b/net/core/sock.c > > @@ -1529,7 +1529,7 @@ static void sock_def_error_report(struct sock *sk) > > static void sock_def_readable(struct sock *sk, int len) > > { > > read_lock(&sk->sk_callback_lock); > > - if (sk->sk_sleep && waitqueue_active(sk->sk_sleep)) > > + if (sk->sk_sleep) > > wake_up_interruptible(sk->sk_sleep); > > sk_wake_async(sk,1,POLL_IN); > > read_unlock(&sk->sk_callback_lock); > > > > Flavio > > I was able to recreate the issue in 4 days run of the test program > Now I plan to run the test again with your fix and then with the fix > with the locking overhead. > > Plz let me know if you have different test/patch priorities, or if there's > another fix available. > > jirka forgot to paste the kernel/server info: - 8 processors - Intel(R) Xeon(R) CPU E5320 @ 1.86GHz - kernel 2.6.9-78.ELsmp jirka
> > > > > > diff --git a/net/core/sock.c b/net/core/sock.c > > > index 4bb1732..2106029 100644 > > > --- a/net/core/sock.c > > > +++ b/net/core/sock.c > > > @@ -1529,7 +1529,7 @@ static void sock_def_error_report(struct sock *sk) > > > static void sock_def_readable(struct sock *sk, int len) > > > { > > > read_lock(&sk->sk_callback_lock); > > > - if (sk->sk_sleep && waitqueue_active(sk->sk_sleep)) > > > + if (sk->sk_sleep) > > > wake_up_interruptible(sk->sk_sleep); > > > sk_wake_async(sk,1,POLL_IN); > > > read_unlock(&sk->sk_callback_lock); > > > > > > Flavio ok, 6 days run with the above patch and still cannot recreate, looks promissing.. any news on customer side?
Hi, We think your patch is a good idea. We are testing it in -55.0.12.EL and -78.0.13.EL, and the issue doesn't occur yet. -55.0.12.EL: 5days -78.0.13.EL: 3days We ask you to release a fixed kernel immediately. This event sent from IssueTracker by Y.Sonoda issue 233481
Hi, as neither of those changes is not in the upstream, I'm trying it to reproduce it in there. I still dont follow the change and how come it helps the case. jirka
so far I'm not able to reproduce it in the upstream Also I revisited the original patch and the pdf describing the issue, but I'd need to clarify what you mean by 'The data which are still not reflected to memory'. The only close case to this coming to my mind is the CPU memory operation ordering. This could be solved by memory barriers, but so far I'm not sure it applies to this issue. Is the 'CPU memory operation ordering' what you are referring to? plz let me know, thanks jirka
The customer would like to know when the errata is released. Could you clarify the schedule? This event sent from IssueTracker by Y.Sonoda issue 233481
We might have another solution not using locks. Instead of the write_lock usage there are memory barriers placed on certain places, ensuring that if 2 CPU meet in the incriminating part of the code, they will sync their caches and thus evade the issue. I prepared 2 patches(attached) as I'm not sure how much invasive I can be: v1) stays on the TCP source level v2) puts the barrier to the wait queue code I build both kernels and published them on http://people.redhat.com/jolsa/494404/ As we are not able to reproduce the issue again (nor RHEL or upstream), it would be great you could test it to see it helps. Preferably the memory barrier addition would be easily acceptable by upstream, then the write lock addition. (also due to the hard reproduction) Let me know if you think the barrier should be on another level/source. I forwarded your question about the errata/schedule to my manager. jirka
Created attachment 345885 [details] memory barriers in the TCP layer
Created attachment 345886 [details] memory barriers in the wait queue layer
after discussing with Flavio, we will go with the version 2, but slightly modified. Please check the 3rd patch version attached and new adjacent kernel in the http://people.redhat.com/jolsa/494404/ thanks, jirka
Created attachment 345916 [details] memory barriers in the wait queue layer - only in the waitqueue_active function
After discussing possible fix on the mailing list it pop up, this could be the CPU bug, see following errata: http://www.intel.com/Assets/PDF/specupdate/315338.pdf especially #AJ39 and #AJ18. #AJ39 has a BIOS workaround. quoting Dave Miler: > But if you look at those errata, all of them can be worked around > in the BIOS, so it would be interesting to know if 1) a BIOS update > exists for the customer's system and 2) such an upgrade makes the > problem go away. Let me know if you can try the BIOS update or you need more info. jirka
Please see the attached patch tcp-layer-barriers.patch as the latest fix proposal. Let me know if you can try it, though the BIOS update seems to have bigger priority. jirka
Created attachment 346358 [details] adding smp_mb calls to the tcp layer
(In reply to comment #49) > Created an attachment (id=346358) [details] > adding smp_mb calls to the tcp layer you can use following kernel with the latest change: http://people.redhat.com/jolsa/494404/kernel-smp-2.6.9-89.2.EL_jolsa_494404_v4.x86_64.rpm jirka
Hi Umeya-san, >Let me know if you can try the BIOS update and We'll confirm to our hardware division whether there is the BIOS that contained workaround of AJ39. >2) that all said there are a patched kernel. Would this help >resolve your issue? The patched kernel is running on two machines now. We'll tell you the result next week. This event sent from IssueTracker by Y.Sonoda issue 233481
Event posted on 2009-06-08 22:13 JST by Y.Sonoda Hi Umeya-san, >Let me know if you can try the BIOS update and We found our test machines already have the BIOS that contains a workaround of AJ39. Therefore, AJ39 doesn't relate to this issue. This event sent from IssueTracker by Y.Sonoda issue 233481
Event posted on 2009-06-11 17:02 JST by Y.Sonoda Hi Umeya-san, >2) that all said there are a patched kernel. Would this help >resolve your issue? At this time, the patched kernel is working well for 4days. We are going to continue the test for a little longer. BTW, as we talked on con-call, Hitachi would like Red Hat to release this fix as 4.8.z by the middle of August. Regards, Internal Status set to 'Waiting on Support' Status set to: Waiting on Tech This event sent from IssueTracker by Y.Sonoda issue 233481
Committed in 89.8.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/
Some errors/confusions while adding the bz to errata tool. Returning bz to MODIFIED state so that it can be added to errata.
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Due to insufficient memory barriers in the network code, a process sleeping in select() may have missed notifications about new data. In rare cases, this bug may have caused a process to sleep forever.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-0263.html