494404 – [RHEL4.5] Even if a process have received data but schedule() in select() cannot return

Bug 494404 - [RHEL4.5] Even if a process have received data but schedule() in select() cannot return

Summary: [RHEL4.5] Even if a process have received data but schedule() in select() can...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 4
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	4.5
Hardware:	All
OS:	Linux
Priority:	urgent
Severity:	urgent
Target Milestone:	rc
Target Release:	---
Assignee:	Jiri Olsa
QA Contact:	Evan McNabb
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	519386
TreeView+	depends on / blocked

Reported:	2009-04-06 18:10 UTC by Issue Tracker
Modified:	2011-02-16 15:19 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Due to insufficient memory barriers in the network code, a process sleeping in select() may have missed notifications about new data. In rare cases, this bug may have caused a process to sleep forever.
Clone Of:
Clones:	509866 (view as bug list)
Environment:
Last Closed:	2011-02-16 15:19:16 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Alternative patch (389 bytes, patch) 2009-04-24 17:44 UTC, Jeremy West	no flags	Details \| Diff
pdf describing how customers patch works (43.50 KB, application/octet-stream) 2009-05-21 19:18 UTC, Flavio Leitner	no flags	Details
memory barriers in the TCP layer (1.63 KB, patch) 2009-05-29 09:52 UTC, Jiri Olsa	no flags	Details \| Diff
memory barriers in the wait queue layer (939 bytes, application/octet-stream) 2009-05-29 09:54 UTC, Jiri Olsa	no flags	Details
memory barriers in the wait queue layer - only in the waitqueue_active function (384 bytes, patch) 2009-05-29 15:11 UTC, Jiri Olsa	no flags	Details \| Diff
adding smp_mb calls to the tcp layer (1.51 KB, patch) 2009-06-03 06:45 UTC, Jiri Olsa	no flags	Details \| Diff
Show Obsolete (4) View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2011:0263	0	normal	SHIPPED_LIVE	Important: Red Hat Enterprise Linux 4.9 kernel security and bug fix update	2011-02-16 15:14:55 UTC

Description Issue Tracker 2009-04-06 18:10:00 UTC

Escalated to Bugzilla from IssueTracker

Comment 1 Issue Tracker 2009-04-06 18:10:01 UTC

(1) Category
	Defect Report

(2) Abstract
	Even if a process have recieved data but schedule() in select() cannot return.

(3) Symptom
	In the application product made by Hitachi, even if data is transmitted 
        from the process of the server while the process of the client is waiting
        for data by select(), the process does not wake up.

        Server process               Client process
                                         readv()
                                        select()
            writev()   --------------> Not return from select()

(4) Environment
	RHEL4.5
	2.6.9-55.0.12.ELsmp(EM64T)

(5) Recreation Steps
        When local data delivery is repeated many times by the application.
        the problem occurs.
	We made a simple reproducer and we're trying to reproduce.
	However, the phenomenon has not been reproduced yet.

(6) Investigation
	We have investigated the system occurring the phenomenon.
	Then, we found that the process waiting by select() was connected
	in the wait queue, and received data were stored the reception
        queue of the process.

	Details are as follows.
	* server process: pdfes
	* client process: pdbes
	* The client process of PID16812 was not returned from select().

	[Backtrace of PID16812]
	crash> bt 16812
	PID: 16812  TASK: 1020cbd97f0       CPU: 2   COMMAND: "pdbes"
	 #0 [1001e38dca8] schedule at ffffffff8030c89e
	 #1 [1001e38dd80] schedule_timeout at ffffffff8030d331
	 #2 [1001e38dde0] do_select at ffffffff8018cabf
	 #3 [1001e38ded0] sys_select at ffffffff8018ce3e
	 #4 [1001e38df80] system_call at ffffffff8011026a
	    RIP: 0000003df2ec0176  RSP: 0000002b1ec27000  RFLAGS: 00010246
	    RAX: 0000000000000017  RBX: ffffffff8011026a  RCX: 0000002b0aec9570
	    RDX: 0000000000000000  RSI: 00000000005588b8  RDI: 0000000000000007
	    RBP: 0000000000000000   R8: 0000000000000000   R9: 000000000000000b
	    R10: 0000000000000000  R11: 0000000000000202  R12: 0000000000000000
	    R13: 0000007fbffffc10  R14: 0000000000406c70  R15: 0000007fbfffd0c0
	    ORIG_RAX: 0000000000000017  CS: 0033  SS: 002b

	[Status of WAIT queue]
	crash> net -s 16812
	PID: 16812  TASK: 1020cbd97f0       CPU: 2   COMMAND: "pdbes"
	FD      SOCKET            SOCK       FAMILY:TYPE SOURCE-PORT DESTINATION-PORT
	 3      1016c9118c0      10110e6a0c0 INET:STREAM  0.0.0.0-0 0.0.0.0-0
	 4      10145904680      100253e4040 UNIX:STREAM
	 6      10066d22400      1016f4d8700 INET:STREAM  0.0.0.0-0 0.0.0.0-2768

	crash> struct sock 0x100253e4040 | grep sk_sleep
 	 sk_sleep = 0x101459046b0,

	crash> waitq 0x101459046b0
	PID: 16812  TASK: 1020cbd97f0       CPU: 2   COMMAND: "pdbes"

	[Result of netstat]
	# netstat -anp |grep 16812
	-------------------------------------------------------------------
	tcp         0      0 0.0.0.0:57192               0.0.0.0:*                   LISTEN      16812/pdbes
	tcp     13572      0 10.208.131.224:54096        10.208.131.227:57147        ESTABLISHED 16812/pdbes
	unix  2      [ ACC ]     STREAM     LISTENING     2464034988 16812/pdbes         /dev/HiRDB/pth/tk26847
	-------------------------------------------------------------------
	* There are data of 13572bytes in the reception queue of PID16812.

	[Collection of the system info by systemtap]
	Based on the above-mentioned result of the survey,
	when we tried the information collection by systemtap,
	we found the server process did not call try_to_wake_up().
	WAIT queue and the result of netstat command have the same situation to
	the survey of PID16812 as above.

	* the client process of PID17519 was not returned from select().
	----------------------------------------------------------------------
	…
	pdbes : do_select(pid:17519)
	pdbes : add_wait_queue(pid:17519)
	pdbes : add_wait_queue(pid:17519)
	pdbes : add_wait_queue(pid:17519)
	pdfes : sock_def_readable(sock:0x101CEEAF840)  //pdbes : PID 17519
	pdfes : try_to_wake_up(17519)
	pdbes : do_select(pid:17519)
	pdbes : add_wait_queue(pid:17519)
	pdbes : add_wait_queue(pid:17519)
	pdbes : add_wait_queue(pid:17519)
	pdfes : sock_def_readable(sock:0x101CEEAF840)  //pdbes : PID 17519
	----------------------------------------------------------------------
	=> The display of the client process(PID17519) is as above.
	   It seems try_to_wake_up() was not called.

	We can mention the following points from the investigation.

	- try_to_wake_up() was not called. The task is not added to the WAIT queue
	  when pdfes wake the task (when calling sock_def_readable()).
	- The process is added to the WAIT queue after occurring the phenomonon.
	- After occurring the phenomenon, tp->rcv_nxt was updated and stored to the reception queue.
	  * The size of the reception queue is calculated by using tp->rcv_nxt in "netstat -anp"

	We think the cause of this problem might be that try_to_wake_up() was not called 
	when data was received since local delivery procedure of the server process conflicted
        with select() procedure of the client process.

(7) Related Documentation/Related Bugzilla #
	Not found.

(8) Attachments
	sysreport
	
(9) Business Impacts
	The application where the problem occurred is an important product for Hitachi's business.
	We think that Linux OS has a fundamental problem from our investigation.
	Three months have passed since the problem occurred first.
	The problem occurs frequently, and the customer is embarrassed.

(10) Requests
	Please let me know if you know similar issues.

This event sent from IssueTracker by fleitner  [SEG - Kernel]
 issue 233481

Comment 2 Issue Tracker 2009-04-06 18:10:03 UTC

File uploaded: sysreport-root.HiRDBtest.tar.bz2
This event sent from IssueTracker by fleitner  [SEG - Kernel]
 issue 233481
it_file 167929

Comment 3 Issue Tracker 2009-04-06 18:10:05 UTC

Hi, SEG, allow me to escalate in advance, while me trying to reproduce this
issue. 


   1. Provide time and date of the problem
N/A customer's site hit this. Has been hitting for 3 months in high
frequence but apparently vendor cannot reproduce the behavior. 


   2. Indicate the platform(s) (architectures) the problem is being
reported against.
x86_64 RHEL4.5 


   3. Provide clear and concise problem description as it is understood at
the time of escalation
          * Observed behavior
(According to report) after number of local data delivery client process
stops to react to server. apparently stopping at select(). 


          * Desired behavior 
Shouldn't be staying at select(). 


   4. State specific action requested of SEG
I'll get more on this not to mention that I'll try reproducing this.
Before all, are you aware of such issue? If so please indicate the BZ.
Otherwise, I'll try reproducing this in-shop and see. Offer me any
suggestion and hint if there's anything you can think up of. 


   5. State whether or not a defect in the product is suspected
          * Provide Bugzilla if one already exists 
N/A 





Issue escalated to Support Engineering Group by: tumeya.
Internal Status set to 'Waiting on SEG'

This event sent from IssueTracker by fleitner  [SEG - Kernel]
 issue 233481

Comment 4 Issue Tracker 2009-04-06 18:10:06 UTC

File uploaded: select-tp.tgz

This event sent from IssueTracker by fleitner  [SEG - Kernel]
 issue 233481
it_file 201560

Comment 5 Issue Tracker 2009-04-06 18:10:08 UTC

We have reproduced the phenomenon.
The reproduction steps is as follows.

(1) Extract select-tp.tgz

  # tar zxvpf select-tp.tgz

(2) Make the test program.

  # cd tp
  # make

(3) Set two IP aliases to eth0.
  e.g.
    # ifconfig eth0:1 10.208.173.184
    # ifconfig eth0:2 10.208.173.188

(4) Edit tp3.sh.
  FIRST_PORT
  LAST_PORT

    Specify the first/last port which is allocated to this test program.
    The number of test programs(servers) is LAST_PORT - FIRST_PORT + 1.
    Set the values so that the number of test programs becomes about 8.
    
  S_IPADDR

    Specify the IP address which the test program(server) listens.
    It should be the one of IP aliases which set by (3).


  C_IPADDR

    Specify the IP address which the test program(client) uses.
    It should be the one of IP aliases different from S_IPADDR which set
by (3).

(5) Run the test program(server).

   # ./tp3.sh server start

(6) Run the test program(client).

   # ./tp3.sh client start

(7) Confirm the test programs is running.

   # top
   -> The CPU usage will be high by tp3-server and tp3-client.

The phenomenon occurred in about 5 hours on Hitachi's
environment(Quad Core Xeon E5345 2.33GHz * 2).
When it occurs, both the server and the client are waiting by select(2).
Then you can see these programs don't use CPU by top command .
And, the output of netstat is as follows.

# netstat -atnp |grep tp3
tcp        0      0 10.208.173.188:10001        0.0.0.0:*                 
 LISTEN      12737/tp3-server
tcp        0      0 10.208.173.188:10002        0.0.0.0:*                 
 LISTEN      12738/tp3-server
tcp        0      0 10.208.173.188:10003        0.0.0.0:*                 
 LISTEN      12739/tp3-server
tcp        0      0 10.208.173.188:10004        0.0.0.0:*                 
 LISTEN      12740/tp3-server
tcp        0      0 10.208.173.188:10005        0.0.0.0:*                 
 LISTEN      12741/tp3-server
tcp        0      0 10.208.173.188:10006        0.0.0.0:*                 
 LISTEN      12742/tp3-server
tcp        0      0 10.208.173.188:10007        0.0.0.0:*                 
 LISTEN      12743/tp3-server
tcp        0      0 10.208.173.188:10008        0.0.0.0:*                 
 LISTEN      12744/tp3-server
tcp        0      0 10.208.173.188:10007        10.208.173.184:32803      
 ESTABLISHED 12743/tp3-server
tcp        0      0 10.208.173.188:10006        10.208.173.184:32802      
 ESTABLISHED 12742/tp3-server
tcp        0      0 10.208.173.188:10005        10.208.173.184:32801      
 ESTABLISHED 12741/tp3-server
tcp        0      0 10.208.173.188:10004        10.208.173.184:32800      
 ESTABLISHED 12740/tp3-server
tcp     5508      0 10.208.173.188:10008        10.208.173.184:32804      
 ESTABLISHED 12744/tp3-server  <== here
tcp        0      0 10.208.173.188:10003        10.208.173.184:32799      
 ESTABLISHED 12739/tp3-server
tcp        0      0 10.208.173.188:10002        10.208.173.184:32798      
 ESTABLISHED 12738/tp3-server
tcp        0      0 10.208.173.188:10001        10.208.173.184:32797      
 ESTABLISHED 12737/tp3-server
tcp    21892      0 10.208.173.184:32800        10.208.173.188:10004      
 ESTABLISHED 12750/tp3-client
tcp    21892      0 10.208.173.184:32801        10.208.173.188:10005      
 ESTABLISHED 12750/tp3-client
tcp    21892      0 10.208.173.184:32802        10.208.173.188:10006      
 ESTABLISHED 12750/tp3-client
tcp    21892      0 10.208.173.184:32803        10.208.173.188:10007      
 ESTABLISHED 12750/tp3-client
tcp    21892      0 10.208.173.184:32797        10.208.173.188:10001      
 ESTABLISHED 12750/tp3-client
tcp    21892      0 10.208.173.184:32798        10.208.173.188:10002      
 ESTABLISHED 12750/tp3-client
tcp    21892      0 10.208.173.184:32799        10.208.173.188:10003      
 ESTABLISHED 12750/tp3-client
tcp        0      0 10.208.173.184:32804        10.208.173.188:10008      
 ESTABLISHED 12750/tp3-client

* Though PID:12744 received data(Recv-Q was not 0), the process was
waiting by select(2).
  Use crash command if you would like to confirm whether the processes are
waiting by select(2).
  Don't use strace and gdb because these will wake up the test programs.

* The test programs work as stated below.

  Server:
   1) Send data to the client by using writev(2).
   2) Recieve data from the client by using readv(2).
   3) Call select(2) if readv(2) returns EAGAIN.
   4) Repeat 1) to 3).

  Client:
   1) Recieve data from the server by using readv(2).
   2) Call select(2) if readv(2) returns EAGAIN.
   3) Send data to the server by using writev(2).
   4) Do 1) to 3) against the next server.
   5) Repeat 1) to 4).

Please investigation by using this test program.
We found the phenomenon looked like this issue reported in upstream.
http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.28.y.git;a=commit;h=6cb2a21049b8990df4576c5fce4d48d0206c22d5

We think our issue also corresponds the CPU cache synchronization program
like this.



This event sent from IssueTracker by fleitner  [SEG - Kernel]
 issue 233481

Comment 6 Issue Tracker 2009-04-06 18:10:10 UTC

Hi Umeya-san,

>did you test the suggested memory barrier and it fixed the issue?
No.
At first, we try to reproduce with original latest kernel
(2.6.9-78.0.13.EL),
and then with patched kenrel.

miki

Internal Status set to 'Waiting on Support'
Status set to: Waiting on Tech

This event sent from IssueTracker by fleitner  [SEG - Kernel]
 issue 233481

Comment 7 Issue Tracker 2009-04-06 18:10:12 UTC

Hmm. I've run kenrel-smp on .55 and .78 for hours but neither can
reproduce the issue. 
I'll try running this more, and if this doesn't reoccur then will
consider again. 



This event sent from IssueTracker by fleitner  [SEG - Kernel]
 issue 233481

Comment 8 Issue Tracker 2009-04-06 18:10:14 UTC

Have done with fv guest with two cpu(amd) but didn't occur. 
May need to try with other box. 


This event sent from IssueTracker by fleitner  [SEG - Kernel]
 issue 233481

Comment 9 Issue Tracker 2009-04-06 18:10:16 UTC

Hi,

Recently, this issue occurs once a week on a customer's site.
A counter measure is required immediately.
Could you inform us your current situation?



This event sent from IssueTracker by fleitner  [SEG - Kernel]
 issue 233481

Comment 10 Issue Tracker 2009-04-06 18:10:18 UTC

Hi, sorry for delay. 
I got to get a hold of E5345 box for 24 hours and tried running the test
for the maximum of given time 
with -55 kernel but unfortunate it didn't reproduce and had to give away
the leased machine. 

Did this reproduce with your configuration? If so, possible for you to
test -55, latest, and, if the 
latest didn't fix the issue, patched kernel and report it to us? 

One of my box that's been available didn't hit this yet btw. Trying to
see what's missing. 


Severity set to: High
Priority set to: 2

This event sent from IssueTracker by fleitner  [SEG - Kernel]
 issue 233481

Comment 11 Issue Tracker 2009-04-06 18:10:20 UTC

Hi Umeya-san,

We tested it with two boxes which are equipped Xeon E5345*2.
The phenomenon occurred on both boxes.
In certain case, the phenomenon occurred in one hour and
in another case, it occurred in three days.

The following is the CPU and the kernel version which we tested.

Xeon E5345 * 2
  -55.0.12 => occurred
  -67      => occurred
  -78.0.13 => did not occur (over 9 days)

Xeon MV * 2
  -55.0.12 => did not occur (over 7 days)


Internal Status set to 'Waiting on Support'
Status set to: Waiting on Tech

This event sent from IssueTracker by fleitner  [SEG - Kernel]
 issue 233481

Comment 12 Issue Tracker 2009-04-06 18:10:22 UTC

SEG, I've discussed the matter with the vendor and they think this is 
somewhat hw specific. Possible to suggest anything, or otherwise reproduce
this? 
I've tried with my AMD machine but wasn't able to reproduce this. 

We tested it with two boxes which are equipped Xeon E5345*2.
The phenomenon occurred on both boxes.
In certain case, the phenomenon occurred in one hour and
in another case, it occurred in three days.

The following is the CPU and the kernel version which we tested.

Xeon E5345 * 2
 -55.0.12 => occurred
 -67      => occurred
 -78.0.13 => did not occur (over 9 days)

Xeon MV * 2
 -55.0.12 => did not occur (over 7 days)

Internal Status set to 'Waiting on SEG'

This event sent from IssueTracker by fleitner  [SEG - Kernel]
 issue 233481

Comment 13 Issue Tracker 2009-04-06 18:10:24 UTC

Hi,

Are we sure that -78.0.13 didn't reproduce?

The sosreport shows X5355 and they said about E5345, not sure about the
differences between those two processor. I'm just making sure I'm seeing

the correct sysreport.

Would be very helpful to reproduce this in-house, so do you remember the
system name/lab with that processor? I'd like to give a try.

In the meanwhile, can you ask them to provide a vmcore showing data in
recv-Q and the process stuck on select()?

thanks,
Flavio

Internal Status set to 'Waiting on Support'

This event sent from IssueTracker by fleitner  [SEG - Kernel]
 issue 233481

Comment 14 Issue Tracker 2009-04-06 18:10:26 UTC

Hi, on sysreport I'll let you know once it's ready. 
Meanwhile we have some question, too. 

- Are we sure that -78.0.13 didn't reproduce?
The sosreport shows X5355 and they said about E5345, not sure about the
differences between those two processor. I'm just making sure I'm
seeing
the correct sysreport.

-Can you provide a vmcore showing data inrecv-Q and the process stuck on 
 select()?

If possible can you help us in collecting these information? 

Internal Status set to 'Waiting on Customer'
Status set to: Waiting on Client

This event sent from IssueTracker by fleitner  [SEG - Kernel]
 issue 233481

Comment 15 Issue Tracker 2009-04-06 18:10:28 UTC

Hi Umeya-san,

>- Are we sure that -78.0.13 didn't reproduce?
>The sosreport shows X5355 and they said about E5345, not sure about the
>differences between those two processor. I'm just making sure I'm
seeing
>the correct sysreport.

At this time, -78.0.13 still doesn't reproduce.
However, we are not sure that the version resolved the problem
or it cannot reproduce easily.

I think the differences between X5355 and E5345 are only a clock frequency
and a TDP.
Please use the same CPU as us as much as possible since I'm not sure
how the differences influence the reproduction.

>-Can you provide a vmcore showing data inrecv-Q and the process stuck on

> select()?
>
>If possible can you help us in collecting these information? 

I uploaded the vmcore to dropbox.
# md5sum  IT233481-vmcore.bz2
64a7a48996b4274a680aba1a97750ab5  IT233481-vmcore.bz2

And I attach a sosreport.
The vmcore and sosreport were collected  in the environment
kernel-smp-2.6.9-67.0.4.EL running.


Internal Status set to 'Waiting on Support'
Status set to: Waiting on Tech

This event sent from IssueTracker by fleitner  [SEG - Kernel]
 issue 233481

Comment 16 Issue Tracker 2009-04-06 18:10:30 UTC

Info: Current status of our tests.

kernel-smp-2.6.9-55.0.12.EL  => occurred
kernel-smp-2.6.9-67.EL       => occurred
kernel-smp-2.6.9-67.0.1.EL   => not tested
kernel-smp-2.6.9-67.0.4.EL   => occurred
kernel-smp-2.6.9-67.0.7.EL   => occurred
kernel-smp-2.6.9-67.0.15.EL  => now testing (#1 10days, #2 6hours)
kernel-smp-2.6.9-67.0.20.EL  => not tested
kernel-smp-2.6.9-67.0.22.EL  => not tested
kernel-smp-2.6.9-78.EL       => didn't occur (7days)
kernel-smp-2.6.9-78.0.13.EL  => didn't occur (9days)



This event sent from IssueTracker by fleitner  [SEG - Kernel]
 issue 233481

Comment 17 Issue Tracker 2009-04-06 18:10:32 UTC

Flavio, 
I re-did it and now it's working. You can get in hook with it at: 


Your corefile is ready for you
You may view it at megatron.gsslab.rdu.redhat.com
Login with kerberos name/password
$ cd /cores/20090402224803/work
/cores/20090402224803/work$ ./crash


Can you look in and see if you can find anything soon? Thanks!


This event sent from IssueTracker by fleitner  [SEG - Kernel]
 issue 233481

Comment 18 Issue Tracker 2009-04-06 18:10:34 UTC

crash> ps | grep client
   9975      1   7     10226899030    IN   0.0    2404    360 
tp31-client
crash> net -s 9975
PID: 9975   TASK: 10226899030       CPU: 7   COMMAND: "tp31-client"
FD      SOCKET            SOCK       FAMILY:TYPE SOURCE-PORT
DESTINATION-PORT
 3      1021b5ce200      1021ad2c040 INET:STREAM  0.0.0.0-0 0.0.0.0-2768
 4      1021b4b99c0      1021a9447c0 INET:STREAM  0.0.0.0-0 0.0.0.0-2768
 5      1021b4b9700      1021acd32c0 INET:STREAM  0.0.0.0-0 0.0.0.0-2768
 6      1021b4b9440      1021acd26c0 INET:STREAM  0.0.0.0-0 0.0.0.0-2768
 7      1021e0301c0      1021b2b1900 INET:STREAM  0.0.0.0-0 0.0.0.0-2768
 8      1021e030480      1021b2b0d00 INET:STREAM  0.0.0.0-0 0.0.0.0-2768
 9      1021a9cbb80      1021b2b0100 INET:STREAM  0.0.0.0-0 0.0.0.0-2768
10      1021a9cb8c0      1021b2b3340 INET:STREAM  0.0.0.0-0 0.0.0.0-2768
11      1021a9cb600      1021b2b2740 INET:STREAM  0.0.0.0-0 0.0.0.0-2768

crash> socket.sk 1021b5ce200
  sk = 0x1021ad2c040,
crash> sock 0x1021ad2c040

  sk_receive_queue = {
    next = 0x1021a4a2540,
    prev = 0x1021aa192c0,
    qlen = 0x2,  <------------
    lock = {
      lock = 0x1,
      magic = 0xdead4ead
    }
  },


crash> sock.sk_sleep 0x1021ad2c040
  sk_sleep = 0x1021b5ce230

crash> __wait_queue_head 0x1021b5ce230
struct __wait_queue_head {
  lock = {
    lock = 0x1,
    magic = 0xdead4ead
  },
  task_list = {
    next = 0x1021b5ce238, <--- empty.
    prev = 0x1021b5ce238
  }
}


There are two entries in receive queue and no task waiting for them. That
would explain the symptom we are seeing on this issue.

That wait queue is protected by sk->sk_callback_lock rwlock.
I'll check upstream if there was any patch around rwlock for this
processor.

Flavio




This event sent from IssueTracker by fleitner  [SEG - Kernel]
 issue 233481

Comment 19 Issue Tracker 2009-04-06 18:10:36 UTC

ah, no flags on struct sock.
crash> sock.sk_flags 0x1021ad2c040
  sk_flags = 0x0, 


still checking upstream code but it seems to me that there is no locking
tighting wakeup list and receive queue list together. It usually does:
...
        skb_queue_tail(&sk->sk_receive_queue, skb); <--- queue data
        sk->sk_data_ready(sk, skb->len);            <--- wake up tasks
waiting
...


That indicates this issue should happens on any processor.
I'm not sure yet if I understand this issue fully, still working, but
seems 
that a simple work around would be not let select() sleep forever and set
up 
a timeout. The test program does:

...
79                  if (errno == EAGAIN) {
80                          ret = select(dstsocks[i] + 1,
81                                  &readfds, NULL, NULL, NULL);
...                                                 ^^^^------------
timeout

SELECT(2)
       timeout is an upper bound on the amount of time elapsed before
select()
       returns.   If  both  fields  of  the  timeval  stucture  are zero,
then
       select() returns immediately.  (This is useful for polling.)  If 
time-
       out is NULL (no timeout), select() can block indefinitely.

Providing a timeout value to select() would work around this issue
because
another select() will take place and the queue data should be provided. 


Flavio


This event sent from IssueTracker by fleitner  [SEG - Kernel]
 issue 233481

Comment 21 Issue Tracker 2009-04-17 07:27:09 UTC

Hi Umeya-san,

> Hi. Uploading kernel-smp. This contains following patches from 446409
and 433685:
> 4c803516a9150ea8ae949071f3640b0d28ed0369
> c8b5e8bd9f625a1266b337d20508d7e2290d8c34

These patches are included in 4.7, aren't they?
Unfortunately, this problem also occurred on -78.0.13.
These patches will be ineffectual.


Internal Status set to 'Waiting on Support'
Status set to: Waiting on Tech

This event sent from IssueTracker by Y.Sonoda 
 issue 233481

Comment 22 Issue Tracker 2009-04-17 07:57:26 UTC

> Let me double-check here. Te last comment from Hitachi indicates
> this didn't occur on -78* kernel. Is your indication correct?

Yes. The problem occurred on -78.0.13 after that comment.

Current status:
> kernel-smp-2.6.9-55.0.12.EL  => occurred
> kernel-smp-2.6.9-67.EL       => occurred
> kernel-smp-2.6.9-67.0.1.EL   => not tested
> kernel-smp-2.6.9-67.0.4.EL   => occurred
> kernel-smp-2.6.9-67.0.7.EL   => occurred
> kernel-smp-2.6.9-67.0.15.EL  => occurred   <== Updated!!
> kernel-smp-2.6.9-67.0.20.EL  => not tested
> kernel-smp-2.6.9-67.0.22.EL  => not tested
> kernel-smp-2.6.9-78.EL       => didn't occur (7days)#<---
> kernel-smp-2.6.9-78.0.13.EL  => occurred   <== Updated!!


Internal Status set to 'Waiting on Support'
Status set to: Waiting on Tech

This event sent from IssueTracker by Y.Sonoda 
 issue 233481

Comment 23 Issue Tracker 2009-04-24 12:42:51 UTC

We summarized the mechanism of this phenomenon as select-issue-en.pdf.
The attached patch, select-forever.patch is a solution of this issue.
We are testing this patch now.

Please confirm this patch and test it.



This event sent from IssueTracker by Y.Sonoda 
 issue 233481
it_file 214576

Comment 24 Flavio Leitner 2009-04-24 14:47:04 UTC

I'm thinking if the patch below would fix this issue too without adding 
the locking overhead. 

diff --git a/net/core/sock.c b/net/core/sock.c
index 4bb1732..2106029 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1529,7 +1529,7 @@ static void sock_def_error_report(struct sock *sk)
 static void sock_def_readable(struct sock *sk, int len)
 {
        read_lock(&sk->sk_callback_lock);
-       if (sk->sk_sleep && waitqueue_active(sk->sk_sleep))
+       if (sk->sk_sleep)
                wake_up_interruptible(sk->sk_sleep);
        sk_wake_async(sk,1,POLL_IN);
        read_unlock(&sk->sk_callback_lock);

Flavio

Comment 25 Jeremy West 2009-04-24 17:44:10 UTC

Created attachment 341229 [details]
Alternative patch

Comment 26 Issue Tracker 2009-05-01 02:14:12 UTC

Hi Umeya-san,
At last teleconference, you commented about RedHat's original patch.
Our engineer is interested in it, so could you kindly upload the patch ?

He wants to compare his idea and RedHat's idea. 

miki


This event sent from IssueTracker by T.Miki 
 issue 233481

Comment 27 Jiri Olsa 2009-05-04 07:36:55 UTC

(In reply to comment #26)
> Hi Umeya-san,
> At last teleconference, you commented about RedHat's original patch.
> Our engineer is interested in it, so could you kindly upload the patch ?
> 
> He wants to compare his idea and RedHat's idea. 
> 
> miki
> 
> 
> This event sent from IssueTracker by T.Miki 
>  issue 233481  

(In reply to comment #24)
> I'm thinking if the patch below would fix this issue too without adding 
> the locking overhead. 
> 
> diff --git a/net/core/sock.c b/net/core/sock.c
> index 4bb1732..2106029 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -1529,7 +1529,7 @@ static void sock_def_error_report(struct sock *sk)
>  static void sock_def_readable(struct sock *sk, int len)
>  {
>         read_lock(&sk->sk_callback_lock);
> -       if (sk->sk_sleep && waitqueue_active(sk->sk_sleep))
> +       if (sk->sk_sleep)
>                 wake_up_interruptible(sk->sk_sleep);
>         sk_wake_async(sk,1,POLL_IN);
>         read_unlock(&sk->sk_callback_lock);
> 
> Flavio  

I was able to recreate the issue in 4 days run of the test program
Now I plan to run the test again with your fix and then with the fix
with the locking overhead. 

Plz let me know if you have different test/patch priorities, or if there's another fix available.

jirka

Comment 28 Jiri Olsa 2009-05-04 07:42:19 UTC

(In reply to comment #27)
> (In reply to comment #26)
> > Hi Umeya-san,
> > At last teleconference, you commented about RedHat's original patch.
> > Our engineer is interested in it, so could you kindly upload the patch ?
> > 
> > He wants to compare his idea and RedHat's idea. 
> > 
> > miki
> > 
> > 
> > This event sent from IssueTracker by T.Miki 
> >  issue 233481  
> 
> (In reply to comment #24)
> > I'm thinking if the patch below would fix this issue too without adding 
> > the locking overhead. 
> > 
> > diff --git a/net/core/sock.c b/net/core/sock.c
> > index 4bb1732..2106029 100644
> > --- a/net/core/sock.c
> > +++ b/net/core/sock.c
> > @@ -1529,7 +1529,7 @@ static void sock_def_error_report(struct sock *sk)
> >  static void sock_def_readable(struct sock *sk, int len)
> >  {
> >         read_lock(&sk->sk_callback_lock);
> > -       if (sk->sk_sleep && waitqueue_active(sk->sk_sleep))
> > +       if (sk->sk_sleep)
> >                 wake_up_interruptible(sk->sk_sleep);
> >         sk_wake_async(sk,1,POLL_IN);
> >         read_unlock(&sk->sk_callback_lock);
> > 
> > Flavio  
> 
> I was able to recreate the issue in 4 days run of the test program
> Now I plan to run the test again with your fix and then with the fix
> with the locking overhead. 
> 
> Plz let me know if you have different test/patch priorities, or if there's
> another fix available.
> 
> jirka  

forgot to paste the kernel/server info:

- 8 processors
- Intel(R) Xeon(R) CPU           E5320  @ 1.86GHz
- kernel 2.6.9-78.ELsmp

jirka

Comment 29 Jiri Olsa 2009-05-10 22:47:30 UTC

> > > 
> > > diff --git a/net/core/sock.c b/net/core/sock.c
> > > index 4bb1732..2106029 100644
> > > --- a/net/core/sock.c
> > > +++ b/net/core/sock.c
> > > @@ -1529,7 +1529,7 @@ static void sock_def_error_report(struct sock *sk)
> > >  static void sock_def_readable(struct sock *sk, int len)
> > >  {
> > >         read_lock(&sk->sk_callback_lock);
> > > -       if (sk->sk_sleep && waitqueue_active(sk->sk_sleep))
> > > +       if (sk->sk_sleep)
> > >                 wake_up_interruptible(sk->sk_sleep);
> > >         sk_wake_async(sk,1,POLL_IN);
> > >         read_unlock(&sk->sk_callback_lock);
> > > 
> > > Flavio  

ok, 6 days run with the above patch and still cannot recreate,
looks promissing.. any news on customer side?

Comment 30 Issue Tracker 2009-05-12 09:42:18 UTC

Hi,

We think your patch is a good idea.
We are testing it in -55.0.12.EL and -78.0.13.EL, and the issue doesn't
occur yet.

  -55.0.12.EL: 5days
  -78.0.13.EL: 3days

We ask you to release a fixed kernel immediately.



This event sent from IssueTracker by Y.Sonoda 
 issue 233481

Comment 35 Jiri Olsa 2009-05-20 10:31:03 UTC

Hi,

as neither of those changes is not in the upstream,
I'm trying it to reproduce it in there.
I still dont follow the change and how come it helps the case.

jirka

Comment 38 Jiri Olsa 2009-05-22 15:04:20 UTC

so far I'm not able to reproduce it in the upstream

Also I revisited the original patch and the pdf describing the issue,
but I'd need to clarify what you  mean by 'The data which are still not reflected to memory'.

The only close case to this coming to my mind is the CPU memory operation ordering. This could be solved by memory barriers, but so far I'm not sure it applies to this issue.

Is the 'CPU memory operation ordering' what you are referring to? 

plz let me know, thanks
jirka

Comment 39 Issue Tracker 2009-05-28 09:32:19 UTC

The customer would like to know when the errata is released.
Could you clarify the schedule?



This event sent from IssueTracker by Y.Sonoda 
 issue 233481

Comment 40 Jiri Olsa 2009-05-29 09:51:16 UTC

We might have another solution not using locks. Instead of the write_lock
usage there are memory barriers placed on certain places, ensuring that
if 2 CPU meet in the incriminating part of the code, they will sync their caches and thus evade the issue.

I prepared 2 patches(attached) as I'm not sure how much invasive I can be:

v1) stays on the TCP source level
v2) puts the barrier to the wait queue code

I build both kernels and published them on 

http://people.redhat.com/jolsa/494404/

As we are not able to reproduce the issue again (nor RHEL or upstream),
it would be great you could test it to see it helps.

Preferably the memory barrier addition would be easily acceptable by upstream,
then the write lock addition. (also due to the hard reproduction)

Let me know if you think the barrier should be on another level/source.


I forwarded your question about the errata/schedule to my manager.

jirka

Comment 41 Jiri Olsa 2009-05-29 09:52:23 UTC

Created attachment 345885 [details]
memory barriers in the TCP layer

Comment 42 Jiri Olsa 2009-05-29 09:54:05 UTC

Created attachment 345886 [details]
memory barriers in the wait queue layer

Comment 43 Jiri Olsa 2009-05-29 15:10:36 UTC

after discussing with Flavio, we will go with the version 2, 
but slightly modified.

Please check the 3rd patch version attached and new adjacent kernel in the 

http://people.redhat.com/jolsa/494404/

thanks,
jirka

Comment 44 Jiri Olsa 2009-05-29 15:11:45 UTC

Created attachment 345916 [details]
memory barriers in the wait queue layer - only in the waitqueue_active function

Comment 47 Jiri Olsa 2009-06-03 06:39:42 UTC

After discussing possible fix on the mailing list it pop up, this could
be the CPU bug, see following errata:

http://www.intel.com/Assets/PDF/specupdate/315338.pdf

especially #AJ39 and #AJ18. #AJ39 has a BIOS workaround.

quoting Dave Miler:

> But if you look at those errata, all of them can be worked around
> in the BIOS, so it would be interesting to know if 1) a BIOS update
> exists for the customer's system and 2) such an upgrade makes the
> problem go away.

Let me know if you can try the BIOS update or you need more info.

jirka

Comment 48 Jiri Olsa 2009-06-03 06:44:51 UTC

Please see the attached patch tcp-layer-barriers.patch as the 
latest fix proposal.

Let me know if you can try it, though the BIOS update seems 
to have bigger priority.

jirka

Comment 49 Jiri Olsa 2009-06-03 06:45:56 UTC

Created attachment 346358 [details]
adding smp_mb calls to the tcp layer

Comment 50 Jiri Olsa 2009-06-03 14:01:02 UTC

(In reply to comment #49)
> Created an attachment (id=346358) [details]
> adding smp_mb calls to the tcp layer  

you can use following kernel with the latest change:

http://people.redhat.com/jolsa/494404/kernel-smp-2.6.9-89.2.EL_jolsa_494404_v4.x86_64.rpm

jirka

Comment 52 Issue Tracker 2009-06-05 09:33:59 UTC

Hi Umeya-san,

>Let me know if you can try the BIOS update and 

We'll confirm to our hardware division whether there is the BIOS
that contained workaround of AJ39.

>2) that all said there are a patched kernel. Would this help 
>resolve your issue? 

The patched kernel is running on two machines now.
We'll tell you the result next week.



This event sent from IssueTracker by Y.Sonoda 
 issue 233481

Comment 53 Issue Tracker 2009-06-08 13:13:36 UTC

Event posted on 2009-06-08 22:13 JST by Y.Sonoda

Hi Umeya-san,

>Let me know if you can try the BIOS update and 

We found our test machines already have the BIOS that contains
a workaround of AJ39.
Therefore, AJ39 doesn't relate to this issue.



This event sent from IssueTracker by Y.Sonoda 
 issue 233481

Comment 54 Issue Tracker 2009-06-11 08:02:26 UTC

Event posted on 2009-06-11 17:02 JST by Y.Sonoda

Hi Umeya-san,

>2) that all said there are a patched kernel. Would this help
>resolve your issue? 

At this time, the patched kernel is working well for 4days.
We are going to continue the test for a little longer.

BTW, as we talked on con-call, Hitachi would like Red Hat to release this
fix as 4.8.z by the middle of August.

Regards,


Internal Status set to 'Waiting on Support'
Status set to: Waiting on Tech

This event sent from IssueTracker by Y.Sonoda 
 issue 233481

Comment 67 Vivek Goyal 2009-08-04 13:36:33 UTC

Committed in 89.8.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/

Comment 71 Vivek Goyal 2010-10-18 21:39:01 UTC

Some errors/confusions while adding the bz to errata tool. Returning bz to MODIFIED state so that it can be added to errata.

Comment 74 Douglas Silas 2011-01-30 23:27:04 UTC

    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Due to insufficient memory barriers in the network code, a process sleeping in select() may have missed notifications about new data. In rare cases, this bug may have caused a process to sleep forever.

Comment 75 errata-xmlrpc 2011-02-16 15:19:16 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0263.html

Note You need to log in before you can comment on or make changes to this bug.