Red Hat Bugzilla – Bug 33590
kernel panic(read/write lock error) occured when network high workload was needed
Last modified: 2005-10-31 17:00:50 EST
(originally reported by Hitachi)
Test enviroment:ia64 8 way server CF-1
When they tested network high workload on the ia64 server,(I asked them the
detail about the test procedure)
they found read/write lock error.
1.Using ITP to investigate the error,they found errors in the function
(1)The second(next) address in the socket list is wrong.
They think that line that fault occured is
if (!ptype->dev || ptype->dev == skb->dev)
(2) kernel panic occured when net_rx_action() is run
1) In CPU0,each elements of socket list in function net_rx_action()
are checked one-by-one
2) In CPU1,socket release is started
3) Released adresses in CPU1 are used to distribute buffer for socket
4) In CPU0,when socket list in net_rx_action() is checked
one-by-one,since the wrong address is used,panic is occured.
in the process of socket operating,the programs below are guessed to use
read/write lock control program.
- checking out the elements of socket list in function net_rx_action()
- releasing the socket in the function dev_remove_pack()
2,depending on the investigation,the patches below are considered useful:
(1) The interval of re-distrubuting IPI(IPI_FLUSH_TLB) is set to 10 times
and receiving message of re-distributing is inviladed.
(2) Changed IDE scaning times from 10 to 2
(3) updated the brlock bug of kernel-2.4-x
(4) update all kinds of driver version
aic7xxx: 6.0.8 BETA SCSI (adaptec)
DAC960: 2.4.10 RAID (Mylex)
e100: 1.5.0 NIC (Intel)
e1000: 3.0.1 GigaNIC(Intel)
qla2x00: 4.15 beta FC (QLogic)
the useful URL is described below:
Created attachment 13924 [details]
Created attachment 13925 [details]
Created attachment 13926 [details]
Dave: is this something we should worry about?
David thinks this is probably ligit, but wants Ingo's feedback.
Created attachment 14064 [details]
Created attachment 14065 [details]
more info:machine information
The patch is 100% legit. (It's perhaps only because the write path
is so rarely used that we didnt see any problems with this code
earlier.) The bug does not corrupt memory, it's only the write-locking
semantics that were violated.
I have asked Hitachi to send us the test tools they used for network high
this patch should be in the current CVS tree, please test.
closing as resolved.