Bug 33590 - kernel panic(read/write lock error) occured when network high workload was needed
Summary: kernel panic(read/write lock error) occured when network high workload was ne...
Alias: None
Product: Red Hat Linux
Classification: Retired
Component: kernel   
(Show other bugs)
Version: 7.3
Hardware: ia64 Linux
Target Milestone: ---
Assignee: Ingo Molnar
QA Contact: Brock Organ
Depends On:
TreeView+ depends on / blocked
Reported: 2001-03-28 11:15 UTC by Bill Huang
Modified: 2005-10-31 22:00 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2001-04-16 09:13:36 UTC
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
patch_ipi_tlb_resend_do_nothing_k240.diff (1.17 KB, patch)
2001-03-28 11:17 UTC, Bill Huang
no flags Details | Diff
patch_ide_scan-k241.diff (310 bytes, patch)
2001-03-28 11:18 UTC, Bill Huang
no flags Details | Diff
br_writelock.diff (731 bytes, patch)
2001-03-28 11:19 UTC, Bill Huang
no flags Details | Diff
more info:panic_msg (995 bytes, patch)
2001-03-29 03:59 UTC, Bill Huang
no flags Details | Diff
more info:machine information (4.41 KB, patch)
2001-03-29 04:00 UTC, Bill Huang
no flags Details | Diff

Description Bill Huang 2001-03-28 11:15:53 UTC
(originally reported by Hitachi)
Test enviroment:ia64 8 way server CF-1
When they tested network high workload on the ia64 server,(I asked them the
detail about the test procedure)
they found read/write lock error.

1.Using ITP to investigate the error,they found errors in the function
 (1)The second(next) address in the socket list is wrong.
    They think that line that fault occured is
      if (!ptype->dev || ptype->dev == skb->dev)
 (2) kernel panic occured when net_rx_action() is run
     1) In CPU0,each elements of socket list in function net_rx_action()
are checked one-by-one
     2) In CPU1,socket release is started
     3) Released adresses in CPU1 are used to distribute buffer for socket
     4) In CPU0,when socket list in net_rx_action() is checked
one-by-one,since the wrong address is used,panic is occured.
   in the process of socket operating,the programs below are guessed to use
read/write lock control program.
     - checking out the elements of socket list in function net_rx_action()
     - releasing the socket in the function dev_remove_pack()
2,depending on the investigation,the patches below are considered useful:

(1) The interval of re-distrubuting IPI(IPI_FLUSH_TLB) is set to 10 times
of original 
    and receiving message of re-distributing is inviladed.
(2) Changed IDE scaning times from 10 to 2 
(3) updated the brlock bug of kernel-2.4-x
(4) update all kinds of driver version
    aic7xxx:  6.0.8 BETA  SCSI (adaptec)
    DAC960:   2.4.10      RAID (Mylex)
    e100:     1.5.0       NIC  (Intel)
    e1000:    3.0.1       GigaNIC(Intel)
    qla2x00:  4.15 beta   FC    (QLogic)

the useful URL is described below:

Comment 1 Bill Huang 2001-03-28 11:17:11 UTC
Created attachment 13924 [details]

Comment 2 Bill Huang 2001-03-28 11:18:10 UTC
Created attachment 13925 [details]

Comment 3 Bill Huang 2001-03-28 11:19:53 UTC
Created attachment 13926 [details]

Comment 4 Arjan van de Ven 2001-03-28 16:18:08 UTC
Dave: is this something we should worry about?

Comment 5 Michael K. Johnson 2001-03-28 23:10:06 UTC
David thinks this is probably ligit, but wants Ingo's feedback.

Comment 6 Bill Huang 2001-03-29 03:59:46 UTC
Created attachment 14064 [details]
more info:panic_msg

Comment 7 Bill Huang 2001-03-29 04:00:53 UTC
Created attachment 14065 [details]
more info:machine information

Comment 8 Ingo Molnar 2001-03-29 07:10:40 UTC
The patch is 100% legit. (It's perhaps only because the write path
is so rarely used that we didnt see any problems with this code
earlier.) The bug does not corrupt memory, it's only the write-locking
semantics that were violated.

Comment 9 Bill Huang 2001-03-29 07:46:14 UTC
I have asked Hitachi to send us the test tools they used for network high

Comment 10 Ingo Molnar 2001-04-16 09:13:31 UTC
this patch should be in the current CVS tree, please test.

Comment 11 Bill Nottingham 2001-05-29 18:00:52 UTC
closing as resolved.

Note You need to log in before you can comment on or make changes to this bug.