Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 852384

Summary: kernel crash in tcp_mark_head_lost()
Product: Red Hat Enterprise Linux 6 Reporter: Vasily Averin <vvs>
Component: kernelAssignee: Red Hat Kernel Manager <kernel-mgr>
Status: CLOSED DUPLICATE QA Contact: Network QE <network-qe>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 6.3CC: haliu, khorenko, kzhang
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-09-24 01:08:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
dmesg buffer extracted from core dump of crashed kernel none

Description Vasily Averin 2012-08-28 10:42:29 UTC
Created attachment 607473 [details]
dmesg buffer extracted from core dump of crashed kernel

Description of problem:
OpenVZ customer reports about kernel crashes in tcp_mark_head_lost()
They newer observed such crashes before upgrade to REHL6.3 based kernel
(openVZ kernel 042stab059.7 is based on 2.6.32-279.1.1.el6 RHEL6.3 kernel)

tcp_ack() ->
  if (tcp_ack_is_dubious(sk, flag))
    tcp_fastretrans_alert() ->
      tcp_update_scoreboard() -> 
        if (tcp_is_fack(tp)) ->
          tcp_mark_head_lost() ->
            tcp_for_write_queue_from(skb, sk) ->
                 skb = skb->next ===> OOPS because skb is NULL

OpenVZ kernel team belives that the problem was happen because 
struct tcp_sock *tp->lost_skb_hint refers to incorrect skb:

struct sock *sk == struct tcp_sock *tp == %RDI = 0xffff8804786c3600

tcp_mark_head_lost ()
...
        if (tp->lost_skb_hint) {
                skb = tp->lost_skb_hint;
                cnt = tp->lost_cnt_hint;

crash>  struct tcp_sock.lost_skb_hint 0xffff8804786c3600
   lost_skb_hint = 0xffff880716c29000
crash>  struct tcp_sock.lost_cnt_hint 0xffff8804786c3600
   lost_cnt_hint = 0x0
crash>  sk_buff 0xffff880716c29000
struct sk_buff {
   next = 0x0,
   prev = 0x0,
   sk = 0xffff8804786c3600,
...

Version-Release number of selected component (if applicable):
2.6.32-279.1.1.el6-based openVZ kernel 2.6.32-042stab059.7

How reproducible:
2/?

Steps to Reproduce:
unknown, obviously casuesd by staled NFS

Actual results:
crash in tcp_mark_head_lost()

Expected results:
no crashes

Additional info:
OpenVZ kernel team expect this issue should be fixed by the following mainline patch:

commit 8818a9d884e3a589899be3303958fff182e98e55
Author: Ilpo Järvinen <ilpo.jarvinen>
Date:   Wed Dec 2 22:24:02 2009 -0800

    tcp: clear hints to avoid a stale one (nfs only affected?)

    Eric Dumazet mentioned in a context of another problem:

    "Well, it seems NFS reuses its socket, so maybe we miss some
    cleaning as spotted in this old patch"

    I've not check under which conditions that actually happens but
    if true, we need to make sure we don't accidently leave stale
    hints behind when the write queue had to be purged (whether reusing
    with NFS can actually happen if purging took place is something I'm
    not sure of).

    ...At least it compiles.

    Signed-off-by: Ilpo Järvinen <ilpo.jarvinen>
    Signed-off-by: David S. Miller <davem>

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 28b04ff..e2d2ca2 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1229,6 +1229,7 @@ static inline void tcp_write_queue_purge(struct sock *sk)
        while ((skb = __skb_dequeue(&sk->sk_write_queue)) != NULL)
                sk_wmem_free_skb(sk, skb);
        sk_mem_reclaim(sk);
+       tcp_clear_all_retrans_hints(tcp_sk(sk));
 }

 static inline struct sk_buff *tcp_write_queue_head(struct sock *sk)

Comment 2 Hangbin Liu 2012-09-24 01:08:48 UTC

*** This bug has been marked as a duplicate of bug 807704 ***