Note: This bug is displayed in read-only format because
the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Created attachment 607473[details]
dmesg buffer extracted from core dump of crashed kernel
Description of problem:
OpenVZ customer reports about kernel crashes in tcp_mark_head_lost()
They newer observed such crashes before upgrade to REHL6.3 based kernel
(openVZ kernel 042stab059.7 is based on 2.6.32-279.1.1.el6 RHEL6.3 kernel)
tcp_ack() ->
if (tcp_ack_is_dubious(sk, flag))
tcp_fastretrans_alert() ->
tcp_update_scoreboard() ->
if (tcp_is_fack(tp)) ->
tcp_mark_head_lost() ->
tcp_for_write_queue_from(skb, sk) ->
skb = skb->next ===> OOPS because skb is NULL
OpenVZ kernel team belives that the problem was happen because
struct tcp_sock *tp->lost_skb_hint refers to incorrect skb:
struct sock *sk == struct tcp_sock *tp == %RDI = 0xffff8804786c3600
tcp_mark_head_lost ()
...
if (tp->lost_skb_hint) {
skb = tp->lost_skb_hint;
cnt = tp->lost_cnt_hint;
crash> struct tcp_sock.lost_skb_hint 0xffff8804786c3600
lost_skb_hint = 0xffff880716c29000
crash> struct tcp_sock.lost_cnt_hint 0xffff8804786c3600
lost_cnt_hint = 0x0
crash> sk_buff 0xffff880716c29000
struct sk_buff {
next = 0x0,
prev = 0x0,
sk = 0xffff8804786c3600,
...
Version-Release number of selected component (if applicable):
2.6.32-279.1.1.el6-based openVZ kernel 2.6.32-042stab059.7
How reproducible:
2/?
Steps to Reproduce:
unknown, obviously casuesd by staled NFS
Actual results:
crash in tcp_mark_head_lost()
Expected results:
no crashes
Additional info:
OpenVZ kernel team expect this issue should be fixed by the following mainline patch:
commit 8818a9d884e3a589899be3303958fff182e98e55
Author: Ilpo Järvinen <ilpo.jarvinen>
Date: Wed Dec 2 22:24:02 2009 -0800
tcp: clear hints to avoid a stale one (nfs only affected?)
Eric Dumazet mentioned in a context of another problem:
"Well, it seems NFS reuses its socket, so maybe we miss some
cleaning as spotted in this old patch"
I've not check under which conditions that actually happens but
if true, we need to make sure we don't accidently leave stale
hints behind when the write queue had to be purged (whether reusing
with NFS can actually happen if purging took place is something I'm
not sure of).
...At least it compiles.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen>
Signed-off-by: David S. Miller <davem>
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 28b04ff..e2d2ca2 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1229,6 +1229,7 @@ static inline void tcp_write_queue_purge(struct sock *sk)
while ((skb = __skb_dequeue(&sk->sk_write_queue)) != NULL)
sk_wmem_free_skb(sk, skb);
sk_mem_reclaim(sk);
+ tcp_clear_all_retrans_hints(tcp_sk(sk));
}
static inline struct sk_buff *tcp_write_queue_head(struct sock *sk)
Created attachment 607473 [details] dmesg buffer extracted from core dump of crashed kernel Description of problem: OpenVZ customer reports about kernel crashes in tcp_mark_head_lost() They newer observed such crashes before upgrade to REHL6.3 based kernel (openVZ kernel 042stab059.7 is based on 2.6.32-279.1.1.el6 RHEL6.3 kernel) tcp_ack() -> if (tcp_ack_is_dubious(sk, flag)) tcp_fastretrans_alert() -> tcp_update_scoreboard() -> if (tcp_is_fack(tp)) -> tcp_mark_head_lost() -> tcp_for_write_queue_from(skb, sk) -> skb = skb->next ===> OOPS because skb is NULL OpenVZ kernel team belives that the problem was happen because struct tcp_sock *tp->lost_skb_hint refers to incorrect skb: struct sock *sk == struct tcp_sock *tp == %RDI = 0xffff8804786c3600 tcp_mark_head_lost () ... if (tp->lost_skb_hint) { skb = tp->lost_skb_hint; cnt = tp->lost_cnt_hint; crash> struct tcp_sock.lost_skb_hint 0xffff8804786c3600 lost_skb_hint = 0xffff880716c29000 crash> struct tcp_sock.lost_cnt_hint 0xffff8804786c3600 lost_cnt_hint = 0x0 crash> sk_buff 0xffff880716c29000 struct sk_buff { next = 0x0, prev = 0x0, sk = 0xffff8804786c3600, ... Version-Release number of selected component (if applicable): 2.6.32-279.1.1.el6-based openVZ kernel 2.6.32-042stab059.7 How reproducible: 2/? Steps to Reproduce: unknown, obviously casuesd by staled NFS Actual results: crash in tcp_mark_head_lost() Expected results: no crashes Additional info: OpenVZ kernel team expect this issue should be fixed by the following mainline patch: commit 8818a9d884e3a589899be3303958fff182e98e55 Author: Ilpo Järvinen <ilpo.jarvinen> Date: Wed Dec 2 22:24:02 2009 -0800 tcp: clear hints to avoid a stale one (nfs only affected?) Eric Dumazet mentioned in a context of another problem: "Well, it seems NFS reuses its socket, so maybe we miss some cleaning as spotted in this old patch" I've not check under which conditions that actually happens but if true, we need to make sure we don't accidently leave stale hints behind when the write queue had to be purged (whether reusing with NFS can actually happen if purging took place is something I'm not sure of). ...At least it compiles. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen> Signed-off-by: David S. Miller <davem> diff --git a/include/net/tcp.h b/include/net/tcp.h index 28b04ff..e2d2ca2 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -1229,6 +1229,7 @@ static inline void tcp_write_queue_purge(struct sock *sk) while ((skb = __skb_dequeue(&sk->sk_write_queue)) != NULL) sk_wmem_free_skb(sk, skb); sk_mem_reclaim(sk); + tcp_clear_all_retrans_hints(tcp_sk(sk)); } static inline struct sk_buff *tcp_write_queue_head(struct sock *sk)