Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1372647

Summary:

Async cross data center replication leads to OOM error under load

Product:

[JBoss] JBoss Data Grid 6

Reporter:

Martin Gencur <mgencur>

Component:

Performance

Assignee:

Tristan Tarrant <ttarrant>

Status:

CLOSED UPSTREAM

QA Contact:

Martin Gencur <mgencur>

Severity:

urgent

Docs Contact:

Priority:

unspecified

Version:

6.6.1

CC:

afield

Target Milestone:

---

Target Release:

---

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2025-02-10 03:49:13 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
Logs and config files	none

Description Martin Gencur 2016-09-02 09:37:06 UTC

Created attachment 1197102 [details]
Logs and config files

During a write-heave load test JDG throws an OutOfMemoryError on the backup data center. This happens after ~10 minutes of heavy load (6 HotRod client writing without any delay between requests, the overall load is about 1300 reqs/s with 33kB values, write-only).

Description of test scenario:
* 2 data centers (data center LON with nodes A,B; data center NYC with nodes C,D) with two JDG servers in each DC
* six HotRod clients writing data only in LON (33kB values, writing as quickly as possible)
* ASYNC replication between DCs
* JGroups is using multiple site masters set to 2 (all nodes are site masters)

The logs from individual nodes show the following pattern:
1) node C (in receiving data center NYC): [GC (Allocation Failure) [PSYoungGen: 1048576K->56409K(1223168K)] 1133628K->141469K(4019712K), 0.1291044 secs] [Times: user=0.46 sys=0.01, real=0.13 secs]
2) node C: java.lang.OutOfMemoryError: Java heap space
3) node A (in sending data center LON): WARN  [org.jgroups.protocols.TCP] (HotRodServerWorker-3) Discarding message because TCP send_queue is full and hasn't been releasing for 300 ms
4) node A: WARN  [org.jgroups.protocols.TCP] (ConnectionMap.Acceptor [172.18.1.4:7610]) JGRP000006: failed accepting connection from peer: java.net.SocketTimeoutException: Read timed out
5) node A: ERROR [org.jgroups.protocols.relay.RELAY2] (HotRodServerWorker-3) node0/LON: no route to NYC: dropping message

We also created a heap dump on OOM error on node C and the interesting part is following:

Class Name                          | Shallow Heap | Retained Heap
-------------------------------------------------------------------
org.jgroups.util.Table @ 0x7002337c0|          112 | 3,043,681,288
org.infinispan.container.DefaultDataContainer @ 0x700190b50|           56 |   331,759,576
-------------------------------------------------------------------

Note: Overall heap is 4 GB. We keep writing only ten thousand entries, 33kB each, which gives 330 MB overall (this corresponds to the data container value above).

Attaching logs and config files. Nodes edg-perf02, edg-perf03 are nodes A,B from the description above; edg-perf04, edg-perf05 are nodes C,D; nodes edg-perf06, edg-perf07 are nodes with HotRod clients (but only edg-perf06 writes data)

Comment 5 Red Hat Bugzilla 2025-02-10 03:49:13 UTC

This product has been discontinued or is no longer tracked in Red Hat Bugzilla.