Bug 1277329 - Core dump when running openshift for several days
Summary: Core dump when running openshift for several days
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 3.0.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: Dan Mace
QA Contact: Jianwei Hou
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-11-03 04:57 UTC by Anping Li
Modified: 2016-01-26 19:16 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-01-26 19:16:38 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2016:0070 0 normal SHIPPED_LIVE Important: Red Hat OpenShift Enterprise 3.1.1 bug fix and enhancement update 2016-01-27 00:12:41 UTC

Description Anping Li 2015-11-03 04:57:41 UTC
Description:
During system testing, there are lot of 'oc login' and 'oc new-projects'. A core dump appears on master.


Version-Release number of selected component (if applicable):
openshift version
openshift v3.0.2.903-114-g2849767
kubernetes v1.2.0-alpha.1-1107-g4c8e6f4
etcd 2.1.2

Environment:
  Linux 10.66.79.249 3.10.0-326.el7.x86_64
  8GB RAM | 2 VCPU | 40.0GB Disk

How reproducible:
two during testing.  t

Steps to Reproduce:

1. Set Openshift Environment
2. Run system testing
   About 500 users log in one by one
   Create some new-project and add new-applications

Actual Result:
The master core dump
-rw-------. 1 root root 9.3G Nov  3 06:34 /var/lib/origin/core.7271

Expected Result:
No core dump appears.

Comment 6 Paul Weil 2015-11-09 15:48:06 UTC
Based on https://github.com/openshift/origin/issues/5737#issuecomment-154767531 I am marking this upcoming release.  

https://github.com/openshift/origin/pull/5760 should help and https://github.com/openshift/origin/pull/5791 is being reviewed.

Comment 7 Anping Li 2015-11-09 23:56:28 UTC
just add a note:
The master process was restarted in a longevity running. sometimes, the coredump was created. sometime, no coredump .

Comment 8 Anping Li 2015-11-16 06:19:02 UTC
also found core files in nodes, list files names here.

Node1:
-rw-------. 1 root root 104706048 Nov  2 01:03 core.7407
-rw-------. 1 root root 105684992 Nov  2 01:09 core.7582

Node2:
[root@10 origin]# ll
total 4767976
-rw-------. 1 root root 431923200 Nov  7 23:22 core.103285
-rw-------. 1 root root 342179840 Nov  8 02:51 core.27960
-rw-------. 1 root root 326107136 Nov  8 05:30 core.37090
-rw-------. 1 root root 298287104 Nov  8 06:50 core.43346
-rw-------. 1 root root 294760448 Nov  8 08:10 core.46545
-rw-------. 1 root root 266563584 Nov  8 09:32 core.48717
-rw-------. 1 root root 280465408 Nov  8 11:33 core.50656
-rw-------. 1 root root 238465024 Nov  8 12:01 core.53502
-rw-------. 1 root root 265490432 Nov  8 12:26 core.54257
-rw-------. 1 root root 277458944 Nov  8 13:14 core.54986
-rw-------. 1 root root 243896320 Nov  8 13:56 core.56204
-rw-------. 1 root root 238051328 Nov  8 14:10 core.57402
-rw-------. 1 root root 338337792 Nov  8 20:17 core.57745
-rw-------. 1 root root 248844288 Nov  8 21:00 core.71455
-rw-------. 1 root root 309309440 Nov  8 23:32 core.72688
-rw-------. 1 root root 770785280 Nov  7 04:43 core.976

Comment 9 Anping Li 2015-11-18 01:26:19 UTC
I have store all core dump files, leave a message me if anyone need them.

Comment 11 Dan Mace 2016-01-07 14:02:11 UTC
Let's re-test to see if the core dumps are still occurring since the referenced PRs have been merged. Several memory leaks have been plugged since this issue was filed which could have been responsible for the crashes.

Comment 12 Anping Li 2016-01-08 07:32:32 UTC
I Will run testing about 3 days and update the result.

Comment 13 Anping Li 2016-01-12 03:04:17 UTC
Run reliability testing for 4 days, there is no core dump. so move bug to verified.

Comment 15 errata-xmlrpc 2016-01-26 19:16:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2016:0070


Note You need to log in before you can comment on or make changes to this bug.