Bug 1277329

Summary: Core dump when running openshift for several days
Product: OpenShift Container Platform Reporter: Anping Li <anli>
Component: NodeAssignee: Dan Mace <dmace>
Status: CLOSED ERRATA QA Contact: Jianwei Hou <jhou>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.0.0CC: aos-bugs, bleanhar, jeder, jokerman, mmccomas, pweil, xtian
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-01-26 19:16:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Anping Li 2015-11-03 04:57:41 UTC
Description:
During system testing, there are lot of 'oc login' and 'oc new-projects'. A core dump appears on master.


Version-Release number of selected component (if applicable):
openshift version
openshift v3.0.2.903-114-g2849767
kubernetes v1.2.0-alpha.1-1107-g4c8e6f4
etcd 2.1.2

Environment:
  Linux 10.66.79.249 3.10.0-326.el7.x86_64
  8GB RAM | 2 VCPU | 40.0GB Disk

How reproducible:
two during testing.  t

Steps to Reproduce:

1. Set Openshift Environment
2. Run system testing
   About 500 users log in one by one
   Create some new-project and add new-applications

Actual Result:
The master core dump
-rw-------. 1 root root 9.3G Nov  3 06:34 /var/lib/origin/core.7271

Expected Result:
No core dump appears.

Comment 6 Paul Weil 2015-11-09 15:48:06 UTC
Based on https://github.com/openshift/origin/issues/5737#issuecomment-154767531 I am marking this upcoming release.  

https://github.com/openshift/origin/pull/5760 should help and https://github.com/openshift/origin/pull/5791 is being reviewed.

Comment 7 Anping Li 2015-11-09 23:56:28 UTC
just add a note:
The master process was restarted in a longevity running. sometimes, the coredump was created. sometime, no coredump .

Comment 8 Anping Li 2015-11-16 06:19:02 UTC
also found core files in nodes, list files names here.

Node1:
-rw-------. 1 root root 104706048 Nov  2 01:03 core.7407
-rw-------. 1 root root 105684992 Nov  2 01:09 core.7582

Node2:
[root@10 origin]# ll
total 4767976
-rw-------. 1 root root 431923200 Nov  7 23:22 core.103285
-rw-------. 1 root root 342179840 Nov  8 02:51 core.27960
-rw-------. 1 root root 326107136 Nov  8 05:30 core.37090
-rw-------. 1 root root 298287104 Nov  8 06:50 core.43346
-rw-------. 1 root root 294760448 Nov  8 08:10 core.46545
-rw-------. 1 root root 266563584 Nov  8 09:32 core.48717
-rw-------. 1 root root 280465408 Nov  8 11:33 core.50656
-rw-------. 1 root root 238465024 Nov  8 12:01 core.53502
-rw-------. 1 root root 265490432 Nov  8 12:26 core.54257
-rw-------. 1 root root 277458944 Nov  8 13:14 core.54986
-rw-------. 1 root root 243896320 Nov  8 13:56 core.56204
-rw-------. 1 root root 238051328 Nov  8 14:10 core.57402
-rw-------. 1 root root 338337792 Nov  8 20:17 core.57745
-rw-------. 1 root root 248844288 Nov  8 21:00 core.71455
-rw-------. 1 root root 309309440 Nov  8 23:32 core.72688
-rw-------. 1 root root 770785280 Nov  7 04:43 core.976

Comment 9 Anping Li 2015-11-18 01:26:19 UTC
I have store all core dump files, leave a message me if anyone need them.

Comment 11 Dan Mace 2016-01-07 14:02:11 UTC
Let's re-test to see if the core dumps are still occurring since the referenced PRs have been merged. Several memory leaks have been plugged since this issue was filed which could have been responsible for the crashes.

Comment 12 Anping Li 2016-01-08 07:32:32 UTC
I Will run testing about 3 days and update the result.

Comment 13 Anping Li 2016-01-12 03:04:17 UTC
Run reliability testing for 4 days, there is no core dump. so move bug to verified.

Comment 15 errata-xmlrpc 2016-01-26 19:16:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2016:0070