Description: During system testing, there are lot of 'oc login' and 'oc new-projects'. A core dump appears on master. Version-Release number of selected component (if applicable): openshift version openshift v3.0.2.903-114-g2849767 kubernetes v1.2.0-alpha.1-1107-g4c8e6f4 etcd 2.1.2 Environment: Linux 10.66.79.249 3.10.0-326.el7.x86_64 8GB RAM | 2 VCPU | 40.0GB Disk How reproducible: two during testing. t Steps to Reproduce: 1. Set Openshift Environment 2. Run system testing About 500 users log in one by one Create some new-project and add new-applications Actual Result: The master core dump -rw-------. 1 root root 9.3G Nov 3 06:34 /var/lib/origin/core.7271 Expected Result: No core dump appears.
Based on https://github.com/openshift/origin/issues/5737#issuecomment-154767531 I am marking this upcoming release. https://github.com/openshift/origin/pull/5760 should help and https://github.com/openshift/origin/pull/5791 is being reviewed.
just add a note: The master process was restarted in a longevity running. sometimes, the coredump was created. sometime, no coredump .
also found core files in nodes, list files names here. Node1: -rw-------. 1 root root 104706048 Nov 2 01:03 core.7407 -rw-------. 1 root root 105684992 Nov 2 01:09 core.7582 Node2: [root@10 origin]# ll total 4767976 -rw-------. 1 root root 431923200 Nov 7 23:22 core.103285 -rw-------. 1 root root 342179840 Nov 8 02:51 core.27960 -rw-------. 1 root root 326107136 Nov 8 05:30 core.37090 -rw-------. 1 root root 298287104 Nov 8 06:50 core.43346 -rw-------. 1 root root 294760448 Nov 8 08:10 core.46545 -rw-------. 1 root root 266563584 Nov 8 09:32 core.48717 -rw-------. 1 root root 280465408 Nov 8 11:33 core.50656 -rw-------. 1 root root 238465024 Nov 8 12:01 core.53502 -rw-------. 1 root root 265490432 Nov 8 12:26 core.54257 -rw-------. 1 root root 277458944 Nov 8 13:14 core.54986 -rw-------. 1 root root 243896320 Nov 8 13:56 core.56204 -rw-------. 1 root root 238051328 Nov 8 14:10 core.57402 -rw-------. 1 root root 338337792 Nov 8 20:17 core.57745 -rw-------. 1 root root 248844288 Nov 8 21:00 core.71455 -rw-------. 1 root root 309309440 Nov 8 23:32 core.72688 -rw-------. 1 root root 770785280 Nov 7 04:43 core.976
I have store all core dump files, leave a message me if anyone need them.
Let's re-test to see if the core dumps are still occurring since the referenced PRs have been merged. Several memory leaks have been plugged since this issue was filed which could have been responsible for the crashes.
I Will run testing about 3 days and update the result.
Run reliability testing for 4 days, there is no core dump. so move bug to verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2016:0070