Bug 1884739 - Node process segfaulted
Summary: Node process segfaulted
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: RHCOS
Version: 4.6
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: ---
: 4.7.0
Assignee: Kelvin Fan
QA Contact: Michael Nguyen
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-10-02 18:24 UTC by Michael McCune
Modified: 2021-02-24 15:22 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: contained a recursive call that was overflowing the stack. Consequence: process seg fault Fix: fix logic error in recursive call Result: no more seg faults
Clone Of:
Environment:
Node process segfaulted
Last Closed: 2021-02-24 15:22:23 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2020:5633 0 None None None 2021-02-24 15:22:55 UTC

Description Michael McCune 2020-10-02 18:24:22 UTC
test:
Node process segfaulted 

is failing frequently in CI, see search results:
https://search.ci.openshift.org/?maxAge=168h&context=1&type=bug%2Bjunit&name=&maxMatches=5&maxBytes=20971520&groupBy=job&search=Node+process+segfaulted

https://prow.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-ovn-4.6/1312061952540807168

nodes/ip-10-0-249-199.us-east-2.compute.internal/journal.gz:Oct 02 16:21:44.182453 ip-10-0-249-199 kernel: gensnippet_if[1443]: segfault at 7ffc7fdbefc8 ip 00007f720a676af5 sp 00007ffc7fdbefb0 error 6 in libc-2.28.so[7f720a64f000+1b9000]
nodes/ip-10-0-249-199.us-east-2.compute.internal/journal.gz:Oct 02 16:22:06.595875 ip-10-0-249-199 kernel: gensnippet_if[6038]: segfault at 7ffc600a0ff8 ip 00007f62c38fcadf sp 00007ffc600a1000 error 6 in libc-2.28.so[7f62c38d5000+1b9000]
nodes/ip-10-0-249-199.us-east-2.compute.internal/journal.gz:Oct 02 16:22:31.868243 ip-10-0-249-199 kernel: gensnippet_if[10458]: segfault at 7ffee4a93f10 ip 000055a05a1b25d9 sp 00007ffee4a93ec0 error 6 in bash[55a05a179000+108000]

Comment 1 Tom Sweeney 2020-10-02 18:43:58 UTC
Seth this is on the OCP 4.6 list, can you please evaluate.

Comment 3 Kelvin Fan 2020-10-02 19:11:26 UTC
This is indeed caused by the same bug as https://bugzilla.redhat.com/show_bug.cgi?id=1884236.
gensnippet_if contains a recursive call that is overflowing the stack.

The fix is now in console-login-helper-messages v0.19-3 and is now sync'ed into the 4.6 plashets.

Comment 4 Micah Abbott 2020-10-05 13:13:06 UTC
The new version of `console-login-helper-messages` landed in 46.82.202010021940-0 which is available in the latest nightly-4.6 payloads

Comment 5 David Eads 2020-10-07 20:24:38 UTC
Node process segfaulted

seeing if that helps the sippy test linking

Comment 7 Michael Nguyen 2020-10-27 01:55:19 UTC
$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.7.0-0.nightly-2020-10-26-124513   True        False         5h56m   Cluster version is 4.7.0-0.nightly-2020-10-26-124513
$ oc get nodes
NAME                                         STATUS   ROLES    AGE     VERSION
ip-10-0-136-111.us-west-2.compute.internal   Ready    worker   6h11m   v1.19.0+e67f5dc
ip-10-0-138-139.us-west-2.compute.internal   Ready    master   6h16m   v1.19.0+e67f5dc
ip-10-0-166-15.us-west-2.compute.internal    Ready    worker   6h11m   v1.19.0+e67f5dc
ip-10-0-187-209.us-west-2.compute.internal   Ready    master   6h16m   v1.19.0+e67f5dc
ip-10-0-218-114.us-west-2.compute.internal   Ready    worker   6h6m    v1.19.0+e67f5dc
ip-10-0-219-131.us-west-2.compute.internal   Ready    master   6h17m   v1.19.0+e67f5dc

$ oc debug node/ip-10-0-218-114.us-west-2.compute.internal -- chroot /host rpm -q console-login-helper-messages
Starting pod/ip-10-0-218-114us-west-2computeinternal-debug ...
To use host binaries, run `chroot /host`
console-login-helper-messages-0.19-3.rhaos4.6.el8.noarch

Removing debug pod ...
$

Comment 10 errata-xmlrpc 2021-02-24 15:22:23 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633


Note You need to log in before you can comment on or make changes to this bug.