Bug 1933792 - ceph crashes with gperftools 2.8
Summary: ceph crashes with gperftools 2.8
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora EPEL
Classification: Fedora
Component: gperftools
Version: epel8
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Ken Dreyer (Red Hat)
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-03-01 17:56 UTC by Ken Dreyer (Red Hat)
Modified: 2021-03-17 00:35 UTC (History)
5 users (show)

Fixed In Version: gperftools-2.7-9.el8
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-03-17 00:35:08 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Ceph Project Bug Tracker 49240 0 None None None 2021-03-01 18:02:25 UTC
Ceph Project Bug Tracker 49387 0 None None None 2021-03-01 18:02:33 UTC
Red Hat Bugzilla 1933756 1 unspecified CLOSED ceph crashes due to a bug in tcmalloc's malloc; it's fixed in gperftools-2.8.1 2021-03-10 00:42:03 UTC

Internal Links: 1933756

Description Ken Dreyer (Red Hat) 2021-03-01 17:56:45 UTC
Description of problem:
The upstream Ceph project developers have found that Ceph crashes semi-consistently with the recent update of gperftools-libs from 2.7 to 2.8 in EPEL 8.

gperftools-libs-2.7-6.el8 has been stable in EPEL 8 for a long time. When we updated to 2.8 in https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2021-dd6932436d , we started seeing crashes in gperftools in Ceph's testing automated framework.

Version-Release number of selected component (if applicable):
gperftools-2.8-1.el8

How reproducible:
fairly reproducible given enough stress-testing over time

Steps to Reproduce:
See the bug reports at https://tracker.ceph.com/issues/49240 and https://tracker.ceph.com/issues/49387

Actual results:
Ceph crashes in gperftools

Expected results:
Ceph does not crash in gperftools

Additional info:
I'm planning to reset back to gperftools-libs-2.7-6.el8 and bump the Epoch and Release in order to resolve this.

Note Fedora 33 or 34 is likely to be the default branching point for EPEL 9. Fedora 34 has gperftools 2.9 (bz 1931259).

Comment 1 Ken Dreyer (Red Hat) 2021-03-01 18:05:43 UTC
In bug 1933756, Casey mentioned this is resolved in 2.8.1, so we could update EPEL 8 to that version instead.

Comment 2 Ken Dreyer (Red Hat) 2021-03-01 18:24:44 UTC
After discussion in #ceph-devel, I'm going to push to 2.7 with an epoch bump to epel8. Given that no other distros besides Fedora have updated to 2.8 and some other upstream maintenance factors, we need wider testing at this point.

Comment 3 Tom "spot" Callaway 2021-03-01 19:15:29 UTC
I would prefer that we not go backwards here. If this is fixed in 2.8.1, that would be an appropriate update.

Comment 4 Kaleb KEITHLEY 2021-03-01 19:38:41 UTC
debian sid/bullseye currently has 2.8.1
OpenSUSE tumbleweed also has at 2.8.1 at present

Comment 5 Kaleb KEITHLEY 2021-03-01 19:46:53 UTC
debian sid/bullseye currently has 2.8.1,
ubuntu hirsute (21/04) currently has 2.8.1,
OpenSUSE tumbleweed also has at 2.8.1 at present.

Comment 6 Josh Durgin 2021-03-01 20:53:02 UTC
I'm concerned with going to 2.8.1 since there may be further bugs that haven't appeared in ceph's stress testing yet. In the past we've seen several issues with updates to tcmalloc in ceph.

All the distros using 2.8.1 are bleeding edge/unreleased, so it has little real-world exposure. What benefit to updating to 2.8.1 is there? It seems high risk, for little reward to me.

Comment 7 Yaakov Selkowitz 2021-03-01 21:03:02 UTC
What's odd is that 2.8 is in F33, but AFAICS no such bugs have been raised against it.

If we revert to 2.7, then the P&Z fixes will need to be backported, and ceph.spec updated yet again to reflect that.

Comment 8 Ken Dreyer (Red Hat) 2021-03-01 21:55:44 UTC
Yes, we would need to backport P&Z fixes to 2.7-6 to avoid regressing on those arches.

Several engineers from multiple teams did testing on gperftools 2.8. They reported their results in Bodhi at https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2021-dd6932436d . The problem is that this testing was not heavy enough stress testing to shake out this particular issue, and in the absence of further information, we're now stuck wondering about unknown-unknowns.

Next time we have to update gperftools in EPEL, we'll coordinate longer testing with Teuthology in the main upstream ceph lab.

Comment 9 Yaakov Selkowitz 2021-03-01 22:32:21 UTC
Filed: https://src.fedoraproject.org/rpms/gperftools/pull-request/3

Comment 10 Fedora Update System 2021-03-02 20:53:09 UTC
FEDORA-EPEL-2021-0eda4297eb has been submitted as an update to Fedora EPEL 8. https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2021-0eda4297eb

Comment 11 Fedora Update System 2021-03-03 23:24:38 UTC
FEDORA-EPEL-2021-0eda4297eb has been pushed to the Fedora EPEL 8 testing repository.

You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2021-0eda4297eb

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 12 Fedora Update System 2021-03-12 09:59:02 UTC
FEDORA-EPEL-2021-0eda4297eb has been submitted as an update to Fedora EPEL 8. https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2021-0eda4297eb

Comment 13 Fedora Update System 2021-03-12 20:28:41 UTC
FEDORA-EPEL-2021-0eda4297eb has been pushed to the Fedora EPEL 8 testing repository.

You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2021-0eda4297eb

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 14 Fedora Update System 2021-03-17 00:35:08 UTC
FEDORA-EPEL-2021-0eda4297eb has been pushed to the Fedora EPEL 8 stable repository.
If problem still persists, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.