Bug 1454355

Summary: FAILED assert(0) in OSD::shutdown(), wrong ref count when snap trimming
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Harald Klein <hklein>
Component: RADOSAssignee: Greg Farnum <gfarnum>
Status: CLOSED ERRATA QA Contact: Vasishta <vashastr>
Severity: high Docs Contact:
Priority: urgent    
Version: 2.2CC: ceph-eng-bugs, ceph-qe-bugs, dzafman, gfarnum, hnallurv, kchai, kdreyer, tserlin, vumrao
Target Milestone: rc   
Target Release: 2.3   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: RHEL: ceph-10.2.7-23.el7cp Ubuntu: ceph_10.2.7-25redhat1 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-06-19 13:33:29 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Comment 9 Vikhyat Umrao 2017-05-23 12:58:50 UTC
OSD log with debug_osd = 20 and debug_ms = 1
------------------------------------------------------------------------------

-6> 2017-05-22 11:05:33.388819 7f19afe3f700 30 osd.1 pg_epoch: 99671 pg[118.5dds1( v 99666'2885404 (99663'2882372,99666'2885404] local-les=97801 n=64075 ec=88336 les/c/f 97801/97803/0 97800/97800/93925) [19,1,38,60,91,72,129,113,143] r=1 lpr=97800 pi=93920-97799/17 luod=0'0 crt=99663'2885402 active NIBBLEWISE] lock

    -5> 2017-05-22 11:05:33.400876 7f19afe3f700 20 osd.1 99671  kicking pg 118.5fas6

    -4> 2017-05-22 11:05:33.400884 7f19afe3f700 30 osd.1 pg_epoch: 99671 pg[118.5fas6( v 99666'2877560 (99663'2874491,99666'2877560] local-les=97801 n=63833 ec=88336 les/c/f 97801/97803/0 97800/97800/94024) [52,89,63,143,118,124,1,36,21] r=6 lpr=97800 pi=43320-97799/1083 luod=0'0 crt=99663'2877558 active NIBBLEWISE] lock

    -3> 2017-05-22 11:05:33.412726 7f19afe3f700 20 osd.1 99671  kicking pg 118.5ffs0

    -2> 2017-05-22 11:05:33.412732 7f19afe3f700 30 osd.1 pg_epoch: 99671 pg[118.5ffs0( v 99666'2881222 (99663'2878149,99666'2881222] local-les=97804 n=64090 ec=88336 les/c/f 97804/97821/0 97800/97800/97800) [1,39,15,124,142,111,66,79,53] r=0 lpr=97800 luod=0'0 crt=99663'2881220 lcod 99666'2881221 mlcod 0'0 active+clean+snaptrim_wait NIBBLEWISE] lock

    -1> 2017-05-22 11:05:33.412741 7f19afe3f700 -1 osd.1 99671 pgid 118.5ffs0 has ref count of 2

     0> 2017-05-22 11:05:33.426260 7f19afe3f700 -1 osd/OSD.cc: In function 'int OSD::shutdown()' thread 7f19afe3f700 time 2017-05-22 11:05:33.412745
osd/OSD.cc: 2738: FAILED assert(0)

 ceph version 10.2.5-37.0.hotfix.bz1436752.el7cp (c0c8ee4a0dc2b9c639d5688b144b47623e2505a2)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0x7f19f88460c5]
 2: (OSD::shutdown()+0x194e) [0x7f19f81a322e]
 3: (OSD::handle_signal(int)+0x126) [0x7f19f81a36f6]
 4: (SignalHandler::entry()+0x127) [0x7f19f8749c87]
 5: (()+0x7dc5) [0x7f19f6765dc5]
 6: (clone()+0x6d) [0x7f19f4df173d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Comment 21 errata-xmlrpc 2017-06-19 13:33:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:1497