Bug 1255830

Summary: FAILED assert(m_seed < old_pg_num) in librbd when increasing placement groups
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Jason Dillaman <jdillama>
Component: RBDAssignee: Josh Durgin <jdurgin>
Status: CLOSED ERRATA QA Contact: ceph-qe-bugs <ceph-qe-bugs>
Severity: high Docs Contact:
Priority: high    
Version: 1.3.0CC: bhubbard, ceph-eng-bugs, ceph-qe-bugs, flucifre, jdillama, jdurgin, kdreyer, tganguly, tmuthami, vumrao
Target Milestone: rc   
Target Release: 1.3.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ceph-0.94.2-8.el7cp Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1255827
: 1258625 (view as bug list) Environment:
Last Closed: 2015-11-23 20:22:15 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Comment 5 Tanay Ganguly 2015-10-28 10:33:21 UTC
Jason,

I tried the following and it worked fine, please let me know if there is any further validation required.

Below are the 2 use case by which i verified.

1st Use case:

1. Created rbd image, took snapshot and cloned it.
2. Mapped it to a VM and installed OS on top of it.
3. Started writing some IO using dd, then abruptly shutdown the VM.

Meanwhile i opened a watch instance on that image, while shutting down the VM.
rbd watch --pool Tanay-RBD testingClone_new1

While all this was happening, i incremented the pg_num and pgp_num in a loop (50 times )for that pool.

I didn't see any FAILED assert error message in watch/log.

Only thing i am seeing in watch is:

[root@cephqe3 ceph-config]# rbd watch --pool Tanay-RBD testingClone_new1
press enter to exit...
testingClone_new1 received notification: notify_id=3435973836800, cookie=48502592, notifier_id=1467375, bl.length=26
testingClone_new1 received notification: notify_id=3435973836801, cookie=48502592, notifier_id=1468015, bl.length=26
testingClone_new1 received notification: notify_id=3453153705986, cookie=48502592, notifier_id=1469010, bl.length=26


2nd Use case is:

1. Created rbd image, took snapshot and cloned it.
2. Exercised librbd api calls, image.write and image.read


Meanwhile i opened a watch instance on that image.
rbd watch --pool Tanay-RBD testingClone_new3

While all this was happening, i incremented the pg_num and pgp_num in a loop (50 times )for that pool 


Thanks.

Comment 6 Jason Dillaman 2015-10-28 14:10:35 UTC
Your first test showed that the watch was still established since you received messages.  For future reference, you can just use 'rados -p <pool> listwatchers <image  name>.rbd' (assuming RBD image format 1 for ease of header lookup) to verify that the image still has a valid watch after resizing the PG.

Comment 7 Tanay Ganguly 2015-10-28 14:36:52 UTC
Marking this Bug as Verified.

Comment 9 errata-xmlrpc 2015-11-23 20:22:15 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2015:2512

Comment 10 Siddharth Sharma 2015-11-23 21:53:23 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2015:2066