1253378 – Cloned bugs have wrong X-Bugzilla-Who in mail header

Bug 1253378 - Cloned bugs have wrong X-Bugzilla-Who in mail header

Summary: Cloned bugs have wrong X-Bugzilla-Who in mail header

Keywords:
Status:	CLOSED DUPLICATE of bug 1145117
Alias:	None
Product:	Bugzilla
Classification:	Community
Component:	Email Notifications
Sub Component:
Version:	4.4
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	PnT DevOps Devs
QA Contact:	tools-bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2015-08-13 15:21 UTC by John Poelstra
Modified:	2015-08-13 23:35 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2015-08-13 23:35:47 UTC
Embargoed:

Attachments	(Terms of Use)

Description John Poelstra 2015-08-13 15:21:01 UTC

Description of problem:

Cloned bugs have wrong X-Bugzilla-Who in mail header

Below is the full email notification for a bug that was cloned.  For this bug that was cloned it says 

X-Bugzilla-Who: rule-engine

Since cloned bugs aren't a result of bugzilla rules engine I believe this value is wrong. 

I want to filter out all email notifications from bugzilla rules engine and I can't given this situation.

-----------------------------


Return-Path: ceph-eng-bugs-bounces
Received: from zmta03.collab.prod.int.phx2.redhat.com (LHLO
 zmta03.collab.prod.int.phx2.redhat.com) (10.5.81.10) by
 zmail12.collab.prod.int.phx2.redhat.com with LMTP; Thu, 13 Aug 2015
 09:35:05 -0400 (EDT)
Received: from int-mx11.intmail.prod.int.phx2.redhat.com (int-mx11.intmail.prod.int.phx2.redhat.com [10.5.11.24])
	by zmta03.collab.prod.int.phx2.redhat.com (Postfix) with ESMTP id 0F00342497;
	Thu, 13 Aug 2015 09:35:05 -0400 (EDT)
Received: from lists01.intranet.prod.int.phx2.redhat.com (lists01.intranet.prod.int.phx2.redhat.com [10.5.30.73])
	by int-mx11.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id t7DDZ4am021121;
	Thu, 13 Aug 2015 09:35:04 -0400
Received: from lists01.intranet.prod.int.phx2.redhat.com (localhost.localdomain [127.0.0.1])
	by lists01.intranet.prod.int.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id t7DDZ4Up010438;
	Thu, 13 Aug 2015 09:35:04 -0400
Received: from int-mx09.intmail.prod.int.phx2.redhat.com
	(int-mx09.intmail.prod.int.phx2.redhat.com [10.5.11.22])
	by lists01.intranet.prod.int.phx2.redhat.com (8.13.8/8.13.8) with ESMTP
	id t7DDZ3N6010411 for <ceph-eng-bugs.redhat.com>;
	Thu, 13 Aug 2015 09:35:03 -0400
Received: from mx1-phx2.redhat.com (mx01.util.phx2.redhat.com [10.5.26.26])
	by int-mx09.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP
	id t7DDZ38e023431
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256
	verify=FAIL)
	for <ceph-eng-bugs>; Thu, 13 Aug 2015 09:35:03 -0400
Received: from bugweb-02.app.bz.hst.phx2.redhat.com
	(bugweb-02.app.bz.hst.phx2.redhat.com [10.5.117.32])
	by mx1-phx2.redhat.com (8.14.4/8.14.4) with ESMTP id t7DDZ3BC021874
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256
	verify=NO)
	for <ceph-eng-bugs>; Thu, 13 Aug 2015 09:35:03 -0400
Received: from bugweb-02.app.bz.hst.phx2.redhat.com (localhost [127.0.0.1])
	by bugweb-02.app.bz.hst.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id
	t7DDZ2wd035452
	for <ceph-eng-bugs>; Thu, 13 Aug 2015 09:35:02 -0400
Received: (from apache@localhost)
	by bugweb-02.app.bz.hst.phx2.redhat.com (8.14.4/8.14.4/Submit) id
	t7DDZ2eT035451; Thu, 13 Aug 2015 09:35:02 -0400
X-Authentication-Warning: bugweb-02.app.bz.hst.phx2.redhat.com: apache set
	sender to bugzilla using -f
From: bugzilla
To: ceph-eng-bugs
Subject: [Bug 1253327] New: [1.3.0 Ubuntu] Installing OS on a rbd Image is
	getting crashed with Mandatory Exclusive Lock enabled in Ubuntu Cluster
Date: Thu, 13 Aug 2015 13:35:00 +0000
Precedence: bulk
Auto-Submitted: auto-generated
X-Bugzilla-Reason: CC
X-Bugzilla-Type: new
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Classification: Red Hat
X-Bugzilla-Product: Red Hat Ceph Storage
X-Bugzilla-Component: Ceph
X-Bugzilla-Sub-Component: RBD
X-Bugzilla-Version: 1.3.0
X-Bugzilla-Keywords: 
X-Bugzilla-Severity: high
X-Bugzilla-Who: rule-engine
X-Bugzilla-Status: NEW
X-Bugzilla-Priority: high
X-Bugzilla-Assigned-To: jdurgin
X-Bugzilla-Target-Milestone: rc
X-Bugzilla-Changed-Fields: bug_id short_desc product version component
	rh_sub_components bug_severity priority assigned_to reporter qa_contact
	cc
	dependson blocked ext_bz_bug_map.ext_bz_bug_id rh_rule
X-Bugzilla-Comment: private
X-Loop: bugzilla
X-BeenThere: bugzilla
X-Bugzilla-Reporter: jdillama
Message-ID: <bug-1253327-371801.com>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Bugzilla-URL: https://bugzilla.redhat.com/
MIME-Version: 1.0
X-Scanned-By: MIMEDefang 2.68 on 10.5.11.24
X-Scanned-By: MIMEDefang 2.68 on 10.5.11.22
X-BeenThere: ceph-eng-bugs
X-Mailman-Version: 2.1.12
List-Id: <ceph-eng-bugs.redhat.com>
List-Unsubscribe: <https://post-office.corp.redhat.com/mailman/options/ceph-eng-bugs>,
	<ceph-eng-bugs-request?subject=unsubscribe>
List-Archive: <http://post-office.corp.redhat.com/archives/ceph-eng-bugs>
List-Post: <ceph-eng-bugs>
List-Help: <ceph-eng-bugs-request?subject=help>
List-Subscribe: <https://post-office.corp.redhat.com/mailman/listinfo/ceph-eng-bugs>,
	<ceph-eng-bugs-request?subject=subscribe>
Sender: ceph-eng-bugs-bounces
Errors-To: ceph-eng-bugs-bounces

https://bugzilla.redhat.com/show_bug.cgi?id=1253327

            Bug ID: 1253327
           Summary: [1.3.0 Ubuntu] Installing OS on a rbd Image is getting
                    crashed with Mandatory Exclusive Lock enabled in
                    Ubuntu Cluster
           Product: Red Hat Ceph Storage
           Version: 1.3.0
         Component: Ceph
     Sub Component: RBD
          Severity: high
          Priority: high
          Assignee: jdurgin
          Reporter: jdillama
        QA Contact: ceph-qe-bugs
                CC: ceph-eng-bugs, jdillama,
                    kdreyer, nlevine,
                    tganguly, tmuthami
        Depends On: 1249004
            Blocks: 1252084
   External Bug ID: Ceph Project Bug Tracker 12596,Ceph Project Bug
                    Tracker 11369
   External Bug ID: Ceph Project Bug Tracker 12596 Ceph Project Bug
                    Tracker 11369
  Rule Engine Rule: Ceph: set proposed release flag for new bugs



+++ This bug was initially created as a clone of Bug #1249004 +++

Description of problem:
While trying to install an OS on a VM having and exported rbd image, the OS
installation is getting Crashed. 


Cluster is based out of Ubuntu 3.13.0.59
Ceph version - 0.94.1.2

dpkg -l |grep librbd
ii  librbd1 0.94.1.4-1trusty                      amd64        RADOS block
device client library

Client is based out of RHEL 7.1 (QEMU Client )
I just enabled tools in Client, i.e.rhel-7-server-ceph-1.3-tools-test
And the version is

rpm -qf /usr/lib64/librbd.so.1
librbd1-0.94.1-15.el7cp.x86_64

Then i exported the rbd image to the Client VM (And the Image was visible as a
VirtIO disk to the VM )
But when i am trying to install OS on top of it, i am getting a crash on VM
installation (PFA)

Note:
[Client was separately installed as ceph-deploy install from Ubuntu will fail 
because my Client is RHEL 7.1 ]

I hope Ceph allows this cross platform configuration ( Ubuntu Cluster on
0.94.1.2 version + RHEL Client having 0.94.1-15 ) 

ubuntu@magna103:~$ sudo rbd -p Tanay-RBD --image testingClone_new8 info
rbd image 'testingClone_new8':
        size 102400 MB in 25600 objects
        order 22 (4096 kB objects)
        block_name_prefix: rbd_data.adc42eb141f2
        format: 2
        features: layering, exclusive, object map
        flags: 
        parent: Tanay-RBD/testing8@testingsnap8
        overlap: 102400 MB

Version-Release number of selected component (if applicable):
ceph version 0.94.1.4 (944951aae2783417100ff0f1078a20bcdcdb605d)

How reproducible:
100%

Steps to Reproduce:
1. Create a rbd image on your Ubuntu Cluster.

#!/bin/bash
for i in {1..10}
do
        rbd create Tanay-RBD/testing$i --size 10240 --image-format 2
--image-features 13 --order 22
        rbd snap create Tanay-RBD/testing$i@testingsnap$i
        rbd snap protect Tanay-RBD/testing$i@testingsnap$i
        rbd clone Tanay-RBD/testing$i@testingsnap$i
Tanay-RBD/testingClone_new$i --image-features 13
        i=`expr $i+1`
done

2. Map the testingClone_new8 to the VM, i.e attach the Disk to install the OS
on top of it. ( Disks get visible to the VM )

3. When installation starts, it is getting Crashed (PFA)

Actual results:
OS installation should succeed. 

Expected results:
OS installation is getting failed and i can see the the Flags showing as flags:
object map invalid

ubuntu@magna103:~$ sudo rbd -p Tanay-RBD --image testingClone_new8 info
rbd image 'testingClone_new8':
        size 102400 MB in 25600 objects
        order 22 (4096 kB objects)
        block_name_prefix: rbd_data.adc42eb141f2
        format: 2  
        features: layering, exclusive, object map
        flags: object map invalid
        parent: Tanay-RBD/testing8@testingsnap8
        overlap: 102400 MB

Additional info:
The same test runs file if i don't use the --image-features 13 flag.

Clones using this script is working fine, and i am able to install OS on top of
it ( i.e without --image-features 13)
#!/bin/bash
for i in {1..10}
do
        rbd create Tanay-RBD/testing$i --size 10240 --image-format 2 -order 22
        rbd snap create Tanay-RBD/testing$i@testingsnap$i
        rbd snap protect Tanay-RBD/testing$i@testingsnap$i
        rbd clone Tanay-RBD/testing$i@testingsnap$i
Tanay-RBD/testingClone_new$i 
i=`expr $i+1`
done

--- Additional comment from Red Hat Bugzilla Rules Engine on 2015-07-31
06:06:21 EDT ---

This bug is automatically being proposed for Ceph 1.3.z by setting the
"ceph-1.3.z" release flag to "?"

--- Additional comment from Jason Dillaman on 2015-07-31 08:13:09 EDT ---

Can you provide the core backtrace or the RBD debug logs from the crash?

--- Additional comment from Tanay Ganguly on 2015-08-03 06:50:21 EDT ---

Actually the VM is not crashing but the Installation stops when it tries to
create partition on Disk ( RBD Disk )

I tried to debug with gdb and log files
This is the only message what i got from vm.log file

red_dispatcher_set_cursor_peer:
inputs_connect: inputs channel client create
block I/O error in device 'drive-virtio-disk0': Invalid argument (22)
block I/O error in device 'drive-virtio-disk0': Invalid argument (22)

There was no Error on Monitor as well.
Neither i could find any bt on gdb.

Its very much reproducible when i select OS disk size > or equals 70 GB.
I have tried to install OS on 10-60 GB, and it worked perfectly.

Can you please log into my test-bed and check.

IP details:

10.8.128.103: Admin & Mon Node

Other OSD Nodes are:
sudo ceph osd tree
ID WEIGHT   TYPE NAME         UP/DOWN REWEIGHT PRIMARY-AFFINITY 
-1 13.49991 root default                                        
-2  2.69998     host magna094                                   
 0  0.89999         osd.0          up  1.00000          1.00000 
 1  0.89999         osd.1          up  1.00000          1.00000 
 2  0.89999         osd.2          up  1.00000          1.00000 
-3  2.69998     host magna112                                   
 3  0.89999         osd.3          up  1.00000          1.00000 
 4  0.89999         osd.4          up  1.00000          1.00000 
 5  0.89999         osd.5          up  1.00000          1.00000 
-4  2.69998     host magna114                                   
 6  0.89999         osd.6          up  1.00000          1.00000 
 7  0.89999         osd.7          up  1.00000          1.00000 
 8  0.89999         osd.8          up  1.00000          1.00000 
-5  2.69998     host magna106                                   
 9  0.89999         osd.9          up  1.00000          1.00000 
10  0.89999         osd.10         up  1.00000          1.00000 
11  0.89999         osd.11         up  1.00000          1.00000 
-6  2.69998     host magna034                                   
12  0.89999         osd.12         up  1.00000          1.00000 
13  0.89999         osd.13         up  1.00000          1.00000 
14  0.89999         osd.14         up  1.00000          1.00000 

QEMU Client IP details are:
10.12.27.16 - QEMU1
10.12.27.17 - QEMU2

Credential: Cluster: ubuntu/redhat
            Client : root/redhat

You can connect to QEMU1 and launch both the VMs (Host16 and New). Currently
both are stuck at installation screen.

--- Additional comment from Tanay Ganguly on 2015-08-03 06:57:30 EDT ---

This is also reproducible with Rhel Clients with Ceph 1.3.0

rpm -qf /usr/lib64/librbd.so.1
librbd1-0.94.1-15.el7cp.x86_64

Getting the same Error message while installing the OS.

rpm -qa |grep qemu
qemu-img-1.5.3-86.el7.x86_64
libvirt-daemon-driver-qemu-1.2.8-16.el7.x86_64
ipxe-roms-qemu-20130517-6.gitc4bce43.el7.noarch
qemu-kvm-common-1.5.3-86.el7.x86_64
qemu-kvm-1.5.3-86.el7.x86_64

--- Additional comment from Josh Durgin on 2015-08-03 14:14:00 EDT ---

Can you enable ceph logging by putting this in ceph.conf on the client node
before starting the vm:

log file = /var/log/libvirt/qemu/rbd-$pid.$cctid.log
debug ms = 1
debug rbd = 20
debug objectcacher = 20

If files don't appear in /var/log/libvirt/qemu when the vm starts using rbd,
there may be some permissions issues or DAC you need to disable in libvirt.
With the logging in place, reproducing the problem should show us how the I/O
error is getting to qemu, and hopefully why as well.

I also see rbd_clone_copy_on_read = true for both clients. Does this still
reproduce with that disabled?

You may have uncovered another bug too. Looking at the qemu logs, I noticed a
backtrace in a couple of the existing VM:

qemu: terminating on signal 15 from pid 28804
osdc/ObjectCacher.cc: In function 'ObjectCacher::~ObjectCacher()' thread
7fe8a3279a40 time 2015-07-31 04:45:16.194110
osdc/ObjectCacher.cc: 551: FAILED assert(i->empty())
 ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff)
 1: (()+0x160a15) [0x7fe89e22ba15]
 2: (()+0x3a3dd1) [0x7fe89e46edd1]
 3: (()+0x57346) [0x7fe89e122346]
 4: (()+0x7d859) [0x7fe89e148859]
 5: (rbd_close()+0x9) [0x7fe89e107719]
 6: (()+0xa709d) [0x7fe8a335909d]
 7: (()+0x8ace9) [0x7fe8a333cce9]
 8: (()+0x8aba8) [0x7fe8a333cba8]
 9: (()+0x8ad73) [0x7fe8a333cd73]
 10: (()+0x8ae18) [0x7fe8a333ce18]
 11: (()+0x75d0f) [0x7fe8a3327d0f]
 12: (__libc_start_main()+0xf5) [0x7fe8991a4af5]
 13: (()+0x7714d) [0x7fe8a332914d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to
interpret this.
terminate called after throwing an instance of 'ceph::FailedAssertion'

Can you check for this to see how you produced it? It looks like it may just be
from shutting down the VM after doing a bunch of writes, but we shouldn't crash
there. Can you open a new bug for that?

--- Additional comment from Jason Dillaman on 2015-08-03 15:06:11 EDT ---

That ObjectCacher crash sounds like this commit upstream:
https://github.com/majianpeng/ceph/commit/35def5c81f7fc83d55d18320e4860c6a63d4c7f5

If an IO error blocks a cache writeback, on shutdown the cache will throw that
assertion.

--- Additional comment from Tanay Ganguly on 2015-08-04 06:32:14 EDT ---

Josh,

Please find the attached rbd.log (rbd.log.gz).
Yes i am able to reproduce it after disabling 'rbd_clone_copy_on_read'

Copying the snippet

2015-08-04 05:07:03.692901 7f717f7fe700 -1 librbd::ObjectMap: failed to update
object map: (22) Invalid argument
2015-08-04 05:07:03.692938 7f717f7fe700 20 librbd::AioRequest: write
0x7f7170004ee0 rbd_data.2132e79e2a9e3.0000000000004587 4182016~12288
should_complete: r = -22
2015-08-04 05:07:03.692954 7f717f7fe700 20 librbd::AioRequest: WRITE_PRE
2015-08-04 05:07:03.692959 7f717f7fe700 20 librbd::AioRequest: complete
0x7f7170004ee0
2015-08-04 05:07:03.692965 7f717f7fe700 20 librbdwriteback: C_OrderedWrite
completing 0x7f717006e360
2015-08-04 05:07:03.692975 7f717f7fe700 20 librbdwriteback: complete_writes()
oid rbd_data.2132e79e2a9e3.0000000000004587
2015-08-04 05:07:03.692986 7f717f7fe700 20 librbdwriteback: complete_writes()
completing 0x7f717006e360
2015-08-04 05:07:03.693014 7f717f7fe700 20 librbd::AioCompletion:
AioCompletion::complete_request() 0x7f71a632bf40 complete_cb=0x7f71a38e6290
pending 1
2015-08-04 05:07:03.693026 7f717f7fe700 20 librbd::AioCompletion:
AioCompletion::finalize() 0x7f71a632bf40 rval -22 read_buf 0 read_bl 0
2015-08-04 05:07:03.693039 7f717f7fe700 20 librbd::AsyncOperation:
0x7f71a632c070 finish_op
2015-08-04 05:07:03.693080 7f717f7fe700 20 librbdwriteback: C_OrderedWrite
finished 0x7f717006e360
2015-08-04 05:07:03.693211 7f71a3806a40 20 librbd: close_image 0x7f71a63650c0
2015-08-04 05:07:03.693244 7f71a3806a40 10 librbd::ImageCtx: canceling async
requests: count=0
2015-08-04 05:07:03.693725 7f71a3806a40 10 librbd::ImageCtx:
prune_parent_extents image overlap 74658611200, object overlap 4194304 from
image extents [74654416896,4194304]
2015-08-04 05:07:03.693776 7f71a3806a40 20 librbdwriteback: write will wait for
result 0x7f71a636ff10
2015-08-04 05:07:03.693791 7f71a3806a40 20 librbd::AioRequest: send
0x7f71a632c660 rbd_data.2132e79e2a9e3.0000000000004587 4182016~12288
2015-08-04 05:07:03.693804 7f71a3806a40 20 librbd::AioRequest: send_pre
0x7f71a632c660 rbd_data.2132e79e2a9e3.0000000000004587 4182016~12288
2015-08-04 05:07:03.693817 7f71a3806a40 20 librbd::ObjectMap: 0x7f71a63650c0
aio_update: start=17799, end=17800, new_state=1
2015-08-04 05:07:03.693832 7f71a3806a40 20 librbd::ObjectMap: 0x7f71a63650c0
updating on-disk object map: [17799,17800) = ->1
2015-08-04 05:07:03.696669 7f717f7fe700 20 librbd::ObjectMap: 0x7f71a63650c0
should_complete: r=-22
2015-08-04 05:07:03.696712 7f717f7fe700 -1 librbd::ObjectMap: failed to update
object map: (22) Invalid argument

--- Additional comment from Tanay Ganguly on 2015-08-04 06:37:35 EDT ---



--- Additional comment from Tanay Ganguly on 2015-08-04 06:38:53 EDT ---

Somehow the (rbd.log.gz) didnt got upload, i have uploaded the original in
text.

--- Additional comment from Tanay Ganguly on 2015-08-04 07:46:49 EDT ---

I ran the same test with mandatory lock enabled i.e. --image-features 5

sudo rbd -p Tanay-RBD --image mandatorye3 info
rbd image 'mandatorye3':
        size 81200 MB in 20300 objects
        order 22 (4096 kB objects)
        block_name_prefix: rbd_data.285f83d1b58ba
        format: 2
        features: layering, exclusive
        flags: 
        parent: Tanay-RBD/mandatory3@mandatorysnap3
        overlap: 81200 MB

And i am unable to reproduce the same Issue.

--- Additional comment from Tanay Ganguly on 2015-08-04 08:09:59 EDT ---

(In reply to Josh Durgin from comment #5)
> Can you enable ceph logging by putting this in ceph.conf on the client node
> before starting the vm:
> 
> log file = /var/log/libvirt/qemu/rbd-$pid.$cctid.log
> debug ms = 1
> debug rbd = 20
> debug objectcacher = 20
> 
> If files don't appear in /var/log/libvirt/qemu when the vm starts using rbd,
> there may be some permissions issues or DAC you need to disable in libvirt.
> With the logging in place, reproducing the problem should show us how the
> I/O error is getting to qemu, and hopefully why as well.
> 
> I also see rbd_clone_copy_on_read = true for both clients. Does this still
> reproduce with that disabled?
> 
> You may have uncovered another bug too. Looking at the qemu logs, I noticed
> a backtrace in a couple of the existing VM:
> 
> qemu: terminating on signal 15 from pid 28804
> osdc/ObjectCacher.cc: In function 'ObjectCacher::~ObjectCacher()' thread
> 7fe8a3279a40 time 2015-07-31 04:45:16.194110
> osdc/ObjectCacher.cc: 551: FAILED assert(i->empty())
>  ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff)
>  1: (()+0x160a15) [0x7fe89e22ba15]
>  2: (()+0x3a3dd1) [0x7fe89e46edd1]
>  3: (()+0x57346) [0x7fe89e122346]
>  4: (()+0x7d859) [0x7fe89e148859]
>  5: (rbd_close()+0x9) [0x7fe89e107719]
>  6: (()+0xa709d) [0x7fe8a335909d]
>  7: (()+0x8ace9) [0x7fe8a333cce9]
>  8: (()+0x8aba8) [0x7fe8a333cba8]
>  9: (()+0x8ad73) [0x7fe8a333cd73]
>  10: (()+0x8ae18) [0x7fe8a333ce18]
>  11: (()+0x75d0f) [0x7fe8a3327d0f]
>  12: (__libc_start_main()+0xf5) [0x7fe8991a4af5]
>  13: (()+0x7714d) [0x7fe8a332914d]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to
> interpret this.
> terminate called after throwing an instance of 'ceph::FailedAssertion'
> 
> Can you check for this to see how you produced it? It looks like it may just
> be from shutting down the VM after doing a bunch of writes, but we shouldn't
> crash there. Can you open a new bug for that?

Hi Josh,

Yes i am able to reproduce the Assertion Failed Bug.
https://bugzilla.redhat.com/show_bug.cgi?id=1250042

After i hit this Bug (1249004), when i force_off the VM. I am seeing the Crash
Dump in vm.log file

--- Additional comment from Jason Dillaman on 2015-08-04 08:57:32 EDT ---

Can you also attach the OSD logs as well?  The OSDs are sending an -EINVAL
(-22) error code back to the librbd client.  The reason for this error
condition should hopefully be included in the OSD logs w/o the need to increase
OSD log levels.

I created an upstream tracker for how librbd incorrectly handles this
unexpected error from the OSD, but we will also need to fix the original source
of the issue.

--- Additional comment from Tanay Ganguly on 2015-08-04 12:29:41 EDT ---

PFA attached 2 OSD logs.

If you need some other logs from different OSD's, can you please log into the
OSD hosts mentioned above.

--- Additional comment from Tanay Ganguly on 2015-08-04 12:34:06 EDT ---



--- Additional comment from Tanay Ganguly on 2015-08-04 12:35:09 EDT ---



--- Additional comment from Jason Dillaman on 2015-08-04 14:44:21 EDT ---

OSD error message: 

/var/log/ceph/ceph-osd.8.log.1.gz:2015-08-04 05:06:05.151676 7f837a876700  0
<cls> cls/rbd/cls_rbd.cc:2109: failed to decode data chunk [4096]:
buffer::end_of_buffer

--- Additional comment from Jason Dillaman on 2015-08-10 10:13:38 EDT ---

Upstream issue 12596 is already fixed in v0.94.2.  This fix will prevent the IO
errors but will still result in the object map being flagged as invalid.  The
fix for the corrupt object map is included in upstream issue 12611 and will
need to be backported to downstream.

Since the crash will be solved via v0.94.2, it might make sense to flag this
issue as fixed in RHCS v1.3.1 and open a new ticket for the object map becoming
corrupted (issue 12611).

https://code.engineering.redhat.com/gerrit/gitweb?p=ceph.git;a=commit;h=094405171bdead0ce09055d4acc6445274992a01

--- Additional comment from Jason Dillaman on 2015-08-10 12:53:30 EDT ---

Cloned BZ to track the root cause for the object map being invalidated.  This
issue will be used to track the IO errors that result after failing to update
the object map.

--- Additional comment from Tanay Ganguly on 2015-08-11 03:20:18 EDT ---

What is the resolution for this bug in 1.3.0 Ubuntu? Will it be fixed or
Release noted? 

Also, as mentioned earlier in comment 4, this Bug is also reproducible with
RHEL 1.3.0 Build, which is already in customer place. How are we planning to
communicate it to the existing 1.3.0 RHEL customers?

--- Additional comment from Ken Dreyer (Red Hat) on 2015-08-11 17:28:41 EDT ---

(In reply to Tanay Ganguly from comment #19)
> What is the resolution for this bug in 1.3.0 Ubuntu? Will it be fixed or
> Release noted? 

Since we're fixing this in the RHEL 1.3.1 release, my plan was to fix it in the
Ubuntu 1.3.1 release too.

> Also, as mentioned earlier in comment 4, this Bug is also reproducible with
> RHEL 1.3.0 Build, which is already in customer place. How are we planning to
> communicate it to the existing 1.3.0 RHEL customers?

I don't know the answer to this. Docs team?

--- Additional comment from Ken Dreyer (Red Hat) on 2015-08-12 17:27:42 EDT ---

Can you please set qa_ack so we can attach this bz to
https://errata.devel.redhat.com/advisory/21183?

--- Additional comment from Ken Dreyer (Red Hat) on 2015-08-12 19:57:47 EDT ---

In case we're thinking we're going to fix this asynchronously, I've reverted
all the 1.3.1 work in dist-git on the private-kdreyer-ceph-1.3-bz1249004
branch.

I'd still need Josh or Jason to push all the relevant patches to
rhcs-0.94.1-ubuntu in GitHub and ceph-1.3-rhel-patches-wip-1.3.0 in Gerrit.

--- Additional comment from Jason Dillaman on 2015-08-13 08:24:42 EDT ---

Retargeting for 1.3.0 async release.

Tentatively setting Tamil as the QE POC here. Please set qa_ack to "+" and
re-assign if needed.

--- Additional comment from Jason Dillaman on 2015-08-13 09:34:21 EDT ---

Moving back to ASSIGNED for inclusion in 1.3.0.

--- Comment #1 from Red Hat Bugzilla Rules Engine <rule-engine> ---
This bug is automatically being proposed for Ceph 1.3.z by setting the
"ceph-1.3.z" release flag to "?"


Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1249004
[Bug 1249004] Installing OS on a rbd Image is getting crashed with
Mandatory Exclusive Lock enabled in Ubuntu Cluster
https://bugzilla.redhat.com/show_bug.cgi?id=1252084
[Bug 1252084] Writes beyond 64GB result in an invalidated RBD object map
-- 
You are receiving this mail because:
You are on the CC list for the bug.
Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=raz4fmGhqB&a=cc_unsubscribe

Comment 1 Jason McDonald 2015-08-13 23:35:47 UTC


*** This bug has been marked as a duplicate of bug 1145117 ***

Note You need to log in before you can comment on or make changes to this bug.