Bug 1473162 - [Gluster-block]: VM core generated, with gluster-block (failed) create
Summary: [Gluster-block]: VM core generated, with gluster-block (failed) create
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: tcmu-runner
Version: rhgs-3.3
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: RHGS 3.3.0
Assignee: Prasanna Kumar Kalever
QA Contact: Sweta Anandpara
URL:
Whiteboard:
Depends On: 1477959 1488610
Blocks: 1417151 1474188 1490350
TreeView+ depends on / blocked
 
Reported: 2017-07-20 07:14 UTC by Sweta Anandpara
Modified: 2017-09-21 04:20 UTC (History)
6 users (show)

Fixed In Version: tcmu-runner-1.2.0-14.el7rhgs
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1477959 1490350 (view as bug list)
Environment:
Last Closed: 2017-09-21 04:20:54 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2017:2773 0 normal SHIPPED_LIVE new packages: gluster-block 2017-09-21 08:16:22 UTC

Description Sweta Anandpara 2017-07-20 07:14:25 UTC
Description of problem:
======================

Scenario: 

'Gluster-block create' command is executed. It fails for some reason on one of the nodes, succeeds on other. Such a partial success is actually a failed create. Hence, the code goes ahead with internal block deletion - undoing everything that it did sometime back. That is when VM crashes.
Meta file of the block says 'CLEANUPINPROGRESS' but does not go further.

Backtraces pasted below. Gluster-block logs and core files will be copied in http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/<bugnumber>/

Version-Release number of selected component (if applicable):
=============================================================
gluster-block-0.2.1-6 and glusterfs-3.8.4-33


How reproducible:
=================
Seen it twice


Additional info:
=================

BUG: unable to handle kernel paging request at 00000000db8d38b8
IP: [<ffffffffc0573080>] uio_poll+0x20/0x70 [uio]
PGD b64d0067 PUD 0 
Oops: 0000 [#1] SMP 
Modules linked in: target_core_pscsi target_core_file target_core_iblock iscsi_target_mod target_core_user target_core_mod crc_t10dif crct10dif_generic uio crct10dif_common fuse nf_conntrack_netbios_ns nf_conntrack_broadcast ip6t_rpfilter ipt_REJECT nf_reject_ipv4 ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio pcspkr joydev sg ppdev i2c_piix4 virtio_balloon parport_pc parport nfsd auth_rpcgss nfs_acl lockd dm_multipath
 grace sunrpc ip_tables xfs libcrc32c sr_mod cdrom ata_generic pata_acpi cirrus virtio_blk drm_kms_helper syscopyarea sysfillrect serio_raw sysimgblt fb_sys_fops ttm ata_piix drm libata 8139too virtio_pci virtio_ring virtio floppy 8139cp i2c_core mii dm_mirror dm_region_hash dm_log dm_mod 8021q garp mrp bridge stp llc bonding
CPU: 1 PID: 19839 Comm: tcmu-runner Not tainted 3.10.0-693.el7.x86_64 #1
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
task: ffff88005339af70 ti: ffff880056338000 task.ti: ffff880056338000
RIP: 0010:[<ffffffffc0573080>]  [<ffffffffc0573080>] uio_poll+0x20/0x70 [uio]
RSP: 0018:ffff88005633bb08  EFLAGS: 00010202
RAX: 00000000fffffffb RBX: ffff880109a70780 RCX: 00000000db8d36e8
RDX: ffffffffc0573060 RSI: ffff88005633bc90 RDI: ffff880053326600
RBP: ffff88005633bb18 R08: 0000000000000001 R09: ffff88011fc16d40
R10: 0000000000000000 R11: 0000000000000000 R12: ffff88011883c830
R13: 0000000000000000 R14: 0000000000000000 R15: ffff88005633bb9c
FS:  00007f91e5679700(0000) GS:ffff88011fd00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000000db8d38b8 CR3: 00000000cebf0000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Stack:
 ffff88005633bba4 0000000000000000 ffff88005633bf38 ffffffff81217297
 00007f91e5678da0 ffff88005633bfd8 ffff88005339af70 0000000000000000
 0000000000000000 0000000000000000 0000000000000000 0000000000000000
Call Trace:
 [<ffffffff81217297>] do_sys_poll+0x327/0x580
 [<ffffffff81215dd0>] ? poll_select_copy_remaining+0x150/0x150
 [<ffffffff8133d9dd>] ? list_del+0xd/0x30
 [<ffffffff810b1671>] ? remove_wait_queue+0x31/0x40
 [<ffffffffc057394d>] ? uio_read+0x11d/0x180 [uio]
 [<ffffffff810c4810>] ? wake_up_state+0x20/0x20
 [<ffffffff812175f4>] SyS_poll+0x74/0x110
 [<ffffffff8111f5c6>] ? __audit_syscall_exit+0x1e6/0x280
 [<ffffffff816b4fc9>] system_call_fastpath+0x16/0x1b
Code: ff ff c3 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 b8 fb ff ff ff 48 89 e5 41 54 53 4c 8b a7 a8 00 00 00 49 8b 1c 24 48 8b 4b 40 <48> 83 b9 d0 01 00 00 00 75 06 5b 41 5c 5d c3 90 48 85 f6 74 19 
RIP  [<ffffffffc0573080>] uio_poll+0x20/0x70 [uio]
 RSP <ffff88005633bb08>
[root@dhcp47-115 ~]#



BUG: unable to handle kernel NULL pointer dereference at 00000000000001d0
IP: [<ffffffffc0566080>] uio_poll+0x20/0x70 [uio]
PGD b5f66067 PUD 3660f067 PMD 0 
Oops: 0000 [#1] SMP 
Modules linked in: target_core_pscsi target_core_file target_core_iblock iscsi_target_mod target_core_user target_core_mod crc_t10dif crct10dif_generic uio crct10dif_common fuse nf_conntrack_netbios_ns nf_conntrack_broadcast ip6t_rpfilter ipt_REJECT nf_reject_ipv4 ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio pcspkr sg ppdev joydev i2c_piix4 parport_pc virtio_balloon parport nfsd auth_rpcgss nfs_acl dm_multipath
 lockd grace sunrpc ip_tables xfs libcrc32c sr_mod cdrom ata_generic pata_acpi virtio_blk cirrus drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ata_piix drm serio_raw libata 8139too virtio_pci virtio_ring virtio 8139cp mii i2c_core floppy 8021q garp mrp bridge stp llc dm_mirror dm_region_hash dm_log bonding dm_mod
CPU: 0 PID: 18943 Comm: tcmu-runner Not tainted 3.10.0-693.el7.x86_64 #1
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
task: ffff8800657d0000 ti: ffff880060f80000 task.ti: ffff880060f80000
RIP: 0010:[<ffffffffc0566080>]  [<ffffffffc0566080>] uio_poll+0x20/0x70 [uio]
RSP: 0018:ffff880060f83b08  EFLAGS: 00010202
RAX: 00000000fffffffb RBX: ffff8800b610a9c0 RCX: 0000000000000000
RDX: ffffffffc0566060 RSI: ffff880060f83c90 RDI: ffff88008963e700
RBP: ffff880060f83b18 R08: 0000000000000001 R09: ffff88011fd16d40
R10: 0000000000000000 R11: 0000000000000000 R12: ffff880060c17250
R13: 0000000000000000 R14: 0000000000000000 R15: ffff880060f83b9c
FS:  00007fc4086ac700(0000) GS:ffff88011fc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000000000001d0 CR3: 00000000b5f7a000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Stack:
 ffff880060f83ba4 0000000000000000 ffff880060f83f38 ffffffff81217297
 00007fc4086abda0 ffff880060f83fd8 ffff8800657d0000 0000000000000000
 0000000000000000 0000000000000000 0000000000000000 0000000000000000
Call Trace:
 [<ffffffff81217297>] do_sys_poll+0x327/0x580
 [<ffffffff81215dd0>] ? poll_select_copy_remaining+0x150/0x150
 [<ffffffff8133d9dd>] ? list_del+0xd/0x30
 [<ffffffff810b1671>] ? remove_wait_queue+0x31/0x40
 [<ffffffffc056694d>] ? uio_read+0x11d/0x180 [uio]
 [<ffffffff810c4810>] ? wake_up_state+0x20/0x20
 [<ffffffff812175f4>] SyS_poll+0x74/0x110
 [<ffffffff8111f5c6>] ? __audit_syscall_exit+0x1e6/0x280
 [<ffffffff816b4fc9>] system_call_fastpath+0x16/0x1b
Code: ff ff c3 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 b8 fb ff ff ff 48 89 e5 41 54 53 4c 8b a7 a8 00 00 00 49 8b 1c 24 48 8b 4b 40 <48> 83 b9 d0 01 00 00 00 75 06 5b 41 5c 5d c3 90 48 85 f6 74 19 
RIP  [<ffffffffc0566080>] uio_poll+0x20/0x70 [uio]
 RSP <ffff880060f83b08>
[root@dhcp47-117 abrt]#




[root@dhcp47-115 ~]# rpm -qa | grep gluster
glusterfs-cli-3.8.4-33.el7rhgs.x86_64
glusterfs-rdma-3.8.4-33.el7rhgs.x86_64
python-gluster-3.8.4-33.el7rhgs.noarch
vdsm-gluster-4.17.33-1.1.el7rhgs.noarch
glusterfs-client-xlators-3.8.4-33.el7rhgs.x86_64
glusterfs-fuse-3.8.4-33.el7rhgs.x86_64
gluster-nagios-common-0.2.4-1.el7rhgs.noarch
glusterfs-events-3.8.4-33.el7rhgs.x86_64
gluster-block-0.2.1-6.el7rhgs.x86_64
libvirt-daemon-driver-storage-gluster-3.2.0-14.el7.x86_64
gluster-nagios-addons-0.2.9-1.el7rhgs.x86_64
samba-vfs-glusterfs-4.6.3-3.el7rhgs.x86_64
glusterfs-3.8.4-33.el7rhgs.x86_64
glusterfs-debuginfo-3.8.4-26.el7rhgs.x86_64
glusterfs-api-3.8.4-33.el7rhgs.x86_64
glusterfs-geo-replication-3.8.4-33.el7rhgs.x86_64
glusterfs-libs-3.8.4-33.el7rhgs.x86_64
glusterfs-server-3.8.4-33.el7rhgs.x86_64
[root@dhcp47-115 ~]# 
[root@dhcp47-115 ~]# gluster peer status
Number of Peers: 5

Hostname: dhcp47-121.lab.eng.blr.redhat.com
Uuid: 49610061-1788-4cbc-9205-0e59fe91d842
State: Peer in Cluster (Connected)
Other names:
10.70.47.121

Hostname: dhcp47-113.lab.eng.blr.redhat.com
Uuid: a0557927-4e5e-4ff7-8dce-94873f867707
State: Peer in Cluster (Connected)

Hostname: dhcp47-114.lab.eng.blr.redhat.com
Uuid: c0dac197-5a4d-4db7-b709-dbf8b8eb0896
State: Peer in Cluster (Connected)
Other names:
10.70.47.114

Hostname: dhcp47-116.lab.eng.blr.redhat.com
Uuid: a96e0244-b5ce-4518-895c-8eb453c71ded
State: Peer in Cluster (Disconnected)
Other names:
10.70.47.116

Hostname: dhcp47-117.lab.eng.blr.redhat.com
Uuid: 17eb3cef-17e7-4249-954b-fc19ec608304
State: Peer in Cluster (Connected)
Other names:
10.70.47.117
[root@dhcp47-115 ~]# 
[root@dhcp47-115 ~]# 
[root@dhcp47-115 ~]# gluster v info nash
 
Volume Name: nash
Type: Replicate
Volume ID: f1ea3d3e-c536-4f36-b61f-cb9761b8a0a6
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: 10.70.47.115:/bricks/brick4/nash0
Brick2: 10.70.47.116:/bricks/brick4/nash1
Brick3: 10.70.47.117:/bricks/brick4/nash2
Options Reconfigured:
nfs.disable: on
transport.address-family: inet
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
performance.open-behind: off
performance.readdir-ahead: off
network.remote-dio: enable
cluster.eager-lock: enable
cluster.quorum-type: auto
cluster.data-self-heal-algorithm: full
cluster.locking-scheme: granular
cluster.shd-max-threads: 8
cluster.shd-wait-qlength: 10000
features.shard: on
user.cifs: off
server.allow-insecure: on
cluster.brick-multiplex: disable
cluster.enable-shared-storage: enable
[root@dhcp47-115 ~]# 
[root@dhcp47-115 ~]# 
[root@dhcp47-115 ~]# gluster-block list nash
nb21
nb22
nb23
nb24
nb25
nb26
nb27
nb28
nb29
nb30
nb31
nb32
nb33
nb34
nb35
nb36
nb37
nb38
nb39
nb40
nb41
nb42
nb43
nb44
nb45
nb46
nb47
nb48
nb49
nb50
nb51
nb52
nb54
nb55
[root@dhcp47-115 ~]#

Comment 3 Sweta Anandpara 2017-07-20 08:35:37 UTC
Seen in twice on two different peer nodes. But I don't have straight-forward steps to reproduce. 

Would like this bug to be discussed in the wider forum, as I am not completely sure of the likelihood and the repercussions of this happening in CNS environment.

Hence, setting blocker to '?'

Comment 7 Sweta Anandpara 2017-07-31 08:55:38 UTC
I hit another VM crash today when the block-create command that I gave failed. It was a not a negative test that I was doing. I was expecting the block to get created successfully. The bug title looks the same, the backtrace is different though. Please do advise if this is different.

BUG: unable to handle kernel NULL pointer dereference at 00000000000001d0
IP: [<ffffffffc0623080>] uio_poll+0x20/0x70 [uio]
PGD 7d462067 PUD ce7b0067 PMD 0 
Oops: 0000 [#1] SMP 
Modules linked in: target_core_pscsi target_core_file target_core_iblock iscsi_target_mod target_core_user target_core_mod crc_t10dif crct10dif_generic uio crct10dif_common sctp_diag sctp dccp_diag dccp tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag binfmt_misc fuse nf_conntrack_netbios_ns nf_conntrack_broadcast ip6t_rpfilter ipt_REJECT nf_reject_ipv4 ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter dm_thin_pool dm_persistent_data dm_bio_prison
 dm_bufio ppdev pcspkr joydev sg virtio_balloon parport_pc i2c_piix4 parport nfsd auth_rpcgss nfs_acl lockd grace sunrpc dm_multipath ip_tables xfs libcrc32c sr_mod cdrom ata_generic pata_acpi cirrus drm_kms_helper virtio_blk syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm serio_raw 8139too virtio_pci virtio_ring virtio ata_piix libata 8139cp mii i2c_core floppy dm_mirror dm_region_hash dm_log dm_mod 8021q garp mrp bridge stp llc bonding
CPU: 0 PID: 14320 Comm: tcmu-runner Not tainted 3.10.0-693.el7.x86_64 #1
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
task: ffff880049cf2f70 ti: ffff880117d94000 task.ti: ffff880117d94000
RIP: 0010:[<ffffffffc0623080>]  [<ffffffffc0623080>] uio_poll+0x20/0x70 [uio]
RSP: 0018:ffff880117d97b08  EFLAGS: 00010202
RAX: 00000000fffffffb RBX: ffff880049c781e0 RCX: 0000000000000000
RDX: ffffffffc0623060 RSI: ffff880117d97c90 RDI: ffff8800c34c8d00
RBP: ffff880117d97b18 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff8800c866f560
R13: 0000000000000000 R14: 0000000000000000 R15: ffff880117d97b9c
FS:  00007fb730e3b700(0000) GS:ffff88011fc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000000000001d0 CR3: 000000009c696000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Stack:
 ffff880117d97ba4 0000000000000000 ffff880117d97f38 ffffffff81217297
 00007fb730e3ada0 ffff880117d97fd8 ffff880049cf2f70 0000000000000000
 0000000000000000 0000000000000000 0000000000000000 0000000000000000
Call Trace:
 [<ffffffff81217297>] do_sys_poll+0x327/0x580
 [<ffffffff810cd794>] ? update_curr+0x104/0x190
 [<ffffffff810c8f18>] ? __enqueue_entity+0x78/0x80
 [<ffffffff810cf90c>] ? enqueue_entity+0x26c/0xb60
 [<ffffffff810ce8d8>] ? check_preempt_wakeup+0x148/0x250
 [<ffffffff810c12d5>] ? check_preempt_curr+0x85/0xa0
 [<ffffffff81215dd0>] ? poll_select_copy_remaining+0x150/0x150
 [<ffffffff810cd794>] ? update_curr+0x104/0x190
 [<ffffffff810ca29e>] ? account_entity_dequeue+0xae/0xd0
 [<ffffffff810cdc7c>] ? dequeue_entity+0x11c/0x5d0
 [<ffffffff81062ede>] ? kvm_clock_read+0x1e/0x20
 [<ffffffff810ce54e>] ? dequeue_task_fair+0x41e/0x660
 [<ffffffff810cb62c>] ? set_next_entity+0x3c/0xe0
 [<ffffffff810cb72f>] ? pick_next_task_fair+0x5f/0x1b0
 [<ffffffff8133d9dd>] ? list_del+0xd/0x30
 [<ffffffff810b1671>] ? remove_wait_queue+0x31/0x40
 [<ffffffffc062394d>] ? uio_read+0x11d/0x180 [uio]
 [<ffffffff810c4810>] ? wake_up_state+0x20/0x20
 [<ffffffff812175f4>] SyS_poll+0x74/0x110
 [<ffffffff8111f5c6>] ? __audit_syscall_exit+0x1e6/0x280
 [<ffffffff816b4fc9>] system_call_fastpath+0x16/0x1b
Code: ff ff c3 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 b8 fb ff ff ff 48 89 e5 41 54 53 4c 8b a7 a8 00 00 00 49 8b 1c 24 48 8b 4b 40 <48> 83 b9 d0 01 00 00 00 75 06 5b 41 5c 5d c3 90 48 85 f6 74 19 
RIP  [<ffffffffc0623080>] uio_poll+0x20/0x70 [uio]
 RSP <ffff880117d97b08>

Comment 8 Sweta Anandpara 2017-07-31 08:56:24 UTC
The above trace is seen with glusterfs-3.8.4-35 and gluster-block-0.2.1-6

Comment 12 krishnaram Karthick 2017-09-19 02:21:16 UTC
corresponding cns bug is verified https://bugzilla.redhat.com/show_bug.cgi?id=1490350#c3).

We are good from the verification of cns perspective.

Comment 13 Sweta Anandpara 2017-09-19 11:52:27 UTC
Tested and verified this on the build tcmu-runner-1.2.0-15 and gluster-block-0.2.1-13.

Executed multiple block creates and deletes. Stopped gluster-blockd service and did node reboots. I do not see the mentioned VM crash in all my attempts.

I did see partially created blocks (on failed creates) for which bz 1490818 has been raised.

Moving this bug to verified in rhgs 3.3.0.

Comment 15 errata-xmlrpc 2017-09-21 04:20:54 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:2773


Note You need to log in before you can comment on or make changes to this bug.