Description of problem: ====================== While doing a round of block create/delete testing in a loop and tcmu-runner restart in the interim, and a few other things (which I don't remember), I happened to see a call trace in dmesg output on 2 of my 6 node cluster. The trace is seen multiple times, back to back, 8 times on one node and 10 times on the other. [ 82.389711] Rounding down aligned max_sectors from 4294967295 to 4294967288 [ 82.408631] tcmu daemon: command reply support 1. [ 82.419220] target_core_register_fabric() trying autoload for iscsi [ 239.882153] tcmu daemon: command reply support 1. [ 240.045338] INFO: task targetctl:4262 blocked for more than 120 seconds. [ 240.045442] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 240.045547] targetctl D ffff8800c6ec1468 0 4262 1 0x00000080 [ 240.045556] ffff880117db3c50 0000000000000086 ffff8800cacf8fd0 ffff880117db3fd8 [ 240.045561] ffff880117db3fd8 ffff880117db3fd8 ffff8800cacf8fd0 ffff8800c6ec1448 [ 240.045565] 7fffffffffffffff ffff8800c6ec1440 ffff8800cacf8fd0 ffff8800c6ec1468 [ 240.045570] Call Trace: [ 240.045587] [<ffffffff816a94c9>] schedule+0x29/0x70 [ 240.045598] [<ffffffff816a6fd9>] schedule_timeout+0x239/0x2c0 [ 240.045608] [<ffffffff815ba55c>] ? netlink_broadcast_filtered+0x14c/0x3e0 [ 240.045613] [<ffffffff816a987d>] wait_for_completion+0xfd/0x140 [ 240.045622] [<ffffffff810c4810>] ? wake_up_state+0x20/0x20 [ 240.045634] [<ffffffffc0619f5a>] tcmu_netlink_event+0x26a/0x3a0 [target_core_user] [ 240.045642] [<ffffffff810b1910>] ? wake_up_atomic_t+0x30/0x30 [ 240.045649] [<ffffffffc061a2c6>] tcmu_configure_device+0x236/0x350 [target_core_user] [ 240.045682] [<ffffffffc05aa5df>] target_configure_device+0x3f/0x3b0 [target_core_mod] [ 240.045695] [<ffffffffc05a4e7c>] target_core_store_dev_enable+0x2c/0x60 [target_core_mod] [ 240.045707] [<ffffffffc05a3244>] target_core_dev_store+0x24/0x40 [target_core_mod] [ 240.045715] [<ffffffff81287f44>] configfs_write_file+0xc4/0x130 [ 240.045722] [<ffffffff81200d2d>] vfs_write+0xbd/0x1e0 [ 240.045726] [<ffffffff81201b3f>] SyS_write+0x7f/0xe0 [ 240.045732] [<ffffffff816b4fc9>] system_call_fastpath+0x16/0x1b Version-Release number of selected component (if applicable): ============================================================ glusterfs-3.8.4-33 and gluster-block-0.2.1-6 How reproducible: =================== I don't have the exact steps to reproduce this. Will keep a watch on further tests that I am doing, in case I hit it again. Additional info: =============== [root@dhcp47-115 ~]# gluster peer status Number of Peers: 5 Hostname: dhcp47-121.lab.eng.blr.redhat.com Uuid: 49610061-1788-4cbc-9205-0e59fe91d842 State: Peer in Cluster (Connected) Other names: 10.70.47.121 Hostname: dhcp47-113.lab.eng.blr.redhat.com Uuid: a0557927-4e5e-4ff7-8dce-94873f867707 State: Peer in Cluster (Connected) Hostname: dhcp47-114.lab.eng.blr.redhat.com Uuid: c0dac197-5a4d-4db7-b709-dbf8b8eb0896 State: Peer in Cluster (Connected) Other names: 10.70.47.114 Hostname: dhcp47-116.lab.eng.blr.redhat.com Uuid: a96e0244-b5ce-4518-895c-8eb453c71ded State: Peer in Cluster (Connected) Other names: 10.70.47.116 Hostname: dhcp47-117.lab.eng.blr.redhat.com Uuid: 17eb3cef-17e7-4249-954b-fc19ec608304 State: Peer in Cluster (Connected) Other names: 10.70.47.117 [root@dhcp47-115 ~]# [root@dhcp47-115 ~]# rpm -qa | grep gluster glusterfs-3.8.4-35.el7rhgs.x86_64 glusterfs-api-3.8.4-35.el7rhgs.x86_64 glusterfs-server-3.8.4-35.el7rhgs.x86_64 glusterfs-rdma-3.8.4-35.el7rhgs.x86_64 gluster-block-0.2.1-6.el7rhgs.x86_64 samba-vfs-glusterfs-4.6.3-5.el7rhgs.x86_64 glusterfs-fuse-3.8.4-35.el7rhgs.x86_64 glusterfs-events-3.8.4-35.el7rhgs.x86_64 vdsm-gluster-4.17.33-1.1.el7rhgs.noarch gluster-nagios-common-0.2.4-1.el7rhgs.noarch libvirt-daemon-driver-storage-gluster-3.2.0-14.el7.x86_64 glusterfs-libs-3.8.4-35.el7rhgs.x86_64 glusterfs-cli-3.8.4-35.el7rhgs.x86_64 glusterfs-geo-replication-3.8.4-35.el7rhgs.x86_64 gluster-nagios-addons-0.2.9-1.el7rhgs.x86_64 glusterfs-debuginfo-3.8.4-26.el7rhgs.x86_64 glusterfs-client-xlators-3.8.4-35.el7rhgs.x86_64 python-gluster-3.8.4-35.el7rhgs.noarch [root@dhcp47-115 ~]# [root@dhcp47-115 ~]# gluster v list ctdb disp gluster_shared_storage testvol vol0 vol1 vol10 vol11 vol12 vol13 vol14 vol15 vol16 vol17 vol18 vol19 vol2 vol20 vol21 vol22 vol23 vol24 vol25 vol26 vol27 vol28 vol29 vol3 vol30 vol31 vol32 vol33 vol34 vol35 vol36 vol37 vol38 vol39 vol4 vol40 vol5 vol6 vol7 vol8 vol9 [root@dhcp47-115 ~]#
Not proposing this as a blocker for rhgs3.3 as I am unsure of the steps that I was executing. I will keep a watch on this in my further testing.
*** This bug has been marked as a duplicate of bug 1476730 ***