Hide Forgot
Description of problem: ======================= While verification of client-io-threads from Fuse mount on EC volume, observed the following on dmesg of client : [88440.551165] INFO: task crefi:1972 blocked for more than 120 seconds. [88440.551223] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [88440.551283] crefi D ffff8800aed7ccb0 0 1972 1968 0x00000080 [88440.551287] ffff88003ec57c70 0000000000000082 ffff880036bbb980 ffff88003ec57fd8 [88440.551291] ffff88003ec57fd8 ffff88003ec57fd8 ffff880036bbb980 ffff8800aed7cca8 [88440.551293] ffff8800aed7ccac ffff880036bbb980 00000000ffffffff ffff8800aed7ccb0 [88440.551296] Call Trace: [88440.551305] [<ffffffff8163b9e9>] schedule_preempt_disabled+0x29/0x70 [88440.551309] [<ffffffff816396e5>] __mutex_lock_slowpath+0xc5/0x1c0 [88440.551312] [<ffffffff81638b4f>] mutex_lock+0x1f/0x2f [88440.551316] [<ffffffff811eb9af>] do_last+0x28f/0x1270 [88440.551320] [<ffffffff811c11ce>] ? kmem_cache_alloc_trace+0x1ce/0x1f0 [88440.551323] [<ffffffff811ee672>] path_openat+0xc2/0x490 [88440.551362] [<ffffffffa01e7cd4>] ? xfs_iunlock+0xa4/0x130 [xfs] [88440.551383] [<ffffffffa01d45fa>] ? xfs_free_eofblocks+0xda/0x270 [xfs] [88440.551387] [<ffffffff811efe3b>] do_filp_open+0x4b/0xb0 [88440.551390] [<ffffffff811fc9c7>] ? __alloc_fd+0xa7/0x130 [88440.551394] [<ffffffff811dd7e3>] do_sys_open+0xf3/0x1f0 [88440.551397] [<ffffffff811dd8fe>] SyS_open+0x1e/0x20 [88440.551401] [<ffffffff81645909>] system_call_fastpath+0x16/0x1b Scenario: ========= 1. Create EC volume (3x(8+3)) from 11 node cluster. 2. Set event-threads to 4 3. Enable client-io-threads 4. Mount the volume on client via Fuse 5. Open 10 Sessions of the client (Different terminals) 6. Create IO from 5 session using crefi on same directory. Ensure to use multi threaded crefi using -T 10. crefi -b 4 -d 4 -n 20 --multi --random --min 1K --max 100M -T 10 -t text --fop=create /mnt/fuse/multiple_threads/ 7. From the other 5 sessions, stat all the content of directory every 30 sec. for i in {1..100}; do echo "This is iteration $i" ; find * | xargs stat ; sleep 30 ;done 8. While step 7 and step 8 are inprogress, bring down 1 bricks from each subvolume. Wait for a while and start the volume forcefully to bring the bricks online. 9. Repeat step 8 after multiple times (3-4) after the healing is completed Version-Release number of selected component (if applicable): ============================================================= glusterfs-3.7.9-10.el7rhgs.x86_64 How reproducible: ================= Hit the issue once, tried again but could not reproduce. Additional info: ================ While the above use case in progress, collected the resource consumption (CPU and Memory) every 10 sec. For client, Max CPU was: 181 % while the Min was: 0 % for glusterfs process. Mostly during the use case it was in range of 50-181 % Memory throughout was in range of: 2.7 - 6.7 % Primary server also shoot CPU for fraction of seconds to 391 % and Memory Max was : .7 % Currently raising this bz under EC for initial analysis.
Keeping the bug in needinfo until further updates