Description of problem: ----------------------- There are few App VMs having their disk images on distributed-replicate gluster volumes. This VMs were performing some IO operation simulated with 'dd' command. While performing rebalance, after adding a pair of bricks to this volume, there are call traces present in /var/log/messages and in /var/log/dmesg Version-Release number of selected component (if applicable): ------------------------------------------------------------- glusterfs-3.4.0.70rhs-1.el6rhs How reproducible: ----------------- tried once, but definite that this would be consistently reproducible Steps to Reproduce: ------------------- 1. Create 2X2 distributed-replicate volume 2. Optimize the volume for virt-store by tagging the volume with group 'virt' 3. Use this volume as a 'DataStore' in RHEV-M Datacenter 4. Create few App VMs with its disk images on that 'DataStore' backed by gluster volume 5. Install RHEL 6.6 on that VM and reboot post installation 6. Run sample IO Load on that VM, simulated by 'dd' command. Use this to continuously create files. 7. Add a pair of bricks to this volume 8. Perform rebalance on the volume Actual results: --------------- Observed some call traces in /var/log/messages and in /var/log/dmesg Expected results: ----------------- Call traces should be observed
Additional Information ---------------------- Call traces as seen in AppVms ------------------------------ AppVM1 ------- Nov 10 11:53:28 appvm1 kernel: INFO: task sadc:22619 blocked for more than 120 seconds. Nov 10 11:53:28 appvm1 kernel: Not tainted 2.6.32-504.el6.x86_64 #1 Nov 10 11:53:28 appvm1 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Nov 10 11:53:28 appvm1 kernel: sadc D 0000000000000000 0 22619 22615 0x00000084 Nov 10 11:53:28 appvm1 kernel: ffff88003a80fdc8 0000000000000082 ffff88003a80fe70 0000000000000001 Nov 10 11:53:28 appvm1 kernel: 000000000000000e ffff88003a80fde8 ffff88003a80fd88 ffffffff81123d80 Nov 10 11:53:28 appvm1 kernel: ffff88003e0dedd8 0000000101383693 ffff88003b39a5f8 ffff88003a80ffd8 Nov 10 11:53:28 appvm1 kernel: Call Trace: Nov 10 11:53:28 appvm1 kernel: [<ffffffff81123d80>] ? find_get_pages_tag+0x40/0x130 Nov 10 11:53:28 appvm1 kernel: [<ffffffff8109ee2e>] ? prepare_to_wait+0x4e/0x80 Nov 10 11:53:28 appvm1 kernel: [<ffffffffa00945a5>] jbd2_log_wait_commit+0xc5/0x140 [jbd2] Nov 10 11:53:28 appvm1 kernel: [<ffffffff8109eb00>] ? autoremove_wake_function+0x0/0x40 Nov 10 11:53:28 appvm1 kernel: [<ffffffffa0094938>] jbd2_complete_transaction+0x68/0xb0 [jbd2] Nov 10 11:53:28 appvm1 kernel: [<ffffffffa00b0061>] ext4_sync_file+0x121/0x1d0 [ext4] Nov 10 11:53:28 appvm1 kernel: [<ffffffff811c07f1>] vfs_fsync_range+0xa1/0x100 Nov 10 11:53:28 appvm1 kernel: [<ffffffff811c08bd>] vfs_fsync+0x1d/0x20 Nov 10 11:53:28 appvm1 kernel: [<ffffffff811c08fe>] do_fsync+0x3e/0x60 Nov 10 11:53:28 appvm1 kernel: [<ffffffff811c0933>] sys_fdatasync+0x13/0x20 Nov 10 11:53:28 appvm1 kernel: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b Nov 10 12:33:27 appvm1 kernel: INFO: task master:1446 blocked for more than 120 seconds. Nov 10 12:33:27 appvm1 kernel: Not tainted 2.6.32-504.el6.x86_64 #1 Nov 10 12:33:27 appvm1 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Nov 10 12:33:27 appvm1 kernel: master D 0000000000000000 0 1446 1 0x00000084 Nov 10 12:33:27 appvm1 kernel: ffff88003a20ba48 0000000000000082 ffff88003a20b9c8 0000000300000001 Nov 10 12:33:27 appvm1 kernel: ffff88003a20b9c8 ffff88003bad5a40 ffff88003a20b9c8 ffffffff811240de Nov 10 12:33:27 appvm1 kernel: 0000000000013760 0000000000180087 ffff88003ae27af8 ffff88003a20bfd8 Nov 10 12:33:27 appvm1 kernel: Call Trace: Nov 10 12:33:27 appvm1 kernel: [<ffffffff811240de>] ? find_get_page+0x1e/0xa0 Nov 10 12:33:27 appvm1 kernel: [<ffffffff8109ee2e>] ? prepare_to_wait+0x4e/0x80 Nov 10 12:33:27 appvm1 kernel: [<ffffffffa008e09d>] do_get_write_access+0x29d/0x520 [jbd2] Nov 10 12:33:27 appvm1 kernel: [<ffffffff8109eb80>] ? wake_bit_function+0x0/0x50 Nov 10 12:33:27 appvm1 kernel: [<ffffffffa008e471>] jbd2_journal_get_write_access+0x31/0x50 [jbd2] Nov 10 12:33:27 appvm1 kernel: [<ffffffffa00dbd98>] __ext4_journal_get_write_access+0x38/0x80 [ext4] Nov 10 12:33:27 appvm1 kernel: [<ffffffffa00b5bb3>] ext4_reserve_inode_write+0x73/0xa0 [ext4] Nov 10 12:33:27 appvm1 kernel: [<ffffffffa00b5c2c>] ext4_mark_inode_dirty+0x4c/0x1d0 [ext4] Nov 10 12:33:27 appvm1 kernel: [<ffffffffa008d495>] ? jbd2_journal_start+0xb5/0x100 [jbd2] Nov 10 12:33:27 appvm1 kernel: [<ffffffffa00b5f20>] ext4_dirty_inode+0x40/0x60 [ext4] Nov 10 12:33:27 appvm1 kernel: [<ffffffff811ba68b>] __mark_inode_dirty+0x3b/0x160 Nov 10 12:33:27 appvm1 kernel: [<ffffffff811aad72>] file_update_time+0xf2/0x170 Nov 10 12:33:27 appvm1 kernel: [<ffffffff81198ec0>] pipe_write+0x300/0x6a0 Nov 10 12:33:27 appvm1 kernel: [<ffffffff8118dd5a>] do_sync_write+0xfa/0x140 Nov 10 12:33:27 appvm1 kernel: [<ffffffff8109eb00>] ? autoremove_wake_function+0x0/0x40 Nov 10 12:33:27 appvm1 kernel: [<ffffffff8123a5eb>] ? selinux_file_permission+0xfb/0x150 Nov 10 12:33:27 appvm1 kernel: [<ffffffff8122d446>] ? security_file_permission+0x16/0x20 Nov 10 12:33:27 appvm1 kernel: [<ffffffff8118e058>] vfs_write+0xb8/0x1a0 Nov 10 12:33:27 appvm1 kernel: [<ffffffff8118ea21>] sys_write+0x51/0x90 Nov 10 12:33:27 appvm1 kernel: [<ffffffff810e5a7e>] ? __audit_syscall_exit+0x25e/0x290 Nov 10 12:33:27 appvm1 kernel: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b Nov 10 12:33:27 appvm1 kernel: INFO: task qmgr:1460 blocked for more than 120 seconds. AppVM2 ------ Nov 10 13:44:08 appvm2 kernel: INFO: task sadc:30563 blocked for more than 120 seconds. Nov 10 13:44:08 appvm2 kernel: Not tainted 2.6.32-504.el6.x86_64 #1 Nov 10 13:44:08 appvm2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Nov 10 13:44:08 appvm2 kernel: sadc D 0000000000000000 0 30563 30559 0x00000084 Nov 10 13:44:08 appvm2 kernel: ffff88007d197dc8 0000000000000086 0000000000000000 0000000000000001 Nov 10 13:44:08 appvm2 kernel: 000000000000000e ffff88007d197de8 00001bd101703ffc ffffffff81123d80 Nov 10 13:44:08 appvm2 kernel: ffff88007dd25608 0000000101ce1aae ffff88007bc33098 ffff88007d197fd8 Nov 10 13:44:08 appvm2 kernel: Call Trace: Nov 10 13:44:08 appvm2 kernel: [<ffffffff81123d80>] ? find_get_pages_tag+0x40/0x130 Nov 10 13:44:08 appvm2 kernel: [<ffffffffa00945a5>] jbd2_log_wait_commit+0xc5/0x140 [jbd2] Nov 10 13:44:08 appvm2 kernel: [<ffffffff8109eb00>] ? autoremove_wake_function+0x0/0x40 Nov 10 13:44:08 appvm2 kernel: [<ffffffffa0094938>] jbd2_complete_transaction+0x68/0xb0 [jbd2] Nov 10 13:44:08 appvm2 kernel: [<ffffffffa00b0061>] ext4_sync_file+0x121/0x1d0 [ext4] Nov 10 13:44:08 appvm2 kernel: [<ffffffff811c07f1>] vfs_fsync_range+0xa1/0x100 Nov 10 13:44:08 appvm2 kernel: [<ffffffff811c08bd>] vfs_fsync+0x1d/0x20 Nov 10 13:44:08 appvm2 kernel: [<ffffffff811c08fe>] do_fsync+0x3e/0x60 Nov 10 13:44:08 appvm2 kernel: [<ffffffff811c0933>] sys_fdatasync+0x13/0x20 Nov 10 13:44:08 appvm2 kernel: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b Nov 10 14:54:08 appvm2 kernel: INFO: task sadc:30837 blocked for more than 120 seconds. Nov 10 14:54:08 appvm2 kernel: Not tainted 2.6.32-504.el6.x86_64 #1 Nov 10 14:54:08 appvm2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Nov 10 14:54:08 appvm2 kernel: sadc D 0000000000000000 0 30837 30833 0x00000084 Nov 10 14:54:08 appvm2 kernel: ffff88007b21ddc8 0000000000000086 0000000000000000 0000000000000001 Nov 10 14:54:08 appvm2 kernel: 000000000000000e ffff88007b21dde8 00001fa6594c9608 ffffffff81123d80 Nov 10 14:54:08 appvm2 kernel: ffff88007dd25608 00000001020e6a88 ffff88007b2ab098 ffff88007b21dfd8 Nov 10 14:54:08 appvm2 kernel: Call Trace: Nov 10 14:54:08 appvm2 kernel: [<ffffffff81123d80>] ? find_get_pages_tag+0x40/0x130 Nov 10 14:54:08 appvm2 kernel: [<ffffffffa00945a5>] jbd2_log_wait_commit+0xc5/0x140 [jbd2] Nov 10 14:54:08 appvm2 kernel: [<ffffffff8109eb00>] ? autoremove_wake_function+0x0/0x40 Nov 10 14:54:08 appvm2 kernel: [<ffffffffa0094938>] jbd2_complete_transaction+0x68/0xb0 [jbd2] Nov 10 14:54:08 appvm2 kernel: [<ffffffffa00b0061>] ext4_sync_file+0x121/0x1d0 [ext4] Nov 10 14:54:08 appvm2 kernel: [<ffffffff811c07f1>] vfs_fsync_range+0xa1/0x100 Nov 10 14:54:08 appvm2 kernel: [<ffffffff811c08bd>] vfs_fsync+0x1d/0x20 Nov 10 14:54:08 appvm2 kernel: [<ffffffff811c08fe>] do_fsync+0x3e/0x60 Nov 10 14:54:08 appvm2 kernel: [<ffffffff811c0933>] sys_fdatasync+0x13/0x20 Nov 10 14:54:08 appvm2 kernel: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
Created attachment 956174 [details] /var/log/message as seen on appvm1 /var/log/messages file as seen on appvm1
Created attachment 956175 [details] /var/log/message as seen on appvm2 /var/log/message as seen on appvm2
Created attachment 956176 [details] sosreport from RHS Node 1 sosreport from rhs node1
Created attachment 956177 [details] sosreport from RHS Node 2 sosreport from rhs node 2
This issue was similar to the one seen with RHS 2.0 and was tracked with this bug - https://bugzilla.redhat.com/show_bug.cgi?id=979901 Not sure both this issues are the same. Apart from the error messages and a momentary VM hang, everthing resumes normal. I could reboot the VM and there were no other problems and hence this bug with medium severity. If the issue seen in this bug is same as BZ 979901, then this issue need to be added to known-issue
The comment, https://bugzilla.redhat.com/show_bug.cgi?id=859589#c41 also provides little information on this issue
The issue was already available in Release Notes as a 'known issue'. As RHS 2.0 has reached End Of Life ( EOL ),this bug could be used for tracking the issue with RHS 2.1.x
Cloning to 3.1. To be fixed in future release.