Bug 979901
Summary: | VM hangs for a brief period with "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" messages logged in VM console. | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Gowrishankar Rajaiyan <grajaiya> |
Component: | glusterfs | Assignee: | Bug Updates Notification Mailing List <rhs-bugs> |
Status: | CLOSED WONTFIX | QA Contact: | SATHEESARAN <sasundar> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 2.0 | CC: | divya, grajaiya, nsathyan, rhs-bugs, rwheeler, vbellur |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Known Issue | |
Doc Text: |
Cause: Slow response time for systemcalls while rebalance process is in progress.
Consequence: As a consequence of slow response time, the log "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" is shown in VM console.
Workaround: Try not to have lot of activity from the VMs in cluster while the rebalance operation is happening on the storage nodes.
Result: If there are not lot of stress on the storage nodes while rebalance is happening, the logs won't appear on the console.
|
Story Points: | --- |
Clone Of: | Environment: |
virt qemu integration
|
|
Last Closed: | 2015-03-23 07:38:12 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Gowrishankar Rajaiyan
2013-07-01 06:31:08 UTC
Comment copied from https://bugzilla.redhat.com/show_bug.cgi?id=859589#c41 <snip> Here is the technical info on why normally at least one time a flush() may take extra time. When a rebalance starts on the file, since we are handling 'open-fd-rebalancing', the clients will understand that the destination file is under migration. After knowing this, before returning 'flush()' call back to application, the client does 'open()' on the destination file, upon success it will do the same 'flush()' on the destination too. Which inturn will have to wait in the queue of long list of activities handled by glusterfsd process. That is the reason why someone would see such logs for 'hung for > 120sec' in dmesg while rebalance is happening. For the claim that it can happen only if there are few VMs or more VMs, it will depend on number of I/O waiting on server side, and also the normal system setup. This is the exact reason why we recommend to do 'rebalance' operation when the load on the system is low (read as weekends/holidays etc). Also, please note that 'rebalance' is advised every time a 'add-brick' is done (which by itself is rare operation ?). I propose that we take off 'blocker' flag out of it as long as none of the VMs are hung and we are not hitting non-recoverable state. </snip> Amar, Naga has identified this bug as Known Issue for Update 5 release. Could you provide the inputs in the Doc Text field? Thanks, Divya The product version of Red Hat Storage on which this issue was reported has reached End Of Life (EOL) [1], hence this bug report is being closed. If the issue is still observed on a current version of Red Hat Storage, please file a new bug report on the current version. [1] https://rhn.redhat.com/errata/RHSA-2014-0821.html The product version of Red Hat Storage on which this issue was reported has reached End Of Life (EOL) [1], hence this bug report is being closed. If the issue is still observed on a current version of Red Hat Storage, please file a new bug report on the current version. [1] https://rhn.redhat.com/errata/RHSA-2014-0821.html |