Bug 1203442
Summary: | miq-system.rb causing top_output.log to fill the log filesystem | |||
---|---|---|---|---|
Product: | Red Hat CloudForms Management Engine | Reporter: | Josh Carter <jocarter> | |
Component: | Appliance | Assignee: | Joe Rafaniello <jrafanie> | |
Status: | CLOSED ERRATA | QA Contact: | Dave Johnson <dajohnso> | |
Severity: | high | Docs Contact: | ||
Priority: | high | |||
Version: | 5.3.0 | CC: | abellott, dajohnso, jfrey, jhardy, jkrocil, jrafanie, mfeifer, obarenbo, sean.myers, thenness | |
Target Milestone: | GA | |||
Target Release: | 5.4.0 | |||
Hardware: | All | |||
OS: | All | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | Bug Fix | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1207860 (view as bug list) | Environment: | ||
Last Closed: | 2015-06-16 12:54:07 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1207860 |
Comment 4
Joe Rafaniello
2015-03-18 23:12:26 UTC
Another question is why this log is not rotating. Is this related to the upstream bug where the logs were not rotating? Could this be a NFS mount issue? to Jason's comment: we rotate logs on a daily schedule, not by size: https://github.com/jrafanie/manageiq/blob/ecda2a8c75225a3248adc011f7714d186c37cb0a/system/COPY/etc/logrotate.d/miq_logs.conf#L2 Following up on Tom's comment, a google search it's looks clear that df is known to hang when mount volumes are inaccessible. http://unix.stackexchange.com/questions/21199/why-is-df-hanging https://bugs.centos.org/view.php?id=6414 https://community.oracle.com/thread/2515836 https://darktraining.com/linux/67/ Many of the suggestions online are to limit our df command to only local volumes by passing -l: df -l Note, this bug seems to indicate the local option may not work with fedora20 https://bugzilla.redhat.com/show_bug.cgi?id=1199679 We'll have to recreate this in house and see if we can verify it on rhel 6.6. Dave, from the provided links, would you be able to try to recreate this? According to the googling I did, it seems like just mounting a nfs share on an appliance, shutting down the nfs server or disabling the nfs, and trying 'df -T -P' seems to do it. From Tom's log reviews, it looks like the top output was showing the SAME df command taking MINUTES to run since it was reported 3 or more times in top_output.log. This is the code we'd change to pass -l: https://github.com/jrafanie/manageiq/blob/19149b1e903a27e241e79dd174a82234344bfbe3/lib/util/miq-system.rb#L405 Jan, can you give this a shot and see if you can reproduce? Thanks! Dave, I was able to reproduce this in 5.4.0.0.12. After blocking access to a mounted NFS server, the "df" processes began stacking up as seen here (same as what Josh posted above): [root@my_host ~]# ps aux | grep df root 28588 0.0 0.0 108208 1280 ? SN 06:01 0:00 sh -c df -T -P 2>&1 root 28589 0.0 0.0 100980 800 ? DN 06:01 0:00 df -T -P root 29024 0.0 0.0 108208 1284 ? SN 06:11 0:00 sh -c df -T -P 2>&1 root 29025 0.0 0.0 100980 800 ? DN 06:11 0:00 df -T -P root 29443 0.0 0.0 108208 1284 ? SN 06:21 0:00 sh -c df -T -P 2>&1 root 29444 0.0 0.0 100980 800 ? DN 06:21 0:00 df -T -P root 29846 0.0 0.0 108208 1284 ? SN 06:31 0:00 sh -c df -T -P 2>&1 root 29847 0.0 0.0 100980 800 ? DN 06:31 0:00 df -T -P When I manually ran "df -T -P" just to check it, it did indeed hang as well. Running "df -T -P -l" gave me results right away though - it doesn't hang. So whatever df-related issue may be (have been) present in fedora, it doesn't seem to affect up-to-date rhel6.6-based appliances. [root@dhcp-8-59-242 ~]# rpm -qf /bin/df coreutils-8.4-37.el6.x86_64 Awesome Jan! Thanks. I'll prepare a fix. If you still have the appliance in this state, can you confirm if df/du need the -l option when passed a specific file on a local storage? In other words, does it bypass remote mounts when you ask for df/du for the storage a file resides? For example: logger_file=/var/www/miq/vmdb/log/appliance_console.log logvol_free_space=`df -k $file | awk '{ print $3 }' | tail -n 1` existing_log_size=`du -k $file | awk '{ print $1 }' | tail -n 1` https://github.com/jrafanie/manageiq/blob/c3418fd900ca1a1fd6c44852045291e907c53a58/system/logrotate_free_space_check.sh#L4-5 note, du, doesn't have a local option. -l is the same as --count-links. Upstream Pull request: https://github.com/ManageIQ/manageiq/pull/2277 Limits df to local filesystem even when we specify a local file path (paranoid???). Jan, I'm still curious if 'df -T -P /etc/hosts' also hangs when NFS is down and 'df -T -P' hangs (without a local file path). Thanks Tom, Brandon, Jan, Dave, and Josh... I wasn't sure we'd be able to recreate this easily. I'm much more confident we can verify this fix now. Joe, both "du" and "df" ran against local files (e.g. /etc/hosts) complete successfully. New commit detected on manageiq/master: https://github.com/ManageIQ/manageiq/commit/0e8c4373503fcd2bca48ae46bccd670657fc5632 commit 0e8c4373503fcd2bca48ae46bccd670657fc5632 Author: Joe Rafaniello <jrafanie> AuthorDate: Fri Mar 20 10:00:14 2015 -0400 Commit: Joe Rafaniello <jrafanie> CommitDate: Fri Mar 20 18:08:01 2015 -0400 To avoid a hung process, limit df to local fs if no file is provided Mounted NFS volumes that go bad/down can hang df so let's limit df to only the local filesystems we care about. Existing usage of MiqSystem.disk_usage has been: * where we pass no argument and log the output for server statistics from local filesystems * where we pass the Postgresql database disk location https://bugzilla.redhat.com/show_bug.cgi?id=1203442 lib/spec/util/miq-system_spec.rb | 30 ++++++++++++++++++++++++++---- lib/util/miq-system.rb | 10 ++++++++-- 2 files changed, 34 insertions(+), 6 deletions(-) New commit detected on manageiq/master: https://github.com/ManageIQ/manageiq/commit/9b4ab5a77e469b7e7ca8bf95a87e031b0e5db4f1 commit 9b4ab5a77e469b7e7ca8bf95a87e031b0e5db4f1 Author: Joe Rafaniello <jrafanie> AuthorDate: Fri Mar 20 10:16:59 2015 -0400 Commit: Joe Rafaniello <jrafanie> CommitDate: Fri Mar 20 18:08:04 2015 -0400 Force df to local filesystems even when a local path is provided. https://bugzilla.redhat.com/show_bug.cgi?id=1203442 lib/fs/MiqFS/modules/RealFS.rb | 2 +- system/logrotate_free_space_check.sh | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) Yes, Dave, will be backported as part of https://bugzilla.redhat.com/show_bug.cgi?id=1207860. New commit detected on cfme/5.3.z: https://code.engineering.redhat.com/gerrit/gitweb?p=cfme.git;a=commitdiff;h=f566d49712e2b63da35d4a076a3e17ea758bda66 commit f566d49712e2b63da35d4a076a3e17ea758bda66 Author: Joe Rafaniello <jrafanie> AuthorDate: Fri Mar 20 10:00:14 2015 -0400 Commit: Joe Rafaniello <jrafanie> CommitDate: Tue Mar 31 17:38:11 2015 -0400 To avoid a hung process, limit df to local fs if no file is provided Mounted NFS volumes that go bad/down can hang df so let's limit df to only the local filesystems we care about. Existing usage of MiqSystem.disk_usage has been: * where we pass no argument and log the output for server statistics from local filesystems * where we pass the Postgresql database disk location https://bugzilla.redhat.com/show_bug.cgi?id=1203442 Conflicts: lib/util/miq-system.rb lib/spec/util/miq-system_spec.rb | 30 ++++++++++++++++++++++++++---- lib/util/miq-system.rb | 10 ++++++++-- 2 files changed, 34 insertions(+), 6 deletions(-) New commit detected on cfme/5.3.z: https://code.engineering.redhat.com/gerrit/gitweb?p=cfme.git;a=commitdiff;h=8e7ee5d8cf5478c3325e27ec66b1ce87a8eeac09 commit 8e7ee5d8cf5478c3325e27ec66b1ce87a8eeac09 Author: Joe Rafaniello <jrafanie> AuthorDate: Fri Mar 20 10:16:59 2015 -0400 Commit: Joe Rafaniello <jrafanie> CommitDate: Tue Mar 31 17:41:14 2015 -0400 Force df to local filesystems even when a local path is provided. https://bugzilla.redhat.com/show_bug.cgi?id=1203442 lib/fs/MiqFS/modules/RealFS.rb | 2 +- system/logrotate_free_space_check.sh | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) New commit detected on cfme/5.3.z: https://code.engineering.redhat.com/gerrit/gitweb?p=cfme.git;a=commitdiff;h=6d017ab52f2e0779c25f17fc807b65f20fd170b0 commit 6d017ab52f2e0779c25f17fc807b65f20fd170b0 Merge: dede0f6 8e7ee5d Author: Brandon Dunne <bdunne> AuthorDate: Thu Apr 2 09:58:18 2015 -0400 Commit: Brandon Dunne <bdunne> CommitDate: Thu Apr 2 09:58:18 2015 -0400 Merge branch '53_df_hang_fix' into '5.3.z' 53 df hang fix * To avoid a hung process, limit df to local fs if no file is provided … Mounted NFS volumes that go bad/down can hang df so let's limit df to only the local filesystems we care about. Existing usage of MiqSystem.disk_usage has been: * where we pass no argument and log the output for server statistics from local filesystems * where we pass the Postgresql database disk location * Force df to local filesystems even when a local path is provided. https://bugzilla.redhat.com/show_bug.cgi?id=1203442 See merge request !17 lib/fs/MiqFS/modules/RealFS.rb | 2 +- lib/spec/util/miq-system_spec.rb | 30 ++++++++++++++++++++++++++---- lib/util/miq-system.rb | 10 ++++++++-- system/logrotate_free_space_check.sh | 2 +- 4 files changed, 36 insertions(+), 8 deletions(-) New commit detected on cfme/5.3.4: https://code.engineering.redhat.com/gerrit/gitweb?p=cfme.git;a=commitdiff;h=1cb8dc973655ab3b7c8f222091d92a7ffb3cd949 commit 1cb8dc973655ab3b7c8f222091d92a7ffb3cd949 Author: Joe Rafaniello <jrafanie> AuthorDate: Fri Mar 20 10:00:14 2015 -0400 Commit: Joe Rafaniello <jrafanie> CommitDate: Thu Apr 2 10:20:32 2015 -0400 To avoid a hung process, limit df to local fs if no file is provided Mounted NFS volumes that go bad/down can hang df so let's limit df to only the local filesystems we care about. Existing usage of MiqSystem.disk_usage has been: * where we pass no argument and log the output for server statistics from local filesystems * where we pass the Postgresql database disk location https://bugzilla.redhat.com/show_bug.cgi?id=1203442 Conflicts: lib/util/miq-system.rb lib/spec/util/miq-system_spec.rb | 30 ++++++++++++++++++++++++++---- lib/util/miq-system.rb | 10 ++++++++-- 2 files changed, 34 insertions(+), 6 deletions(-) New commit detected on cfme/5.3.4: https://code.engineering.redhat.com/gerrit/gitweb?p=cfme.git;a=commitdiff;h=3e0d55ebbf17a418342fe9e9e6473cdf9e0d7b92 commit 3e0d55ebbf17a418342fe9e9e6473cdf9e0d7b92 Author: Joe Rafaniello <jrafanie> AuthorDate: Fri Mar 20 10:16:59 2015 -0400 Commit: Joe Rafaniello <jrafanie> CommitDate: Thu Apr 2 10:20:36 2015 -0400 Force df to local filesystems even when a local path is provided. https://bugzilla.redhat.com/show_bug.cgi?id=1203442 lib/fs/MiqFS/modules/RealFS.rb | 2 +- system/logrotate_free_space_check.sh | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) Verified on 5.4.0.0.24 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2015-1100.html |