Bug 844758

Summary: excessive logging in self-heal daemon should be reduced
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Vijay Bellur <vbellur>
Component: fuseAssignee: vsomyaju
Status: CLOSED ERRATA QA Contact: Rahul Hinduja <rhinduja>
Severity: high Docs Contact:
Priority: unspecified    
Version: 2.0CC: aavati, amarts, csaba, nsathyan, rabhat, sdharane, shaines
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 844695
: 844804 (view as bug list) Environment:
Last Closed: 2013-09-23 22:36:14 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 844695    
Bug Blocks: 844804    
Attachments:
Description Flags
Comparison of analyzer result between two releases none

Description Vijay Bellur 2012-07-31 16:26:39 UTC
+++ This bug was initially created as a clone of Bug #844695 +++

Created attachment 601512 [details]
log analysing script which parses the log file and prints the number of times each function has logged

Description of problem:

Self-heal daemon logs heavily by increasing the log file to some GBs thus consuming large disk space. It should be minimized by preventing the logs being repeated and also avoiding non important logs or printing those log message in some other log level such as DEBUG or TRACE.

With the bug a script has been attached which analyzes the glusterfs log file provided as an argument and prints the functions and number of times they have been called.

This is the o/p of log analyzer script run on a self-heal-daemon log file which had grown to 15-20MBs.

afr_detect_self_heal_by_iatt:7353
afr_detect_self_heal_by_lookup_status:243
afr_dir_exclusive_crawl:20
afr_launch_self_heal:19993
afr_notify:15
afr_self_heal_completion_cbk:19993
afr_sh_common_lookup_resp_handler:5745
afr_sh_data_open_cbk:240
afr_sh_entry_expunge_entry_cbk:1956
afr_sh_entry_fix:1
afr_sh_metadata_post_nonblocking_inodelk_cbk:486
afr_sh_post_nonblocking_entry_cbk:3
cleanup_and_exit:2
client3_1_entrylk_cbk:3
client3_1_getxattr_cbk:3
client3_1_inodelk_cbk:243
client3_1_open_cbk:240
client3_1_unlink_cbk:4
client_ping_cbk:3
client_rpc_notify:42
client_set_lk_version_cbk:54
client_setvolume_cbk:108
_crawl_proceed:12
gf_add_cmdline_options:12
main:2
mgmt_cbk_spec:4
mgmt_getspec_cbk:5
notify:81
reincarnate:1
_remove_stale_index:9
rpc_client_ping_timer_expired:3
rpc_clnt_reconfig:24
saved_frames_unwind:6
select_server_supported_programs:54
sh_loop_driver_done:18008
socket_connect_finish:24
__socket_keepalive:2
__socket_proto_state_machine:41
__socket_rwv:18
socket_server_event_handler:2
========= Error Functions ========
afr_notify:3
afr_self_heal_completion_cbk:243
afr_sh_common_lookup_resp_handler:5745
afr_sh_data_open_cbk:240
afr_sh_metadata_post_nonblocking_inodelk_cbk:486
afr_sh_post_nonblocking_entry_cbk:3
cleanup_and_exit:2
client3_1_entrylk_cbk:3
client3_1_getxattr_cbk:3
client3_1_inodelk_cbk:243
client3_1_open_cbk:240
client3_1_unlink_cbk:4
client_ping_cbk:3
_crawl_proceed:12
_remove_stale_index:4
saved_frames_unwind:6
socket_connect_finish:24
__socket_keepalive:2
__socket_proto_state_machine:41
__socket_rwv:18
socket_server_event_handler:2


"Error function"  section lists also functions which have logged in WARNING level.


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Amar Tumballi 2012-08-23 06:45:24 UTC
This bug is not seen in current master branch (which will get branched as RHS 2.1.0 soon). To consider it for fixing, want to make sure this bug still exists in RHS servers. If not reproduced, would like to close this.

Comment 2 Rahul Hinduja 2012-12-11 07:12:06 UTC
Ran the following test case with build (glusterfs-3.3.0rhs-25.el6rhs.x86_64) and (glusterfs-3.3.0.5rhs-37.el6rhs.x86_64) 


Test Case:
==========

1. Create 1*2 replica setup.
2. Mount the volume to the client.
3. Bring down one of the brick from replica pair (killed pid).
4. Add large number of directories and files to the mount. I used following

for i in {1..500}; do cp -rf /etc etc.$i; done

5. Check the indices directory to confirm large number of files to be healed.
6. Bring back the brick (gluster volume start <vol-name> force)
7. Self heal should start.
8. Self heal should complete at the end.
9. Use the log_analyzer.sh script to analyse glustershd log file.

./log_analyzer.sh /var/log/glusterfs/glustershd.log 


Test Result Comparison:
=======================

Size of glustershd comparison:

With build glusterfs-3.3.0rhs-25.el6rhs.x86_64 : 1.8 G

With build glusterfs-3.3.0.5rhs-37.el6rhs.x86_64 : 401 M



log_analyzer.sh result:
======================


With build glusterfs-3.3.0rhs-25.el6rhs.x86_64
-----------------------------------------------

[12/11/12 - 06:32:44 root@dhcp159-57 ~]# ./log_analyzer.sh /var/log/glusterfs/glustershd.log 
afr_detect_self_heal_by_iatt:498528
afr_detect_self_heal_by_lookup_status:613008
afr_dir_exclusive_crawl:70
afr_launch_self_heal:1489040
afr_notify:2
afr_self_heal_completion_cbk:1489040
afr_sh_common_lookup_resp_handler:777924
afr_sh_data_open_cbk:585268
afr_sh_metadata_post_nonblocking_inodelk_cbk:1226016
afr_sh_post_nonblocking_entry_cbk:27740
cleanup_and_exit:1
client3_1_entrylk_cbk:27740
client3_1_inodelk_cbk:613008
client3_1_open_cbk:585268
client3_1_readlink_cbk:249532
client_query_portmap_cbk:2
client_rpc_notify:3
client_set_lk_version_cbk:4
client_setvolume_cbk:8
_crawl_proceed:11
gf_add_cmdline_options:6
_log_if_unknown_option:4
main:2
notify:9
rpc_clnt_reconfig:4
select_server_supported_programs:4
sh_loop_driver_done:507030
socket_connect_finish:1
__socket_keepalive:3
__socket_proto_state_machine:1
__socket_rwv:2
socket_server_event_handler:3
========= Error Functions ========
afr_self_heal_completion_cbk:613008
afr_sh_common_lookup_resp_handler:777924
afr_sh_data_open_cbk:585268
afr_sh_metadata_post_nonblocking_inodelk_cbk:1226016
afr_sh_post_nonblocking_entry_cbk:27740
cleanup_and_exit:1
client3_1_entrylk_cbk:27740
client3_1_inodelk_cbk:613008
client3_1_open_cbk:585268
client3_1_readlink_cbk:249532
client_query_portmap_cbk:2
_crawl_proceed:11
_log_if_unknown_option:4
socket_connect_finish:1
__socket_keepalive:3
__socket_proto_state_machine:1
__socket_rwv:2
socket_server_event_handler:3
[12/11/12 - 06:42:40 root@dhcp159-57 ~]# 





With build glusterfs-3.3.0.5rhs-37.el6rhs.x86_64
------------------------------------------------

[12/11/12 - 06:32:44 root@dhcp159-108 ~]# ./log_analyzer.sh /var/log/glusterfs/glustershd.log 
afr_dir_exclusive_crawl:40
afr_notify:2
afr_sh_data_open_cbk:582563
afr_sh_post_nonblocking_entry_cbk:27330
cleanup_and_exit:1
client3_1_entrylk_cbk:27330
client3_1_inodelk_cbk:609893
client3_1_open_cbk:582563
client3_1_readlink_cbk:249500
client_query_portmap_cbk:2
client_rpc_notify:3
client_set_lk_version_cbk:4
client_setvolume_cbk:8
_crawl_proceed:11
gf_add_cmdline_options:6
_log_if_unknown_option:4
main:2
notify:9
rpc_clnt_reconfig:4
select_server_supported_programs:4
socket_connect_finish:1
__socket_keepalive:3
__socket_proto_state_machine:1
__socket_rwv:2
socket_server_event_handler:3
========= Error Functions ========
afr_sh_data_open_cbk:582563
afr_sh_post_nonblocking_entry_cbk:27330
cleanup_and_exit:1
client3_1_entrylk_cbk:27330
client3_1_inodelk_cbk:609893
client3_1_open_cbk:582563
client3_1_readlink_cbk:249500
client_query_portmap_cbk:2
_crawl_proceed:11
_log_if_unknown_option:4
socket_connect_finish:1
__socket_keepalive:3
__socket_proto_state_machine:1
__socket_rwv:2
socket_server_event_handler:3
[12/11/12 - 06:35:40 root@dhcp159-108 ~]# 




By comparing the result between 2 releases, the excessive logging is handled in the latest release. But there are still logs which are reported in huge numbers. Is this all we have targeted to reduce the excessive logging?

Comment 3 Rahul Hinduja 2012-12-11 07:14:05 UTC
Created attachment 661298 [details]
Comparison of analyzer result between two releases

Attached table contains the column wise comparison of the strings in glustershd log file for both the releases.

Comment 4 Rahul Hinduja 2013-01-07 06:32:57 UTC
Re assigning as per the comment #2 and needinfo ?

Comment 5 vsomyaju 2013-01-08 10:10:10 UTC
For this release, the reduction in excessive logging obtained is fine. Feel free to open a new bug for the same for later releases if logging is still considered to be excessive.

Comment 6 Rahul Hinduja 2013-04-10 13:42:27 UTC
Marking the bug verified as per the comment 2 and 5

Comment 7 Scott Haines 2013-09-23 22:36:14 UTC
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html