+++ This bug was initially created as a clone of Bug #1209843 +++ Description of problem: Had a distribute volume 'nash' across 4 bricks in 2 nodes (2bricks in each node). Did a glusterfind create session and tried to create multiple sessions for the same volume. A crash was observed while creating the 20th session. The location /var/lib/glusterd/glusterfind/ shows the mention of all 20 sessions, but the log of the 20th session does not seem to be complete. Version-Release number of selected component (if applicable): Glusterfs 3.7 nightly glusterfs-3.7dev-0.910.git17827de.el6.x86_64 Steps to Reproduce: 1. Create a gluster volume 2. Try to create >20 sessions for the same volume using 'glusterfind create' command Actual results: Step2 results in a crash. Expected results: Step2 should either be successful in creating multiple sessions, or should fail gracefully stating the maximum allowed sessions for a single volume Additional info: [root@dhcp43-140 ~]# glusterfind create --force sess*nash nash [root@dhcp43-140 ~]# [root@dhcp43-140 ~]# vi /var/lib/glusterd/glusterfind glusterfind/ glusterfind.secret.pem glusterfind.secret.pem.pub [root@dhcp43-140 ~]# cat /var/lib/glusterd/glusterfind.secret.pem -----BEGIN RSA PRIVATE KEY----- MIIEoQIBAAKCAQEA2918osxpjW/GwdY2hdAbLBLqofdggFjAWWPsbaTfMDgqif6R b6kUipg4htxz820i8oryHo7g8qe/MoO3pyFM+Av+iZKpGcA47SlmuBB5ho/X37aO UDJ7JVlUUsGssgd0CrH3Hm++HQUlf4PoGGX997w4UeyI9oEpmVmnkgVkO0G0ayMF zaCVLl8BlMOb3uzOoUIBHZIoMYKFFpuCi0k9I6C4DUVEyYo1LgwwIcopOUdwXX5p AeRwFmeLdEtfq3Fomh/B/aTIt9IRdjPvBGZXmxCPFfHMnWINCslTYdd4CrUfB2Jz WPGRlEvjzLAjPlcEQCGeTHDgDqFbOwFOJ0ELQQIBIwKCAQAlsPDLc3/PGnnYFhf5 rqWShuZk6JQzQmod899jQNXNwHxSK6PnQY5+KLk7sMNrjw1OJnKmJx8/iniMURgr R4mJm6f6U6f1yS5UisEm3kC3+2bV5Mfwfa618g50ljrcsNIQdkee0VPKdekrzXg+ smYNNjWKYxAqQgchmlc9moY2CvIosV+9aJ0fjCnW/hz5Q3ltgHZa8YEHoZgHxnPQ vCK+RAa8RyzpzeF1onZqQc/ZcjDv5VChZZ+KGlyF+Du5P2QErQRzxVj3au72hnP1 Bqo1Hwf2DUNTZ5L8JjK1Ft1NgM0htKhfchn4EoqjRQCorzEkb72/xg5b9DPK4yrE Y7ozAoGBAO1IQ564nNMT4mviVcTNeaJqp1f7YyViI9mE46eqeg3oKyYkYiMFvHrc i90atBCKnHyZxAlcX6mXaeCPaNDidyQ7vIZf9QAzip9SxZJn0rrQ9JioGY3AAN9k 2N9SOXKlOtoL8KEIAIVPc6vV905iSxdoPwgha1AB3yD65Tejfx6DAoGBAO01f2NE IYLidaGho4NjaZKNLUwPrjzNhV9kXOH0hbnlk2qemUZv8hcjTKBF64bk5iuhq16i s/kuuTnXZTJlsyeYcfb6W0GLrQAwCLgx7LvpkZ1SiuDML6mRvExEmKUhYEMMKVF5 gh9zFrno+6ZZY04uAW33AjxJsbbLREe00YPrAoGAFFamtdVPRUrY5K0ACY30e6K9 4vhKU6lTh6xOBw6cv10LA0TyeAfOU6yBBFK++hqCcRR+hHWhzLU09f2p5gTIYjEB h9xW1CGsy9PlDIyOZ8jEgh0JgS227pOdj3wTjXvgeRb3XkKDs6e5dSD38MaYuN0M t43V/4shwP+P/XRp+00CgYEA36d4G8PlFQFnmGUzxQX101Ho4VCc93/yx6fFShJv dMKDrat6kt6TzKxA7uLWwQO0ckf5WTpR6vGKEfb5A6G3i7uepwlOuiSbzPoW2Y4o WWcxhbQ5zKNC4bVLI1aejRDXGqURA60USYnE+G30lYd640/yuCNogfxXGhAbzpSL B1kCgYBxWaHoaHQ/peupGu5+dd+wN00qj5VNE18sQWEkL1dIAwSD5tL0zrBY8Btw sN9/GlDvHXtiT92Yo+j90IB0tScm7p+itPKDTnkUT52A9jzmf+e1utCTykTwI0cg XYS7DzEsU3Pzz59UuImONPaW/CMd8zZZJbCD3zT6Po4PJ+jlJA== -----END RSA PRIVATE KEY----- [root@dhcp43-140 ~]# cat /var/lib/glusterd/glusterfind.secret.pem.pub ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEA2918osxpjW/GwdY2hdAbLBLqofdggFjAWWPsbaTfMDgqif6Rb6kUipg4htxz820i8oryHo7g8qe/MoO3pyFM+Av+iZKpGcA47SlmuBB5ho/X37aOUDJ7JVlUUsGssgd0CrH3Hm++HQUlf4PoGGX997w4UeyI9oEpmVmnkgVkO0G0ayMFzaCVLl8BlMOb3uzOoUIBHZIoMYKFFpuCi0k9I6C4DUVEyYo1LgwwIcopOUdwXX5pAeRwFmeLdEtfq3Fomh/B/aTIt9IRdjPvBGZXmxCPFfHMnWINCslTYdd4CrUfB2JzWPGRlEvjzLAjPlcEQCGeTHDgDqFbOwFOJ0ELQQ== root.eng.blr.redhat.com [root@dhcp43-140 ~]# [root@dhcp43-140 ~]# glusterfind pre sess*nash nash /tmp/outt Generated output file /tmp/outt [root@dhcp43-140 ~]# cat /tmp/outt 44_55 b*c +dir/ file1 .trashcan/ [root@dhcp43-140 ~]# glusterfind post sess*nash nash [root@dhcp43-140 ~]# cat /var/lib/glusterd/glusterfind/sess sess/ sess*nash/ [root@dhcp43-140 ~]# cat /var/lib/glusterd/glusterfind/sess\*nash/nash/status 1428506824[root@dhcp43-140 ~]# [root@dhcp43-140 ~]# [root@dhcp43-140 ~]# [root@dhcp43-140 ~]# [root@dhcp43-140 ~]# glusterfind create usage: glusterfind create [-h] [--debug] [--force] session volume glusterfind create: error: too few arguments [root@dhcp43-140 ~]# glusterfind create sess2 --debug nash [root@dhcp43-140 ~]# vi /var/lib/glusterd/ bitd/ glusterfind.secret.pem hooks/ quotad/ geo-replication/ glusterfind.secret.pem.pub nfs/ scrub/ glusterd.info glustershd/ options snaps/ glusterfind/ groups/ peers/ vols/ [root@dhcp43-140 ~]# vi /var/lib/glusterd/glusterfind glusterfind/ glusterfind.secret.pem glusterfind.secret.pem.pub [root@dhcp43-140 ~]# vi /var/lib/glusterd/glusterfind/^C [root@dhcp43-140 ~]# [root@dhcp43-140 ~]# [root@dhcp43-140 ~]# glusterfind list SESSION VOLUME SESSION TIME --------------------------------------------------------------------------- sess ozone 2015-03-31 21:32:49 sess2 nash 2015-04-08 21:09:49 sess*nash nash 2015-04-08 20:57:04 [root@dhcp43-140 ~]# for i in {1..100}; do > glusterfind create sess^$i nash; > done ^CTraceback (most recent call last): File "/usr/bin/glusterfind", line 17, in <module> main() File "/usr/libexec/glusterfs/glusterfind/main.py", line 480, in main globals()["mode_" + args.mode](session_dir, args) File "/usr/libexec/glusterfs/glusterfind/main.py", line 323, in mode_create ssh_setup() File "/usr/libexec/glusterfs/glusterfind/main.py", line 288, in ssh_setup execute(cmd, exit_msg="Failed to distribute ssh keys", logger=logger) File "/usr/libexec/glusterfs/glusterfind/utils.py", line 127, in execute (out, err) = p.communicate() File "/usr/lib64/python2.6/subprocess.py", line 732, in communicate stdout, stderr = self._communicate(input, endtime) File "/usr/lib64/python2.6/subprocess.py", line 1316, in _communicate stdout, stderr = self._communicate_with_poll(input, endtime) File "/usr/lib64/python2.6/subprocess.py", line 1388, in _communicate_with_poll ready = poller.poll(self._remaining_time(endtime)) KeyboardInterrupt ^CTraceback (most recent call last): File "/usr/bin/glusterfind", line 17, in <module> main() File "/usr/libexec/glusterfs/glusterfind/main.py", line 480, in main globals()["mode_" + args.mode](session_dir, args) File "/usr/libexec/glusterfs/glusterfind/main.py", line 323, in mode_create ssh_setup() File "/usr/libexec/glusterfs/glusterfind/main.py", line 301, in ssh_setup logger=logger) File "/usr/libexec/glusterfs/glusterfind/utils.py", line 127, in execute (out, err) = p.communicate() File "/usr/lib64/python2.6/subprocess.py", line 732, in communicate stdout, stderr = self._communicate(input, endtime) File "/usr/lib64/python2.6/subprocess.py", line 1316, in _communicate stdout, stderr = self._communicate_with_poll(input, endtime) File "/usr/lib64/python2.6/subprocess.py", line 1388, in _communicate_with_poll ready = poller.poll(self._remaining_time(endtime)) KeyboardInterrupt ^CTraceback (most recent call last): File "/usr/bin/glusterfind", line 17, in <module> main() File "/usr/libexec/glusterfs/glusterfind/main.py", line 480, in main globals()["mode_" + args.mode](session_dir, args) File "/usr/libexec/glusterfs/glusterfind/main.py", line 323, in mode_create ssh_setup() File "/usr/libexec/glusterfs/glusterfind/main.py", line 301, in ssh_setup logger=logger) File "/usr/libexec/glusterfs/glusterfind/utils.py", line 127, in execute (out, err) = p.communicate() File "/usr/lib64/python2.6/subprocess.py", line 732, in communicate stdout, stderr = self._communicate(input, endtime) File "/usr/lib64/python2.6/subprocess.py", line 1316, in _communicate stdout, stderr = self._communicate_with_poll(input, endtime) File "/usr/lib64/python2.6/subprocess.py", line 1388, in _communicate_with_poll ready = poller.poll(self._remaining_time(endtime)) KeyboardInterrupt ^Z [1]+ Stopped glusterfind create sess^$i nash [root@dhcp43-140 ~]# glusterfind list SESSION VOLUME SESSION TIME --------------------------------------------------------------------------- Traceback (most recent call last): File "/usr/bin/glusterfind", line 17, in <module> main() File "/usr/libexec/glusterfs/glusterfind/main.py", line 480, in main globals()["mode_" + args.mode](session_dir, args) File "/usr/libexec/glusterfs/glusterfind/main.py", line 454, in mode_list human_time(last_processed).ljust(25))) File "/usr/libexec/glusterfs/glusterfind/utils.py", line 54, in human_time return datetime.fromtimestamp(float(ts)).strftime("%Y-%m-%d %H:%M:%S") TypeError: float() argument must be a string or a number [root@dhcp43-140 ~]# vi /var/log/glusterfs/glusterfind/ cli.log sess^1/ sess^11/ sess^13/ sess^15/ sess^17/ sess2/ sess^3/ sess^5/ sess^7/ sess^9/ session2/ sess/ sess^10/ sess^12/ sess^14/ sess^16/ sess^18/ sess^2/ sess^4/ sess^6/ sess^8/ session1/ sess*nash/ [root@dhcp43-140 ~]# vi /var/log/glusterfs/glusterfind/cli.log [root@dhcp43-140 ~]# vi /var/log/glusterfs/glusterfind/sess^18/nash/cli.log [root@dhcp43-140 ~]# vi /var/log/glusterfs/glusterfind/sess^12/nash/cli.log [root@dhcp43-140 ~]# vi /var/log/glusterfs/glusterfind/sess^14/nash/cli.log [root@dhcp43-140 ~]# vi /var/log/glusterfs/glusterfind/sess^17/nash/cli.log [root@dhcp43-140 ~]# vi /var/log/glusterfs/glusterfind/sess^18/nash/cli.log [root@dhcp43-140 ~]# gluster v i nash Volume Name: nash Type: Distribute Volume ID: d5c4752d-a37f-44fb-8e22-11f5cc427bea Status: Started Number of Bricks: 4 Transport-type: tcp Bricks: Brick1: 10.70.43.140:/rhs/thinbrick1/nash/dd Brick2: 10.70.43.140:/rhs/thinbrick2/nash/dd Brick3: 10.70.42.75:/rhs/thinbrick1/nash/dd Brick4: 10.70.42.75:/rhs/thinbrick2/nash/dd Options Reconfigured: changelog.changelog: on storage.build-pgfid: on [root@dhcp43-140 ~]# [root@dhcp43-140 ~]# rpm -qa | grep gluster glusterfs-fuse-3.7dev-0.910.git17827de.el6.x86_64 glusterfs-devel-3.7dev-0.910.git17827de.el6.x86_64 glusterfs-rdma-3.7dev-0.910.git17827de.el6.x86_64 glusterfs-libs-3.7dev-0.910.git17827de.el6.x86_64 glusterfs-api-3.7dev-0.910.git17827de.el6.x86_64 glusterfs-3.7dev-0.910.git17827de.el6.x86_64 glusterfs-extra-xlators-3.7dev-0.910.git17827de.el6.x86_64 glusterfs-api-devel-3.7dev-0.910.git17827de.el6.x86_64 glusterfs-resource-agents-3.7dev-0.910.git17827de.el6.noarch glusterfs-debuginfo-3.7dev-0.910.git17827de.el6.x86_64 glusterfs-cli-3.7dev-0.910.git17827de.el6.x86_64 glusterfs-server-3.7dev-0.910.git17827de.el6.x86_64 glusterfs-geo-replication-3.7dev-0.910.git17827de.el6.x86_64 [root@dhcp43-140 ~]# --- Additional comment from Aravinda VK on 2015-04-08 06:43:05 EDT --- I see KeyboardInterrupt during session creation, Looks like create in for loop is interpreted. Due to this partial session is created. That is why `glusterfind list` is failing. --- Additional comment from Sweta Anandpara on 2015-04-08 07:38:41 EDT --- I reran the command, and all the 100 sessions were created. But why is it that 'glusterfind list' lists only 3 (random) sessions and not all 100 of them? [root@dhcp43-140 ~]# glusterfind list SESSION VOLUME SESSION TIME --------------------------------------------------------------------------- sess^88 nash 2015-04-08 22:40:43 sess^24 nash 2015-04-08 22:38:26 sess^32 nash 2015-04-08 22:38:43 Traceback (most recent call last): File "/usr/bin/glusterfind", line 17, in <module> main() File "/usr/libexec/glusterfs/glusterfind/main.py", line 480, in main globals()["mode_" + args.mode](session_dir, args) File "/usr/libexec/glusterfs/glusterfind/main.py", line 454, in mode_list human_time(last_processed).ljust(25))) File "/usr/libexec/glusterfs/glusterfind/utils.py", line 54, in human_time return datetime.fromtimestamp(float(ts)).strftime("%Y-%m-%d %H:%M:%S") TypeError: float() argument must be a string or a number [root@dhcp43-140 ~]# How do I get the output back to normal after a failed 'glusterfind list'? --- Additional comment from Aravinda VK on 2015-04-08 07:41:44 EDT --- (In reply to Sweta Anandpara from comment #2) > I reran the command, and all the 100 sessions were created. But why is it > that 'glusterfind list' lists only 3 (random) sessions and not all 100 of > them? > > [root@dhcp43-140 ~]# glusterfind list > SESSION VOLUME SESSION TIME > --------------------------------------------------------------------------- > sess^88 nash 2015-04-08 22:40:43 > sess^24 nash 2015-04-08 22:38:26 > sess^32 nash 2015-04-08 22:38:43 > Traceback (most recent call last): > File "/usr/bin/glusterfind", line 17, in <module> > main() > File "/usr/libexec/glusterfs/glusterfind/main.py", line 480, in main > globals()["mode_" + args.mode](session_dir, args) > File "/usr/libexec/glusterfs/glusterfind/main.py", line 454, in mode_list > human_time(last_processed).ljust(25))) > File "/usr/libexec/glusterfs/glusterfind/utils.py", line 54, in human_time > return datetime.fromtimestamp(float(ts)).strftime("%Y-%m-%d %H:%M:%S") > TypeError: float() argument must be a string or a number > [root@dhcp43-140 ~]# > > How do I get the output back to normal after a failed 'glusterfind list'? For some session session file doesn't exist or invalid value. tree /var/lib/glusterd/glusterfind to see any session directory without status file. I will fix the list command not to fail for invalid value. --- Additional comment from Anand Avati on 2015-04-08 08:15:11 EDT --- REVIEW: http://review.gluster.org/10164 (tools/glusterfind: Do not fail to list if Session is invalid) posted (#1) for review on master by Aravinda VK (avishwan) --- Additional comment from Aravinda VK on 2015-04-08 08:17:36 EDT --- Sent patch to prevent crash in `glusterfind list` http://review.gluster.org/10164 --- Additional comment from Nagaprasad Sathyanarayana on 2015-04-09 02:36:53 EDT --- Not clear about this specific use case. Was a session create followed by close before opening the new session? If not, is glusterfind expected to have keep multiple open sessions? --- Additional comment from Aravinda VK on 2015-04-09 02:39:39 EDT --- (In reply to Nagaprasad Sathyanarayana from comment #6) > Not clear about this specific use case. Was a session create followed by > close before opening the new session? If not, is glusterfind expected to > have keep multiple open sessions? Sessions are per consumers. For example, Backup utility and Antivirus scanner. Both consumer can be scheduled to run in different timings, Using same session will not make sence. --- Additional comment from Anand Avati on 2015-04-09 06:32:54 EDT --- REVIEW: http://review.gluster.org/10164 (tools/glusterfind: Do not fail to list if Session is invalid) posted (#2) for review on master by Aravinda VK (avishwan) --- Additional comment from Anand Avati on 2015-04-13 08:14:43 EDT --- REVIEW: http://review.gluster.org/10164 (tools/glusterfind: Do not fail to list if Session is invalid) posted (#3) for review on master by Aravinda VK (avishwan) --- Additional comment from Anand Avati on 2015-04-23 06:32:55 EDT --- REVIEW: http://review.gluster.org/10164 (tools/glusterfind: Do not fail to list if Session is invalid) posted (#4) for review on master by Aravinda VK (avishwan) --- Additional comment from Anand Avati on 2015-04-23 09:56:09 EDT --- COMMIT: http://review.gluster.org/10164 committed in master by Vijay Bellur (vbellur) ------ commit 0fc310accb6bf86e42e09e0876b51b8860075fda Author: Aravinda VK <avishwan> Date: Wed Apr 8 17:39:56 2015 +0530 tools/glusterfind: Do not fail to list if Session is invalid If session is invalid, Show the session time as "Session Corrupted" BUG: 1209843 Change-Id: I10f7fb79f6a3a2cabab3f21bce86f2a2f424e8b6 Signed-off-by: Aravinda VK <avishwan> Reviewed-on: http://review.gluster.org/10164 Reviewed-by: Kotresh HR <khiremat> Tested-by: NetBSD Build System
REVIEW: http://review.gluster.org/10497 (tools/glusterfind: Do not fail to list if Session is invalid) posted (#1) for review on release-3.7 by Aravinda VK (avishwan)
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report. glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939 [2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user