Bug 1217927 - [Backup]: Crash observed when multiple sessions were created for the same volume
Summary: [Backup]: Crash observed when multiple sessions were created for the same volume
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: glusterfind
Version: 3.7.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Aravinda VK
QA Contact: bugs@gluster.org
URL:
Whiteboard:
Depends On: 1209843 1224071
Blocks: qe_tracker_everglades glusterfs-3.7.0
TreeView+ depends on / blocked
 
Reported: 2015-05-03 04:55 UTC by Aravinda VK
Modified: 2015-05-22 07:32 UTC (History)
6 users (show)

Fixed In Version: glusterfs-3.7.0
Doc Type: Bug Fix
Doc Text:
Clone Of: 1209843
Environment:
Last Closed: 2015-05-14 17:29:33 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Aravinda VK 2015-05-03 04:55:52 UTC
+++ This bug was initially created as a clone of Bug #1209843 +++

Description of problem:

Had a distribute volume 'nash' across 4 bricks in 2 nodes (2bricks in each node). Did a glusterfind create session and tried to create multiple sessions for the same volume. A crash was observed while creating the 20th session. The location /var/lib/glusterd/glusterfind/ shows the mention of all 20 sessions, but the log of the 20th session does not seem to be complete. 

Version-Release number of selected component (if applicable):
Glusterfs 3.7 nightly glusterfs-3.7dev-0.910.git17827de.el6.x86_64


Steps to Reproduce:
1. Create a gluster volume
2. Try to create >20 sessions for the same volume using 'glusterfind create' command

Actual results:

Step2 results in a crash. 

Expected results:

Step2 should either be successful in creating multiple sessions, or should fail gracefully stating the maximum allowed sessions for a single volume

Additional info:


[root@dhcp43-140 ~]# glusterfind create --force sess*nash nash 
[root@dhcp43-140 ~]# 
[root@dhcp43-140 ~]# vi /var/lib/glusterd/glusterfind
glusterfind/                glusterfind.secret.pem      glusterfind.secret.pem.pub  
[root@dhcp43-140 ~]# cat /var/lib/glusterd/glusterfind.secret.pem
-----BEGIN RSA PRIVATE KEY-----
MIIEoQIBAAKCAQEA2918osxpjW/GwdY2hdAbLBLqofdggFjAWWPsbaTfMDgqif6R
b6kUipg4htxz820i8oryHo7g8qe/MoO3pyFM+Av+iZKpGcA47SlmuBB5ho/X37aO
UDJ7JVlUUsGssgd0CrH3Hm++HQUlf4PoGGX997w4UeyI9oEpmVmnkgVkO0G0ayMF
zaCVLl8BlMOb3uzOoUIBHZIoMYKFFpuCi0k9I6C4DUVEyYo1LgwwIcopOUdwXX5p
AeRwFmeLdEtfq3Fomh/B/aTIt9IRdjPvBGZXmxCPFfHMnWINCslTYdd4CrUfB2Jz
WPGRlEvjzLAjPlcEQCGeTHDgDqFbOwFOJ0ELQQIBIwKCAQAlsPDLc3/PGnnYFhf5
rqWShuZk6JQzQmod899jQNXNwHxSK6PnQY5+KLk7sMNrjw1OJnKmJx8/iniMURgr
R4mJm6f6U6f1yS5UisEm3kC3+2bV5Mfwfa618g50ljrcsNIQdkee0VPKdekrzXg+
smYNNjWKYxAqQgchmlc9moY2CvIosV+9aJ0fjCnW/hz5Q3ltgHZa8YEHoZgHxnPQ
vCK+RAa8RyzpzeF1onZqQc/ZcjDv5VChZZ+KGlyF+Du5P2QErQRzxVj3au72hnP1
Bqo1Hwf2DUNTZ5L8JjK1Ft1NgM0htKhfchn4EoqjRQCorzEkb72/xg5b9DPK4yrE
Y7ozAoGBAO1IQ564nNMT4mviVcTNeaJqp1f7YyViI9mE46eqeg3oKyYkYiMFvHrc
i90atBCKnHyZxAlcX6mXaeCPaNDidyQ7vIZf9QAzip9SxZJn0rrQ9JioGY3AAN9k
2N9SOXKlOtoL8KEIAIVPc6vV905iSxdoPwgha1AB3yD65Tejfx6DAoGBAO01f2NE
IYLidaGho4NjaZKNLUwPrjzNhV9kXOH0hbnlk2qemUZv8hcjTKBF64bk5iuhq16i
s/kuuTnXZTJlsyeYcfb6W0GLrQAwCLgx7LvpkZ1SiuDML6mRvExEmKUhYEMMKVF5
gh9zFrno+6ZZY04uAW33AjxJsbbLREe00YPrAoGAFFamtdVPRUrY5K0ACY30e6K9
4vhKU6lTh6xOBw6cv10LA0TyeAfOU6yBBFK++hqCcRR+hHWhzLU09f2p5gTIYjEB
h9xW1CGsy9PlDIyOZ8jEgh0JgS227pOdj3wTjXvgeRb3XkKDs6e5dSD38MaYuN0M
t43V/4shwP+P/XRp+00CgYEA36d4G8PlFQFnmGUzxQX101Ho4VCc93/yx6fFShJv
dMKDrat6kt6TzKxA7uLWwQO0ckf5WTpR6vGKEfb5A6G3i7uepwlOuiSbzPoW2Y4o
WWcxhbQ5zKNC4bVLI1aejRDXGqURA60USYnE+G30lYd640/yuCNogfxXGhAbzpSL
B1kCgYBxWaHoaHQ/peupGu5+dd+wN00qj5VNE18sQWEkL1dIAwSD5tL0zrBY8Btw
sN9/GlDvHXtiT92Yo+j90IB0tScm7p+itPKDTnkUT52A9jzmf+e1utCTykTwI0cg
XYS7DzEsU3Pzz59UuImONPaW/CMd8zZZJbCD3zT6Po4PJ+jlJA==
-----END RSA PRIVATE KEY-----
[root@dhcp43-140 ~]# cat /var/lib/glusterd/glusterfind.secret.pem.pub 
ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEA2918osxpjW/GwdY2hdAbLBLqofdggFjAWWPsbaTfMDgqif6Rb6kUipg4htxz820i8oryHo7g8qe/MoO3pyFM+Av+iZKpGcA47SlmuBB5ho/X37aOUDJ7JVlUUsGssgd0CrH3Hm++HQUlf4PoGGX997w4UeyI9oEpmVmnkgVkO0G0ayMFzaCVLl8BlMOb3uzOoUIBHZIoMYKFFpuCi0k9I6C4DUVEyYo1LgwwIcopOUdwXX5pAeRwFmeLdEtfq3Fomh/B/aTIt9IRdjPvBGZXmxCPFfHMnWINCslTYdd4CrUfB2JzWPGRlEvjzLAjPlcEQCGeTHDgDqFbOwFOJ0ELQQ== root.eng.blr.redhat.com
[root@dhcp43-140 ~]# 
[root@dhcp43-140 ~]# glusterfind pre sess*nash nash /tmp/outt
Generated output file /tmp/outt
[root@dhcp43-140 ~]# cat /tmp/outt
44_55
b*c
+dir/
file1
.trashcan/
[root@dhcp43-140 ~]# glusterfind post sess*nash nash
[root@dhcp43-140 ~]# cat /var/lib/glusterd/glusterfind/sess
sess/      sess*nash/ 
[root@dhcp43-140 ~]# cat /var/lib/glusterd/glusterfind/sess\*nash/nash/status 
1428506824[root@dhcp43-140 ~]# 
[root@dhcp43-140 ~]# 
[root@dhcp43-140 ~]# 
[root@dhcp43-140 ~]# 
[root@dhcp43-140 ~]# glusterfind create 
usage: glusterfind create [-h] [--debug] [--force] session volume
glusterfind create: error: too few arguments
[root@dhcp43-140 ~]# glusterfind create sess2 --debug nash 
[root@dhcp43-140 ~]# vi /var/lib/glusterd/
bitd/                       glusterfind.secret.pem      hooks/                      quotad/
geo-replication/            glusterfind.secret.pem.pub  nfs/                        scrub/
glusterd.info               glustershd/                 options                     snaps/
glusterfind/                groups/                     peers/                      vols/
[root@dhcp43-140 ~]# vi /var/lib/glusterd/glusterfind
glusterfind/                glusterfind.secret.pem      glusterfind.secret.pem.pub  
[root@dhcp43-140 ~]# vi /var/lib/glusterd/glusterfind/^C
[root@dhcp43-140 ~]# 
[root@dhcp43-140 ~]# 
[root@dhcp43-140 ~]# glusterfind list
SESSION                   VOLUME                    SESSION TIME             
---------------------------------------------------------------------------
sess                      ozone                     2015-03-31 21:32:49      
sess2                     nash                      2015-04-08 21:09:49      
sess*nash                 nash                      2015-04-08 20:57:04      
[root@dhcp43-140 ~]# for i in {1..100}; do
> glusterfind create sess^$i nash;
> done
^CTraceback (most recent call last):
  File "/usr/bin/glusterfind", line 17, in <module>
    main()
  File "/usr/libexec/glusterfs/glusterfind/main.py", line 480, in main
    globals()["mode_" + args.mode](session_dir, args)
  File "/usr/libexec/glusterfs/glusterfind/main.py", line 323, in mode_create
    ssh_setup()
  File "/usr/libexec/glusterfs/glusterfind/main.py", line 288, in ssh_setup
    execute(cmd, exit_msg="Failed to distribute ssh keys", logger=logger)
  File "/usr/libexec/glusterfs/glusterfind/utils.py", line 127, in execute
    (out, err) = p.communicate()
  File "/usr/lib64/python2.6/subprocess.py", line 732, in communicate
    stdout, stderr = self._communicate(input, endtime)
  File "/usr/lib64/python2.6/subprocess.py", line 1316, in _communicate
    stdout, stderr = self._communicate_with_poll(input, endtime)
  File "/usr/lib64/python2.6/subprocess.py", line 1388, in _communicate_with_poll
    ready = poller.poll(self._remaining_time(endtime))
KeyboardInterrupt

^CTraceback (most recent call last):
  File "/usr/bin/glusterfind", line 17, in <module>
    main()
  File "/usr/libexec/glusterfs/glusterfind/main.py", line 480, in main
    globals()["mode_" + args.mode](session_dir, args)
  File "/usr/libexec/glusterfs/glusterfind/main.py", line 323, in mode_create
    ssh_setup()
  File "/usr/libexec/glusterfs/glusterfind/main.py", line 301, in ssh_setup
    logger=logger)
  File "/usr/libexec/glusterfs/glusterfind/utils.py", line 127, in execute
    (out, err) = p.communicate()
  File "/usr/lib64/python2.6/subprocess.py", line 732, in communicate
    stdout, stderr = self._communicate(input, endtime)
  File "/usr/lib64/python2.6/subprocess.py", line 1316, in _communicate
    stdout, stderr = self._communicate_with_poll(input, endtime)
  File "/usr/lib64/python2.6/subprocess.py", line 1388, in _communicate_with_poll
    ready = poller.poll(self._remaining_time(endtime))
KeyboardInterrupt
^CTraceback (most recent call last):
  File "/usr/bin/glusterfind", line 17, in <module>
    main()
  File "/usr/libexec/glusterfs/glusterfind/main.py", line 480, in main
    globals()["mode_" + args.mode](session_dir, args)
  File "/usr/libexec/glusterfs/glusterfind/main.py", line 323, in mode_create
    ssh_setup()
  File "/usr/libexec/glusterfs/glusterfind/main.py", line 301, in ssh_setup
    logger=logger)
  File "/usr/libexec/glusterfs/glusterfind/utils.py", line 127, in execute
    (out, err) = p.communicate()
  File "/usr/lib64/python2.6/subprocess.py", line 732, in communicate
    stdout, stderr = self._communicate(input, endtime)
  File "/usr/lib64/python2.6/subprocess.py", line 1316, in _communicate
    stdout, stderr = self._communicate_with_poll(input, endtime)
  File "/usr/lib64/python2.6/subprocess.py", line 1388, in _communicate_with_poll
    ready = poller.poll(self._remaining_time(endtime))
KeyboardInterrupt
^Z
[1]+  Stopped                 glusterfind create sess^$i nash
[root@dhcp43-140 ~]# glusterfind list
SESSION                   VOLUME                    SESSION TIME             
---------------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/bin/glusterfind", line 17, in <module>
    main()
  File "/usr/libexec/glusterfs/glusterfind/main.py", line 480, in main
    globals()["mode_" + args.mode](session_dir, args)
  File "/usr/libexec/glusterfs/glusterfind/main.py", line 454, in mode_list
    human_time(last_processed).ljust(25)))
  File "/usr/libexec/glusterfs/glusterfind/utils.py", line 54, in human_time
    return datetime.fromtimestamp(float(ts)).strftime("%Y-%m-%d %H:%M:%S")
TypeError: float() argument must be a string or a number
[root@dhcp43-140 ~]# vi /var/log/glusterfs/glusterfind/
cli.log    sess^1/    sess^11/   sess^13/   sess^15/   sess^17/   sess2/     sess^3/    sess^5/    sess^7/    sess^9/    session2/
sess/      sess^10/   sess^12/   sess^14/   sess^16/   sess^18/   sess^2/    sess^4/    sess^6/    sess^8/    session1/  sess*nash/
[root@dhcp43-140 ~]# vi /var/log/glusterfs/glusterfind/cli.log 
[root@dhcp43-140 ~]# vi /var/log/glusterfs/glusterfind/sess^18/nash/cli.log 
[root@dhcp43-140 ~]# vi /var/log/glusterfs/glusterfind/sess^12/nash/cli.log 
[root@dhcp43-140 ~]# vi /var/log/glusterfs/glusterfind/sess^14/nash/cli.log 
[root@dhcp43-140 ~]# vi /var/log/glusterfs/glusterfind/sess^17/nash/cli.log 
[root@dhcp43-140 ~]# vi /var/log/glusterfs/glusterfind/sess^18/nash/cli.log 
[root@dhcp43-140 ~]# gluster v i nash
 
Volume Name: nash
Type: Distribute
Volume ID: d5c4752d-a37f-44fb-8e22-11f5cc427bea
Status: Started
Number of Bricks: 4
Transport-type: tcp
Bricks:
Brick1: 10.70.43.140:/rhs/thinbrick1/nash/dd
Brick2: 10.70.43.140:/rhs/thinbrick2/nash/dd
Brick3: 10.70.42.75:/rhs/thinbrick1/nash/dd
Brick4: 10.70.42.75:/rhs/thinbrick2/nash/dd
Options Reconfigured:
changelog.changelog: on
storage.build-pgfid: on
[root@dhcp43-140 ~]# 
[root@dhcp43-140 ~]# rpm -qa | grep gluster
glusterfs-fuse-3.7dev-0.910.git17827de.el6.x86_64
glusterfs-devel-3.7dev-0.910.git17827de.el6.x86_64
glusterfs-rdma-3.7dev-0.910.git17827de.el6.x86_64
glusterfs-libs-3.7dev-0.910.git17827de.el6.x86_64
glusterfs-api-3.7dev-0.910.git17827de.el6.x86_64
glusterfs-3.7dev-0.910.git17827de.el6.x86_64
glusterfs-extra-xlators-3.7dev-0.910.git17827de.el6.x86_64
glusterfs-api-devel-3.7dev-0.910.git17827de.el6.x86_64
glusterfs-resource-agents-3.7dev-0.910.git17827de.el6.noarch
glusterfs-debuginfo-3.7dev-0.910.git17827de.el6.x86_64
glusterfs-cli-3.7dev-0.910.git17827de.el6.x86_64
glusterfs-server-3.7dev-0.910.git17827de.el6.x86_64
glusterfs-geo-replication-3.7dev-0.910.git17827de.el6.x86_64
[root@dhcp43-140 ~]#

--- Additional comment from Aravinda VK on 2015-04-08 06:43:05 EDT ---

I see KeyboardInterrupt during session creation, Looks like create in for loop is interpreted. Due to this partial session is created. That is why `glusterfind list` is failing.

--- Additional comment from Sweta Anandpara on 2015-04-08 07:38:41 EDT ---

I reran the command, and all the 100 sessions were created. But why is it that 'glusterfind list' lists only 3 (random) sessions and not all 100 of them?

[root@dhcp43-140 ~]# glusterfind list
SESSION                   VOLUME                    SESSION TIME             
---------------------------------------------------------------------------
sess^88                   nash                      2015-04-08 22:40:43      
sess^24                   nash                      2015-04-08 22:38:26      
sess^32                   nash                      2015-04-08 22:38:43      
Traceback (most recent call last):
  File "/usr/bin/glusterfind", line 17, in <module>
    main()
  File "/usr/libexec/glusterfs/glusterfind/main.py", line 480, in main
    globals()["mode_" + args.mode](session_dir, args)
  File "/usr/libexec/glusterfs/glusterfind/main.py", line 454, in mode_list
    human_time(last_processed).ljust(25)))
  File "/usr/libexec/glusterfs/glusterfind/utils.py", line 54, in human_time
    return datetime.fromtimestamp(float(ts)).strftime("%Y-%m-%d %H:%M:%S")
TypeError: float() argument must be a string or a number
[root@dhcp43-140 ~]# 

How do I get the output back to normal after a failed 'glusterfind list'?

--- Additional comment from Aravinda VK on 2015-04-08 07:41:44 EDT ---

(In reply to Sweta Anandpara from comment #2)
> I reran the command, and all the 100 sessions were created. But why is it
> that 'glusterfind list' lists only 3 (random) sessions and not all 100 of
> them?
> 
> [root@dhcp43-140 ~]# glusterfind list
> SESSION                   VOLUME                    SESSION TIME             
> ---------------------------------------------------------------------------
> sess^88                   nash                      2015-04-08 22:40:43      
> sess^24                   nash                      2015-04-08 22:38:26      
> sess^32                   nash                      2015-04-08 22:38:43      
> Traceback (most recent call last):
>   File "/usr/bin/glusterfind", line 17, in <module>
>     main()
>   File "/usr/libexec/glusterfs/glusterfind/main.py", line 480, in main
>     globals()["mode_" + args.mode](session_dir, args)
>   File "/usr/libexec/glusterfs/glusterfind/main.py", line 454, in mode_list
>     human_time(last_processed).ljust(25)))
>   File "/usr/libexec/glusterfs/glusterfind/utils.py", line 54, in human_time
>     return datetime.fromtimestamp(float(ts)).strftime("%Y-%m-%d %H:%M:%S")
> TypeError: float() argument must be a string or a number
> [root@dhcp43-140 ~]# 
> 
> How do I get the output back to normal after a failed 'glusterfind list'?

For some session session file doesn't exist or invalid value.

tree /var/lib/glusterd/glusterfind

to see any session directory without status file. I will fix the list command not to fail for invalid value.

--- Additional comment from Anand Avati on 2015-04-08 08:15:11 EDT ---

REVIEW: http://review.gluster.org/10164 (tools/glusterfind: Do not fail to list if Session is invalid) posted (#1) for review on master by Aravinda VK (avishwan)

--- Additional comment from Aravinda VK on 2015-04-08 08:17:36 EDT ---

Sent patch to prevent crash in `glusterfind list` http://review.gluster.org/10164

--- Additional comment from Nagaprasad Sathyanarayana on 2015-04-09 02:36:53 EDT ---

Not clear about this specific use case.  Was a session create followed by close before opening the new session? If not, is glusterfind expected to have keep multiple open sessions?

--- Additional comment from Aravinda VK on 2015-04-09 02:39:39 EDT ---

(In reply to Nagaprasad Sathyanarayana from comment #6)
> Not clear about this specific use case.  Was a session create followed by
> close before opening the new session? If not, is glusterfind expected to
> have keep multiple open sessions?

Sessions are per consumers. For example, Backup utility and Antivirus scanner. Both consumer can be scheduled to run in different timings, Using same session will not make sence.

--- Additional comment from Anand Avati on 2015-04-09 06:32:54 EDT ---

REVIEW: http://review.gluster.org/10164 (tools/glusterfind: Do not fail to list if Session is invalid) posted (#2) for review on master by Aravinda VK (avishwan)

--- Additional comment from Anand Avati on 2015-04-13 08:14:43 EDT ---

REVIEW: http://review.gluster.org/10164 (tools/glusterfind: Do not fail to list if Session is invalid) posted (#3) for review on master by Aravinda VK (avishwan)

--- Additional comment from Anand Avati on 2015-04-23 06:32:55 EDT ---

REVIEW: http://review.gluster.org/10164 (tools/glusterfind: Do not fail to list if Session is invalid) posted (#4) for review on master by Aravinda VK (avishwan)

--- Additional comment from Anand Avati on 2015-04-23 09:56:09 EDT ---

COMMIT: http://review.gluster.org/10164 committed in master by Vijay Bellur (vbellur) 
------
commit 0fc310accb6bf86e42e09e0876b51b8860075fda
Author: Aravinda VK <avishwan>
Date:   Wed Apr 8 17:39:56 2015 +0530

    tools/glusterfind: Do not fail to list if Session is invalid
    
    If session is invalid, Show the session time as "Session Corrupted"
    
    BUG: 1209843
    Change-Id: I10f7fb79f6a3a2cabab3f21bce86f2a2f424e8b6
    Signed-off-by: Aravinda VK <avishwan>
    Reviewed-on: http://review.gluster.org/10164
    Reviewed-by: Kotresh HR <khiremat>
    Tested-by: NetBSD Build System

Comment 1 Anand Avati 2015-05-03 05:03:01 UTC
REVIEW: http://review.gluster.org/10497 (tools/glusterfind: Do not fail to list if Session is invalid) posted (#1) for review on release-3.7 by Aravinda VK (avishwan)

Comment 2 Niels de Vos 2015-05-14 17:29:33 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report.

glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Comment 3 Niels de Vos 2015-05-14 17:35:58 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report.

glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Comment 4 Niels de Vos 2015-05-14 17:38:19 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report.

glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Comment 5 Niels de Vos 2015-05-14 17:47:00 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report.

glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user


Note You need to log in before you can comment on or make changes to this bug.