Bug 1294594 - [Tier]: Killing glusterfs tier process doesn't reflect as failed/faulty in tier status
Summary: [Tier]: Killing glusterfs tier process doesn't reflect as failed/faulty in ti...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: tier
Version: rhgs-3.1
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: RHGS 3.1.2
Assignee: Mohammed Rafi KC
QA Contact: Rahul Hinduja
URL:
Whiteboard:
Depends On:
Blocks: 1276245 1294600 1295365
TreeView+ depends on / blocked
 
Reported: 2015-12-29 06:33 UTC by Rahul Hinduja
Modified: 2016-09-17 15:41 UTC (History)
6 users (show)

Fixed In Version: glusterfs-3.7.5-14
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1294600 (view as bug list)
Environment:
Last Closed: 2016-03-01 06:06:14 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2016:0193 0 normal SHIPPED_LIVE Red Hat Gluster Storage 3.1 update 2 2016-03-01 10:20:36 UTC

Description Rahul Hinduja 2015-12-29 06:33:35 UTC
Description of problem:
=======================

If tierd glusterfs process is killed, the tier status should show as Faulty/Failed but with latest build it shows inprogress. 

Restarting tier with tier start force restarts the run time correctly. 

glusterfs tier process in the system:
=====================================

[root@dhcp37-165 ~]# ps -eaf | grep tier 
root     11709     1  5 Dec27 ?        01:00:08 /usr/sbin/glusterfsd -s 10.70.37.165 --volfile-id tiervolume.10.70.37.165.rhs-brick2-ct-1 -p /var/lib/glusterd/vols/tiervolume/run/10.70.37.165-rhs-brick2-ct-1.pid -S /var/run/gluster/788c178e21431d325f98c68c2cc5cb32.socket --brick-name /rhs/brick2/ct-1 -l /var/log/glusterfs/bricks/rhs-brick2-ct-1.log --xlator-option *-posix.glusterd-uuid=fda547f4-eaf4-44ad-b281-978c1306f75a --brick-port 49152 --xlator-option tiervolume-server.listen-port=49152
root     11728     1  5 Dec27 ?        01:07:53 /usr/sbin/glusterfsd -s 10.70.37.165 --volfile-id tiervolume.10.70.37.165.rhs-brick2-ct-7 -p /var/lib/glusterd/vols/tiervolume/run/10.70.37.165-rhs-brick2-ct-7.pid -S /var/run/gluster/3875a1138ebc5a4987b5c2d49ef5cade.socket --brick-name /rhs/brick2/ct-7 -l /var/log/glusterfs/bricks/rhs-brick2-ct-7.log --xlator-option *-posix.glusterd-uuid=fda547f4-eaf4-44ad-b281-978c1306f75a --brick-port 49153 --xlator-option tiervolume-server.listen-port=49153
root     11882     1  1 Dec27 ?        00:13:10 /usr/sbin/glusterfsd -s 10.70.37.165 --volfile-id tiervolume.10.70.37.165.rhs-brick3-tiervolume_hot -p /var/lib/glusterd/vols/tiervolume/run/10.70.37.165-rhs-brick3-tiervolume_hot.pid -S /var/run/gluster/4f46770e383fab1ee7789ff7a656a342.socket --brick-name /rhs/brick3/tiervolume_hot -l /var/log/glusterfs/bricks/rhs-brick3-tiervolume_hot.log --xlator-option *-posix.glusterd-uuid=fda547f4-eaf4-44ad-b281-978c1306f75a --brick-port 49154 --xlator-option tiervolume-server.listen-port=49154
root     13339     1  1 17:50 ?        00:00:39 /usr/sbin/glusterfs -s localhost --volfile-id rebalance/tiervolume --xlator-option *dht.use-readdirp=yes --xlator-option *dht.lookup-unhashed=yes --xlator-option *dht.assert-no-child-down=yes --xlator-option *replicate*.data-self-heal=off --xlator-option *replicate*.metadata-self-heal=off --xlator-option *replicate*.entry-self-heal=off --xlator-option *dht.readdir-optimize=on --xlator-option *tier-dht.xattr-name=trusted.tier.tier-dht --xlator-option *dht.rebalance-cmd=6 --xlator-option *dht.node-uuid=fda547f4-eaf4-44ad-b281-978c1306f75a --xlator-option *dht.commit-hash=3020665580 --socket-file /var/run/gluster/gluster-tier-cf53869f-8994-40b5-a14b-65c107792595.sock --pid-file /var/lib/glusterd/vols/tiervolume/tier/fda547f4-eaf4-44ad-b281-978c1306f75a.pid -l /var/log/glusterfs/tiervolume-tier.log
root     13447 13047  0 18:25 pts/0    00:00:00 grep --color=auto tier
[root@dhcp37-165 ~]#

Rebalance tier status shown as "Inprogress" with run time 2108:
===============================================================

[root@dhcp37-165 ~]# gluster volume rebalance tiervolume status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status   run time in secs
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost                0        0Bytes             0             0             0          in progress            2108.00
                            10.70.37.133              886        0Bytes         21554             0             0          in progress           70333.00
                            10.70.37.160                0        0Bytes         41287             0             0          in progress           70334.00
                            10.70.37.158              929        0Bytes         25117             0             0          in progress           70334.00
                            10.70.37.110                0        0Bytes         37237             0             0          in progress           70333.00
                            10.70.37.155              994        0Bytes         20551             0             0          in progress           70334.00
                             10.70.37.99                0        0Bytes         41083             0             0          in progress           70334.00
                             10.70.37.88              797        0Bytes         24030             0             0          in progress           70334.00
                            10.70.37.112                0        0Bytes         42078             0             0          in progress           70333.00
                            10.70.37.199             1061        0Bytes         26635             0             0          in progress           70334.00
                            10.70.37.162                0        0Bytes         38218             0             0          in progress           70334.00
                             10.70.37.87             1036        0Bytes         22035             0             0          in progress           70333.00
volume rebalance: tiervolume: success
[root@dhcp37-165 ~]#

[root@dhcp37-165 ~]# gluster volume rebalance tiervolume tier status
Node                 Promoted files       Demoted files        Status              
---------            ---------            ---------            ---------           
localhost            0                    0                    in progress         
10.70.37.133         0                    886                  in progress         
10.70.37.160         0                    0                    in progress         
10.70.37.158         0                    936                  in progress         
10.70.37.110         0                    0                    in progress         
10.70.37.155         0                    1004                 in progress         
10.70.37.99          0                    0                    in progress         
10.70.37.88          0                    803                  in progress         
10.70.37.112         0                    0                    in progress         
10.70.37.199         0                    1064                 in progress         
10.70.37.162         0                    0                    in progress         
10.70.37.87          0                    1043                 in progress         
Tiering Migration Functionality: tiervolume: success
[root@dhcp37-165 ~]# 

Kill glusterfs process:
=======================

[root@dhcp37-165 ~]# kill -9 13339

Check for tierd glusterfs process:
==================================

[root@dhcp37-165 ~]# ps -eaf | grep tier
root     11709     1  5 Dec27 ?        01:00:45 /usr/sbin/glusterfsd -s 10.70.37.165 --volfile-id tiervolume.10.70.37.165.rhs-brick2-ct-1 -p /var/lib/glusterd/vols/tiervolume/run/10.70.37.165-rhs-brick2-ct-1.pid -S /var/run/gluster/788c178e21431d325f98c68c2cc5cb32.socket --brick-name /rhs/brick2/ct-1 -l /var/log/glusterfs/bricks/rhs-brick2-ct-1.log --xlator-option *-posix.glusterd-uuid=fda547f4-eaf4-44ad-b281-978c1306f75a --brick-port 49152 --xlator-option tiervolume-server.listen-port=49152
root     11728     1  5 Dec27 ?        01:08:22 /usr/sbin/glusterfsd -s 10.70.37.165 --volfile-id tiervolume.10.70.37.165.rhs-brick2-ct-7 -p /var/lib/glusterd/vols/tiervolume/run/10.70.37.165-rhs-brick2-ct-7.pid -S /var/run/gluster/3875a1138ebc5a4987b5c2d49ef5cade.socket --brick-name /rhs/brick2/ct-7 -l /var/log/glusterfs/bricks/rhs-brick2-ct-7.log --xlator-option *-posix.glusterd-uuid=fda547f4-eaf4-44ad-b281-978c1306f75a --brick-port 49153 --xlator-option tiervolume-server.listen-port=49153
root     11882     1  1 Dec27 ?        00:13:10 /usr/sbin/glusterfsd -s 10.70.37.165 --volfile-id tiervolume.10.70.37.165.rhs-brick3-tiervolume_hot -p /var/lib/glusterd/vols/tiervolume/run/10.70.37.165-rhs-brick3-tiervolume_hot.pid -S /var/run/gluster/4f46770e383fab1ee7789ff7a656a342.socket --brick-name /rhs/brick3/tiervolume_hot -l /var/log/glusterfs/bricks/rhs-brick3-tiervolume_hot.log --xlator-option *-posix.glusterd-uuid=fda547f4-eaf4-44ad-b281-978c1306f75a --brick-port 49154 --xlator-option tiervolume-server.listen-port=49154
root     13494 13047  0 18:26 pts/0    00:00:00 grep --color=auto tier
[root@dhcp37-165 ~]#


Rebalance tier status and tier status both shows "Inprogress"
=============================================================

[root@dhcp37-165 ~]# gluster volume rebalance tiervolume status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status   run time in secs
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost                0        0Bytes             0             0             0          in progress            2117.00
                            10.70.37.133              907        0Bytes         21554             0             0          in progress           70370.00
                            10.70.37.160                0        0Bytes         41793             0             0          in progress           70370.00
                            10.70.37.158              955        0Bytes         25117             0             0          in progress           70371.00
                            10.70.37.110                0        0Bytes         37688             0             0          in progress           70370.00
                            10.70.37.155             1022        0Bytes         20551             0             0          in progress           70370.00
                             10.70.37.99                0        0Bytes         41775             0             0          in progress           70370.00
                             10.70.37.88              824        0Bytes         24030             0             0          in progress           70370.00
                            10.70.37.112                0        0Bytes         42078             0             0          in progress           70369.00
                            10.70.37.199             1064        0Bytes         26635             0             0          in progress           70370.00
                            10.70.37.162                0        0Bytes         38218             0             0          in progress           70370.00
                             10.70.37.87             1066        0Bytes         22035             0             0          in progress           70370.00
volume rebalance: tiervolume: success
[root@dhcp37-165 ~]#

Restart tierd using tier start force and check for tierd glusterfs process and rebalance run time:
===============================================================================

[root@dhcp37-165 ~]# gluster volume tier tiervolume start force
Tiering Migration Functionality: tiervolume: success: Attach tier is successful on tiervolume. use tier status to check the status.
ID: 2be4b831-d96a-4077-8919-5aa6a272163e

[root@dhcp37-165 ~]# ps -eaf | grep tier
root     11709     1  5 Dec27 ?        01:01:03 /usr/sbin/glusterfsd -s 10.70.37.165 --volfile-id tiervolume.10.70.37.165.rhs-brick2-ct-1 -p /var/lib/glusterd/vols/tiervolume/run/10.70.37.165-rhs-brick2-ct-1.pid -S /var/run/gluster/788c178e21431d325f98c68c2cc5cb32.socket --brick-name /rhs/brick2/ct-1 -l /var/log/glusterfs/bricks/rhs-brick2-ct-1.log --xlator-option *-posix.glusterd-uuid=fda547f4-eaf4-44ad-b281-978c1306f75a --brick-port 49152 --xlator-option tiervolume-server.listen-port=49152
root     11728     1  5 Dec27 ?        01:09:00 /usr/sbin/glusterfsd -s 10.70.37.165 --volfile-id tiervolume.10.70.37.165.rhs-brick2-ct-7 -p /var/lib/glusterd/vols/tiervolume/run/10.70.37.165-rhs-brick2-ct-7.pid -S /var/run/gluster/3875a1138ebc5a4987b5c2d49ef5cade.socket --brick-name /rhs/brick2/ct-7 -l /var/log/glusterfs/bricks/rhs-brick2-ct-7.log --xlator-option *-posix.glusterd-uuid=fda547f4-eaf4-44ad-b281-978c1306f75a --brick-port 49153 --xlator-option tiervolume-server.listen-port=49153
root     11882     1  1 Dec27 ?        00:13:10 /usr/sbin/glusterfsd -s 10.70.37.165 --volfile-id tiervolume.10.70.37.165.rhs-brick3-tiervolume_hot -p /var/lib/glusterd/vols/tiervolume/run/10.70.37.165-rhs-brick3-tiervolume_hot.pid -S /var/run/gluster/4f46770e383fab1ee7789ff7a656a342.socket --brick-name /rhs/brick3/tiervolume_hot -l /var/log/glusterfs/bricks/rhs-brick3-tiervolume_hot.log --xlator-option *-posix.glusterd-uuid=fda547f4-eaf4-44ad-b281-978c1306f75a --brick-port 49154 --xlator-option tiervolume-server.listen-port=49154
root     13506     1  1 18:26 ?        00:00:00 /usr/sbin/glusterfs -s localhost --volfile-id rebalance/tiervolume --xlator-option *dht.use-readdirp=yes --xlator-option *dht.lookup-unhashed=yes --xlator-option *dht.assert-no-child-down=yes --xlator-option *replicate*.data-self-heal=off --xlator-option *replicate*.metadata-self-heal=off --xlator-option *replicate*.entry-self-heal=off --xlator-option *dht.readdir-optimize=on --xlator-option *tier-dht.xattr-name=trusted.tier.tier-dht --xlator-option *dht.rebalance-cmd=6 --xlator-option *dht.node-uuid=fda547f4-eaf4-44ad-b281-978c1306f75a --xlator-option *dht.commit-hash=3020682906 --socket-file /var/run/gluster/gluster-tier-cf53869f-8994-40b5-a14b-65c107792595.sock --pid-file /var/lib/glusterd/vols/tiervolume/tier/fda547f4-eaf4-44ad-b281-978c1306f75a.pid -l /var/log/glusterfs/tiervolume-tier.log
root     13515 13047  0 18:26 pts/0    00:00:00 grep --color=auto tier
[root@dhcp37-165 ~]# gluster volume rebalance tiervolume status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status   run time in secs
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost                0        0Bytes             0             0             0          in progress             138.00
                            10.70.37.133              956        0Bytes         21554             0             0          in progress           70529.00
                            10.70.37.160                0        0Bytes         42256             0             0          in progress           70529.00
                            10.70.37.158             1005        0Bytes         25117             0             0          in progress           70529.00
                            10.70.37.110                0        0Bytes         38131             0             0          in progress           70529.00
                            10.70.37.155             1022        0Bytes         20551             0             0          in progress           70529.00
                             10.70.37.99                0        0Bytes         42448             0             0          in progress           70529.00
                             10.70.37.88              848        0Bytes         24030             0             0          in progress           70529.00
                            10.70.37.112                0        0Bytes         43026             0             0          in progress           70528.00
                            10.70.37.199             1067        0Bytes         26635             0             0          in progress           70529.00
                            10.70.37.162                0        0Bytes         39062             0             0          in progress           70529.00
                             10.70.37.87             1113        0Bytes         22035             0             0          in progress           70529.00
volume rebalance: tiervolume: success
[root@dhcp37-165 ~]# 


Version-Release number of selected component (if applicable):
=============================================================

glusterfs-3.7.5-13.el7rhgs.x86_64


How reproducible:
=================

Always


Steps to Reproduce:
===================
1. Create tier setup
2. Kill glusterfs process for tierd
3. Check tier status 

Actual results:
===============

It shows Inprogress


Expected results:
=================

Status should show faulty/Failed.

Comment 5 Rahul Hinduja 2016-01-06 12:10:39 UTC
Verified with build: glusterfs-3.7.5-14.el7rhgs.x86_64

Killing glusterfs of tierd process shows as failed in tier status. Moving this bug to verified state.

Comment 8 errata-xmlrpc 2016-03-01 06:06:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0193.html


Note You need to log in before you can comment on or make changes to this bug.