| Summary: | file promotion/demotion stuck or hung when one brick in each set of hot replica pair was brought down | ||
|---|---|---|---|
| Product: | Red Hat Gluster Storage | Reporter: | Nag Pavan Chilakam <nchilaka> |
| Component: | tier | Assignee: | Bug Updates Notification Mailing List <rhs-bugs> |
| Status: | CLOSED WONTFIX | QA Contact: | Nag Pavan Chilakam <nchilaka> |
| Severity: | high | Docs Contact: | |
| Priority: | urgent | ||
| Version: | rhgs-3.1 | CC: | rgowdapp, rhs-bugs |
| Target Milestone: | --- | Keywords: | ZStream |
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | tier-migration | ||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2018-02-06 17:43:07 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
|
Description
Nag Pavan Chilakam
2016-02-01 11:03:52 UTC
refer to this node: [root@dhcp37-101 glusterfs]# gluster v status nagvol Status of volume: nagvol Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Hot Bricks: Brick 10.70.37.120:/rhs/brick7/nagvol_hot 49156 0 Y 32513 Brick 10.70.37.60:/rhs/brick7/nagvol_hot N/A N/A N N/A Brick 10.70.37.69:/rhs/brick7/nagvol_hot 49156 0 Y 32442 Brick 10.70.37.101:/rhs/brick7/nagvol_hot N/A N/A N N/A Brick 10.70.35.163:/rhs/brick7/nagvol_hot 49156 0 Y 617 Brick 10.70.35.173:/rhs/brick7/nagvol_hot N/A N/A N N/A Brick 10.70.35.232:/rhs/brick7/nagvol_hot 49156 0 Y 32361 Brick 10.70.35.176:/rhs/brick7/nagvol_hot N/A N/A N N/A Brick 10.70.35.222:/rhs/brick7/nagvol_hot 49155 0 Y 22713 Brick 10.70.35.155:/rhs/brick7/nagvol_hot N/A N/A N N/A Brick 10.70.37.195:/rhs/brick7/nagvol_hot N/A N/A N N/A Brick 10.70.37.202:/rhs/brick7/nagvol_hot 49156 0 Y 26275 Cold Bricks: Brick 10.70.37.202:/rhs/brick1/nagvol 49152 0 Y 16950 Brick 10.70.37.195:/rhs/brick1/nagvol 49152 0 Y 16702 Brick 10.70.35.155:/rhs/brick1/nagvol 49152 0 Y 13578 Brick 10.70.35.222:/rhs/brick1/nagvol 49152 0 Y 13546 Brick 10.70.35.108:/rhs/brick1/nagvol 49152 0 Y 4675 Brick 10.70.35.44:/rhs/brick1/nagvol 49152 0 Y 12288 Brick 10.70.35.89:/rhs/brick1/nagvol 49152 0 Y 2668 Brick 10.70.35.231:/rhs/brick1/nagvol 49152 0 Y 22810 Brick 10.70.35.176:/rhs/brick1/nagvol 49152 0 Y 22781 Brick 10.70.35.232:/rhs/brick1/nagvol 49152 0 Y 22783 Brick 10.70.35.173:/rhs/brick1/nagvol 49152 0 Y 22795 Brick 10.70.35.163:/rhs/brick1/nagvol 49152 0 Y 22805 Brick 10.70.37.101:/rhs/brick1/nagvol 49152 0 Y 22847 Brick 10.70.37.69:/rhs/brick1/nagvol 49152 0 Y 22847 Brick 10.70.37.60:/rhs/brick1/nagvol 49152 0 Y 22895 Brick 10.70.37.120:/rhs/brick1/nagvol 49152 0 Y 22916 Brick 10.70.37.202:/rhs/brick2/nagvol 49153 0 Y 16969 Brick 10.70.37.195:/rhs/brick2/nagvol 49153 0 Y 16721 Brick 10.70.35.155:/rhs/brick2/nagvol 49153 0 Y 13597 Brick 10.70.35.222:/rhs/brick2/nagvol 49153 0 Y 13565 Brick 10.70.35.108:/rhs/brick2/nagvol 49153 0 Y 4694 Brick 10.70.35.44:/rhs/brick2/nagvol 49153 0 Y 12307 Brick 10.70.35.89:/rhs/brick2/nagvol 49153 0 Y 2683 Brick 10.70.35.231:/rhs/brick2/nagvol 49153 0 Y 22829 NFS Server on localhost 2049 0 Y 6099 Self-heal Daemon on localhost N/A N/A Y 6107 Quota Daemon on localhost N/A N/A Y 6115 NFS Server on 10.70.37.69 2049 0 Y 26129 Self-heal Daemon on 10.70.37.69 N/A N/A Y 26137 Quota Daemon on 10.70.37.69 N/A N/A Y 26145 NFS Server on 10.70.37.195 2049 0 Y 19012 Self-heal Daemon on 10.70.37.195 N/A N/A Y 19020 Quota Daemon on 10.70.37.195 N/A N/A Y 19028 NFS Server on 10.70.37.60 2049 0 Y 31342 Self-heal Daemon on 10.70.37.60 N/A N/A Y 31350 Quota Daemon on 10.70.37.60 N/A N/A Y 31358 NFS Server on 10.70.37.120 2049 0 Y 26317 Self-heal Daemon on 10.70.37.120 N/A N/A Y 26325 Quota Daemon on 10.70.37.120 N/A N/A Y 26333 NFS Server on dhcp37-202.lab.eng.blr.redhat .com 2049 0 Y 20158 Self-heal Daemon on dhcp37-202.lab.eng.blr. redhat.com N/A N/A Y 20166 Quota Daemon on dhcp37-202.lab.eng.blr.redh at.com N/A N/A Y 20174 NFS Server on 10.70.35.108 2049 0 Y 7435 Self-heal Daemon on 10.70.35.108 N/A N/A Y 7443 Quota Daemon on 10.70.35.108 N/A N/A Y 7451 NFS Server on 10.70.35.232 2049 0 Y 26640 Self-heal Daemon on 10.70.35.232 N/A N/A Y 26648 Quota Daemon on 10.70.35.232 N/A N/A Y 26656 NFS Server on 10.70.35.176 2049 0 Y 26492 Self-heal Daemon on 10.70.35.176 N/A N/A Y 26500 Quota Daemon on 10.70.35.176 N/A N/A Y 26508 NFS Server on 10.70.35.173 2049 0 Y 26912 Self-heal Daemon on 10.70.35.173 N/A N/A Y 26920 Quota Daemon on 10.70.35.173 N/A N/A Y 26928 NFS Server on 10.70.35.44 2049 0 Y 15159 Self-heal Daemon on 10.70.35.44 N/A N/A Y 15167 Quota Daemon on 10.70.35.44 N/A N/A Y 15175 NFS Server on 10.70.35.155 2049 0 Y 16144 Self-heal Daemon on 10.70.35.155 N/A N/A Y 16152 Quota Daemon on 10.70.35.155 N/A N/A Y 16160 NFS Server on 10.70.35.89 2049 0 Y 6458 Self-heal Daemon on 10.70.35.89 N/A N/A Y 6466 Quota Daemon on 10.70.35.89 N/A N/A Y 6474 NFS Server on 10.70.35.231 2049 0 Y 25739 Self-heal Daemon on 10.70.35.231 N/A N/A Y 25747 Quota Daemon on 10.70.35.231 N/A N/A Y 25755 NFS Server on 10.70.35.222 2049 0 Y 16478 Self-heal Daemon on 10.70.35.222 N/A N/A Y 16486 Quota Daemon on 10.70.35.222 N/A N/A Y 16494 NFS Server on 10.70.35.163 2049 0 Y 26759 Self-heal Daemon on 10.70.35.163 N/A N/A Y 26767 Quota Daemon on 10.70.35.163 N/A N/A Y 26775 Task Status of Volume nagvol ------------------------------------------------------------------------------ Task : Tier migration ID : 0870550a-70ba-4cd1-98da-b456059bd6cc Status : in progress [root@dhcp37-101 glusterfs]# less /var/log/glusterfs/nagvol-tier.log [root@dhcp37-101 glusterfs]# [root@dhcp37-101 glusterfs]# [root@dhcp37-101 glusterfs]# md5sum ddfile.21 md5sum: ddfile.21: No such file or directory [root@dhcp37-101 glusterfs]# cd /rhs/brick1/nagvol/ddcommand [root@dhcp37-101 ddcommand]# md5sum ddfile.21 md5sum: ddfile.21: No such file or directory [root@dhcp37-101 ddcommand]# ls ddfile.1 ddfile.13 ddfile.18 ddfile.22 ddfile.26 ddfile.3 ddlogme2.txt ddfile.10 ddfile.15 ddfile.2 ddfile.23 ddfile.27 ddfile.8 ddlogme.txt ddfile.102 ddfile.16 ddfile.20 ddfile.25 ddfile.29 ddfile.9 [root@dhcp37-101 ddcommand]# ls -l total 2250020 -rw-r--r--. 2 root root 192000000 Jan 27 11:01 ddfile.1 -rw-r--r--. 2 root root 192000000 Jan 27 11:24 ddfile.10 -rw-r--r--. 2 root root 192000000 Jan 25 19:38 ddfile.102 -rw-r-Sr-T. 2 root root 192000000 Jan 27 11:32 ddfile.13 -rw-r--r--. 2 root root 192000000 Jan 27 11:37 ddfile.15 ---------T. 2 root root 0 Jan 27 11:37 ddfile.16 ---------T. 2 root root 0 Jan 27 11:43 ddfile.18 ---------T. 2 root root 0 Jan 27 11:01 ddfile.2 -rw-r--r--. 2 root root 192000000 Jan 27 11:52 ddfile.20 -rw-r--r--. 2 root root 192000000 Jan 27 11:57 ddfile.22 ---------T. 2 root root 0 Jan 27 11:57 ddfile.23 -rw-r--r--. 2 root root 192000000 Jan 27 12:05 ddfile.25 -rw-r--r--. 2 root root 192000000 Jan 27 12:08 ddfile.26 ---------T. 2 root root 0 Jan 27 12:08 ddfile.27 -rw-r--r--. 2 root root 192000000 Jan 27 12:15 ddfile.29 ---------T. 2 root root 0 Jan 27 11:03 ddfile.3 -rw-r--r--. 2 root root 192000000 Jan 27 11:19 ddfile.8 -rw-r--r--. 2 root root 192000000 Jan 27 11:22 ddfile.9 ---------T. 2 root root 0 Jan 27 10:58 ddlogme2.txt -rw-r--r--. 2 root root 512 Jan 25 20:51 ddlogme.txt [root@dhcp37-101 ddcommand]# ls -l ^C [root@dhcp37-101 ddcommand]# ls -l total 2250020 -rw-r--r--. 2 root root 192000000 Jan 27 11:01 ddfile.1 -rw-r--r--. 2 root root 192000000 Jan 27 11:24 ddfile.10 -rw-r--r--. 2 root root 192000000 Jan 25 19:38 ddfile.102 -rw-r-Sr-T. 2 root root 192000000 Jan 27 11:32 ddfile.13 -rw-r--r--. 2 root root 192000000 Jan 27 11:37 ddfile.15 ---------T. 2 root root 0 Jan 27 11:37 ddfile.16 ---------T. 2 root root 0 Jan 27 11:43 ddfile.18 ---------T. 2 root root 0 Jan 27 11:01 ddfile.2 -rw-r--r--. 2 root root 192000000 Jan 27 11:52 ddfile.20 -rw-r--r--. 2 root root 192000000 Jan 27 11:57 ddfile.22 ---------T. 2 root root 0 Jan 27 11:57 ddfile.23 -rw-r--r--. 2 root root 192000000 Jan 27 12:05 ddfile.25 -rw-r--r--. 2 root root 192000000 Jan 27 12:08 ddfile.26 ---------T. 2 root root 0 Jan 27 12:08 ddfile.27 -rw-r--r--. 2 root root 192000000 Jan 27 12:15 ddfile.29 ---------T. 2 root root 0 Jan 27 11:03 ddfile.3 -rw-r--r--. 2 root root 192000000 Jan 27 11:19 ddfile.8 -rw-r--r--. 2 root root 192000000 Jan 27 11:22 ddfile.9 ---------T. 2 root root 0 Jan 27 10:58 ddlogme2.txt -rw-r--r--. 2 root root 512 Jan 25 20:51 ddlogme.txt [root@dhcp37-101 ddcommand]# ls -lrth /rhs/brick*/nagvol*/ddcommand/ddfile.21 -rw-r-Sr-T. 2 root root 1.5G Jan 27 11:54 /rhs/brick7/nagvol_hot/ddcommand/ddfile.21 [root@dhcp37-101 ddcommand]# ls -l total 2250020 -rw-r--r--. 2 root root 192000000 Jan 27 11:01 ddfile.1 -rw-r--r--. 2 root root 192000000 Jan 27 11:24 ddfile.10 -rw-r--r--. 2 root root 192000000 Jan 25 19:38 ddfile.102 -rw-r-Sr-T. 2 root root 192000000 Jan 27 11:32 ddfile.13 -rw-r--r--. 2 root root 192000000 Jan 27 11:37 ddfile.15 ---------T. 2 root root 0 Jan 27 11:37 ddfile.16 ---------T. 2 root root 0 Jan 27 11:43 ddfile.18 ---------T. 2 root root 0 Jan 27 11:01 ddfile.2 -rw-r--r--. 2 root root 192000000 Jan 27 11:52 ddfile.20 -rw-r--r--. 2 root root 192000000 Jan 27 11:57 ddfile.22 ---------T. 2 root root 0 Jan 27 11:57 ddfile.23 -rw-r--r--. 2 root root 192000000 Jan 27 12:05 ddfile.25 -rw-r--r--. 2 root root 192000000 Jan 27 12:08 ddfile.26 ---------T. 2 root root 0 Jan 27 12:08 ddfile.27 -rw-r--r--. 2 root root 192000000 Jan 27 12:15 ddfile.29 ---------T. 2 root root 0 Jan 27 11:03 ddfile.3 -rw-r--r--. 2 root root 192000000 Jan 27 11:19 ddfile.8 -rw-r--r--. 2 root root 192000000 Jan 27 11:22 ddfile.9 ---------T. 2 root root 0 Jan 27 10:58 ddlogme2.txt -rw-r--r--. 2 root root 512 Jan 25 20:51 ddlogme.txt [root@dhcp37-101 ddcommand]# ls -lrth /rhs/brick*/nagvol*/ddcommand/ddfile.21 -rw-r-Sr-T. 2 root root 1.5G Jan 27 11:54 /rhs/brick7/nagvol_hot/ddcommand/ddfile.21 [root@dhcp37-101 ddcommand]# pwd /rhs/brick1/nagvol/ddcommand [root@dhcp37-101 ddcommand]# ls -lrth /rhs/brick*/nagvol*/ddcommand/ddfile.21 -rw-r-Sr-T. 2 root root 1.5G Jan 27 11:54 /rhs/brick7/nagvol_hot/ddcommand/ddfile.21 [root@dhcp37-101 ddcommand]# [root@dhcp37-101 ddcommand]# [root@dhcp37-101 ddcommand]# cd /rhs/brick7/nagvol_hot/ddcommand/ [root@dhcp37-101 ddcommand]# md5sum ddfile.21 0d8ea8feb6d830fed9ee0703d73aea1f ddfile.21 [root@dhcp37-101 ddcommand]# #sosreport [root@dhcp37-101 ddcommand]# sosreport sosreport (version 3.2) This command will collect diagnostic and configuration information from this Red Hat Enterprise Linux system and installed applications. An archive containing the collected information will be generated in /var/tmp and may be provided to a Red Hat support representative. Any information provided to Red Hat will be treated in accordance with the published support policies at: https://access.redhat.com/support/ The generated archive may contain data considered sensitive and its content should be reviewed by the originating organization before being passed to any third party. No changes will be made to system configuration. Press ENTER to continue, or CTRL-C to quit. Please enter your first initial and last name [dhcp37-101.lab.eng.blr.redhat.com]: Please enter the case id that you are generating this report for []: brickdown_file_demote_stuck Setting up archive ... Setting up plugins ... Running plugins. Please wait ... Running 89/89: yum... Creating compressed archive... Your sosreport has been generated and saved in: /var/tmp/sosreport-dhcp37-101.lab.eng.blr.redhat.com.brickdownfiledemotestuck-20160129190259.tar.xz The checksum is: 98401ea77116695635c5ff51cad3b808 Please send this file to your support representative. [root@dhcp37-101 ddcommand]# exit logout Connection to 10.70.37.101 closed. bash-4.3$ (In reply to nchilaka from comment #0) > Description of problem: > ===================== > I brought down one brick of each replica pair in hot tier, and found that a > file which was already promoting/demotion didn't proceed further. It was > stuck in that state for more than half-an-hour. Most likely the disconnect was not identified till a call-bail happened (note that 30 min is the timeout for call-bail). Looks like more of an issue with rpc/transport. If logs are still available, we can grep through log-files whether call_bail happened for any fops. Thank you for your bug report. We are no longer working on any improvements for Tier. This bug will be set to CLOSED WONTFIX to reflect this. Please reopen if the rfe is deemed critical. |