Bug 1444086
| Summary: | Brick Multiplexing:Different brick processes pointing to same socket, process file and volfile-id must not lead to IO loss when one of the volume is down | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Nag Pavan Chilakam <nchilaka> |
| Component: | core | Assignee: | Mohit Agrawal <moagrawa> |
| Status: | CLOSED ERRATA | QA Contact: | Nag Pavan Chilakam <nchilaka> |
| Severity: | urgent | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | rhgs-3.3 | CC: | amukherj, moagrawa, rhs-bugs, storage-qa-internal |
| Target Milestone: | --- | Keywords: | Reopened |
| Target Release: | RHGS 3.3.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | brick-multiplexing | ||
| Fixed In Version: | glusterfs-3.8.4-25 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2017-09-21 04:39:40 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1417151 | ||
|
Description
Nag Pavan Chilakam
2017-04-20 14:21:03 UTC
As discussed later, this is a bug, hence moving it to right state Upstream patches : https://review.gluster.org/#/q/topic:bug-1444596 Downstream patches: https://code.engineering.redhat.com/gerrit/#/c/105595/ https://code.engineering.redhat.com/gerrit/#/c/105596/ On 3.8.4-25 the problem still exists Hence will have to move to failed_qa performed same steps [root@dhcp35-45 ~]# ps -ef|grep glusterfsd root 29765 1 11 13:07 ? 00:00:22 /usr/sbin/glusterfsd -s 10.70.35.45 --volfile-id bali-1.10.70.35.45.rhs-brick1-bali-1 -p /var/lib/glusterd/vols/bali-1/run/10.70.35.45-rhs-brick1-bali-1.pid -S /var/run/gluster/0725de02cc65e3bd10bbccdbf07631e6.socket --brick-name /rhs/brick1/bali-1 -l /var/log/glusterfs/bricks/rhs-brick1-bali-1.log --xlator-option *-posix.glusterd-uuid=e4f737cd-59a2-4392-aa3d-4230f698f128 --brick-port 49152 --xlator-option bali-1-server.listen-port=49152 root 30083 1 0 13:10 ? 00:00:00 /usr/sbin/glusterfsd -s 10.70.35.45 --volfile-id bali-1.10.70.35.45.rhs-brick1-bali-1 -p /var/lib/glusterd/vols/bali-1/run/10.70.35.45-rhs-brick1-bali-1.pid -S /var/run/gluster/0725de02cc65e3bd10bbccdbf07631e6.socket --brick-name /rhs/brick1/bali-1 -l /var/log/glusterfs/bricks/rhs-brick1-bali-1.log --xlator-option *-posix.glusterd-uuid=e4f737cd-59a2-4392-aa3d-4230f698f128 --brick-port 49153 --xlator-option bali-1-server.listen-port=49153 root 30103 1 0 13:10 ? 00:00:00 /usr/sbin/glusterfsd -s localhost --volfile-id snapd/bali-1 -p /var/lib/glusterd/vols/bali-1/run/bali-1-snapd.pid -l /var/log/glusterfs/snaps/bali-1/snapd.log --brick-name snapd-bali-1 -S /var/run/gluster/b0c28a9b87c703e8435212615395783b.socket --brick-port 49154 --xlator-option bali-1-server.listen-port=49154 --no-mem-accounting root 30157 28566 0 13:10 pts/0 00:00:00 grep --color=auto glusterfsd [root@dhcp35-45 ~]# gluster v status Status of volume: bali-1 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.35.45:/rhs/brick1/bali-1 49153 0 Y 30083 Brick 10.70.35.130:/rhs/brick1/bali-1 49153 0 Y 2132 Brick 10.70.35.122:/rhs/brick1/bali-1 49153 0 Y 1441 Snapshot Daemon on localhost 49154 0 Y 30103 Self-heal Daemon on localhost N/A N/A Y 30112 Snapshot Daemon on 10.70.35.23 49152 0 Y 30666 Self-heal Daemon on 10.70.35.23 N/A N/A Y 30675 Snapshot Daemon on 10.70.35.122 49154 0 Y 1461 Self-heal Daemon on 10.70.35.122 N/A N/A Y 1470 Snapshot Daemon on 10.70.35.130 49154 0 Y 2173 Self-heal Daemon on 10.70.35.130 N/A N/A Y 2199 Task Status of Volume bali-1 ------------------------------------------------------------------------------ There are no active volume tasks Status of volume: bali-2 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.35.45:/rhs/brick2/bali-2 49152 0 Y 29765 Brick 10.70.35.130:/rhs/brick2/bali-2 49152 0 Y 1794 Brick 10.70.35.122:/rhs/brick2/bali-2 49152 0 Y 1225 Self-heal Daemon on localhost N/A N/A Y 30112 Self-heal Daemon on 10.70.35.23 N/A N/A Y 30675 Self-heal Daemon on 10.70.35.122 N/A N/A Y 1470 Self-heal Daemon on 10.70.35.130 N/A N/A Y 2199 Task Status of Volume bali-2 ------------------------------------------------------------------------------ There are no active volume tasks [root@dhcp35-45 ~]# On_qa validation: While I see the problem mentioned in description still there, ie 2 different bricks processes ie glusterfsd pointing to same socket and vol file, I am only moving this to verified as I dont see any IO impact(as suggested in comment#12 and comment#11) ie when i stop vol_1(where uss was enabled later), IOs to vol_2 was still in progress. Hence moving to verified version:3.8.4-33 also I am changing the title and creating a new bug seperately to track the same socket file issue (due to my comment in comment#13) Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:2774 |