Bug 838332 - Gluster: looks like that some exception is masked (volume cannot be created))
Gluster: looks like that some exception is masked (volume cannot be created))
Status: CLOSED NOTABUG
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: rhsc (Show other bugs)
unspecified
Unspecified Unspecified
unspecified Severity high
: ---
: ---
Assigned To: Shireesh
Sudhir D
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-07-08 09:23 EDT by Ilia Meerovich
Modified: 2016-07-04 20:06 EDT (History)
8 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-09-25 10:07:22 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Ilia Meerovich 2012-07-08 09:23:49 EDT
Description of problem:
It looks like that some exception is masked (volume cannot be created))  

Version-Release number of selected component (if applicable):

here are rpms that installed on my system:
[root@auto-gluster ~]# rpm -qa | grep rhs
rhsc-config-2.0.techpreview1-0.scratch.el6ev.noarch
rhsc-dbscripts-2.0.techpreview1-0.scratch.el6ev.noarch
rhsc-userportal-2.0.techpreview1-0.scratch.el6ev.noarch
rhsc-restapi-2.0.techpreview1-0.scratch.el6ev.noarch
rhsc-sdk-3.1.0.1-1alpha.el6.noarch
python-rhsm-0.99.8-1.el6.noarch
rhsc-jboss-deps-2.0.techpreview1-0.scratch.el6ev.noarch
rhsc-notification-service-2.0.techpreview1-0.scratch.el6ev.noarch
rhsc-2.0.techpreview1-0.scratch.el6ev.noarch
rhsc-genericapi-2.0.techpreview1-0.scratch.el6ev.noarch
rhsc-webadmin-portal-2.0.techpreview1-0.scratch.el6ev.noarch
rhsc-log-collector-3.1.0_0001-.el6.noarch
rhsc-setup-2.0.techpreview1-0.scratch.el6ev.noarch
rhsc-backend-2.0.techpreview1-0.scratch.el6ev.noarch
rhsc-tools-common-2.0.techpreview1-0.scratch.el6ev.noarch

How reproducible:


Steps to Reproduce:
Unfortunately I don't know exact steps.
It looks that somehow gluster daemon on host still see some volume that erased from database on rhsc. Volume creation fails and that what i see in jbossas log:

  
Actual results:
2012-07-08 14:59:57,768 WARN  [org.ovirt.engine.core.dal.job.ExecutionMessageDirector] (ajp--0.0.0.0-8009-6) [1435e0ef] The message key CreateGlusterVolume is missing from bundles/ExecutionMessages
2012-07-08 14:59:57,926 INFO  [org.ovirt.engine.core.bll.gluster.CreateGlusterVolumeCommand] (ajp--0.0.0.0-8009-6) [1435e0ef] Running command: CreateGlusterVolumeCommand internal: false. Entities affected :  ID: 5667473c-c8f4-11e1-aa19-b7c5e0c2ed13 Type: VdsGroups
2012-07-08 14:59:57,939 INFO  [org.ovirt.engine.core.vdsbroker.gluster.CreateGlusterVolumeVDSCommand] (ajp--0.0.0.0-8009-6) [1435e0ef] START, CreateGlusterVolumeVDSCommand(vdsId = 68594d5a-c8f4-11e1-9212-0f296cfed87f), log id: 531daf46
2012-07-08 14:59:58,036 INFO  [org.ovirt.engine.core.vdsbroker.gluster.CreateGlusterVolumeVDSCommand] (ajp--0.0.0.0-8009-6) [1435e0ef] FINISH, CreateGlusterVolumeVDSCommand, log id: 531daf46
2012-07-08 14:59:58,085 ERROR [org.ovirt.engine.api.restapi.resource.AbstractBackendResource] (ajp--0.0.0.0-8009-6) Operation Failed: [volume create failed
error: volume boo already exists
return code: 255]


Expected results:


Additional info:
Comment 2 Shireesh 2012-07-09 00:08:37 EDT
Information provided in "Actual results" is the log extract, and I don't see any bug there. The error logged suggests that we are trying to create a volume which already exists in the gluster cluster.

Are you suggesting that the error message (volume boo already exists) is not being sent back in the response of the POST request? If so, please provide the actual and expected response body.
Comment 3 Ilia Meerovich 2012-07-09 00:49:36 EDT
I will elaborate.
volume boo was created and erased at past, it's brick directories were also removed. but it looks (according to host's logs) that it still exists on host (but doesn't appears in management DB). I will try to retrieve logs from host.

It looks that "The message key CreateGlusterVolume is missing from bundles/ExecutionMessages" is masking something
Comment 4 Shireesh 2012-07-09 01:24:22 EDT
If you performed all activities from the UI / REST api, and did not use the gluster CLI directly for anything, this should not happen in normal circumstances. The entry from database will be deleted only if the VDSM verb (which executes gluster CLI for deleting the volume) succeeds.

If above is true, I see only two possibilities:

1) The gluster cluster already existed. Engine was newly installed and you tried to create a volume that was already present on the gluster cluster.

2) There could be a bug in GlusterFS wherein a "gluster volume delete" succeeds, but subsequent creation of a volume with same name (possibly from a different node of the cluster) fails with the error "volume <volname> already exists"

I think both above possibilities are highly unlikely, but possible.

Please check on above lines, and provide more details on how this issue can be simulated consistently.

The message "The message key CreateGlusterVolume is missing from bundles/ExecutionMessages" in the log is not masking anything. It indicates that there is no entry for the key "CreateGlusterVolume" in the resource bundle(s), and so it will not be localized if used in any log message.
Comment 5 Ilia Meerovich 2012-07-09 03:31:03 EDT
Hi, it looks like option 2.
I can see that database is clear on manager but host is "thinking" that it still has a volume:

This is from host's log:

[2012-07-08 17:13:33.234081] I [glusterd-brick-ops.c:380:glusterd_handle_add_brick] 0-glusterd: Received add brick req
[2012-07-08 17:13:33.234181] I [glusterd-utils.c:285:glusterd_lock] 0-glusterd: Cluster lock held by a25ff1e8-f8c1-4888-9bde-95120c62e6c8
[2012-07-08 17:13:33.234193] I [glusterd-handler.c:458:glusterd_op_txn_begin] 0-management: Acquired local lock
[2012-07-08 17:13:33.234291] I [glusterd-utils.c:857:glusterd_volume_brickinfo_get_by_brick] 0-: brick: 10.35.97.162:/foo_1
[2012-07-08 17:13:33.234315] E [glusterd-utils.c:4301:glusterd_new_brick_validate] 0-management: Host 10.35.97.162 not connected
[2012-07-08 17:13:33.234327] E [glusterd-op-sm.c:1999:glusterd_op_ac_send_stage_op] 0-: Staging failed
[2012-07-08 17:13:33.234337] I [glusterd-op-sm.c:2039:glusterd_op_ac_send_stage_op] 0-glusterd: Sent op req to 0 peers
[2012-07-08 17:13:33.234352] I [glusterd-op-sm.c:2653:glusterd_op_txn_complete] 0-glusterd: Cleared local lock
[2012-07-09 07:21:27.249199] I [glusterd-volume-ops.c:83:glusterd_handle_create_volume] 0-glusterd: Received create volume req
[2012-07-09 07:21:27.249300] E [glusterd-volume-ops.c:116:glusterd_handle_create_volume] 0-glusterd: Volume foo already exists

also i can see that host is holding all info about removed volumes:
[root@localhost ~]# ls -ls  /var/lib/glusterd/vols/
total 60
4 drwxr-xr-x 3 root root 4096 Jul  1 06:38 bo2
4 drwxr-xr-x 3 root root 4096 Jul  1 06:39 bo23
4 drwxr-xr-x 3 root root 4096 Jul  1 06:58 bo23e
4 drwxr-xr-x 3 root root 4096 Jul  1 07:01 bo23e3
4 drwxr-xr-x 3 root root 4096 Jul  1 07:02 bo23e34
4 drwxr-xr-x 3 root root 4096 Jul  1 07:46 bo23e345
4 drwxr-xr-x 3 root root 4096 Jul  1 07:47 bo23e345w
4 drwxr-xr-x 4 root root 4096 Jul  8 17:10 boo
4 drwxr-xr-x 3 root root 4096 Jul  1 06:37 booo
4 drwxr-xr-x 3 root root 4096 Jul  8 11:37 fffffoo
4 drwxr-xr-x 3 root root 4096 Jun 17 09:54 foo
4 drwxr-xr-x 4 root root 4096 Jun 26 14:35 myVol2
4 drwxr-xr-x 3 root root 4096 Jun 17 09:54 vol16
4 drwxr-xr-x 3 root root 4096 Jun 17 09:54 vol17
4 drwxr-xr-x 4 root root 4096 Jun 27 10:07 volumes
[root@localhost ~]#
Comment 6 Ilia Meerovich 2012-07-09 03:32:18 EDT
maybe it will be a good idea to do some vnc session - it will be much easier do explain this issue
Comment 7 Ilia Meerovich 2012-07-09 03:46:08 EDT
Hi, 

I double checked my tests.
I didn't stopped volume and tried to remove it. remove failed but i removed bricks directories (I'm doing it since you have a bug and you are not removing bricks directories after removing volume).
Now it looks that the system is out of sync ...
Comment 8 Shireesh 2012-07-09 04:46:14 EDT
(In reply to comment #5)
> Hi, it looks like option 2.
> I can see that database is clear on manager but host is "thinking" that it
> still has a volume:

This suggests that the volume actually exists on the gluster cluster. You can check this by running the command "gluster volume info" on any of the nodes.

> 
> This is from host's log:
> 
> [2012-07-08 17:13:33.234081] I
> [glusterd-brick-ops.c:380:glusterd_handle_add_brick] 0-glusterd: Received
> add brick req
> [2012-07-08 17:13:33.234181] I [glusterd-utils.c:285:glusterd_lock]
> 0-glusterd: Cluster lock held by a25ff1e8-f8c1-4888-9bde-95120c62e6c8
> [2012-07-08 17:13:33.234193] I
> [glusterd-handler.c:458:glusterd_op_txn_begin] 0-management: Acquired local
> lock
> [2012-07-08 17:13:33.234291] I
> [glusterd-utils.c:857:glusterd_volume_brickinfo_get_by_brick] 0-: brick:
> 10.35.97.162:/foo_1
> [2012-07-08 17:13:33.234315] E
> [glusterd-utils.c:4301:glusterd_new_brick_validate] 0-management: Host
> 10.35.97.162 not connected
> [2012-07-08 17:13:33.234327] E
> [glusterd-op-sm.c:1999:glusterd_op_ac_send_stage_op] 0-: Staging failed
> [2012-07-08 17:13:33.234337] I
> [glusterd-op-sm.c:2039:glusterd_op_ac_send_stage_op] 0-glusterd: Sent op req
> to 0 peers
> [2012-07-08 17:13:33.234352] I
> [glusterd-op-sm.c:2653:glusterd_op_txn_complete] 0-glusterd: Cleared local
> lock
> [2012-07-09 07:21:27.249199] I
> [glusterd-volume-ops.c:83:glusterd_handle_create_volume] 0-glusterd:
> Received create volume req
> [2012-07-09 07:21:27.249300] E
> [glusterd-volume-ops.c:116:glusterd_handle_create_volume] 0-glusterd: Volume
> foo already exists
> 
> also i can see that host is holding all info about removed volumes:
> [root@localhost ~]# ls -ls  /var/lib/glusterd/vols/
> total 60
> 4 drwxr-xr-x 3 root root 4096 Jul  1 06:38 bo2
> 4 drwxr-xr-x 3 root root 4096 Jul  1 06:39 bo23
> 4 drwxr-xr-x 3 root root 4096 Jul  1 06:58 bo23e
> 4 drwxr-xr-x 3 root root 4096 Jul  1 07:01 bo23e3
> 4 drwxr-xr-x 3 root root 4096 Jul  1 07:02 bo23e34
> 4 drwxr-xr-x 3 root root 4096 Jul  1 07:46 bo23e345
> 4 drwxr-xr-x 3 root root 4096 Jul  1 07:47 bo23e345w
> 4 drwxr-xr-x 4 root root 4096 Jul  8 17:10 boo
> 4 drwxr-xr-x 3 root root 4096 Jul  1 06:37 booo
> 4 drwxr-xr-x 3 root root 4096 Jul  8 11:37 fffffoo
> 4 drwxr-xr-x 3 root root 4096 Jun 17 09:54 foo
> 4 drwxr-xr-x 4 root root 4096 Jun 26 14:35 myVol2
> 4 drwxr-xr-x 3 root root 4096 Jun 17 09:54 vol16
> 4 drwxr-xr-x 3 root root 4096 Jun 17 09:54 vol17
> 4 drwxr-xr-x 4 root root 4096 Jun 27 10:07 volumes
> [root@localhost ~]#
Comment 9 Shireesh 2012-07-09 04:56:54 EDT
(In reply to comment #7)
> Hi, 
> 
> I double checked my tests.
> I didn't stopped volume and tried to remove it. remove failed but i removed
> bricks directories 

If remove failed, how come the volume doesn't exist in the DB? Did you manually delete the entries from DB as well?

> (I'm doing it since you have a bug and you are not
> removing bricks directories after removing volume).
> Now it looks that the system is out of sync ...

One way to fix this would be to create those directories again, and then stop and delete the volume.

Note You need to log in before you can comment on or make changes to this bug.