Bug 1540868 - Volume start commit failed, Commit failed for operation Start on local node [NEEDINFO]
Summary: Volume start commit failed, Commit failed for operation Start on local node
Keywords:
Status: CLOSED EOL
Alias: None
Product: GlusterFS
Classification: Community
Component: glusterd
Version: 3.10
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Sanju
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-02-01 08:40 UTC by zhudongmei
Modified: 2018-06-20 18:29 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-06-20 18:29:01 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
srakonde: needinfo? (zhu.dongmei20)


Attachments (Terms of Use)
gluster node2 log (811.09 KB, application/x-gzip)
2018-02-02 00:56 UTC, zhudongmei
no flags Details
gluser node 1 log (664.77 KB, application/x-gzip)
2018-02-02 00:57 UTC, zhudongmei
no flags Details
gluser node 2 log-new (163.25 KB, application/zip)
2018-02-24 03:30 UTC, zhudongmei
no flags Details
gluser node 1 log-new (174.12 KB, application/zip)
2018-02-24 03:31 UTC, zhudongmei
no flags Details

Description zhudongmei 2018-02-01 08:40:25 UTC
Description of problem:
sometimes volume start failed.


Version-Release number of selected component (if applicable):
glusterfs 3.10.3

How reproducible:
sometimes

Steps to Reproduce:
gluster cluster has 2 nodes, and want to create a replica volume,but start failed
log as follows:

1.cmd history:
[2018-01-31 15:27:29.310233]  : volume delete test0019_component_volume : SUCCESS
[2018-01-31 15:27:31.018337]  : volume create test0021-component_volume replica 2 188.188.188.84:/root/data/platform/test0021-component/test0021-component_volume 188.188.188.83:/root/data/platform/test0021-component/test0021-component_volume force : SUCCESS
[2018-01-31 15:27:31.495940]  : volume set test0021-component_volume nfs.disable off : SUCCESS
[2018-01-31 15:27:31.574767]  : volume set test0021-component_volume nfs.rpc-auth-allow 188.188.0.0/16,172.17.0.0/16 : SUCCESS
[2018-01-31 15:27:31.651966]  : volume set test0021-component_volume diagnostics.brick-log-level WARNING : SUCCESS
[2018-01-31 15:27:31.728427]  : volume set test0021-component_volume diagnostics.brick-sys-log-level WARNING : SUCCESS
[2018-01-31 15:27:31.805802]  : volume set test0021-component_volume diagnostics.client-log-level ERROR : SUCCESS
[2018-01-31 15:27:31.885078]  : volume set test0021-component_volume diagnostics.client-sys-log-level ERROR : SUCCESS
[2018-01-31 15:27:31.963931]  : volume set test0021-component_volume cluster.server-quorum-type server : SUCCESS
[2018-01-31 15:27:31.987742]  : volume set all cluster.server-quorum-ratio 60% : SUCCESS
[2018-01-31 15:27:32.065940]  : volume set test0021-component_volume cluster.quorum-count 2 : SUCCESS
[2018-01-31 15:27:32.146743]  : volume set test0021-component_volume cluster.quorum-type fixed : SUCCESS
[2018-01-31 15:27:32.186824]  : volume start test0021-component_volume : FAILED : Commit failed on localhost. Please check log file for details.

2.glusterd.log
[2018-01-31 15:25:10.892912] E [MSGID: 101042] [compat.c:569:gf_umount_lazy] 0-management: Lazy unmount of /var/run/gluster/test0019_component_volume_quota_list/
The message "E [MSGID: 101042] [compat.c:569:gf_umount_lazy] 0-management: Lazy unmount of /var/run/gluster/test0019_component_volume_quota_list/" repeated 5 times between [2018-01-31 15:25:10.892912] and [2018-01-31 15:26:38.510580]
[2018-01-31 15:27:11.417708] E [MSGID: 101042] [compat.c:569:gf_umount_lazy] 0-management: Lazy unmount of /var/run/gluster/test0019_component_volume_quota_list/
[2018-01-31 15:27:20.847870] E [run.c:191:runner_log] (-->/usr/lib64/glusterfs/3.10.3/xlator/mgmt/glusterd.so(+0x36ba5) [0x7f406ea5cba5] -->/usr/lib64/glusterfs/3.10.3/xlator/mgmt/glusterd.so(+0xdb79b) [0x7f406eb0179b] -->/lib64/libglusterfs.so.0(runner_log+0x115) [0x7f407403e385] ) 0-management: Failed to execute script: /var/lib/glusterd/hooks/1/stop/pre/S30samba-stop.sh --volname=test0019_component_volume --last=no
[2018-01-31 15:27:24.918501] W [socket.c:593:__socket_rwv] 0-management: readv on /var/run/gluster/f4add0eba7cd3f12c9f6faad73592c1a.socket failed (No data available)
[2018-01-31 15:27:24.918783] W [glusterd-handler.c:5687:__glusterd_brick_rpc_notify] 0-management: got disconnect from stale rpc on /root/data/platform/test0019_component/test0019_component_volume
[2018-01-31 15:27:24.919171] W [socket.c:593:__socket_rwv] 0-socket.management: writev on 188.188.188.84:49151 failed (Broken pipe)
[2018-01-31 15:27:32.186287] E [MSGID: 106005] [glusterd-utils.c:5525:glusterd_brick_start] 0-management: Unable to start brick 188.188.188.84:/root/data/platform/test0021-component/test0021-component_volume
[2018-01-31 15:27:32.186378] E [MSGID: 106123] [glusterd-mgmt.c:323:gd_mgmt_v3_commit_fn] 0-management: Volume start commit failed.
[2018-01-31 15:27:32.186395] E [MSGID: 106123] [glusterd-mgmt.c:1532:glusterd_mgmt_v3_commit] 0-management: Commit failed for operation Start on local node
[2018-01-31 15:27:32.186409] E [MSGID: 106123] [glusterd-mgmt.c:2145:glusterd_mgmt_v3_initiate_all_phases] 0-management: Commit Op Failed
[2018-01-31 15:27:32.260834] E [run.c:191:runner_log] (-->/usr/lib64/glusterfs/3.10.3/xlator/mgmt/glusterd.so(+0x36ba5) [0x7f406ea5cba5] -->/usr/lib64/glusterfs/3.10.3/xlator/mgmt/glusterd.so(+0xdb79b) [0x7f406eb0179b] -->/lib64/libglusterfs.so.0(runner_log+0x115) [0x7f407403e385] ) 0-management: Failed to execute script: /var/lib/glusterd/hooks/1/stop/pre/S30samba-stop.sh --volname=test0021-component_volume --last=yes

3.cli.log
[2018-01-31 15:27:31.018453] I [cli-rpc-ops.c:1163:gf_cli_create_volume_cbk] 0-cli: Received resp to create volume
[2018-01-31 15:27:31.018602] I [input.c:31:cli_batch] 0-: Exiting with: 0
[2018-01-31 15:27:31.030488] I [cli.c:759:main] 0-cli: Started running gluster with version 3.10.3
[2018-01-31 15:27:31.037326] I [cli-cmd-volume.c:2320:cli_check_gsync_present] 0-: geo-replication not installed
[2018-01-31 15:27:31.037671] I [MSGID: 101190] [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2018-01-31 15:27:31.496040] I [cli-rpc-ops.c:2329:gf_cli_set_volume_cbk] 0-cli: Received resp to set
[2018-01-31 15:27:31.496214] I [input.c:31:cli_batch] 0-: Exiting with: 0
[2018-01-31 15:27:31.506834] I [cli.c:759:main] 0-cli: Started running gluster with version 3.10.3
[2018-01-31 15:27:31.513681] I [cli-cmd-volume.c:2320:cli_check_gsync_present] 0-: geo-replication not installed
[2018-01-31 15:27:31.514030] I [MSGID: 101190] [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2018-01-31 15:27:31.574845] I [cli-rpc-ops.c:2329:gf_cli_set_volume_cbk] 0-cli: Received resp to set
[2018-01-31 15:27:31.574971] I [input.c:31:cli_batch] 0-: Exiting with: 0
[2018-01-31 15:27:31.584757] I [cli.c:759:main] 0-cli: Started running gluster with version 3.10.3
[2018-01-31 15:27:31.591421] I [cli-cmd-volume.c:2320:cli_check_gsync_present] 0-: geo-replication not installed
[2018-01-31 15:27:31.591755] I [MSGID: 101190] [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2018-01-31 15:27:31.652050] I [cli-rpc-ops.c:2329:gf_cli_set_volume_cbk] 0-cli: Received resp to set
[2018-01-31 15:27:31.652182] I [input.c:31:cli_batch] 0-: Exiting with: 0
[2018-01-31 15:27:31.661223] I [cli.c:759:main] 0-cli: Started running gluster with version 3.10.3
[2018-01-31 15:27:31.667749] I [cli-cmd-volume.c:2320:cli_check_gsync_present] 0-: geo-replication not installed
[2018-01-31 15:27:31.668525] I [MSGID: 101190] [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2018-01-31 15:27:31.728504] I [cli-rpc-ops.c:2329:gf_cli_set_volume_cbk] 0-cli: Received resp to set
[2018-01-31 15:27:31.728634] I [input.c:31:cli_batch] 0-: Exiting with: 0
[2018-01-31 15:27:31.738082] I [cli.c:759:main] 0-cli: Started running gluster with version 3.10.3
[2018-01-31 15:27:31.744711] I [cli-cmd-volume.c:2320:cli_check_gsync_present] 0-: geo-replication not installed
[2018-01-31 15:27:31.745048] I [MSGID: 101190] [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2018-01-31 15:27:31.805869] I [cli-rpc-ops.c:2329:gf_cli_set_volume_cbk] 0-cli: Received resp to set
[2018-01-31 15:27:31.806021] I [input.c:31:cli_batch] 0-: Exiting with: 0
[2018-01-31 15:27:31.815240] I [cli.c:759:main] 0-cli: Started running gluster with version 3.10.3
[2018-01-31 15:27:31.821861] I [cli-cmd-volume.c:2320:cli_check_gsync_present] 0-: geo-replication not installed
[2018-01-31 15:27:31.822207] I [MSGID: 101190] [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2018-01-31 15:27:31.885159] I [cli-rpc-ops.c:2329:gf_cli_set_volume_cbk] 0-cli: Received resp to set
[2018-01-31 15:27:31.885299] I [input.c:31:cli_batch] 0-: Exiting with: 0
[2018-01-31 15:27:31.894963] I [cli.c:759:main] 0-cli: Started running gluster with version 3.10.3
[2018-01-31 15:27:31.901576] I [cli-cmd-volume.c:2320:cli_check_gsync_present] 0-: geo-replication not installed
[2018-01-31 15:27:31.901921] I [MSGID: 101190] [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2018-01-31 15:27:31.964013] I [cli-rpc-ops.c:2329:gf_cli_set_volume_cbk] 0-cli: Received resp to set
[2018-01-31 15:27:31.964153] I [input.c:31:cli_batch] 0-: Exiting with: 0
[2018-01-31 15:27:31.973426] I [cli.c:759:main] 0-cli: Started running gluster with version 3.10.3
[2018-01-31 15:27:31.979980] I [cli-cmd-volume.c:2320:cli_check_gsync_present] 0-: geo-replication not installed
[2018-01-31 15:27:31.980316] I [MSGID: 101190] [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2018-01-31 15:27:31.987820] I [cli-rpc-ops.c:2329:gf_cli_set_volume_cbk] 0-cli: Received resp to set
[2018-01-31 15:27:31.987948] I [input.c:31:cli_batch] 0-: Exiting with: 0
[2018-01-31 15:27:31.996924] I [cli.c:759:main] 0-cli: Started running gluster with version 3.10.3
[2018-01-31 15:27:32.003351] I [cli-cmd-volume.c:2320:cli_check_gsync_present] 0-: geo-replication not installed
[2018-01-31 15:27:32.003694] I [MSGID: 101190] [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2018-01-31 15:27:32.066022] I [cli-rpc-ops.c:2329:gf_cli_set_volume_cbk] 0-cli: Received resp to set
[2018-01-31 15:27:32.066174] I [input.c:31:cli_batch] 0-: Exiting with: 0
[2018-01-31 15:27:32.075567] I [cli.c:759:main] 0-cli: Started running gluster with version 3.10.3
[2018-01-31 15:27:32.082204] I [cli-cmd-volume.c:2320:cli_check_gsync_present] 0-: geo-replication not installed
[2018-01-31 15:27:32.082536] I [MSGID: 101190] [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2018-01-31 15:27:32.146825] I [cli-rpc-ops.c:2329:gf_cli_set_volume_cbk] 0-cli: Received resp to set
[2018-01-31 15:27:32.146972] I [input.c:31:cli_batch] 0-: Exiting with: 0
[2018-01-31 15:27:32.156844] I [cli.c:759:main] 0-cli: Started running gluster with version 3.10.3
[2018-01-31 15:27:32.163480] I [cli-cmd-volume.c:2320:cli_check_gsync_present] 0-: geo-replication not installed
[2018-01-31 15:27:32.163825] I [MSGID: 101190] [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2018-01-31 15:27:32.186925] I [cli-rpc-ops.c:1469:gf_cli_start_volume_cbk] 0-cli: Received resp to start volume
[2018-01-31 15:27:32.186973] I [input.c:31:cli_batch] 0-: Exiting with: -1

Actual results:
volume start fail

Expected results:
volume start ok

Additional info:

Comment 1 Atin Mukherjee 2018-02-01 10:19:18 UTC
Could you attach the glusterd and the respective brick logs from the node where this command failed?

Comment 2 zhudongmei 2018-02-02 00:56:45 UTC
Created attachment 1389836 [details]
gluster node2 log

gluster node2 log

Comment 3 zhudongmei 2018-02-02 00:57:26 UTC
Created attachment 1389837 [details]
gluser node 1 log

gluser node 1 log

Comment 4 Atin Mukherjee 2018-02-02 04:55:37 UTC
Sanju - Can you please take a look at the logs and check why the volume start commit failed?

Comment 5 Sanju 2018-02-05 05:42:40 UTC
I can't find the respective brick logs on the attachment you provided. Can you please provide the brick logs of the volume which is failed to start.

Comment 6 Sanju 2018-02-05 06:18:59 UTC
Can you please run the glusterd in debug mode and provide the logs

Comment 7 zhudongmei 2018-02-24 03:29:42 UTC
 Hi Sanju
 
  modify gluserd debug mode (/usr/lib/systemd/system/glusterd.service : Environment="LOG_LEVEL=DEBUG")

  then get the log: gluster1.zip gluster2.zip

   please have a look

Comment 8 zhudongmei 2018-02-24 03:30:48 UTC
Created attachment 1400087 [details]
gluser node 2 log-new

Comment 9 zhudongmei 2018-02-24 03:31:15 UTC
Created attachment 1400088 [details]
gluser node 1 log-new

Comment 10 zhudongmei 2018-02-28 00:53:58 UTC
add debug gluserd log,please have a look

Comment 11 Sanju 2018-03-05 07:48:09 UTC
(In reply to Sanju from comment #5)
> I can't find the respective brick logs on the attachment you provided. Can
> you please provide the brick logs of the volume which is failed to start.

I'm waiting for brick logs

Comment 12 Shyamsundar 2018-06-20 18:29:01 UTC
This bug reported is against a version of Gluster that is no longer maintained
(or has been EOL'd). See https://www.gluster.org/release-schedule/ for the
versions currently maintained.

As a result this bug is being closed.

If the bug persists on a maintained version of gluster or against the mainline
gluster repository, request that it be reopened and the Version field be marked
appropriately.


Note You need to log in before you can comment on or make changes to this bug.