Bug 1540868

Summary: Volume start commit failed, Commit failed for operation Start on local node
Product: [Community] GlusterFS Reporter: zhudongmei <zhu.dongmei20>
Component: glusterdAssignee: Sanju <srakonde>
Status: CLOSED EOL QA Contact:
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.10CC: amukherj, bugs, srakonde, zhu.dongmei20
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-06-20 18:29:01 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
gluster node2 log
none
gluser node 1 log
none
gluser node 2 log-new
none
gluser node 1 log-new none

Description zhudongmei 2018-02-01 08:40:25 UTC
Description of problem:
sometimes volume start failed.


Version-Release number of selected component (if applicable):
glusterfs 3.10.3

How reproducible:
sometimes

Steps to Reproduce:
gluster cluster has 2 nodes, and want to create a replica volume,but start failed
log as follows:

1.cmd history:
[2018-01-31 15:27:29.310233]  : volume delete test0019_component_volume : SUCCESS
[2018-01-31 15:27:31.018337]  : volume create test0021-component_volume replica 2 188.188.188.84:/root/data/platform/test0021-component/test0021-component_volume 188.188.188.83:/root/data/platform/test0021-component/test0021-component_volume force : SUCCESS
[2018-01-31 15:27:31.495940]  : volume set test0021-component_volume nfs.disable off : SUCCESS
[2018-01-31 15:27:31.574767]  : volume set test0021-component_volume nfs.rpc-auth-allow 188.188.0.0/16,172.17.0.0/16 : SUCCESS
[2018-01-31 15:27:31.651966]  : volume set test0021-component_volume diagnostics.brick-log-level WARNING : SUCCESS
[2018-01-31 15:27:31.728427]  : volume set test0021-component_volume diagnostics.brick-sys-log-level WARNING : SUCCESS
[2018-01-31 15:27:31.805802]  : volume set test0021-component_volume diagnostics.client-log-level ERROR : SUCCESS
[2018-01-31 15:27:31.885078]  : volume set test0021-component_volume diagnostics.client-sys-log-level ERROR : SUCCESS
[2018-01-31 15:27:31.963931]  : volume set test0021-component_volume cluster.server-quorum-type server : SUCCESS
[2018-01-31 15:27:31.987742]  : volume set all cluster.server-quorum-ratio 60% : SUCCESS
[2018-01-31 15:27:32.065940]  : volume set test0021-component_volume cluster.quorum-count 2 : SUCCESS
[2018-01-31 15:27:32.146743]  : volume set test0021-component_volume cluster.quorum-type fixed : SUCCESS
[2018-01-31 15:27:32.186824]  : volume start test0021-component_volume : FAILED : Commit failed on localhost. Please check log file for details.

2.glusterd.log
[2018-01-31 15:25:10.892912] E [MSGID: 101042] [compat.c:569:gf_umount_lazy] 0-management: Lazy unmount of /var/run/gluster/test0019_component_volume_quota_list/
The message "E [MSGID: 101042] [compat.c:569:gf_umount_lazy] 0-management: Lazy unmount of /var/run/gluster/test0019_component_volume_quota_list/" repeated 5 times between [2018-01-31 15:25:10.892912] and [2018-01-31 15:26:38.510580]
[2018-01-31 15:27:11.417708] E [MSGID: 101042] [compat.c:569:gf_umount_lazy] 0-management: Lazy unmount of /var/run/gluster/test0019_component_volume_quota_list/
[2018-01-31 15:27:20.847870] E [run.c:191:runner_log] (-->/usr/lib64/glusterfs/3.10.3/xlator/mgmt/glusterd.so(+0x36ba5) [0x7f406ea5cba5] -->/usr/lib64/glusterfs/3.10.3/xlator/mgmt/glusterd.so(+0xdb79b) [0x7f406eb0179b] -->/lib64/libglusterfs.so.0(runner_log+0x115) [0x7f407403e385] ) 0-management: Failed to execute script: /var/lib/glusterd/hooks/1/stop/pre/S30samba-stop.sh --volname=test0019_component_volume --last=no
[2018-01-31 15:27:24.918501] W [socket.c:593:__socket_rwv] 0-management: readv on /var/run/gluster/f4add0eba7cd3f12c9f6faad73592c1a.socket failed (No data available)
[2018-01-31 15:27:24.918783] W [glusterd-handler.c:5687:__glusterd_brick_rpc_notify] 0-management: got disconnect from stale rpc on /root/data/platform/test0019_component/test0019_component_volume
[2018-01-31 15:27:24.919171] W [socket.c:593:__socket_rwv] 0-socket.management: writev on 188.188.188.84:49151 failed (Broken pipe)
[2018-01-31 15:27:32.186287] E [MSGID: 106005] [glusterd-utils.c:5525:glusterd_brick_start] 0-management: Unable to start brick 188.188.188.84:/root/data/platform/test0021-component/test0021-component_volume
[2018-01-31 15:27:32.186378] E [MSGID: 106123] [glusterd-mgmt.c:323:gd_mgmt_v3_commit_fn] 0-management: Volume start commit failed.
[2018-01-31 15:27:32.186395] E [MSGID: 106123] [glusterd-mgmt.c:1532:glusterd_mgmt_v3_commit] 0-management: Commit failed for operation Start on local node
[2018-01-31 15:27:32.186409] E [MSGID: 106123] [glusterd-mgmt.c:2145:glusterd_mgmt_v3_initiate_all_phases] 0-management: Commit Op Failed
[2018-01-31 15:27:32.260834] E [run.c:191:runner_log] (-->/usr/lib64/glusterfs/3.10.3/xlator/mgmt/glusterd.so(+0x36ba5) [0x7f406ea5cba5] -->/usr/lib64/glusterfs/3.10.3/xlator/mgmt/glusterd.so(+0xdb79b) [0x7f406eb0179b] -->/lib64/libglusterfs.so.0(runner_log+0x115) [0x7f407403e385] ) 0-management: Failed to execute script: /var/lib/glusterd/hooks/1/stop/pre/S30samba-stop.sh --volname=test0021-component_volume --last=yes

3.cli.log
[2018-01-31 15:27:31.018453] I [cli-rpc-ops.c:1163:gf_cli_create_volume_cbk] 0-cli: Received resp to create volume
[2018-01-31 15:27:31.018602] I [input.c:31:cli_batch] 0-: Exiting with: 0
[2018-01-31 15:27:31.030488] I [cli.c:759:main] 0-cli: Started running gluster with version 3.10.3
[2018-01-31 15:27:31.037326] I [cli-cmd-volume.c:2320:cli_check_gsync_present] 0-: geo-replication not installed
[2018-01-31 15:27:31.037671] I [MSGID: 101190] [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2018-01-31 15:27:31.496040] I [cli-rpc-ops.c:2329:gf_cli_set_volume_cbk] 0-cli: Received resp to set
[2018-01-31 15:27:31.496214] I [input.c:31:cli_batch] 0-: Exiting with: 0
[2018-01-31 15:27:31.506834] I [cli.c:759:main] 0-cli: Started running gluster with version 3.10.3
[2018-01-31 15:27:31.513681] I [cli-cmd-volume.c:2320:cli_check_gsync_present] 0-: geo-replication not installed
[2018-01-31 15:27:31.514030] I [MSGID: 101190] [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2018-01-31 15:27:31.574845] I [cli-rpc-ops.c:2329:gf_cli_set_volume_cbk] 0-cli: Received resp to set
[2018-01-31 15:27:31.574971] I [input.c:31:cli_batch] 0-: Exiting with: 0
[2018-01-31 15:27:31.584757] I [cli.c:759:main] 0-cli: Started running gluster with version 3.10.3
[2018-01-31 15:27:31.591421] I [cli-cmd-volume.c:2320:cli_check_gsync_present] 0-: geo-replication not installed
[2018-01-31 15:27:31.591755] I [MSGID: 101190] [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2018-01-31 15:27:31.652050] I [cli-rpc-ops.c:2329:gf_cli_set_volume_cbk] 0-cli: Received resp to set
[2018-01-31 15:27:31.652182] I [input.c:31:cli_batch] 0-: Exiting with: 0
[2018-01-31 15:27:31.661223] I [cli.c:759:main] 0-cli: Started running gluster with version 3.10.3
[2018-01-31 15:27:31.667749] I [cli-cmd-volume.c:2320:cli_check_gsync_present] 0-: geo-replication not installed
[2018-01-31 15:27:31.668525] I [MSGID: 101190] [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2018-01-31 15:27:31.728504] I [cli-rpc-ops.c:2329:gf_cli_set_volume_cbk] 0-cli: Received resp to set
[2018-01-31 15:27:31.728634] I [input.c:31:cli_batch] 0-: Exiting with: 0
[2018-01-31 15:27:31.738082] I [cli.c:759:main] 0-cli: Started running gluster with version 3.10.3
[2018-01-31 15:27:31.744711] I [cli-cmd-volume.c:2320:cli_check_gsync_present] 0-: geo-replication not installed
[2018-01-31 15:27:31.745048] I [MSGID: 101190] [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2018-01-31 15:27:31.805869] I [cli-rpc-ops.c:2329:gf_cli_set_volume_cbk] 0-cli: Received resp to set
[2018-01-31 15:27:31.806021] I [input.c:31:cli_batch] 0-: Exiting with: 0
[2018-01-31 15:27:31.815240] I [cli.c:759:main] 0-cli: Started running gluster with version 3.10.3
[2018-01-31 15:27:31.821861] I [cli-cmd-volume.c:2320:cli_check_gsync_present] 0-: geo-replication not installed
[2018-01-31 15:27:31.822207] I [MSGID: 101190] [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2018-01-31 15:27:31.885159] I [cli-rpc-ops.c:2329:gf_cli_set_volume_cbk] 0-cli: Received resp to set
[2018-01-31 15:27:31.885299] I [input.c:31:cli_batch] 0-: Exiting with: 0
[2018-01-31 15:27:31.894963] I [cli.c:759:main] 0-cli: Started running gluster with version 3.10.3
[2018-01-31 15:27:31.901576] I [cli-cmd-volume.c:2320:cli_check_gsync_present] 0-: geo-replication not installed
[2018-01-31 15:27:31.901921] I [MSGID: 101190] [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2018-01-31 15:27:31.964013] I [cli-rpc-ops.c:2329:gf_cli_set_volume_cbk] 0-cli: Received resp to set
[2018-01-31 15:27:31.964153] I [input.c:31:cli_batch] 0-: Exiting with: 0
[2018-01-31 15:27:31.973426] I [cli.c:759:main] 0-cli: Started running gluster with version 3.10.3
[2018-01-31 15:27:31.979980] I [cli-cmd-volume.c:2320:cli_check_gsync_present] 0-: geo-replication not installed
[2018-01-31 15:27:31.980316] I [MSGID: 101190] [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2018-01-31 15:27:31.987820] I [cli-rpc-ops.c:2329:gf_cli_set_volume_cbk] 0-cli: Received resp to set
[2018-01-31 15:27:31.987948] I [input.c:31:cli_batch] 0-: Exiting with: 0
[2018-01-31 15:27:31.996924] I [cli.c:759:main] 0-cli: Started running gluster with version 3.10.3
[2018-01-31 15:27:32.003351] I [cli-cmd-volume.c:2320:cli_check_gsync_present] 0-: geo-replication not installed
[2018-01-31 15:27:32.003694] I [MSGID: 101190] [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2018-01-31 15:27:32.066022] I [cli-rpc-ops.c:2329:gf_cli_set_volume_cbk] 0-cli: Received resp to set
[2018-01-31 15:27:32.066174] I [input.c:31:cli_batch] 0-: Exiting with: 0
[2018-01-31 15:27:32.075567] I [cli.c:759:main] 0-cli: Started running gluster with version 3.10.3
[2018-01-31 15:27:32.082204] I [cli-cmd-volume.c:2320:cli_check_gsync_present] 0-: geo-replication not installed
[2018-01-31 15:27:32.082536] I [MSGID: 101190] [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2018-01-31 15:27:32.146825] I [cli-rpc-ops.c:2329:gf_cli_set_volume_cbk] 0-cli: Received resp to set
[2018-01-31 15:27:32.146972] I [input.c:31:cli_batch] 0-: Exiting with: 0
[2018-01-31 15:27:32.156844] I [cli.c:759:main] 0-cli: Started running gluster with version 3.10.3
[2018-01-31 15:27:32.163480] I [cli-cmd-volume.c:2320:cli_check_gsync_present] 0-: geo-replication not installed
[2018-01-31 15:27:32.163825] I [MSGID: 101190] [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2018-01-31 15:27:32.186925] I [cli-rpc-ops.c:1469:gf_cli_start_volume_cbk] 0-cli: Received resp to start volume
[2018-01-31 15:27:32.186973] I [input.c:31:cli_batch] 0-: Exiting with: -1

Actual results:
volume start fail

Expected results:
volume start ok

Additional info:

Comment 1 Atin Mukherjee 2018-02-01 10:19:18 UTC
Could you attach the glusterd and the respective brick logs from the node where this command failed?

Comment 2 zhudongmei 2018-02-02 00:56:45 UTC
Created attachment 1389836 [details]
gluster node2 log

gluster node2 log

Comment 3 zhudongmei 2018-02-02 00:57:26 UTC
Created attachment 1389837 [details]
gluser node 1 log

gluser node 1 log

Comment 4 Atin Mukherjee 2018-02-02 04:55:37 UTC
Sanju - Can you please take a look at the logs and check why the volume start commit failed?

Comment 5 Sanju 2018-02-05 05:42:40 UTC
I can't find the respective brick logs on the attachment you provided. Can you please provide the brick logs of the volume which is failed to start.

Comment 6 Sanju 2018-02-05 06:18:59 UTC
Can you please run the glusterd in debug mode and provide the logs

Comment 7 zhudongmei 2018-02-24 03:29:42 UTC
 Hi Sanju
 
  modify gluserd debug mode (/usr/lib/systemd/system/glusterd.service : Environment="LOG_LEVEL=DEBUG")

  then get the log: gluster1.zip gluster2.zip

   please have a look

Comment 8 zhudongmei 2018-02-24 03:30:48 UTC
Created attachment 1400087 [details]
gluser node 2 log-new

Comment 9 zhudongmei 2018-02-24 03:31:15 UTC
Created attachment 1400088 [details]
gluser node 1 log-new

Comment 10 zhudongmei 2018-02-28 00:53:58 UTC
add debug gluserd log,please have a look

Comment 11 Sanju 2018-03-05 07:48:09 UTC
(In reply to Sanju from comment #5)
> I can't find the respective brick logs on the attachment you provided. Can
> you please provide the brick logs of the volume which is failed to start.

I'm waiting for brick logs

Comment 12 Shyamsundar 2018-06-20 18:29:01 UTC
This bug reported is against a version of Gluster that is no longer maintained
(or has been EOL'd). See https://www.gluster.org/release-schedule/ for the
versions currently maintained.

As a result this bug is being closed.

If the bug persists on a maintained version of gluster or against the mainline
gluster repository, request that it be reopened and the Version field be marked
appropriately.

Comment 13 Red Hat Bugzilla 2023-09-14 04:16:13 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days