Bug 990397
Summary: | VMs are moving to paused state because file descriptors going bad | ||||||
---|---|---|---|---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | Pranith Kumar K <pkarampu> | ||||
Component: | replicate | Assignee: | Pranith Kumar K <pkarampu> | ||||
Status: | CLOSED DUPLICATE | QA Contact: | |||||
Severity: | unspecified | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | mainline | CC: | cdhouch, gluster-bugs, ndevos, ppquant, samppah, tis | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2014-07-11 18:18:47 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | 998352 | ||||||
Bug Blocks: | |||||||
Attachments: |
|
Description
Pranith Kumar K
2013-07-31 06:29:21 UTC
Created attachment 780939 [details]
Logs which lead to paused vms.
We are seeing this symptom as well with Ovirt 3.3 latest builds and glusterfs-3.4 on CentOS 6.4. Any attempt at a volume rebalance leads to paused vm's that need to be force powered off then restarted to regain functionality. They do not recover from paused status. Will provide logs if requested. This is using: distributed replicated gluster volume glusterfs-cli-3.4.0-8.el6.x86_64 glusterfs-3.4.0-8.el6.x86_64 glusterfs-fuse-3.4.0-8.el6.x86_64 glusterfs-server-3.4.0-8.el6.x86_64 glusterfs-libs-3.4.0-8.el6.x86_64 glusterfs-geo-replication-3.4.0-8.el6.x86_64 ovirt-engine-websocket-proxy-3.3.0-0.3.beta1.el6.noarch ovirt-engine-setup-3.3.0-0.2.master.20130814135834.gitb3a5fe3.el6.noarch ovirt-engine-sdk-python-3.3.0.5-1.20130814.git988a6d3.el6.noarch ovirt-engine-3.3.0-0.2.master.20130814135834.gitb3a5fe3.el6.noarch ovirt-engine-backend-3.3.0-0.2.master.20130814135834.gitb3a5fe3.el6.noarch ovirt-image-uploader-3.3.0-0.2.master.20130715.git7674462.el6.noarch ovirt-host-deploy-1.1.0-0.2.master.20130813.gitd813ae3.el6.noarch ovirt-engine-userportal-3.3.0-0.2.master.20130814135834.gitb3a5fe3.el6.noarch ovirt-engine-restapi-3.3.0-0.2.master.20130814135834.gitb3a5fe3.el6.noarch ovirt-engine-cli-3.3.0.4-1.20130718.gite0f993f.el6.noarch ovirt-engine-webadmin-portal-3.3.0-0.2.master.20130814135834.gitb3a5fe3.el6.noarch ovirt-engine-dbscripts-3.3.0-0.2.master.20130814135834.gitb3a5fe3.el6.noarch ovirt-host-deploy-java-1.1.0-0.2.master.20130813.gitd813ae3.el6.noarch ovirt-engine-tools-3.3.0-0.2.master.20130814135834.gitb3a5fe3.el6.noarch ovirt-engine-lib-3.3.0-0.3.beta1.el6.noarch ovirt-log-collector-3.3.0-0.2.master.20130723.git77829a0.el6.noarch ovirt-iso-uploader-3.3.0-0.2.master.20130813.git0067d55.el6.noarch Gluster volume info: Volume Name: vmstorage Type: Distributed-Replicate Volume ID: cb521898-912c-491f-adc7-3373b8b7d9a5 Status: Started Number of Bricks: 15 x 2 = 30 Transport-type: tcp Bricks: Brick1: 192.168.12.108:/bricks/brick1 Brick2: 192.168.12.109:/bricks/brick1 Brick3: 192.168.12.108:/bricks/brick2 Brick4: 192.168.12.109:/bricks/brick2 Brick5: 192.168.12.110:/bricks/brick2 Brick6: 192.168.12.112:/bricks/brick2 Brick7: 192.168.12.108:/bricks/brick3 Brick8: 192.168.12.109:/bricks/brick3 Brick9: 192.168.12.110:/bricks/brick3 Brick10: 192.168.12.112:/bricks/brick3 Brick11: 192.168.12.108:/bricks/brick4 Brick12: 192.168.12.109:/bricks/brick4 Brick13: 192.168.12.110:/bricks/brick4 Brick14: 192.168.12.112:/bricks/brick4 Brick15: 192.168.12.108:/bricks/brick5 Brick16: 192.168.12.109:/bricks/brick5 Brick17: 192.168.12.110:/bricks/brick5 Brick18: 192.168.12.112:/bricks/brick5 Brick19: 192.168.12.108:/bricks/brick6 Brick20: 192.168.12.109:/bricks/brick6 Brick21: 192.168.12.110:/bricks/brick6 Brick22: 192.168.12.112:/bricks/brick6 Brick23: 192.168.12.108:/bricks/brick7 Brick24: 192.168.12.109:/bricks/brick7 Brick25: 192.168.12.110:/bricks/brick7 Brick26: 192.168.12.112:/bricks/brick7 Brick27: 192.168.12.108:/bricks/brick8 Brick28: 192.168.12.109:/bricks/brick8 Brick29: 192.168.12.110:/bricks/brick8 Brick30: 192.168.12.112:/bricks/brick8 Options Reconfigured: performance.open-behind: off network.remote-dio: on cluster.eager-lock: enable performance.stat-prefetch: off performance.io-cache: off performance.read-ahead: off performance.quick-read: off server.allow-insecure: on storage.owner-gid: 36 storage.owner-uid: 36 From Ovirt engine.log 2013-08-15 11:37:51,345 INFO [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-61) VM spwalkp1m2 52137001-54d0-47ba-b096-a84f1c9457e6 moved from Up --> Paused 2013-08-15 11:37:51,354 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-61) Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM spwalkp1m2 has paused due to unknown storage error. 2013-08-15 11:40:04,307 INFO [org.ovirt.engine.core.bll.RunVmCommand] (ajp--127.0.0.1-8702-1) [214acbfc] Lock Acquired to object EngineLock [exclusiveLocks= key: 52137001-54d0-47ba-b096-a84f1c9457e6 value: VM , sharedLocks= ] From gluster node nfsp14m2 bricks-brick8.log bricks/bricks-brick8.log:[2013-08-15 16:38:05.819315] E [posix.c:2135:posix_writev] 0-vmstorage-posix: write failed: offset 749867008, Bad file descriptor bricks/bricks-brick8.log:[2013-08-15 16:38:05.819431] I [server-rpc-fops.c:1439:server_writev_cbk] 0-vmstorage-server: 1019: WRITEV 0 (4ed70355-1ef5-4283-bfc9-d75399f08b0f) ==> (Bad file descriptor) bricks/bricks-brick8.log:[2013-08-15 16:38:05.819530] E [posix.c:2135:posix_writev] 0-vmstorage-posix: write failed: offset 352321536, Bad file descriptor bricks/bricks-brick8.log:[2013-08-15 16:38:05.819565] I [server-rpc-fops.c:1439:server_writev_cbk] 0-vmstorage-server: 1020: WRITEV 0 (4ed70355-1ef5-4283-bfc9-d75399f08b0f) ==> (Bad file descriptor) bricks/bricks-brick8.log:[2013-08-15 16:38:05.819629] E [posix.c:2135:posix_writev] 0-vmstorage-posix: write failed: offset 352329728, Bad file descriptor bricks/bricks-brick8.log:[2013-08-15 16:38:05.819659] I [server-rpc-fops.c:1439:server_writev_cbk] 0-vmstorage-server: 1021: WRITEV 0 (4ed70355-1ef5-4283-bfc9-d75399f08b0f) ==> (Bad file descriptor) From gluster node nfsp15m2 bricks-brick8.log bricks/bricks-brick8.log:[2013-08-15 16:38:05.181461] E [posix.c:2135:posix_writev] 0-vmstorage-posix: write failed: offset 749867008, Bad file descriptor bricks/bricks-brick8.log:[2013-08-15 16:38:05.181548] I [server-rpc-fops.c:1439:server_writev_cbk] 0-vmstorage-server: 984: WRITEV 0 (4ed70355-1ef5-4283-bfc9-d75399f08b0f) ==> (Bad file descriptor) bricks/bricks-brick8.log:[2013-08-15 16:38:05.181657] E [posix.c:2135:posix_writev] 0-vmstorage-posix: write failed: offset 352321536, Bad file descriptor bricks/bricks-brick8.log:[2013-08-15 16:38:05.181691] I [server-rpc-fops.c:1439:server_writev_cbk] 0-vmstorage-server: 985: WRITEV 0 (4ed70355-1ef5-4283-bfc9-d75399f08b0f) ==> (Bad file descriptor) bricks/bricks-brick8.log:[2013-08-15 16:38:05.181756] E [posix.c:2135:posix_writev] 0-vmstorage-posix: write failed: offset 352329728, Bad file descriptor bricks/bricks-brick8.log:[2013-08-15 16:38:05.181785] I [server-rpc-fops.c:1439:server_writev_cbk] 0-vmstorage-server: 986: WRITEV 0 (4ed70355-1ef5-4283-bfc9-d75399f08b0f) ==> (Bad file descriptor) COMMIT: http://review.gluster.org/5601 committed in master by Anand Avati (avati) ------ commit 41fa8da33435b8ba05a7eddbccddd96cde1aa762 Author: Raghavendra Bhat <raghavendra> Date: Tue Aug 13 19:47:01 2013 +0530 mount/fuse: save the basefd flags in the new fd Upon graph switch, the basefd's flags were not saved in the new fd created for the new graph upon which all the further requests for the open file would come. Thus posix was treating the fd as a read-only fd and was denying the write on the fds. Change-Id: I781b62b376a85d1a938c091559270c3f242f1a2a BUG: 998352 Signed-off-by: Raghavendra Bhat <raghavendra> Reviewed-on: http://review.gluster.org/5601 Reviewed-by: Amar Tumballi <amarts> Tested-by: Gluster Build System <jenkins.com> Reviewed-by: Anand Avati <avati> Hi, the patch above, which fixes the bug 998352 which has similar symptoms should address this issue as well. Moving this bug to MODIFIED State for now. Please feel free to re-open if the issue still persists. Pranith. *** This bug has been marked as a duplicate of bug 998352 *** |