Bug 1108176
| Summary: | Sanlock issues after upgrading to 3.4 | ||||||
|---|---|---|---|---|---|---|---|
| Product: | [Retired] oVirt | Reporter: | Jairo Rizzo <jairo.rizzo> | ||||
| Component: | vdsm | Assignee: | Federico Simoncelli <fsimonce> | ||||
| Status: | CLOSED WORKSFORME | QA Contact: | Gil Klein <gklein> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 3.4 | CC: | acathrow, amureini, bazulay, bugs, gklein, iheim, mgoldboi, yeylon | ||||
| Target Milestone: | --- | ||||||
| Target Release: | 3.4.3 | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | storage | ||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2014-06-18 11:39:11 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
|
Description
Jairo Rizzo
2014-06-11 14:02:48 UTC
A more recent issue just happend about 2 hours on HV2: Jun 8 04:59:51 hv2 sanlock[2163]: 2014-06-08 04:59:51-0400 40884 [2163]: s1 check_our_lease warning 71 last_success 40813 Jun 8 04:59:52 hv2 sanlock[2163]: 2014-06-08 04:59:52-0400 40885 [2163]: s1 check_our_lease warning 72 last_success 40813 Jun 8 04:59:53 hv2 sanlock[2163]: 2014-06-08 04:59:53-0400 40886 [2163]: s1 check_our_lease warning 73 last_success 40813 Jun 8 04:59:58 hv2 wdmd[2152]: test warning now 40891 ping 40881 close 0 renewal 40813 expire 40893 client 2163 sanlock_84975a07-3abb-4c9a-bf37-f64920298b0c:2 Jun 8 04:59:58 hv2 wdmd[2152]: /dev/watchdog closed unclean Jun 8 04:59:58 hv2 kernel: SoftDog: Unexpected close, not stopping watchdog! Jun 8 04:59:59 hv2 sanlock[2163]: 2014-06-08 04:59:59-0400 40892 [2163]: s1 check_our_lease warning 79 last_success 40813 Jun 8 04:59:59 hv2 wdmd[2152]: test warning now 40892 ping 40881 close 40891 renewal 40813 expire 40893 client 2163 sanlock_84975a07-3abb-4c9a-bf37-f64920298b0c:2 Jun 8 05:00:00 hv2 sanlock[2163]: 2014-06-08 05:00:00-0400 40893 [2163]: s1 check_our_lease failed 80 Jun 8 05:00:00 hv2 wdmd[2152]: test failed rem 58 now 40893 ping 40881 close 40891 renewal 40813 expire 40893 client 2163 sanlock_84975a07-3abb-4c9a-bf37-f64920298b0c:2 Jun 8 05:00:00 hv2 sanlock[2163]: 2014-06-08 05:00:00-0400 40893 [2163]: s1 all pids clear Jun 8 05:00:01 hv2 wdmd[2152]: /dev/watchdog reopen Jun 8 05:01:20 hv2 sanlock[2163]: 2014-06-08 05:01:20-0400 40973 [3038]: s1 delta_renew write error -202 Jun 8 05:01:20 hv2 sanlock[2163]: 2014-06-08 05:01:20-0400 40973 [3038]: s1 renewal error -202 delta_length 140 last_success 40813 Jun 8 05:01:30 hv2 sanlock[2163]: 2014-06-08 05:01:30-0400 40983 [3038]: 84975a07 close_task_aio 0 0x7ff5a80008c0 busy Jun 8 05:02:20 hv2 sanlock[2163]: 2014-06-08 05:02:20-0400 41033 [3038]: 84975a07 close_task_aio 0 0x7ff5a80008c0 busy Jun 8 05:02:27 hv2 kernel: INFO: task python:3013 blocked for more than 120 seconds. Jun 8 05:02:27 hv2 kernel: Not tainted 2.6.32-431.3.1.el6.x86_64 #1 Jun 8 05:02:27 hv2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jun 8 05:02:27 hv2 kernel: python D 0000000000000002 0 3013 2852 0x00000084 Jun 8 05:02:27 hv2 kernel: ffff8807f62a7a78 0000000000000082 ffff8807f62a7a18 ffffffff81054839 Jun 8 05:02:27 hv2 kernel: ffff8807f62a7a08 0000000300000000 ffff8807f62a7a18 ffff88080f65dc40 Jun 8 05:02:27 hv2 kernel: ffff880812a805f8 ffff8807f62a7fd8 000000000000fbc8 ffff880812a805f8 Jun 8 05:02:27 hv2 kernel: Call Trace: Jun 8 05:02:27 hv2 kernel: [<ffffffff81054839>] ? __wake_up_common+0x59/0x90 I reset back the GlusterFS volume options (for the two replicated volumes I have) to the recommended values below: Option Key Option Value network.remote-dio enable performance.io-cache off cluster.quorum-type auto performance.stat-prefetch off cluster.server-quorum-type server performance.quick-read off cluster.eager-lock enable performance.read-ahead off storage.owner-uid 36 storage.owner-gid 36 The above settings work perfectly and my cluster has been stable for the past 48 hours. I believe the issue arises when enabling some performance options above but dont have test lab to test. Thanks --Rizzo Hi Jairo please consider to read also: http://www.ovirt.org/Gluster_Storage_Domain_Reference I am closing this now, best guess is that the previous volume configuration was a getting your kernel overloaded, as per message: Jun 8 05:02:27 hv2 kernel: INFO: task python:3013 blocked for more than 120 seconds. Jun 8 05:02:27 hv2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Feel free to reopen in case you hit this problem again. Thanks. |