Bug 1303571
Summary: | Bricks used by glusterfs get unmounted from their respective nodes while attempting to stress. | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Ambarish <asoman> |
Component: | lvm2 | Assignee: | LVM and device-mapper development team <lvm-team> |
lvm2 sub component: | Other | QA Contact: | cluster-qe <cluster-qe> |
Status: | CLOSED NOTABUG | Docs Contact: | |
Severity: | urgent | ||
Priority: | unspecified | CC: | agk, heinzm, jbrassow, msnitzer, prajnoha, prockai, zkabelac |
Version: | 7.2 | ||
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2016-02-01 11:02:51 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Ambarish
2016-02-01 10:44:07 UTC
Tier logs,brick logs and sosreports copied here: http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1303571/ dmesg copied to same location The environment is preserved for further debugging by the LVM team. (In reply to Ambarish from comment #0) > Description of problem: > > The volumes are created using RHGS(Red Hat Gluster Storage) which uses the > underlying LVM PVs as bricks as the building block. > While running parallel I/O,the bricks get unmounted from the nodes (after > ~20 mins of starting the workload). > This looks like an LVM issue(outside the scope of RHGS). > The VGs,PVs and LV are intact. > > One of the problematic LVs is RHS_vg6/RHS_lv6 (from Server : 10.70.37.134) > > > *SNIPPET FROM LOGS* : > > > Jan 30 19:01:36 dhcp37-134 lvm[16942]: WARNING: Device for PV > Kc8B3r-1Qg1-kfVy-VU6c-1WlR-rczA-0q0eRO not found or rejected by a filter. > Jan 30 19:01:36 dhcp37-134 lvm[16942]: Cannot change VG RHS_vg4 while PVs > are missing. > Jan 30 19:01:36 dhcp37-134 lvm[16942]: Consider vgreduce --removemissing. > Jan 30 19:01:36 dhcp37-134 lvm[16942]: Failed to extend thin > RHS_vg4-RHS_pool4-tpool. > Jan 30 19:01:36 dhcp37-134 lvm[16942]: Unmounting thin volume > RHS_vg4-RHS_pool4-tpool from /rhs/brick4. > So you get clear log output what happens. It's intended lvm2 behavior - the failure of thin-pool extension is currently associated with unmount of every related thin-volume - to avoid bigger disaster to happen (pool overfilling). If you want higher 'occupancy' of thin-pool - raise-up threshold, lvm2 currently tries to avoid overfilling of thin-pool by dropping thin-volumes from being used (and thus potentially generating further load on thin-pool). As a fix - provide more space in VG so the thin-pool resize does not fail. Use higher percentage (up to 95%) for thin-pool resize. Use smaller 'resize' step (down to 1%) (thought more resize operations will appear and my slow down thin-pool usage a bit more) lvm2 currently does not provide configurable options for dmeventd behavior if you want other lvm2 behavior then described - please open RFE. We do plain to provide some more 'fine-grained' policy modes.... No need for further debugging. It works as designed, thus closing this BZ. |