Bug 1420839 - Process /sbin/lvm was killed by signal 11 (SIGSEGV)
Summary: Process /sbin/lvm was killed by signal 11 (SIGSEGV)
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: lvm2
Version: 6.9
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: LVM and device-mapper development team
QA Contact: cluster-qe@redhat.com
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-02-09 15:40 UTC by michal novacek
Modified: 2017-12-06 10:38 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-12-06 10:38:52 UTC
Target Upstream Version:


Attachments (Terms of Use)
tarred abrt catch directory (2.40 MB, application/x-gzip)
2017-02-09 15:40 UTC, michal novacek
no flags Details
lvmdump -a -m (225.79 KB, application/x-gzip)
2017-02-09 17:31 UTC, michal novacek
no flags Details

Description michal novacek 2017-02-09 15:40:44 UTC
Created attachment 1248841 [details]
tarred abrt catch directory

Description of problem:

Not sure about reproducer. I have been playing with pacemaker cluster and halvm resource agent with tagged volumes and different types of lvraid. 

The scenarios I'm running have one vg with 6 pvs on top of which I create raid
lv. During the scenario I put some of the pvs offline to see whether the partial  vg is noticed by the resource agent. Then I put pvs back online and restore the missing pvs, delete lv and repeat the test with another raid.

I rememember that I saw sometimes 'kernel: device-mapper: ioctl: error adding target to table' and 'dmsetup ls' showing devices that should not be present.

I'm leaving abrt report in case you would be able to find something from it.

Comment 2 michal novacek 2017-02-09 17:31:43 UTC
Created attachment 1248864 [details]
lvmdump -a -m

Comment 4 Zdenek Kabelac 2017-02-12 16:22:58 UTC
From SOS report the system went into inconsistent state:

raidvg-raidlv_rimage_1-missing_0_0: 0 10461184 error
raidvg-raidlv_rmeta_1-missing_0_0: 0 8192 error
raidvg-raidlv_rimage_0: 0 10461184 linear

So lvm2 is in position which should not happen - but since of our incorrectly working 'degraded/partial' activation -  it's likely not fixable ATM in rhel6.

-

So is the system fliping/loosing device while undergoing PVS ?
There are some issues already fixed upstream for this.

Comment 5 michal novacek 2017-02-15 14:01:35 UTC
I'm not sure whether this is exactly the problem I hit when creating the bug
but the things I do are definitely very similar to what must have caused the
problem.

VG raidvg is not clustered and there is 'volume_list' line on each node
(simulation of pacemaker cluster exclusive activation using tagging).

[root@virt-002 ~]# export tag='--addtag pacemaker \
--config activation{volume_list=["@pacemaker"]}'

[root@virt-002 ~]# lvcreate -ay -v $tag \
    --name raidlv \
    --type raid5 \
    --extents 100%VG \
    --stripes 2 \
    --nosync \
    raidvg

[root@virt-002 ~]# echo offline > /sys/block/sda/device/state[root@virt-002 ~]# lvs -a
  /dev/sda1: read failed after 0 of 2048 at 0: Input/output error
  /dev/sda1: read failed after 0 of 512 at 1069154304: Input/output error
  /dev/sda1: read failed after 0 of 512 at 1069244416: Input/output error
  /dev/sda1: read failed after 0 of 512 at 0: Input/output error
  /dev/sda1: read failed after 0 of 512 at 4096: Input/output error
  Couldn't find device with uuid rkIG43-24L6-95BN-s5ui-3lCw-DkyD-S09zcE.
  Couldn't find device for segment belonging to raidvg/raidlv_rimage_2 while checking used and assumed devices.
  LV                VG         Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  raidlv            raidvg     Rwi-a-r-p-   3.96g                                    100.00          
  [raidlv_rimage_0] raidvg     iwi-aor---   1.98g                                                    
  [raidlv_rimage_1] raidvg     iwi-aor---   1.98g                                                    
  [raidlv_rimage_2] raidvg     iwi-aor-p-   1.98g                                                    
  [raidlv_rmeta_0]  raidvg     ewi-aor---   4.00m                                                    
  [raidlv_rmeta_1]  raidvg     ewi-aor---   4.00m                                                    
  [raidlv_rmeta_2]  raidvg     ewi-aor-p-   4.00m                                                    
  lv_root           vg_virt002 -wi-ao----   7.00g                                                    
  lv_swap           vg_virt002 -wi-ao---- 852.00m                                                    

[root@virt-002 ~]# lvchange -an $tag raidvg/raidlv
  /dev/sda1: read failed after 0 of 512 at 1069154304: Input/output error
  /dev/sda1: read failed after 0 of 512 at 1069244416: Input/output error
  /dev/sda1: read failed after 0 of 512 at 0: Input/output error
  /dev/sda1: read failed after 0 of 512 at 4096: Input/output error
  /dev/sda1: read failed after 0 of 2048 at 0: Input/output error
  Couldn't find device with uuid rkIG43-24L6-95BN-s5ui-3lCw-DkyD-S09zcE.
  Couldn't find device for segment belonging to raidvg/raidlv_rimage_2 while checking used and assumed devices.
  Logical volume "raidlv" changed.
[root@virt-002 ~]# echo $?
0

[root@virt-002 ~]# lvs -a
  /dev/sda1: open failed: No such device or address
  Couldn't find device with uuid rkIG43-24L6-95BN-s5ui-3lCw-DkyD-S09zcE.
  LV                VG         Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  raidlv            raidvg     Rwi---r-p-   3.96g                                                    
  [raidlv_rimage_0] raidvg     Iwi---r---   1.98g                                                    
  [raidlv_rimage_1] raidvg     Iwi---r---   1.98g                                                    
  [raidlv_rimage_2] raidvg     Iwi---r-p-   1.98g                                                    
  [raidlv_rmeta_0]  raidvg     ewi---r---   4.00m                                                    
  [raidlv_rmeta_1]  raidvg     ewi---r---   4.00m                                                    
  [raidlv_rmeta_2]  raidvg     ewi---r-p-   4.00m                                                    
  lv_root           vg_virt002 -wi-ao----   7.00g                                                    
  lv_swap           vg_virt002 -wi-ao---- 852.00m


[root@virt-002 ~]# ssh virt-003
[root@virt-003 ~]# export tag='--addtag pacemaker --config activation{volume_list=["@pacemaker"]}'
[root@virt-003 ~]# lvchange -ay $tag raidvg/raidlv
  Logical volume "raidlv" changed.
  device-mapper: create ioctl on raidvg-raidlv_rmeta_0LVM-qQFP8dgOEdKibSkI4HWhACi0gWxrXsNipxtc1y3EqHSL25Kzi5qRBtDssMf3hYXg failed: Device or resource busy
[root@virt-003 ~]# lvs -a
  LV                VG         Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  raidlv            raidvg     Rwi---r-p-   3.96g                                                    
  [raidlv_rimage_0] raidvg     Iwi---r---   1.98g                                                    
  [raidlv_rimage_1] raidvg     Iwi---r---   1.98g                                                    
  [raidlv_rimage_2] raidvg     Iwi---r-p-   1.98g                                                    
  [raidlv_rmeta_0]  raidvg     ewi---r---   4.00m                                                    
  [raidlv_rmeta_1]  raidvg     ewi---r---   4.00m                                                    
  [raidlv_rmeta_2]  raidvg     ewi---r-p-   4.00m                                                    
  lv_root           vg_virt003 -wi-ao----   7.00g                                                    
  lv_swap           vg_virt003 -wi-ao---- 852.00m                                                    
[root@virt-003 ~]# vgchange -ay $tag raidvg
  Volume group "raidvg" successfully changed
  device-mapper: create ioctl on raidvg-raidlv_rmeta_0LVM-qQFP8dgOEdKibSkI4HWhACi0gWxrXsNipxtc1y3EqHSL25Kzi5qRBtDssMf3hYXg failed: Device or resource busy
  0 logical volume(s) in volume group "raidvg" now active

Comment 6 Zdenek Kabelac 2017-02-15 14:33:58 UTC
Well since lvm.conf  suggest that just 'warn' policy is used for 'raid_fault_policy' - lvm2 is kind of blind ATM to resolve any sort of double failure.

We know we need to improve this - but it's not solved upstream at all yet.

Comment 7 Jan Kurik 2017-12-06 10:38:52 UTC
Red Hat Enterprise Linux 6 is in the Production 3 Phase. During the Production 3 Phase, Critical impact Security Advisories (RHSAs) and selected Urgent Priority Bug Fix Advisories (RHBAs) may be released as they become available.

The official life cycle policy can be reviewed here:

http://redhat.com/rhel/lifecycle

This issue does not meet the inclusion criteria for the Production 3 Phase and will be marked as CLOSED/WONTFIX. If this remains a critical requirement, please contact Red Hat Customer Support to request a re-evaluation of the issue, citing a clear business justification. Note that a strong business justification will be required for re-evaluation. Red Hat Customer Support can be contacted via the Red Hat Customer Portal at the following URL:

https://access.redhat.com/


Note You need to log in before you can comment on or make changes to this bug.