Bug 741212

Summary: hung task while running test_resize_io(PoolResizeTests)
Product: Red Hat Enterprise Linux 6 Reporter: Qian Cai <qcai>
Component: device-mapperAssignee: Heinz Mauelshagen <heinzm>
Status: CLOSED CURRENTRELEASE QA Contact: Cluster QE <mspqa-list>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 6.3CC: agk, bmarzins, coughlan, dwysocha, heinzm, jbrassow, mbroz, msnitzer, prajnoha, prockai, thornber, zkabelac
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-01-04 17:06:33 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 756082    

Description Qian Cai 2011-09-26 09:45:09 UTC
Description of problem:
INFO: task dd:23088 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
dd              D c2547d74     0 23088  23083 0x00000080
 e718ea70 00000082 00000002 c2547d74 00000000 00000000 ed4fe570 edb47580
 c0af3580 00120000 00000000 c0af3580 c0af3580 c0af3580 c0af3580 00000001
 00000000 c05bcafb 1fb05000 00000004 e6032000 00000030 000f0000 1fb05000
Call Trace:
 [<c05bcafb>] ? submit_bio+0x6b/0x100
 [<c047d31b>] ? ktime_get_ts+0xbb/0xf0
 [<c083398e>] ? io_schedule+0x6e/0xb0
 [<c054cf42>] ? dio_await_completion+0x42/0xb0
 [<c054d3e6>] ? direct_io_worker+0x316/0x400
 [<c054b213>] ? blkdev_direct_IO+0x63/0x80
 [<c054a490>] ? block_llseek+0x100/0x100
 [<c04d9899>] ? generic_file_direct_write+0x119/0x240
 [<c04db0a0>] ? __generic_file_aio_write+0x330/0x4f0
 [<c0483a89>] ? tick_program_event+0x19/0x20
 [<c047764d>] ? hrtimer_interrupt+0x1ad/0x290
 [<c054a76b>] ? blkdev_aio_write+0x3b/0xa0
 [<c051dcf5>] ? do_sync_write+0xb5/0xf0
 [<c051de86>] ? rw_verify_area+0x66/0x120
 [<c051dc40>] ? do_sync_readv_writev+0xf0/0xf0
 [<c051e2e0>] ? vfs_write+0xa0/0x170
 [<c051e481>] ? sys_write+0x41/0x80
 [<c083c35f>] ? sysenter_do_call+0x12/0x28

- checked that the pool was not full yet.
- it can be reproduced using the default volume size.
- changed volume_size in config.rb to 1097152 or 1097151, failed in another way.

  1) Error:
test_resize_io(PoolResizeTests):
RuntimeError: wipe sub process failed
    ./pool_resize_tests.rb:75:in `resize_io_many'
    ./lib/thinp-test.rb:56:in `with_thin'
    ./lib/dm.rb:164:in `with_dev'
    ./lib/thinp-test.rb:55:in `with_thin'
    ./lib/thinp-test.rb:62:in `with_new_thin'
    ./pool_resize_tests.rb:58:in `resize_io_many'
    ./lib/thinp-test.rb:50:in `with_standard_pool'
    ./lib/dm.rb:164:in `with_dev'
    ./lib/thinp-test.rb:49:in `with_standard_pool'
    ./pool_resize_tests.rb:57:in `resize_io_many'
    ./pool_resize_tests.rb:82:in `test_resize_io'

Version-Release number of selected component (if applicable):
upstream thinp git tree/branch

How reproducible:
always

Steps to Reproduce:
config.rb looks like this.

    { :metadata_dev => '/dev/mapper/tgroup-lvol0',
      :data_dev => '/dev/mapper/tgroup-lvol1',

# lvdisplay 
  --- Logical volume ---
  LV Name                /dev/tgroup/lvol0
  VG Name                tgroup
  LV UUID                GQBp3l-4TpW-RSnj-2CG3-YlXw-pOrM-v20scG
  LV Write Access        read/write
  LV Status              available
  # open                 0
  LV Size                320.00 MiB
  Current LE             80
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           253:2
   
  --- Logical volume ---
  LV Name                /dev/tgroup/lvol1
  VG Name                tgroup
  LV UUID                3XaqtZ-j2gy-w0jJ-mQeN-R33o-H3BN-U5QXA3
  LV Write Access        read/write
  LV Status              available
  # open                 0
  LV Size                16.00 GiB
  Current LE             4096
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           253:3

Comment 2 Joe Thornber 2011-09-26 10:55:22 UTC
This sounds very similar to an issue I fixed last week.  Could someone (Mike, CAI) tell me what the git commit reference was for this build of thinp please?

Comment 3 Joe Thornber 2011-09-26 10:56:45 UTC
I'm also wondering if the status line should show how many bios are queued on the 'retry' list awaiting resizing.  This makes diagnosing these faults a lot easier (that's how I ascertained what was happening on the fusion-io machine last week).

Comment 4 Qian Cai 2011-09-27 01:12:05 UTC
(In reply to comment #2)
> This sounds very similar to an issue I fixed last week.  Could someone (Mike,
> CAI) tell me what the git commit reference was for this build of thinp please?
linux-2.6-thinp git tree thin-dev branch, topmost commit is,
37804cda8cb7fe3a5c5962b34bf2d884f08e0a56

If running this test through Mike's latest 6.2-based brew build, it generates those in the console,

device-mapper: thin: 253:4: reached low water mark, sending event.
device-mapper: thin: 253:4: reached low water mark, sending event.
device-mapper: thin: 253:4: reached low water mark, sending event.
device-mapper: thin: 253:4: no free space available.

# dmsetup status
mpathb: 0 41943040 multipath 2 0 1 0 1 1 A 0 2 0 8:32 A 0 8:64 A 0 
mpatha: 0 41943040 multipath 2 0 1 0 1 1 A 0 2 0 8:16 A 0 8:48 A 0 
test-dev-400967: 0 1048576 thin-pool 1 654744 131200 -
test-dev-192738: 0 2097152 thin 917376 917503
tgroup-lvol1: 0 33554432 linear 
tgroup-lvol0: 0 655360 linear

Comment 5 Joe Thornber 2011-09-27 11:14:56 UTC
ok, wait for a more recent build.  I believe that's been fixed in latest git.

Comment 6 Heinz Mauelshagen 2011-09-27 11:41:05 UTC
Not in 1b2b78a1b62180c7e58ca834517a8713dd08bc01 dm-thin-pool and fb735be974caaacd512b117f47c936890617444b test suite as tested by me recently.

Comment 7 Mike Snitzer 2011-09-28 18:26:19 UTC
(In reply to comment #2)
> This sounds very similar to an issue I fixed last week.  Could someone (Mike,
> CAI) tell me what the git commit reference was for this build of thinp please?

The build I did last week was sync'd to thin-dev commit 481a50d727eea

I've started a new scratch brew build, that is sync'd to thin-dev commit f1898e2c98a (mikulas' bufio update), here:

http://brewweb.devel.redhat.com/brew/taskinfo?taskID=3670851

Comment 8 Qian Cai 2011-10-26 09:04:20 UTC
Unable to reproduce this anymore with the latest upstream thinp git, thin-dev branch, top commit c265f154c9.

Comment 10 Tom Coughlan 2012-01-04 17:06:33 UTC
(In reply to comment #8)
> Unable to reproduce this anymore with the latest upstream thinp git, thin-dev
> branch, top commit c265f154c9.

Okay, thanks for testing. Closing.