Bug 1505942

Summary: runner/tcmu: returns incorrect response sizes for non RW commands
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Mike Christie <mchristi>
Component: iSCSIAssignee: Mike Christie <mchristi>
Status: CLOSED WORKSFORME QA Contact: Tejas <tchandra>
Severity: urgent Docs Contact: Bara Ancincova <bancinco>
Priority: high    
Version: 3.0CC: bniver, ceph-eng-bugs, ceph-qe-bugs, flucifre, hnallurv, jdillama, mchristi
Target Milestone: rc   
Target Release: 3.*   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Known Issue
Doc Text:
.The iSCSI gateway can fail to scan or setup LUNs When using the iSCSI gateway, the Linux initiators can return the `kzalloc` failures due to buffers being too large. In addition, the VMWare ESX initiators can return the `READ_CAP` failures due to not being able to copy the data. As a consequence, the iSCSI gateway fails to scan or setup Logical Unit Numbers (LUNs), find or rediscover devices, and add the devices back after path failures.
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-02-15 22:34:20 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1494421    

Description Mike Christie 2017-10-24 15:40:29 UTC
Description of problem:

On the linux initiator side you might see kzalloc failures due to buffers being too large and on ESX you might see READ_CAP failures due to it not being able to copy the data. LUN scanning/setup might then fail so devices are not found or rediscovered and added back after path failures.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 10 Jason Dillaman 2017-10-26 19:48:46 UTC
Oct 10 18:16:09 localhost kernel: WARNING: CPU: 12 PID: 9299 at mm/page_alloc.c:2902 __alloc_pages_slowpath+0x6f/0x724
Oct 10 18:16:09 localhost kernel: Modules linked in: ext4 mbcache jbd2 dm_queue_length iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi sb_edac edac_core intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd mei_me iTCO_wdt sg iTCO_vendor_support joydev mxm_wmi lpc_ich mei pcspkr i2c_i801 shpchp ipmi_ssif ipmi_si ipmi_devintf wmi ipmi_msghandler acpi_power_meter nfsd auth_rpcgss nfs_acl lockd dm_multipath grace sunrpc ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ixgbe drm ahci libahci libata crct10dif_pclmul be2net megaraid_sas crct10dif_common crc32c_intel mdio ptp i2c_core pps_core dca dm_mirror dm_region_hash
Oct 10 18:16:09 localhost kernel: dm_log dm_mod
Oct 10 18:16:09 localhost kernel: CPU: 12 PID: 9299 Comm: kworker/u40:2 Not tainted 3.10.0-693.el7.x86_64 #1
Oct 10 18:16:09 localhost kernel: Hardware name: FUJITSU PRIMERGY RX2530 M2/D3279-B1, BIOS V5.0.0.11 R1.7.0 for D3279-B1x                     04/21/2016
Oct 10 18:16:09 localhost kernel: Workqueue: kmpath_handlerd activate_path [dm_multipath]
Oct 10 18:16:09 localhost kernel: 0000000000000000 000000001dde5f0b ffff8807766379e0 ffffffff816a3d91
Oct 10 18:16:09 localhost kernel: ffff880776637a20 ffffffff810879c8 00000b5681033619 0000000000100010
Oct 10 18:16:09 localhost kernel: 0000000000104010 ffff88087ffd7000 0000000000000000 0000000000124010
Oct 10 18:16:09 localhost kernel: Call Trace:
Oct 10 18:16:09 localhost kernel: [<ffffffff816a3d91>] dump_stack+0x19/0x1b
Oct 10 18:16:09 localhost kernel: [<ffffffff810879c8>] __warn+0xd8/0x100
Oct 10 18:16:09 localhost kernel: [<ffffffff81087b0d>] warn_slowpath_null+0x1d/0x20
Oct 10 18:16:09 localhost kernel: [<ffffffff8169f723>] __alloc_pages_slowpath+0x6f/0x724
Oct 10 18:16:09 localhost kernel: [<ffffffff8109927e>] ? try_to_del_timer_sync+0x5e/0x90
Oct 10 18:16:09 localhost kernel: [<ffffffff8118cd85>] __alloc_pages_nodemask+0x405/0x420
Oct 10 18:16:09 localhost kernel: [<ffffffff811d1108>] alloc_pages_current+0x98/0x110
Oct 10 18:16:09 localhost kernel: [<ffffffff8118760e>] __get_free_pages+0xe/0x40
Oct 10 18:16:09 localhost kernel: [<ffffffff811dcaae>] kmalloc_order_trace+0x2e/0xa0
Oct 10 18:16:09 localhost kernel: [<ffffffff811e0641>] __kmalloc+0x211/0x230
Oct 10 18:16:09 localhost kernel: [<ffffffff8147b376>] realloc_buffer+0x36/0x70
Oct 10 18:16:09 localhost kernel: [<ffffffff8147b8bb>] alua_rtpg+0x50b/0x630
Oct 10 18:16:09 localhost kernel: [<ffffffff810cd794>] ? update_curr+0x104/0x190
Oct 10 18:16:09 localhost kernel: [<ffffffff810ca29e>] ? account_entity_dequeue+0xae/0xd0
Oct 10 18:16:09 localhost kernel: [<ffffffff810cdc7c>] ? dequeue_entity+0x11c/0x5d0
Oct 10 18:16:09 localhost kernel: [<ffffffffc00dfc20>] ? reinstate_path+0x180/0x180 [dm_multipath]
Oct 10 18:16:09 localhost kernel: [<ffffffff8147ba17>] alua_activate+0x37/0x2a0
Oct 10 18:16:09 localhost kernel: [<ffffffffc00dfc20>] ? reinstate_path+0x180/0x180 [dm_multipath]
Oct 10 18:16:09 localhost kernel: [<ffffffff81477793>] scsi_dh_activate+0xc3/0x160
Oct 10 18:16:09 localhost kernel: [<ffffffffc00dfeaa>] activate_path+0x5a/0x60 [dm_multipath]
Oct 10 18:16:09 localhost kernel: [<ffffffff810a881a>] process_one_work+0x17a/0x440
Oct 10 18:16:09 localhost kernel: [<ffffffff810a94e6>] worker_thread+0x126/0x3c0
Oct 10 18:16:09 localhost kernel: [<ffffffff810a93c0>] ? manage_workers.isra.24+0x2a0/0x2a0
Oct 10 18:16:09 localhost kernel: [<ffffffff810b098f>] kthread+0xcf/0xe0
Oct 10 18:16:09 localhost kernel: [<ffffffff8108ddeb>] ? do_exit+0x6bb/0xa40
Oct 10 18:16:09 localhost kernel: [<ffffffff810b08c0>] ? insert_kthread_work+0x40/0x40
Oct 10 18:16:09 localhost kernel: [<ffffffff816b4f18>] ret_from_fork+0x58/0x90
Oct 10 18:16:09 localhost kernel: [<ffffffff810b08c0>] ? insert_kthread_work+0x40/0x40
Oct 10 18:16:09 localhost kernel: ---[ end trace 560d82aec7daf644 ]---
Oct 10 18:16:09 localhost kernel: sd 11:0:0:40: alua_rtpg: kmalloc buffer failed

Comment 18 Brett Niver 2017-10-30 13:49:14 UTC
Moving to 3.1

Comment 24 Mike Christie 2019-02-15 22:34:20 UTC
Closing for now. We have not been able to replicate it and it's been over a year.