Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
This project is now read‑only. Starting Monday, February 2, please use Jira Cloud for all bug tracking management.

Bug 1505942

Summary: runner/tcmu: returns incorrect response sizes for non RW commands
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Mike Christie <mchristi>
Component: iSCSIAssignee: Mike Christie <mchristi>
Status: CLOSED WORKSFORME QA Contact: Tejas <tchandra>
Severity: urgent Docs Contact: Bara Ancincova <bancinco>
Priority: high    
Version: 3.0CC: bniver, ceph-eng-bugs, ceph-qe-bugs, flucifre, hnallurv, jdillama, mchristi
Target Milestone: rc   
Target Release: 3.*   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Known Issue
Doc Text:
.The iSCSI gateway can fail to scan or setup LUNs When using the iSCSI gateway, the Linux initiators can return the `kzalloc` failures due to buffers being too large. In addition, the VMWare ESX initiators can return the `READ_CAP` failures due to not being able to copy the data. As a consequence, the iSCSI gateway fails to scan or setup Logical Unit Numbers (LUNs), find or rediscover devices, and add the devices back after path failures.
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-02-15 22:34:20 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1494421    

Description Mike Christie 2017-10-24 15:40:29 UTC
Description of problem:

On the linux initiator side you might see kzalloc failures due to buffers being too large and on ESX you might see READ_CAP failures due to it not being able to copy the data. LUN scanning/setup might then fail so devices are not found or rediscovered and added back after path failures.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 10 Jason Dillaman 2017-10-26 19:48:46 UTC
Oct 10 18:16:09 localhost kernel: WARNING: CPU: 12 PID: 9299 at mm/page_alloc.c:2902 __alloc_pages_slowpath+0x6f/0x724
Oct 10 18:16:09 localhost kernel: Modules linked in: ext4 mbcache jbd2 dm_queue_length iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi sb_edac edac_core intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd mei_me iTCO_wdt sg iTCO_vendor_support joydev mxm_wmi lpc_ich mei pcspkr i2c_i801 shpchp ipmi_ssif ipmi_si ipmi_devintf wmi ipmi_msghandler acpi_power_meter nfsd auth_rpcgss nfs_acl lockd dm_multipath grace sunrpc ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ixgbe drm ahci libahci libata crct10dif_pclmul be2net megaraid_sas crct10dif_common crc32c_intel mdio ptp i2c_core pps_core dca dm_mirror dm_region_hash
Oct 10 18:16:09 localhost kernel: dm_log dm_mod
Oct 10 18:16:09 localhost kernel: CPU: 12 PID: 9299 Comm: kworker/u40:2 Not tainted 3.10.0-693.el7.x86_64 #1
Oct 10 18:16:09 localhost kernel: Hardware name: FUJITSU PRIMERGY RX2530 M2/D3279-B1, BIOS V5.0.0.11 R1.7.0 for D3279-B1x                     04/21/2016
Oct 10 18:16:09 localhost kernel: Workqueue: kmpath_handlerd activate_path [dm_multipath]
Oct 10 18:16:09 localhost kernel: 0000000000000000 000000001dde5f0b ffff8807766379e0 ffffffff816a3d91
Oct 10 18:16:09 localhost kernel: ffff880776637a20 ffffffff810879c8 00000b5681033619 0000000000100010
Oct 10 18:16:09 localhost kernel: 0000000000104010 ffff88087ffd7000 0000000000000000 0000000000124010
Oct 10 18:16:09 localhost kernel: Call Trace:
Oct 10 18:16:09 localhost kernel: [<ffffffff816a3d91>] dump_stack+0x19/0x1b
Oct 10 18:16:09 localhost kernel: [<ffffffff810879c8>] __warn+0xd8/0x100
Oct 10 18:16:09 localhost kernel: [<ffffffff81087b0d>] warn_slowpath_null+0x1d/0x20
Oct 10 18:16:09 localhost kernel: [<ffffffff8169f723>] __alloc_pages_slowpath+0x6f/0x724
Oct 10 18:16:09 localhost kernel: [<ffffffff8109927e>] ? try_to_del_timer_sync+0x5e/0x90
Oct 10 18:16:09 localhost kernel: [<ffffffff8118cd85>] __alloc_pages_nodemask+0x405/0x420
Oct 10 18:16:09 localhost kernel: [<ffffffff811d1108>] alloc_pages_current+0x98/0x110
Oct 10 18:16:09 localhost kernel: [<ffffffff8118760e>] __get_free_pages+0xe/0x40
Oct 10 18:16:09 localhost kernel: [<ffffffff811dcaae>] kmalloc_order_trace+0x2e/0xa0
Oct 10 18:16:09 localhost kernel: [<ffffffff811e0641>] __kmalloc+0x211/0x230
Oct 10 18:16:09 localhost kernel: [<ffffffff8147b376>] realloc_buffer+0x36/0x70
Oct 10 18:16:09 localhost kernel: [<ffffffff8147b8bb>] alua_rtpg+0x50b/0x630
Oct 10 18:16:09 localhost kernel: [<ffffffff810cd794>] ? update_curr+0x104/0x190
Oct 10 18:16:09 localhost kernel: [<ffffffff810ca29e>] ? account_entity_dequeue+0xae/0xd0
Oct 10 18:16:09 localhost kernel: [<ffffffff810cdc7c>] ? dequeue_entity+0x11c/0x5d0
Oct 10 18:16:09 localhost kernel: [<ffffffffc00dfc20>] ? reinstate_path+0x180/0x180 [dm_multipath]
Oct 10 18:16:09 localhost kernel: [<ffffffff8147ba17>] alua_activate+0x37/0x2a0
Oct 10 18:16:09 localhost kernel: [<ffffffffc00dfc20>] ? reinstate_path+0x180/0x180 [dm_multipath]
Oct 10 18:16:09 localhost kernel: [<ffffffff81477793>] scsi_dh_activate+0xc3/0x160
Oct 10 18:16:09 localhost kernel: [<ffffffffc00dfeaa>] activate_path+0x5a/0x60 [dm_multipath]
Oct 10 18:16:09 localhost kernel: [<ffffffff810a881a>] process_one_work+0x17a/0x440
Oct 10 18:16:09 localhost kernel: [<ffffffff810a94e6>] worker_thread+0x126/0x3c0
Oct 10 18:16:09 localhost kernel: [<ffffffff810a93c0>] ? manage_workers.isra.24+0x2a0/0x2a0
Oct 10 18:16:09 localhost kernel: [<ffffffff810b098f>] kthread+0xcf/0xe0
Oct 10 18:16:09 localhost kernel: [<ffffffff8108ddeb>] ? do_exit+0x6bb/0xa40
Oct 10 18:16:09 localhost kernel: [<ffffffff810b08c0>] ? insert_kthread_work+0x40/0x40
Oct 10 18:16:09 localhost kernel: [<ffffffff816b4f18>] ret_from_fork+0x58/0x90
Oct 10 18:16:09 localhost kernel: [<ffffffff810b08c0>] ? insert_kthread_work+0x40/0x40
Oct 10 18:16:09 localhost kernel: ---[ end trace 560d82aec7daf644 ]---
Oct 10 18:16:09 localhost kernel: sd 11:0:0:40: alua_rtpg: kmalloc buffer failed

Comment 18 Brett Niver 2017-10-30 13:49:14 UTC
Moving to 3.1

Comment 24 Mike Christie 2019-02-15 22:34:20 UTC
Closing for now. We have not been able to replicate it and it's been over a year.