Bug 2229738

Summary: Cannot upload files bigger than 64K to "SSH-2.0-9.99 sshlib" server, transfer hangs
Product: Red Hat Enterprise Linux 8 Reporter: Renaud Métrich <rmetrich>
Component: curlAssignee: Jacek Migacz <jmigacz>
Status: CLOSED MIGRATED QA Contact: Daniel Rusek <drusek>
Severity: high Docs Contact:
Priority: high    
Version: 8.8CC: ansasaki, jamacku, jfindysz, jjelen, jmigacz, jsantos, omoris, qguo, shebburn, tscherf
Target Milestone: rcKeywords: MigratedToJIRA, Triaged
Target Release: ---Flags: pm-rhel: mirror+
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-09-20 00:43:57 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Renaud Métrich 2023-08-07 13:40:38 UTC
Description of problem:

We have a customer uploading files to a SFTP server advertising as "SSH-2.0-9.99 sshlib" (no idea for now which product it is). The customer uses a curl command for that, which makes internally use of libssh implementation.

When uploading 64K files or bigger (it works up to 63K included), the transfer hangs forever.
curl backtrace shows libssh is waiting indefinitely for 4 bytes from the server, which are never sent by the server (frame 8):
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
[2023/08/07 15:00:09.206865, 3] sftp_write:  Could not write as much data as expected
[2023/08/07 15:00:09.206878, 3] ssh_channel_read_timeout:  Read (4) buffered : 0 bytes. Window: 1279932
[2023/08/07 15:00:09.206894, 4] ssh_socket_pollcallback:  Poll callback on socket 4 (POLLOUT ), out buffer 0
[2023/08/07 15:00:09.206904, 4] ssh_socket_pollcallback:  sending control flow event
[2023/08/07 15:00:09.206914, 4] ssh_packet_socket_controlflow_callback:  sending channel_write_wontblock callback
^C
Program received signal SIGINT, Interrupt.
0x00007ffff6ce7f08 in __GI___poll (fds=0x5555557d5190, nfds=1, timeout=-1) at ../sysdeps/unix/sysv/linux/poll.c:29
29	  return SYSCALL_CANCEL (poll, fds, nfds, timeout);
(gdb) bt
#0  0x00007ffff6ce7f08 in __GI___poll (fds=0x5555557d5190, nfds=1, timeout=-1) at ../sysdeps/unix/sysv/linux/poll.c:29
#1  0x00007ffff6543f69 in poll (__timeout=<optimized out>, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:38
#2  ssh_poll (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>) at /usr/src/debug/libssh-0.9.6-10.el8_8.x86_64/src/poll.c:96
#3  0x00007ffff6544358 in ssh_poll_ctx_dopoll (ctx=ctx@entry=0x5555557d3fa0, timeout=timeout@entry=-1) at /usr/src/debug/libssh-0.9.6-10.el8_8.x86_64/src/poll.c:679
#4  0x00007ffff6545430 in ssh_handle_packets (session=session@entry=0x5555557b0a10, timeout=timeout@entry=-1) at /usr/src/debug/libssh-0.9.6-10.el8_8.x86_64/src/session.c:664
#5  0x00007ffff65454fd in ssh_handle_packets_termination (session=session@entry=0x5555557b0a10, timeout=timeout@entry=-3,  fct=fct@entry=0x7ffff6525650 <ssh_channel_read_termination>, user=user@entry=0x7fffffff9860) at /usr/src/debug/libssh-0.9.6-10.el8_8.x86_64/src/session.c:733
#6  0x00007ffff6528d04 in ssh_channel_read_timeout (channel=0x5555557dcdf0, dest=dest@entry=0x7fffffff98c0,  count=count@entry=4, is_stderr=is_stderr@entry=0, timeout_ms=<optimized out>, timeout_ms@entry=-3) at /usr/src/debug/libssh-0.9.6-10.el8_8.x86_64/src/channels.c:2943
#7  0x00007ffff6528e4f in ssh_channel_read (channel=<optimized out>, dest=dest@entry=0x7fffffff98c0, count=count@entry=4,  is_stderr=is_stderr@entry=0) at /usr/src/debug/libssh-0.9.6-10.el8_8.x86_64/src/channels.c:2855
#8  0x00007ffff6552f44 in sftp_packet_read (sftp=sftp@entry=0x5555557dc600) at /usr/src/debug/libssh-0.9.6-10.el8_8.x86_64/src/sftp.c:462
#9  0x00007ffff6553370 in sftp_read_and_dispatch (sftp=0x5555557dc600) at /usr/src/debug/libssh-0.9.6-10.el8_8.x86_64/src/sftp.c:635
#10 0x00007ffff6555750 in sftp_write (file=0x5555557dcfb0, buf=<optimized out>, count=65536) at /usr/src/debug/libssh-0.9.6-10.el8_8.x86_64/src/sftp.c:2198
#11 0x00007ffff7b83982 in sftp_send () from /lib64/libcurl.so.4
#12 0x00007ffff7b5aca8 in Curl_write () from /lib64/libcurl.so.4
#13 0x00007ffff7b6e977 in Curl_readwrite () from /lib64/libcurl.so.4
#14 0x00007ffff7b78804 in multi_runsingle () from /lib64/libcurl.so.4
#15 0x00007ffff7b79a51 in curl_multi_perform () from /lib64/libcurl.so.4
#16 0x00007ffff7b7033b in curl_easy_perform () from /lib64/libcurl.so.4
#17 0x0000555555569da0 in operate_do (global=global@entry=0x7fffffffe100, config=0x55555578db40) at ../../src/tool_operate.c:1574
#18 0x000055555556b2ea in operate (config=0x7fffffffe100, argc=<optimized out>, argv=<optimized out>) at ../../src/tool_operate.c:2077
#19 0x000055555555d9dc in main (argc=8, argv=0x7fffffffe278) at ../../src/tool_main.c:262
(gdb)
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

It seems like there is an issue with flow control.
Diffing a 63K upload and 64K one, I see not much diff except for 63K upload, the server sends back some data which causes normal termination to occur.
On the 64K upload, nothing happens after last "ssh_packet_socket_controlflow_callback:  sending channel_write_wontblock callback" is seen.
See curl.63K and curl.64K debug outputs.

It's unclear to me who is culprit here, it may be the backend, but just FYI a transfer using openssh's sftp command just works fine.

The issue happens with latest Fedora's libssh as well (libssh-0.10.5-1.fc38.x86_64).
The issue doesn't happen when building curl with libssh2 instead of libssh.

Version-Release number of selected component (if applicable):

libssh-0.9.6-10.el8_8.x86_64 (RHEL 8.8)
libssh-0.10.5-1.fc38.x86_64 (Fedora 38)

How reproducible:

Always when uploading to end customer's system

Comment 4 Renaud Métrich 2023-08-07 14:03:07 UTC
Compiling curl with libssh (as RH does):

./configure --disable-shared --with-openssl --with-libssh

Compiling with libssh2:

./configure --disable-shared --with-openssl --with-libssh2

Comment 12 Jakub Jelen 2023-09-18 08:15:14 UTC
Moving back to libssh as we have a fix for libssh as a short-term solution.

Comment 13 RHEL Program Management 2023-09-20 00:42:42 UTC
Issue migration from Bugzilla to Jira is in process at this time. This will be the last message in Jira copied from the Bugzilla bug.

Comment 14 RHEL Program Management 2023-09-20 00:43:57 UTC
This BZ has been automatically migrated to the issues.redhat.com Red Hat Issue Tracker. All future work related to this report will be managed there.

Due to differences in account names between systems, some fields were not replicated.  Be sure to add yourself to Jira issue's "Watchers" field to continue receiving updates and add others to the "Need Info From" field to continue requesting information.

To find the migrated issue, look in the "Links" section for a direct link to the new issue location. The issue key will have an icon of 2 footprints next to it, and begin with "RHEL-" followed by an integer.  You can also find this issue by visiting https://issues.redhat.com/issues/?jql= and searching the "Bugzilla Bug" field for this BZ's number, e.g. a search like:

"Bugzilla Bug" = 1234567

In the event you have trouble locating or viewing this issue, you can file an issue by sending mail to rh-issues. You can also visit https://access.redhat.com/articles/7032570 for general account information.