Subject: Busy loop in libssh2_sftp_write

Busy loop in libssh2_sftp_write

From: Radu Rendec <radu.rendec_at_gmail.com>
Date: Fri, 6 Dec 2019 14:13:37 -0500

Hi Everyone,

I'm observing some strange behavior in libssh2_sftp_write and I think it
may be going into a busy loop. I'm using an older library version (on an
embedded system), but that part of the code looks identical to the
latest master branch on github.

The busy loop seems to be the loop that the BLOCK_ADJUST macro inserts.
What I see is that sftp_write returns immediately with an error (the
underlying send() call returns with an error as well). Then the call to
_libssh2_wait_socket returns 0, immediately. At a closer look, the
poll() inside _libssh2_wait_socket has both POLLIN and POLLOUT set in
the events bitmask, but only POLLIN set in the returned events bitmask.
That explains why poll() returns immediately and returns 1 and yet the
subsequent call to send() fails.

I get that _libssh2_wait_socket is a generic function, but I believe in
this particular case it should only poll for writing (i.e. POLLOUT),
since the other function call in the outer loop only tries to write.

I assume the socket has some data in the RX queue and the TX queue is
full. How the socket ends up in that state is a different story (and I
have a long and clear explanation of how it happens in my case). But I
believe this is likely to happen easily when the underlying transport
channel is flaky. For instance, the local TCP end receives some data,
but then the TX window doesn't advance and data piles up in the TX
queue until it fills up the queue.

I am unfamiliar with the libssh2 code base, so I might be missing
something. Please let me know if this makes any sense. Thanks!

Best regards,
Radu Rendec
_______________________________________________
libssh2-devel https://cool.haxx.se/cgi-bin/mailman/listinfo/libssh2-devel
Received on 2019-12-06