Subject: Re: Packet drop on full socket problem

Re: Packet drop on full socket problem

From: Daniel Stenberg <daniel_at_haxx.se>
Date: Wed, 6 Oct 2010 23:13:38 +0200 (CEST)

On Fri, 17 Sep 2010, Thomas Rauscher wrote:

(First, I'm sorry its taken me this long to respond...)

> I think I've found a problem that occurs when writing to the send socket
> returns -1 (EGAIN). Additional preconditions to trigger the problem are
> writing in larger chunks than the advertized window size, e.g. 128k writes
> vs. 12k window size.

I don't quite understand your problem. I'll add my questions and thoughts
inline below.

> The remote side is a dropbear SSH server which seems use 12k window size
> increments. This means that packets need to be split very often. If
> additionally the socket buffer gets full, the saved packet is never sent.
>
> A workaround is to use smaller writes (1k), but this only hides the problem.
>
> Details:
>
> 1) The application calls _libssh2_channel_write(..., 128*1024);

First, _libssh2_channel_write() will internally ignore everything that is
larger than 32768 bytes. It will only try to send the first 32768 bytes in
each function invoke.

The function will/should then make sure that it doesn't try to send any more
data than the remote has a window for. In this case, it should further
decrease the amount of data this function will attempt to send.

> * In _libssh2_transport_write()
>
> _libssh2_send returns -1 (EAGAIN) and the current packet is saved to
> p->odata, p->olen ...

You mean that it returns EAGAIN immediately or after having sent the first 12K
of data? I assume you mean that it first sends some data and then when it
loops it gets EAGAIN back.

> * _libssh2_transport_write() returns LIBSSH2_ERROR_EAGAIN to
> _libssh2_channel_write() which executes
>
> if(wrote) {
> _libssh2_transport_drain(session);
> goto _channel_write_done;
> }

... as it would only execute that if 'wrote' actually wasn't zero.

> _libssh2_transport_drain() frees p->outbuf and sets it to NULL.
>
> * _libssh2_transport_write then returns "wrote" (12k) to the application.

Right, as it did in fact successfully send away 12K.

> 2) Application calls _libssh2_channel_write(..., 128*1024) again.

Right, but that buffer should now be pointing 12K further into the data as 12K
was in fact sent in the previous invoke.

> _libssh2_transport_write() now calls send_existing() first which
> immediately returns because p->outbuf is NULL.
>
> if (!p->outbuf) {
> *ret = 0;
> return LIBSSH2_ERROR_NONE;
> }

Right, there's nothing save there. What do you think it should have saved
there?

> * This results in not sending the saved packet, but sending the next packet.
> The SSH server then bails out and terminates the connection (saying "bad
> packet size").

... as you can see I didn't follow how it ended up like this! I'll get myself
a dropbear install and see if I can repeat this. Is uploading data with a 128K
buffer enough to trigger it? Like with the sftp_write_nonblock.c example?

-- 
  / daniel.haxx.se
_______________________________________________
libssh2-devel http://cool.haxx.se/cgi-bin/mailman/listinfo/libssh2-devel
Received on 2010-10-06