Subject: Re: libssh2_sftp_write doesn't hang anymore

Re: libssh2_sftp_write doesn't hang anymore

From: Daniel Stenberg <daniel_at_haxx.se>
Date: Thu, 11 Nov 2010 12:46:51 +0100 (CET)

On Wed, 10 Nov 2010, Mark Riordan wrote:

> My upload test from Windows to Linux now works!

Oh. It proves we are at least still making progress...

> The 19 MB debug log has 149009 "Would block" messages, which seems
> excessive - one for every 4 bytes.

(Once for every 4 byte of _your_ payload data, yes, but SFTP, SSH channels and
SSH transport will add a notiacble amount of data to send.)

It certainly seems a bit excessive yes. I know that we do a few too many "try
this just in case" calls so that there can be a few too many per invoke of a
the sftp_write function, but it does give me the feeling that we're not doing
the waiting for the socket activity correctly.

A detail in the log:

We get excessive -37 logged, but they are all "lumped up" and get done on *the
same microsecond*. See for example at the time 0.414451 where 966 lines are
reported on the same timestamp. Then there's a 16 millisecond pause, where
presumably the code waits for the socket to turn writable again. Then at
0.430076 there's an additional 1320 lines logged at *the same microsecond*.

I figure the GetSystemTimeAsFileTime() function we use for this simply has
this low resolution so we get this inexactness in the timings.

> My application recorded that calls to libssh2_sftp_write generally resulted
> in 4000 or 8000 bytes written. Never less than 500.

Right, as you use a 32500 bytes buffer and libssh2 will bite off 4000 bytes at
a time from that so the SFTP packets will become 4000*8 + 500*1 (of payload).
How many bytes of those that are actually sent off depends on the window size
in the SSH channel and how much the TCP layer accepts. You will probably gain
an unmeasureably small amount of speed by using even 4000 byte blocks.

I've just now pushed yet another change to the sftp_write code that concerns
situations where more data is ACKed already than the buffer size passed in to
the function as that would cause major weirdness. I don't think you've seen
effects of this, as I believe things would go down the drain hard if so.

Some ideas on how to move further from here...

- Start using a smaller test file. Your problem is probably just as visible
with a 100K file. Or perhaps you should even try a much smaller, like a 10K
one to see how that behaves.

- Since we're now down to "just" being slow, you can try to add more output
logs again to see if they reveal anything.

- We need to figure out where the EAGAINs come from and make sure libssh2 acts
correctly on them, but it a bit hard when the logging slows things down this
much. Possibly you should make your example use the libssh2_trace_sethandler()
function to get the tracing done by your own function, and then you just keep
all the logging in memory until the test run is over and then save it to disk
as that is probably gonna be A LOT faster although of course quite memory
consuming.

-- 
  / daniel.haxx.se
_______________________________________________
libssh2-devel http://cool.haxx.se/cgi-bin/mailman/listinfo/libssh2-devel
Received on 2010-11-11