Original post

Hi all,

I want to share the performances analysis I recently did for SFTPGo, the fully featured and highly configurable SFTP server written in (https://github.com/drakkan/sftpgo).

When I decided to write an SFTP server I evaluated the available libraries and I did some quick tests too.

Golang’s SSH stack and pkg/sftp were able to easily saturate a Gigabit connection and this seemed enough. I noticed that OpenSSH was faster, but I didn’t investigate further.

So I chose Golang for SFTPGo.

The project is growing fast and one of the users also noticed that SFTPGo has lower performance than OpenSSH. He opened an issue providing some stats when using a 40Gb Ethernet card.

I did some more profiling and discovered that the main bottlenecks are, unsurprisingly, the cipher used and the message authentication. So we can have a huge performance boost using a fast cipher with implicit messages authentication, for example aes12…@openssh.com, however this cipher is not widely supported.

The most used cipher, and the one used in the user’s tests is AES-CTR, and the Golang implementations seems quite slow.

He noticed that an unmerged patch is available for Golang, greatly improving AES-CTR performance:

I applied this patch and, while performance improved, the AES-CTR SFTP transfers were still slower than the AES-GCM ones. The bottleneck is now the MAC computation.

The tested hardware supports Intel SHA extensions but Golang’s SHA256 implementation only uses the AVX2 extension.

Again, I was lucky: I can simply use minio/sha256-simd as a drop-in replacement for Golang’s SHA256 implementation:

The performance improved again, but OpenSSH was still faster.

So this time I have to look at pkg/sftp: I found some extraneous copies/allocations in critical paths and I sent some pull requests that are now merged in the master branch. Still, SFTP transfers were slower than OpenSSH ones.

Compared to my SCP implementation the main difference is that pkg/sftp allocates a new slice for each SFTP packet, while my SCP implementation allocates a slice once and then reuses this slice. 

Basically for each SFTP packet pkg/sftp does something like this:

data := make([]byte, size)

So I wrote a proof of concept allocator that tries to avoid all these extra allocations reusing the previously allocated slices:

And bingo! Now SFTPGo performance is very close to OpenSSH! You can find the full benchmark results here:

Conclusion: I see several complaints about Go performance, especially compared to Rust, but, at least in my use case, Go can be as fast as a C project (such as OpenSSH). But some special attention is required and thus this improved performance is not by default available to all the users.

Now some questions:

1) for the pkg/sftp allocator I’m discussing with pkg/sftp maintainers to find a way to get it included. Do you have smarter ideas to avoid these allocations?

2) There is a patch available for AES-CTR in Golang (since 2017): I wonder why it is not yet merged?

3) To improve SHA computation performance, I cannot find anything usable in Golang itself. Is there any plan to have support for Intel SHA Extensions and AVX512 directly in Golang anytime soon?

Thank you for this great programming language, it makes it really simple to add new features to SFTPGo!

regards,

Nicola