Original post

I have started learning Go routines and I am a little bit puzzled with concurrency and parallelism, so I though the best way to grasp those is to start writing small paradigms that demonstrate their usage. That being told, I am thinking to write a program that just creates an empty file (e.g. touch file) on my hard disk.

  • What would be the fastest way of doing that in Go, running on multi-threaded CPU hardware?
  • Do I really need to use channels? I think not, just waitgroup should suffice as all I want it to make sure that main() will not return until all go routines will have finished their job.
  • How many go routines should run in parallel? Since I have only 8 logical cpu cores available, I guess I can run only 8 go routines in parallel at a time. Does that mean I have to separate the work in chunks of 8 files per go routine?
  • How can I benchmark this so I can draw my conclusions?

Hi, @drpaneas, if you’re creating these empty files in a single directory on a single filesystem, then I doubt having multiple cores will result in any improvement over a single core. I suspect you’ll get the best performance with a single goroutine. Empty files don’t have to write data to the disk, just to the file file table/inode tree (depending on the file system). If you’re putting these files into different directories, some filesystem drivers may be able to handle concurrent accesses to those directories on the disk and you may benefit from having one goroutine per directory. Even in that case, sequential disk access is usually faster than random, so it still might be better to use a single goroutine.

I recommend implementing some benchmarks with the testing pacakage’s B type to compare the performance between adjusting different variables.

You’ll probably also get very different performance characteristics:

  • If the files are non-empty
  • If you change the filesystem you’re writing to, even if it’s just from ext2 to ext3, for example.
  • Between solid state and mechanical drives: A mechanical drive might be slower, but might benefit from concurrency. That’s probably more true if you’re dealing with non-empty files because of how reads from/writes to mechanical drives can be reorganized by the operating system.
  • Based on the hardware and/or software interface between the computer and the storage (e.g. SATA vs. SAS vs. NAS vs. cloud).

Hi @skillian, thank you for your reply :smiley: Indeed I see different numbers running the code in ext3 and BTRFS – which is different filesystems. Do you have any tips/experience for file constructors? I am thinking of also creating a portscanner against all the TCP ports, which to do that the system creates a socket, that’s a file in my linux disk – would you think a single go routine would have better performance over multiple ones?

And another question I have: since I have 6 cores (so 12 with hyperthreading), so would that ever makes sense for me to start processing a function in parallel with more than 12 go routines?

Hi, @drpaneas, I didn’t ignore your reply, just been a bit hectic lately!

No experience :smile:!

I would be surprised if the creation of a socket creates a real file in the filesystem. Most (if not all) files in the /sys and /proc directories are “virtual” files. Just like how Python takes object-oriented programming to the max and says “everything is an object”, the *nix analogy is that “everything is a file.”

Reading from or writing to these virtual files doesn’t actually touch the disk; the operating system interprets IO to/from these files specially. For example, the command cat /dev/zero > /dev/null will run forever. It reads data from the /dev/zero pseudo-device (which always “reads” arrays of bytes whose values are all zero) and dumps it into the /dev/null pseudo-device (which accepts any writes and just discards the data).

When you create a socket, it’s probably exposed through the VFS as a file, but it’s not an actual file written to the disk, so the bottleneck of writing to the disk doesn’t apply.

Regarding your question about using more than 12 goroutines on a 12-core server, the TL;DR answer is: The number of goroutines you should use is (usually*) not dependent on the number of physical or virtual cores on your server.

I am not intimately familiar with Linux’s (or any OS’s) network drivers/stack, so I don’t know if sockets can be opened in parallel or not, but I want to say that the idea of goroutines is not really about parallelism, it’s about concurrency:

Just like how in C#, Tasks are not necessarily about parallelism- They’re a way to make efficient use of OS resources (e.g. OS-level threads) by multiplexing multiple tasks on the same OS thread (whenever an operation needs to block, you await it, which is a way to pass control back to the CLR to schedule another task on the thread), and in Python’s asyncio module and in Node, there’s an event loop that runs unblocked code and delegates blocking operations to some other resource (sometimes? a thread).

For better or worse, in Go, I use goroutines the same way I use tasks in C# except that you don’t have to worry about CPU- vs. I/O- intensive operations and whether it should run in Task.Run or not, or if a function is async vs. synchronous. Those issues go away in Go and the question instead becomes about concurrency: “Does C require B and does B require A or can I do A and B at the same time and I only need the result of both for C?”

* There are times that taking the physical hardware of the computer into account is beneficial: If you’re executing a mathematical operation over a large data set, it might make sense to subdivide that operation into multiple chunks to better utilize your hardware and it might make sense to limit the number of chunks based on the number of (probably physical) cores. In your use case, though, I do not believe you will get significant differences in performance characteristics by increasing or limiting your number of goroutines to your physical or virtual core count.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.