Original post


I’m looking for some general advice on optimizing for performance.

I have a rather complex bit of code, but the code lives in a function (no globals), so I can either a) call this function in multiple goroutines:

output := make(chan int, 4)

for w := 0; w < 4; w++ {

for a := 0; a < 4; a++ {

Or b) use the same kind of function, compile my go program and run the thing 4 times in parallel.

When I use 4 goroutines, it takes about 2.5x as long and running the compiled program in 4x in parallel! And 4 copies of the program running in parallel seem to run at the same speed as a single running version, i.e. near perfect scaling.

As far as I can tell (still a Go-newb though), none of the goroutines accesses the same data, and I made deep copies for all the structs they process.

I would love and tips on how to proceed. I want to scale to 32-cores, but after testing on up to 12-cores so far, I get rapidly diminishing returns.