Go GCC vs GC
For awhile now I’ve been playing around with a stack based language written in Go. The specifics of that project aren’t too important here, but I did notice something interesting today as I tried GCC for the first time with the project. First I wanted to make sure it worked so I ran go-5 test
and happily all tests passed. Next, I thought I’d see how the benchmark looked. Here I think I was most surprised. Every single benchmark I had written was between 1.5x & 2x slower with GCC compared to GC. The versions I used were
gcc version 5.3.1 20160101 (Debian 5.3.1-5) go version go1.4.2 gccgo (Debian 5.3.1-5) 5.3.1 20160101 linux/amd64
go version go1.5.2 linux/amd64
go version go1.4.3 linux/amd64
Go GC 1.4 & 1.5 ended up having very similar performance to one another in my tests. The most interesting tests were two that iterated over a linked list. Below is a recreation of the benchmark that was run. There are really two things that can be considered tested here. Interface method invocation & direct pointer method invocation.
package main
import (
"fmt"
"testing"
)
type Lister interface {
Next() Lister
Name() string
}
type List struct {
link Lister
name string
}
func (l List) Name() string {
return l.name
}
func (l List) Next() Lister {
return l.link
}
func BenchmarkLinkedList(b *testing.B) {
var l Lister
for i := 1; i < 1000; i++ {
l = List{link: l, name: fmt.Sprintf("test%d", i)}
}
first := "test0"
b.ResetTimer()
for i := 0; i < b.N; i++ {
for ll := l; ll != nil; ll = ll.Next() {
if ll.Name() == first {
break
}
}
}
}
type List2 struct {
link *List2
name string
}
// Test without the interface function calls
func BenchmarkPureLinkedList(b *testing.B) {
var l *List2
for i := 1; i < 1000; i++ {
l = &List2{link: l, name: fmt.Sprintf("test%d", i)}
}
first := "test0"
b.ResetTimer()
for i := 0; i < b.N; i++ {
for ll := l; ll != nil; ll = ll.link {
if ll.name == first {
break
}
}
}
}
As you can see, Go GC 1.4 & 1.5 aren’t too far off one another (with 1.5 being a bit better in the BenchmarkLinkedList test.) However, compare that to GCC. It’s roughly twice as slow in the first test and nearly 5x slower in the second, when compared to Go GC 1.5.
Here’s the output of pprof for GC 1.5.2
And the output of pprof for GCC 5.3.1
To me, the interesting thing is that, in both implementations, each of List.Name & List.Next take almost as much time (during 30 seconds) as the PureLinkedList function takes (which is the benchmark function using direct struct & pointer usage.)
That code was run with go run main.go
and then go tool pprof http://localhost:6060/debug/pprof/profile
while main.go was running. Then I inspected the collected profile with top10.
This code is very similar to the benchmark code above, but uses net/http/pprof
package main
import (
"flag"
"fmt"
"log"
"net/http"
"os"
"time"
)
import _ "net/http/pprof"
type Lister interface {
Next() Lister
Name() string
}
type List struct {
link Lister
name string
}
func (l List) Name() string {
return l.name
}
func (l List) Next() Lister {
return l.link
}
func LinkedList(l Lister) {
first := "test0"
for i := 0; i < 10000; i++ {
for ll := l; ll != nil; ll = ll.Next() {
if ll.Name() == first {
break
}
}
}
}
type List2 struct {
link *List2
name string
}
// Test without the interface function calls
func PureLinkedList(l *List2) {
first := "test0"
for i := 0; i < 10000; i++ {
for ll := l; ll != nil; ll = ll.link {
if ll.name == first {
break
}
}
}
}
var cpuprofile = flag.String("cpuprofile", "", "write cpu profile to file")
func main() {
var l Lister
for i := 1; i < 10000; i++ {
l = List{link: l, name: fmt.Sprintf("test%d", i)}
}
var ll *List2
for i := 1; i < 10000; i++ {
ll = &List2{link: ll, name: fmt.Sprintf("test%d", i)}
}
go func() {
log.Println(http.ListenAndServe("localhost:6060", nil))
}()
for {
LinkedList(l)
time.Sleep(time.Millisecond * 300)
PureLinkedList(ll)
time.Sleep(time.Millisecond * 300)
}
os.Exit(1)
}
I suppose the take away here is direct struct access is going to be faster than interface method calls. Not that this is news or even hard to believe, but is a reminder that tight loops that need speed will do better with direct struct access over interface method calls. Though, and hopefully this is obvious, profiling your application should happen before optimizing code.