For awhile now I’ve been playing around with a stack based language written in Go. The specifics of that project aren’t too important here, but I did notice something interesting today as I tried GCC for the first time with the project. First I wanted to make sure it worked so I ran go-5 test and happily all tests passed. Next, I thought I’d see how the benchmark looked. Here I think I was most surprised. Every single benchmark I had written was between 1.5x & 2x slower with GCC compared to GC. The versions I used were

gcc version 5.3.1 20160101 (Debian 5.3.1-5) go version go1.4.2 gccgo (Debian 5.3.1-5) 5.3.1 20160101 linux/amd64

go version go1.5.2 linux/amd64

go version go1.4.3 linux/amd64

Go GC 1.4 & 1.5 ended up having very similar performance to one another in my tests. The most interesting tests were two that iterated over a linked list. Below is a recreation of the benchmark that was run. There are really two things that can be considered tested here. Interface method invocation & direct pointer method invocation.

package main

import (
	"fmt"
	"testing"
)

type Lister interface {
	Next() Lister
	Name() string
}

type List struct {
	link Lister
	name string
}

func (l List) Name() string {
	return l.name
}

func (l List) Next() Lister {
	return l.link
}

func BenchmarkLinkedList(b *testing.B) {
	var l Lister
	for i := 1; i < 1000; i++ {
		l = List{link: l, name: fmt.Sprintf("test%d", i)}
	}

	first := "test0"
	b.ResetTimer()

	for i := 0; i < b.N; i++ {
		for ll := l; ll != nil; ll = ll.Next() {
			if ll.Name() == first {
				break
			}
		}
	}
}

type List2 struct {
	link *List2
	name string
}

// Test without the interface function calls
func BenchmarkPureLinkedList(b *testing.B) {
	var l *List2
	for i := 1; i < 1000; i++ {
		l = &List2{link: l, name: fmt.Sprintf("test%d", i)}
	}

	first := "test0"
	b.ResetTimer()

	for i := 0; i < b.N; i++ {
		for ll := l; ll != nil; ll = ll.link {
			if ll.name == first {
				break
			}
		}
	}
}

As you can see, Go GC 1.4 & 1.5 aren’t too far off one another (with 1.5 being a bit better in the BenchmarkLinkedList test.) However, compare that to GCC. It’s roughly twice as slow in the first test and nearly 5x slower in the second, when compared to Go GC 1.5.

Here’s the output of pprof for GC 1.5.2

And the output of pprof for GCC 5.3.1

To me, the interesting thing is that, in both implementations, each of List.Name & List.Next take almost as much time (during 30 seconds) as the PureLinkedList function takes (which is the benchmark function using direct struct & pointer usage.)

That code was run with go run main.go and then go tool pprof http://localhost:6060/debug/pprof/profile while main.go was running. Then I inspected the collected profile with top10.

This code is very similar to the benchmark code above, but uses net/http/pprof

package main

import (
	"flag"
	"fmt"
	"log"
	"net/http"
	"os"
	"time"
)

import _ "net/http/pprof"

type Lister interface {
	Next() Lister
	Name() string
}

type List struct {
	link Lister
	name string
}

func (l List) Name() string {
	return l.name
}

func (l List) Next() Lister {
	return l.link
}

func LinkedList(l Lister) {

	first := "test0"

	for i := 0; i < 10000; i++ {
		for ll := l; ll != nil; ll = ll.Next() {
			if ll.Name() == first {
				break
			}
		}
	}
}

type List2 struct {
	link *List2
	name string
}

// Test without the interface function calls
func PureLinkedList(l *List2) {

	first := "test0"

	for i := 0; i < 10000; i++ {
		for ll := l; ll != nil; ll = ll.link {
			if ll.name == first {
				break
			}
		}
	}
}

var cpuprofile = flag.String("cpuprofile", "", "write cpu profile to file")

func main() {
	var l Lister
	for i := 1; i < 10000; i++ {
		l = List{link: l, name: fmt.Sprintf("test%d", i)}
	}

	var ll *List2
	for i := 1; i < 10000; i++ {
		ll = &List2{link: ll, name: fmt.Sprintf("test%d", i)}
	}

	go func() {
		log.Println(http.ListenAndServe("localhost:6060", nil))
	}()

	for {
		LinkedList(l)
		time.Sleep(time.Millisecond * 300)
		PureLinkedList(ll)
		time.Sleep(time.Millisecond * 300)
	}

	os.Exit(1)
}

I suppose the take away here is direct struct access is going to be faster than interface method calls. Not that this is news or even hard to believe, but is a reminder that tight loops that need speed will do better with direct struct access over interface method calls. Though, and hopefully this is obvious, profiling your application should happen before optimizing code.