Go GCC vs GC
For awhile now I’ve been playing around with a stack based language written in Go. The specifics of that project aren’t too important here, but I did notice something interesting today as I tried GCC for the first time with the project. First I wanted to make sure it worked so I ran go-5 test
and happily all tests passed. Next, I thought I’d see how the benchmark looked. Here I think I was most surprised. Every single benchmark I had written was between 1.5x & 2x slower with GCC compared to GC. The versions I used were
gcc version 5.3.1 20160101 (Debian 5.3.1-5) go version go1.4.2 gccgo (Debian 5.3.1-5) 5.3.1 20160101 linux/amd64
go version go1.5.2 linux/amd64
go version go1.4.3 linux/amd64
Go GC 1.4 & 1.5 ended up having very similar performance to one another in my tests. The most interesting tests were two that iterated over a linked list. Below is a recreation of the benchmark that was run. There are really two things that can be considered tested here. Interface method invocation & direct pointer method invocation.
package main
import (
"fmt"
"testing"
)
type Lister interface {
Next() Lister
Name() string
}
type List struct {
link Lister
name string
}
func (l List) Name() string {
return l.name
}
func (l List) Next() Lister {
return l.link
}
func BenchmarkLinkedList(b *testing.B) {
var l Lister
for i := 1; i < 1000; i++ {
l = List{link: l, name: fmt.Sprintf("test%d", i)}
}
first := "test0"
b.ResetTimer()
for i := 0; i < b.N; i++ {
for ll := l; ll != nil; ll = ll.Next() {
if ll.Name() == first {
break
}
}
}
}
type List2 struct {
link *List2
name string
}
// Test without the interface function calls
func BenchmarkPureLinkedList(b *testing.B) {
var l *List2
for i := 1; i < 1000; i++ {
l = &List2{link: l, name: fmt.Sprintf("test%d", i)}
}
first := "test0"
b.ResetTimer()
for i := 0; i < b.N; i++ {
for ll := l; ll != nil; ll = ll.link {
if ll.name == first {
break
}
}
}
}
$ go version go version go1.4.3 linux/amd64 $ go test -bench . testing: warning: no tests to run PASS BenchmarkLinkedList 100000 14888 ns/op BenchmarkPureLinkedList 1000000 2166 ns/op
$ go version go version go1.5.2 linux/amd64 $ go test -bench . testing: warning: no tests to run PASS BenchmarkLinkedList-2 100000 12744 ns/op BenchmarkPureLinkedList-2 1000000 2179 ns/op
$ go version go version go1.4.2 gccgo (Debian 5.3.1-5) 5.3.1 20160101 linux/amd64 $ go test -bench . # _/home/harley/tmp/tests_go/test_linked_list ar: `u' modifier ignored since `D' is the default (see `U') # testmain ar: `u' modifier ignored since `D' is the default (see `U') testing: warning: no tests to run PASS BenchmarkLinkedList 50000 25005 ns/op BenchmarkPureLinkedList 200000 10096 ns/op
As you can see, Go GC 1.4 & 1.5 aren’t too far off one another (with 1.5 being a bit better in the BenchmarkLinkedList test.) However, compare that to GCC. It’s roughly twice as slow in the first test and nearly 5x slower in the second, when compared to Go GC 1.5.
Here’s the output of pprof for GC 1.5.2
(pprof) top10 21.39s of 21.46s total (99.67%) Dropped 3 nodes (cum < = 0.11s) flat flat% sum% cum cum% 6.52s 30.38% 30.38% 16.40s 76.42% main.LinkedList 5.01s 23.35% 53.73% 5.01s 23.35% main.(*List).Name 5.01s 23.35% 77.07% 5.05s 23.53% main.PureLinkedList 4.85s 22.60% 99.67% 4.85s 22.60% main.(*List).Next 0 0% 99.67% 21.45s 100% main.main 0 0% 99.67% 21.45s 100% runtime.goexit 0 0% 99.67% 21.45s 100% runtime.main
And the output of pprof for GCC 5.3.1
(pprof) top10 25.83s of 25.83s total ( 100%) flat flat% sum% cum cum% 25.83s 100% 100% 25.83s 100% runtime_sigprof 0 0% 100% 25.83s 100% 0 0% 100% 8s 30.97% __go_strcmp 0 0% 100% 18.16s 70.31% main.LinkedList 0 0% 100% 4.97s 19.24% main.Name.N9_main.List 0 0% 100% 4.63s 17.92% main.Next.N9_main.List 0 0% 100% 7.67s 29.69% main.PureLinkedList 0 0% 100% 25.83s 100% main.main 0 0% 100% 25.83s 100% runtime_main 0 0% 100% 25.83s 100% sig_tramp_info
To me, the interesting thing is that, in both implementations, each of List.Name & List.Next take almost as much time (during 30 seconds) as the PureLinkedList function takes (which is the benchmark function using direct struct & pointer usage.)
That code was run with go run main.go
and then go tool pprof http://localhost:6060/debug/pprof/profile
while main.go was running. Then I inspected the collected profile with top10.
This code is very similar to the benchmark code above, but uses net/http/pprof
package main
import (
"flag"
"fmt"
"log"
"net/http"
"os"
"time"
)
import _ "net/http/pprof"
type Lister interface {
Next() Lister
Name() string
}
type List struct {
link Lister
name string
}
func (l List) Name() string {
return l.name
}
func (l List) Next() Lister {
return l.link
}
func LinkedList(l Lister) {
first := "test0"
for i := 0; i < 10000; i++ {
for ll := l; ll != nil; ll = ll.Next() {
if ll.Name() == first {
break
}
}
}
}
type List2 struct {
link *List2
name string
}
// Test without the interface function calls
func PureLinkedList(l *List2) {
first := "test0"
for i := 0; i < 10000; i++ {
for ll := l; ll != nil; ll = ll.link {
if ll.name == first {
break
}
}
}
}
var cpuprofile = flag.String("cpuprofile", "", "write cpu profile to file")
func main() {
var l Lister
for i := 1; i < 10000; i++ {
l = List{link: l, name: fmt.Sprintf("test%d", i)}
}
var ll *List2
for i := 1; i < 10000; i++ {
ll = &List2{link: ll, name: fmt.Sprintf("test%d", i)}
}
go func() {
log.Println(http.ListenAndServe("localhost:6060", nil))
}()
for {
LinkedList(l)
time.Sleep(time.Millisecond * 300)
PureLinkedList(ll)
time.Sleep(time.Millisecond * 300)
}
os.Exit(1)
}
I suppose the take away here is direct struct access is going to be faster than interface method calls. Not that this is news or even hard to believe, but is a reminder that tight loops that need speed will do better with direct struct access over interface method calls. Though, and hopefully this is obvious, profiling your application should happen before optimizing code.