C Bindings for C++ :: Losinggeneration's Coding and Projects

Lately I’ve been playing around with writing C bindings for a C++ library. Your first thought might be, “Why would you bind a C++ to C?” Those of you who have ever had to do bindings to other languages may already know the answer. In general, it’s much easier to bind C to another language than it is to bind C++ directly. Another, albeit less likely, reason might be because a client wants a C interface instead.

You may be asking your self at this point, “Why not using SWIG or (insert specific language binding tools/libraries here)?” and that’s an extremely valid question. In general, I’d say, use SWIG, unless you’re unhappy with the bindings they generate. As a specific example for C++ to Lua binding, OOLua did a benchmark of OOLua and SWIG (as well as Luabind) in which OOLua typically did better then the other two bindings as far as execution speed. That’s not to say that in real world cases you’re going to see considerable improvements over SWIG by writing your own bindings (or using something else), but it could be a reason to consider it.

The reason I’ve personally chosen to write my own is to have complete control of the resulting binding(s). With Go in particular, I don’t care for the resulting API.

There are some things you’ll want to consider when taking this route. First and foremost. You’ll have to explicitly handle each binding you may do. A change to the original API may result in breakage and the need to refactor several bindings (depending on how many you want to support.) On the other hand, SWIG allows your to fairly easily only have to change things in one location and potentially have support for as many languages as SWIG supports (though, you’re likely to only support a few, or perhaps even just one.)

Next, it’s possible your bindings may actually be slower than one’s generated by SWIG. As SWIG is a mature project, a lot of work has been put into SWIG to make it efficient and also portable. While this isn’t exactly a killer, some care must be taken to make sure your bindings don’t become a jumbled mess of non-portable/slow code.

Finally, it’s going to take considerably more time to write and maintain your own bindings in comparison to SWIG. The main reason is that SWIG obviously does 95% of the work for you with the last 5% being tweaks you may want or need. With your own bindings, you not only have to carefuly plan out your own API’s (for C as well as any additional bindings), but any refactoring has to be done in more places. SWIG is a clear winner in this case (as mentioned before.)

Now, that’s not to say SWIG is without faults. One instance in particular (for Go) the SWIG documentation has this to say:

Often the APIs generated by swig are not very natural in Go, especially if there are output arguments.

As well as:

For classes, since swig generates an interface, you can add additional methods by defining another interface that includes the swig-generated interface. Of course, if you have to rewrite most of the methods, instead of just a few, then you might as well define your own struct that includes the swig-wrapped object, instead of adding methods to the swig-generated object. This only works if your wrappers do not need to import other go modules. There is at present no way to insert import statements in the correct place in swig-generated Go. If you need to do that, you must put your Go code in a separate file.

SWIG Documentation: Go adding additional code

Basically this means if the resulting binding is very unnatural, extend or wrap the wrapper. Also, in this case, you’re starting to need special logic on a per-language binding basis. For many, this is still a cheaper situation for large code bases. For smaller code bases that need a lot of this added code in the SWIG interface file could argue there’s less reason to use SWIG.

All of that aside, you may still decide to write your own C bindings. The process is actually quite straight forward in many cases. Though, there are some things to keep in mind. For instance, templates cannot be used in C bindings. You can however bind specific types instances templates. The reasons should be obvious to anyone who understands the various stages of the C++ compiler. In short, templates are only evaluated at compile time and only for instances. Bindings in general require that type information to be compiled into the binary to be referenced later. This isn’t unique to writing your own C bindings, SWIG, OOLua, and others will all require these types to be declared ahead of time. As an example, you can’t bind std::vector without first knowing the type. You’re free to bind std::vector, std::vector<char *>, std::vector, etc. These will all resolve to a specific type, and thus can be bound. You can’t, however, expect std::vector to work if the template was never compiled for that type.

What I’ll cover in the rest of the post are some of the basics. A full example with source is included at the end of the post that you can download and examine. It’s thoroughly commented and is actually quite straight forward. The very first thing you should know, the C bindings are written in C++, but provide C linkage. If you don’t already, is that C++ has built in support for generating C linkage with:

extern "C"

You can either put this in front of a function or type or enclose multiple statements with curly braces. Now, because you’re still in C++, you can leverage a lot of the library without having to drop to C-ism’s unless needed. For instance, when creating new C structs, you may use new/delete instead of malloc/free. Inside the C binding functions you may also use std::strings, std::vectors, and other C++ containers if convenient. Keep in mind you’ll really want to do as little in the C binding interface as possible to keep overhead to a minimum. Another thing you’ll need to do is protect the C library from seeing the ‘extern “C” {’ lines. These will cause parse errors in C. That’s why you should protect it with:

#ifdef __cplusplus
extern "C" {
#endif
void someCLinkageFunction();
#ifndef __cplusplus
}
#endif

Next, if you have complete control over both the library your binding as well as the C binding, you may want to take a bit of time to pull all enums into one file. The reason for this is that it will allow your C++ and C code to much more easily share the enums without having to redeclare them. In at least the case where you just copy the enum over to the C header, you must protect against letting it be included twice. While compiling your C interface, you’ll see “duplicate definition of FU” if you’re including the enum twice. One way around this is to put an #ifndef around it. Then, while compiling our C interface, only the C++ one is included. Note that this works because enums do not create a type that is put into the library.

Free functions tend to be fairly straight forward. The only really tricky part might be if you pass data in that must be converted. For example:

int sum(const std::vector<int> &x) {
	int sum = 0;
	for(int i = 0; i < x.size(); i++) {
		sum += x.at(i);
	}
	return sum;
}

Might be wrapped in C like this:

int c_sum(const int *x, size_t i) {
	std::vector<int> v(x, x + i);
	return sum(v);
}

Because you’re writing your own C bindings, you have a bit of flexibility with how you implement your C interface. This would also work:

typedef struct {
	int x[10];
} Sum_t;
int c_struct_sum(const Sum_t *x) {
	std::vector<int> v(x->x, x->x + sizeof(x->x)/sizeof(int));
	return sum(v);
}

Here’s an example of how a C++ class might be wrapped:

class Test {
public:
	Test();
	virtual ~Test();
	void AFunction();
protected:
	int x;
};

Might have this for the C API:

typedef struct Test_s Test_t;
Test_t *Test_New();
void Test_Free(Test_t *t);
void Test_AFunction(Test_t *t);

The method I’ve chosen here is to hide the internal interface of the struct from the user and we take care of the internal details of the struct within the wrapper. While this method still allows for inherited types to be passed as base types, this style of C wrapper does effectively hide the object hierarchy. While this might be a bit confusing if someone opts to use the C wrapper for something other than bindings, that percentage will actually be very small to non-existent. Since you have the C++ (at least headers) in front of you, you’re free to hide the details in the C wrapper.

For the implementation of the above C interface, I’ve implemented it like this:

typedef struct Test_s {
	Test *t;
} Test_t;
Test_t *Test_New() {
	Test_t *t = new Test_t;
	t->t = new Test();
	return t;
}
void Test_Free(Test_t *t) {
	delete t->t;
	delete t;
}
void Test_AFunction(Test_t *t) {
	t->t->AFunction();
}

As you can see, The C struct has a pointer to the wrapped class. We then create a new structure and new instance of the class and pass back the struct in the _New() function. In _Delete we clean up the implementation class and wrapper struct. Finally, the _AFunction we just call the function like a normal function in C++, the only difference is we’re passing the class wrapped in a struct to a function to make the call for us instead of calling it directly.

As a sort of note, inheritance works exactly the same. You can even pass the child struct to Test_AFunction with a type cast and it will work as you’d expect it. In this case, you may want to provide a function or define to do the typecast automatically so you can pass it to a class prefixed with the same class name (so if you have “class Inher : public Test;” You might “#define Inher_AFunction(i) Test_AFunction((Test_t *)i)")

When you get to polymorphic abstract classes, things get a bit more tricky. The method I’ve opted to use is have one struct which like before hides the underlying implementation, and a second which provides function pointers to the C functions which are to be used by the object. This second struct is then passed to the _New function and a specific instance of the abstract class is created using that struct with a pointer put in the first struct and passed back tho be used as before. Perhaps an example will be clearer. Given the following C++ abstract class:

class TestAbst {
public:
	virtual ~TestAbst() {}
	virtual void Print(const char *str) const = 0;
};

I’d define this in the C header:

typedef struct TestAbst_s TestAbst_t;
typedef struct TestAbst_Impl_s {
	void (*Print)(const char *str);
} TestAbst_Impl_t;

You can see that the implementing structure provides a function pointer that must be setup by the implementing code. The implementation wrapper will be defined like this:

class _implTestAbst : public TestAbst {
public:
	_implTestAbst(TestAbst_Impl_t *tai) : tai(tai) {}
	void Print(const char *str) const {
		tai->Print(str);
	}
private:
	TestAbst_Impl_t *tai;
};
typedef struct TestAbst_s {
	TestAbst *ta;
} TestAbst_t;
TestAbst_t *TestAbst_New(TestAbst_Impl_t *tai) {
	TestAbst_t *ta = new TestAbst_t;
	ta->ta = new _implTestAbst(tai);
	return ta;
}
void TestAbst_Free(TestAbst_t *ta) {
	delete ta->ta;
	delete ta;
}
void TestAbst_Print(TestAbst_t *ta, const char *str) {
	ta->ta->Print(str);
}

So, as I said before, we create a derived class from the abstract class which takes the implementation struct for its constructor and store the pointer to that struct internally. (Note this is a trivial example. You may actually want to either copy the struct or maintain ownership of the struct so it can be properly freed. This currently assumes either the caller is going to explicitly delete the struct or that it was a non pointer passed by address.)

If you’re wanting to bind C++ to C for other language bindings, this should give you a good idea on how to start. That said, after having done this for a current project, I’d recommend against manually doing this.

As an anecdote to finish this off, I’ll provide a description of the project for which I used this technique. Basically the game engine has two parts: the core which deals will all the low-level stuff (such as setting up the rendering context, provides input handling, resource loading, etc) and the other part are helper classes that make dealing with fonts, particle effects, etc easier. The core is basically a singleton class which was trivial to wrap to C. The helpers were kind of here-and-there. Most are single classes, but there’s some obvious interaction between them and a few that implement GUI controls make use of inheritance. As you’ve seen, inheritance isn’t trivial, and handling this correctly for various cases can be error prone. The route I ended up going was wrapping the core class to C, binding the C to Go, and then porting the helpers to Go.

This has an immediate obvious drawback. We now have two implementations. A bug in one, may or may not show up in the other, so it’s twice as much work to make sure everything is working as it should. Another issue is that any other binding will now need to reimplement all these helpers as well. Because of this later point, I ended up wrapping all the C++ classes to C (well I didn’t actually finish, but I was close enough for now.) This way, any other binding can use the wrapped C if desired. There was one reason for me to port the C++ over to Go instead of using the wrapped C; because I wanted to improve my understanding of Go. This also had an added bonus of verifying the Go binding of the core wrapped and bound class from C++ as working correctly in Go. It also allowed me to tweak the API a bit because I was already using the main API in the target language! Obviously if I had interest in binding to other languages, such as Lua, I’d probably want all of them using the same C wrapper (or as I said before, SWIG), but since that’s not really my plan at the moment, my current approach seems to be working well enough.

I’ll leave it at that for now. This wasn’t a full tutorial of binding C++ to C, but it should give you a pretty good start if you’re really interested in taking this route but aren’t exactly sure where to begin. (Note that this was written oven several weeks when I had a few minutes here and there, so please let me know if you spot any glaring issue with this post.)

Source code for this post: c_binding-0.1.tar.gz