Implementing classes in C

Sometimes you want to implement classes, but you're using a language such as C rather than C++. While a language like C++ was designed specifically to make it easy to use classes, you can use classes in any language. Here is one way to go about it in C. The general technique works in any language - even assembler. Note that this is NOT a way to get all of the features of C++ in C - if you want that, then just use C++.

Obviously, a class in C must involve a struct, especially as in C++ a class and a struct are the same thing. This takes us to the first relevant difference between C and C++ - access control.

Access control

C++ provides public, protected, and private fields, whereas C only provides public fields. You need to decide whether this matters at all - note that the meaning of a correct C++ program is not changed by changing the access control to 'public' on all protected and private fields. If you feel that you need a reminder, you can modify the field name to indicate the access control:

	class C {
		int i;
	protected:
		int j;
	public:
		int k;
	};

could be implemented as:

	typedef struct C {
		int i_v;
		int j_t;
		int k;
	} C;

using the convention that _v means priVate and _t means proTected. Note that 'public' does not need to be marked as you don't need to be reminded when using a public field to check that you are using it correctly.

Generally, it is much easier to ignore access control - the vast majority of fields are normally private, so you can simply ensure that you are only accessing fields in the member functions of that class. Hopefully your code is neat and compact enough that this is sufficiently obvious.

Member functions and this

You will need to pass the this value explicitly into member functions. In C++ the compiler does this for you behind the scenes.

Non-virtual member functions

In C++ the only assistance that the language provides over C for non-virtual member functions is naming. You can have two member functions, A::foo() and B::foo() visible at the same time whereas in C all functions must have unique names. Of course A::foo() is unique when fully qualified, so the obvious way to handle foo() in C is to have functions A_foo() and B_foo(). This does not guarantee uniqueness, but any conflict will show up as a multiply-defined function, so it is easily fixed. The only two functions which cannot use this scheme are the constructor and destructor. In C, the function A() is illegal if you have a typedef for A, and ~A() is always illegal. I suggest using the names A_ctor() and A_dtor(). Again, this does not absolutely guarantee uniqueness, but any conflicts are easily sorted out.

	class C {
		int a, b;
	public:
		int foo(int c) {
			return a + c;
		}
	};

would be:

	typedef struct C {
		int a, b;
	} C;

	int C_foo(C *this, int c)
	{
	 return this->a + c;
	}

If you don't like function names with capital letters (especially since that conflicts with standard C practice) then you can use the lowercase equivalent, which then means the constructor can simply name the class:

C++C
Foo()foo()
Foo::bar()foo_bar()
~Foo()foo_dtor()

Derived classes

It is quite simple to implement derived classes. A derived class starts with its base class, so

	class D : B {
		int x;
	};

becomes:

	typedef struct D {
		B b;
		int x;
	} D:

of course, this also means that you must explicitly convert an object of the derived class to one of the base class when necessary. The C++ code:

int foo(B *b, int c)
{
 return b->x + c;
}

int bar(D *d)
{
 return foo(d, 5);
}

must translate into C with bar() as:

int bar(D *d)
{
 return foo(&d->b, 5);
}

The compiler should provide adequate warnings for any cases you overlook.

Since the name of the base class field does not appear at all in C++, it is fairly obvious that its name is not very important. We could simple call it b (for 'base') always. For example, if we have FilledCircle derived from Circle, derived from Ellipse, derived from Shape, we could use:

	typedef struct Shape {
		...
	} Shape;

	typedef struct Ellipse {
		Shape b;
		...
	} Ellipse;

	typedef struct Circle {
		Ellipse b;
		...
	} Circle;

	typedef struct FilledCircle {
		Circle b;
		...
	} FilledCircle;

However, that means we sometimes need to refer to x.b.b.b which is a bit confusing. We could use:

	typedef struct Ellipse {
		Shape shape;
		...
	} Ellipse;

	typedef struct Circle {
		Ellipse ellipse;
		...
	} Circle;

	typedef struct FilledCircle {
		Circle circle;
		...
	} FilledCircle;

but that requires us to say x.circle.ellise.shape which is excessive - especially remembering that C++ does not need to name these fields at all. A good compromise is to use a very short but mnemonic abbreviation:

	typedef struct Ellipse {
		Shape sh;
		...
	} Ellipse;

	typedef struct Circle {
		Ellipse el;
		...
	} Circle;

	typedef struct FilledCircle {
		Circle ci;
		...
	} FilledCircle;

which gives x.ci.el.sh - easy enough to follow without big words which contribute little to understanding.

Virtual member functions

Virtual functions are fairly easy to implement by using a technique used by C++ compilers, but they require a little bookkeeping and a simplifying assumption to make them simple. In C++ when declaring a derived class we can introduce a new virtual member that was not in the base class. It is much simpler to require the base class to have all virtual functions. We then implement this:

	struct B {
		int x;
		virtual int foo(int z);
		virtual int bar();
	};

	int B::foo(int z)
	{
	 return x + z;
	}

	int B::bar()
	{
	 return x - 1;
	}

	struct D : B {
		virtual int foo(int z);
		virtual int bar();
	};

	int D::foo(int z)
	{
	 return x + z + 1;
	}

	int D::bar()
	{
	 return x - 2;
	}

	struct DD : D {
		virtual int foo(int z);
		virtual int bar();
	};

	int DD::foo(int z)
	{
	 return x + z + 3;
	}

	int DD::bar()
	{
	 return x - 3;
	}

as:

	struct B_vtbl {
		int (*foo)(B *this, int z);
		int (*bar)(B *this);
	};

	struct B {
		struct B_vtbl vtbl;
		int x;
	};

	static int B_foo(B *this, int z)
	{
	 return this->x + z;
	}

	static int B_bar(B *this)
	{
	 return this->x - 1;
	}

	const struct B_vtbl {
		B_foo,
		B_bar,
	} B_vtbl;

	void B_ctor(B *this)
	{
	 this->vtbl = &B_vtbl;
	}

	struct D {
		B b;
	};

	static int D_foo(B *this_base, int z)
	{
	 D *this = (D *) this_base;
	 return this->b.x + z + 1;
	}

	static int D_bar(D *this)
	{
	 D *this = (D *) this_base;
	 return this->b.x - 2;
	}

	const struct D_vtbl {
		D_foo,
		D_bar,
	} D_vtbl;

	void D_ctor(D *this)
	{
	 this->b.vtbl = &D_vtbl;
	}

	struct DD {
		D d;
	};

	static int DD_foo(B *this_base, int z)
	{
	 DD *this = (DD *) this_base;
	 return this->d.b.x + z + 3;
	}

	static int DD_bar(B *this_base)
	{
	 DD *this = (DD *) this_base;
	 return this->d.b.x - 3;
	}

	const struct DD_vtbl {
		DD_foo,
		DD_bar,
	} DD_vtbl;

	void DD_ctor(DD *this)
	{
	 this->d.b.vtbl = &DD_vtbl;
	}

This scheme requires you to write a call to a virtual function as a->vtbl->foo(a, 3) instead of the C++ syntax a->foo(3) but this is not very inconvenient. The general principle is that every object contains a pointer to the table of virtual functions. The constructor sets it up, and calls to virtual functions go through the table, ensuring that the dynamic type of the object determines the function to be called. In the above example of classes derived from Shape, if there is a virtual function draw(int x, int y), and fs is a pointer to an instance of FilledCircle, then the C++:

	fs->draw(a, b);

would be written as:

	fs->ci.el.sh->vtbl->draw(&fs->ci.el.sh, a, b);

which is a bit tedious, but workable.