14. Week 7 Thursday: Constructors, Initialisation Lists, and RAII
≪ 13. Week 7 Tuesday: Function Overloading and Default Arguments | Table of Contents | 15. Week 8 Tuesday: Midterm 2 Review ≫Over the past several lectures, you’ve (hopefully) been introduced to the struct
. By default, C++ and its standard libraries come with a very restricted set of data types, hence the “nouns” that we use when we write pseudocode are very limited. Every object or piece of data that we interact with is an integer, decimal number, character, string, or vector, yet we may want to analyse different types of abstract objects. Hence the struct (or class) was born: it’s a way to teach C++ what new types of nouns are.
I’d like to zoom in on one particular aspect of structs: the constructor. More broadly, I’d like to expose exactly what happens when we first create an instance of a struct of any kind. This will naturally lead us to the dogmatism of C++ known as RAII: “Resource Acquisition is Initialisation”.
Constructors
Let’s begin by building a simple struct that represents a point in the 2-dimensional Cartesian plane. For today, everything will be public, though this is bad practise and we’ll soon discuss our alternatives. Let’s begin by giving it two member variables and nothing else.
I’ve placed this in a header file called Point.hpp
, together with an empty (for now) implementation file Point.cpp
. Consider the following very simple program:
1#include <iostream>
2
3#include "Point.hpp"
4
5using namespace std;
6
7int main() {
8 Point p;
9 cout << p.x << ' ' << p.y << endl;
10
11 return 0;
12}
What gets printed out? In fact, the doubles that get printed out for me change slightly on every run of the program. This is undefined behaviour. Of course, when p
is created in memory, C++ sets aside enough space for just two doubles and says its job is done. Like any other uninitialised, these two doubles belonging to p
hold garbage memory.
The role of the constructor is therefore to “set up” the object before it gets used, populating the member variables with meaningful data. Let’s write two constructors for the Point
class, one with no arguments that sets the coordinates to \((0, 0)\), and one that takes in an \(x\) and \(y\) coordinate.
Point.hpp
1#ifndef POINT_HPP
2#define POINT_HPP
3
4struct Point {
5 double x;
6 double y;
7
8 /**
9 * Default constructor. Sets up this point to (0, 0).
10 */
11 Point();
12
13 /**
14 * Given two arguments x and y, sets up this point to (x, y).
15 * @param x x-coordinate of the new point.
16 * @param y y-coordinate of the new point.
17 */
18 Point(double new_x, double new_y);
19};
20
21#endif
Point.cpp
I’ve added print statements to the constructors so there’s a visual stimulus when the constructors are run in the program. Let’s now consider the following program:
1cout << "Before 2 points" << endl;
2Point p1;
3
4cout << "Between 2 points" << endl;
5Point p2(1, 5);
6
7cout << "After 2 points" << endl;
The constructors are ran the exact moment a variable is created for the first time in a program.
There are exceptions, however. What’s the output of the following code, for instance?
I got only one line of output. This indicates that C++ is aware of the futility of running either constructor of the Point
struct when it’s creating the variable p2
: whatever happens, that data will be overwritten when p1
is copied into p2
.
Likewise, consider the following code:
1// creates a point and returns it.
2Point make_point() {
3 Point p;
4 return p;
5}
6
7int main() {
8 Point p1 = make_point();
9 return 0;
10}
Again, only one constructor is run here, even though two different points were created!
Remark 1. Move and Copy Constructors
These are examples of when move constructors and copy constructors are being used rather than the user-defined constructors. These are special commands that are run in the above scenarios, i.e. when a struct is being copied from one variable to another or when data is moved from temporary “return memory” to a more permanent variable.
The behaviour of these is determined by C++ by default, but there are ways to overload these special constructors if you need to!
Constructors upon Constructors and RAII
Now let’s create a class called Segment
, which just consists of two Point
objects for its endpoints. Let’s give it a constructor that accepts two Point
objects as arguments, one for each endpoint, and sets its two endpoints equal to them. We’ll forgo having a default constructor.
1// in Segment.hpp
2struct Segment {
3 Point p1;
4 Point p2;
5
6 Segment(Point new_p1, Point new_p2);
7};
8
9// in Segment.cpp
10Segment::Segment(Point new_p1, Point new_p2) {
11 cout << "Segment constructor" << endl;
12 p1 = new_p1;
13 p2 = new_p2;
14}
Now think carefully: what is the output of the following code?
One might expect that the default constructor for point is run first, followed by the custom constructor for point, and finally the constructor of segment. But upon running this code, the default point constructor is run an extra two times before the constructor of segment is run. What gives?
Casting a long shadow over our heads is the concept of RAII: “Resource Acquisition is Initialisation”. Constructors are called the precise moment C++ creates a space in memory for a struct. This is seen when we run the command Point p;
, but it also happens here in the background when we run the command Segment s(p1, p2);
.
Before the body of segment’s constructor is even run, C++ needs to create space for a Segment
variable, and this means creating space for two Point
variables. It’s at this very moment that the constructors for Point
are run, hence the extra output. To reiterate, the moment memory space for a Point
variable is run, it’s immediately and automatically initialised using its constructor.
Initialisation Lists
But this is so obviously wasteful in this example! To make C++ skip this default initialisation step, we can instead use the initialisation list:
1Segment::Segment(Point new_p1, Point new_p2) :
2 p1(new_p1), p2(new_p2) {
3
4 cout << "Segment constructor" << endl;
5}
Note the :
between the parentheses and the curly braces, and also the commas between the items on the initialisation list. Running our preceeding program now shows only two Point
constructors being called — exactly what we want to happen.
Beyond this, you can also explicitly call other the constructors of the Point
class from the initialisation directly. For instance, we may write the following constructor for the Segment
class, which takes four double
s, two for each point.
1Segment::Segment(double x1, double y1, double x2, double y2) :
2 p1(x1, y1), p2(x2, y2) {
3 cout << "Segment 4 double constructor" << endl;
4}
Now, the code Segment s(0, 0, 0, 1);
calls the custom Point
constructor twice before running its own constructor’s main body.
Observe now, however, that while the command Point p;
works just fine with no hitches, the command Segment s;
causes a compilation error. Again, RAII: in order to create a Segment
variable, C++ needs to know how to initialise it. However, there was no default constructor in our struct, so C++ freaks out and has a breakdown.
This is not necessarily a bad thing, and not every struct or class will need a default constructor. The point of these things is to make sure that things are getting set up the right way. If it doesn’t make sense to create a struct without some initial information, then we shouldn’t use a default constructor anyways!
This does have ramifications elsewhere, however. Consider the following (horridly designed) struct and accompanying code:
1struct Triangle {
2 Segment s;
3 Point p;
4
5 Triangle(Segment new_s, Point new_p) {
6 p = new_p;
7 s = new_s;
8 }
9};
10
11int main() {
12 Segment s(0, 0, 0, 1);
13 Point p(1, 1);
14 Triangle t(s, p);
15
16 return 0;
17}
Without the line Triangle t(s, p);
, this code runs just like it did before. However, this particular line of code produces a compilation error. The Segment s;
inside the Triangle
struct must be initialised before the body of the constructor is run. However, Segment
doesn’t have a default constructor! Thus C++ complains and throws a tantrum.
This tangled behaviour of C++ constructors is a source of some particularly painful bugs to have to squash…
Remark 2.
The C language also has structs, but they are not allowed to have either member functions or constructors in them! You have to manually initialise every member variable in a C struct or wrap this functionality in a function.
For this reason, the struct
keyword is typically used for data types that are nothing more than a group of simple variables rather than data types that have member functions. This is a strictly cosmetic difference and has no real-world impact.
If we have time to discuss it, here’s a challenge problem to chew on. You can solve this without any structs at all, but I think writing out the algorithm step-by-step in words and abstracting all the nouns to appropriate data types helps immensely.
Challenge 3. Continued Fractions
The irrational number \(\pi = 3.1415926535\ldots\) can be approximated using continued fractions: one has \[\pi = 3 + 0.14159\ldots = 3 + \frac{1}{7 + 0.06251\ldots} = 3 + \frac{1}{7+\frac{1}{15+0.99659\ldots}}\] This process can be terminated at any step by dropping the decimal. For instance, stopping after the first fraction gives the well-known approximation \[\pi \approx 3 + \frac{1}{7} = \frac{22}{7}.\] Use this process to determine a rational approximation of \(\pi \) accurate to \(7\) decimal places.