Reflections on maths, learning and maths learning support, by David K Butler

Tag: explanations

  • The reorder of operations

    The community of maths users the world over agrees that when evaluating an expression or calculation, some operations should be done before others. Mostly it’s to prevent us having to be needlessly specific about what order to do calculations in, mathematicians being very concerned with efficient communication.

    My problem with the order of operations as usually stated is this: it’s wrong! It’s wrong because you almost always don’t do the operations in the order described. You don’t do all the multiplications before all the additions, and instead will often quite a few of the additions first because they are easier. And you don’t do the multiplications in the order they come, but rearrange them into some other easier order. And you often don’t do the brackets first, but instead choose expand them out in order to make the calculation easier.

    You can read the rest of this blog post, and the other posts in the series across the years, in PDF form here. 

    The titles of the five blog posts are:

    • The reorder of operations
    • (Holding it together)
    • The Operation Tower
    • Replacing
    • Sticky operations
  • Obscuring the GST by making it simple

    I was helping out at Roseworthy Campus yesterday as the Vet Medicine students were learning about budgeting for a Vet Clinic as a business. One aspect of this was calculating the amount of the cost of goods and services that was GST (stands for “Goods and Services Tax” – in other countries it’s known as VAT or Sales Tax). The Excel sheet they were working in already had the formula worked in and it was this: GST = (Total Price)/11.

    You can read the rest of this blog post in PDF form here. 

  • The advent calendar function

    In a previous post I discussed how we need ways to think about functions that are not curves on an x-y-plane. Well I have a seasonally-appropriate one for you: the Advent Calendar.

    The advent calendar I have in mind is the kind where there is a little cardboard door for each day in December up to Christmas, and behind each door is a little chocolate. (Yes I know it might just have a picture, but seriously, the chocolate ones are better, right?)

    An advent calendar. Each window has a picture of a cat on it. A hand is holding one of the windows open, showing an empty space where a chocolate used to be.

    Most of the time when we’re not drawing graphs, we talk about a function as a sort of machine, which takes some sort of input and produces some sort of output. This is a very dynamic view of function which I like very much. I imagine putting an object (usually a number) in a funnel at the top and the machine churns and whirs and gurgles until a new object (usually another number) shoots out of a chute at the bottom.

    But a function doesn’t necessarily have this time element. The set-theory definition of function is simply a correspondence between one set and another, so that every object in a domain has associated with it one object in a codomain. (The domain is the set of things we usually call “inputs” and the codomain is the set of things from which we choose the “outputs”.) In this sense the “output” is there all the time whether we calculate it or not.

    This is where the advent calendar comes in. The chocolate is there whether you open the door or not. Opening the door to see for yourself what shape the chocolate is corresponds to what we do when we calculate the value of the function. But fundamentally all the numbers in the domain have a value for the function before you calculate it, just like all the unopened doors have a chocolate.

    I find this particular picture works well for vector calculus, where every point in a plane or in space has a function value, which may be a number (in the case of a “scalar field”) or a vector (in the case of a “vector field”). In the vector field case, the vector you find doesn’t really interact with the ordinary points, and indeed the vector “output” at one point doesn’t really interact with the ones at the other points. It’s almost as if at every point there is a little room where the vector lives all by itself. All we need is a little door to let us into this little room…

  • Rotation confusion

    I had a long chat with one of the students the other day about rotation matrices. They had come up in the Engineering Physics course called Dynamics as a way of finding the components of vectors relative to rotated axes. He had some notes scrawled on a piece of paper from one of my MLC tutors, which regrettably were not actually correct for his situation. I know precisely why this happened: rotation matrices are used in both Dynamics and Maths 1B, but they are used in different ways (in fact, there are two different uses just within Maths 1B!). It’s high time I made an attempt to clear up this confusion, especially since three more students have asked me about this very issue in the last week!

    In Maths 1B, you learn about Linear Transformations, which are a special kind of function that you enact upon vectors in some dimension to produce vectors in some dimension. It turns out that all linear transformations can be described by representing your vector as a column of coordinates and multiplying it by a matrix. Each linear transformation has its own matrix that works for all the vectors it acts upon. Rotations happen to be a type of linear transformation and in two dimensions there is a formula based on the angle you rotate that tells you what the matrix is. I’ve included just such a matrix in the picture here.

    A diagram with two sets of coordinate axes. The first is labelled "BEFORE" and shows three points with the vectors from the origin to there. A green arrow starting from each point shows the direction they will rotate. A note below says "original coords" and has a column vector x, y. The second is labelled "AFTER" and shows the points rotated, with a green arrow ending at each point showing where it rotated from. A note below says "new coords" and has a matrix multiplied by a column vector x, y. The matrix f

    One reason this works is because multiplying a matrix by your standard basis vectors of (1,0)T and (0,1)T gives you the first and second columns of your matrix respectively. But multiplying by the matrix has the same effect as the rotation transformation, so to figure out what these columns actually are, all we have to do is rotate the points (1,0) and (0,1). If you do this, then because of trigonometry, you get the two points (cos θ, sin θ) and (-sin θ, cos θ), which are indeed the columns of the matrix.

    A set of coordinate axes with the points (1,0) and (0,1) marked. Each has a green arrow curving anticlockwise starting at it labelled theta. A triangle sits in the first quadrant with one edge on the x-axis, and with its hypotenuse ending where the green arrow from the (1,0) ends. Its hypotenuse is labelled 1, its horizontal edge labelled cos theta and its vertical edge lavbelled sin theta. Another triangle sits in the second quadrant with one edge on the y-axis, and with its hypotenuse ending where the gre

    Let’s just make sure we know what’s going on here before we move on: You have a point in the 2D plane, you take its coordinates as a column, you multiply this column by the matrix, and you produce a new set of coordinates, which is a new point. So your matrix in effect moves your point from one place to another. The point with coordinates (1,0) moves to the point with coordinates (cos θ, sin θ); the point with coordinates (0,1) moves to the point with coordinates (-sin θ, cos θ).

    So now we have that a rotation matrix has cos θ on the main diagonal, sin θ in the bottom left corner and -sin θ in the top right corner. And it tells you where a point moves to under a rotation of θ anticlockwise. (It’s worth noting that it also works perfectly well on the components of vectors imagined as arrows.)

    The problem is that over in Dynamics, a rotation matrix does not look quite like this! In particular, the minus sign is in the opposite corner. Why?

    The answer is that in Dynamics the rotation matrix is not a description of a transformation of the points or arrows themselves, but a description of how their coordinates change when you transform the coordinate axes. The points themselves don’t move at all, it’s the coordinate axes that move and we just relabel the points with new coordinates.

    A diagram with two sets of coordinate axes. The first is labelled "BEFORE" and shows three points with the vectors from the origin to there. A green arrow starting from each coordinate axis shows the direction they will rotate. A note below says "original coords" and has a column vector x, y. The second is labelled "AFTER" and shows the axes rotated, with a green arrow ending at the end of each coordinate axis showing where it rotated from. A note below says "new coords" and has a matrix multiplied by a col

    The reason this works is again because of the standard basis vectors. The point (1,0) has its coordinates recalculated according to the new axes, and its coordinates turn out to be (cos θ, -sin θ); while the point (0,1) also has its coordinates recalculated and its coordinates turn out to be (sin θ, cos θ).

    Two overlayed sets of coordinate axes. The first set is dark blue and is arranged east-west and south-north with the points (1,0) and (0,1) marked. The second set are in a lighter blue and share an origin with the first, but are rotated anticlockwise. There are two green arrows curving from the first set of axes to the next, labelled theta. A triangle sits with hypotenuse along the dark blue x-axis from the origin to the point (1,0) and right angle on the light blue x-axis. The side on the light blue x-axis

    You may notice that this is precisely what the coordinates would have been if you did rotate the points themselves, but in the opposite direction to the original rotation matrix. This makes sense. If you turn your head to match the new coordinate axes, then this is precisely what has happened. Basically, if you rotate the coordinate axes one way, the points “move” the other way relative to the axes.

    And this would be the end of the story, except that in Maths 1B you also rotate coordinate axes, and yet the rotation matrix is somehow still not the same as the one in Dynamics! Why?

    The reason is that in Maths 1B we rotate axes in the context of equations of curves, and this is quite a different situation from when you rotate axes in the context of the points themselves.

    Imagine I have an equation which describes a curve. A point is part of the curve if its coordinates satisfy the equation, and it’s not part of the curve if its coordinates don’t satisfy the equation. But what if I relabel all the points with new coordinates according to a new set of axes? I want an equation for my curve so that a point is on the curve if its new coordinates satisfy the new equation. How do I achieve that? Well I do already have an equation, it’s just in terms of the old coordinates. So if I have a point in the new coordinates, to tell if it’s in the curve, I just need to figure out what the old coordinates are and sub them into the old equation. It ought to be possible to make one equation that encompasses both of these actions – the transferring to the old coordinate system and the subbing into the old equation.

    Did you notice what happened there? In order to create an equation that described the same curve relative to the new axes, I had to begin with the new coordinates and transform them into the old coordinates. Let me repeat: I had to go from new to old. The coordinate transformation matrix in Dynamics goes from old to new. To go in the opposite direction I have to have the minus in the opposite corner.

    A diagram with two sets of coordinate axes. The first is labelled "BEFORE" and shows an ellipse oriented at an angle to the axes.  A green arrow starting from each coordinate axis shows the direction they will rotate. A note below says "original coords" and has a column vector x, y. The second is labelled "AFTER" and shows the axes rotated, with a green arrow ending at the end of each coordinate axis showing where it rotated from. The curve is still there and the new axes align with the longest and shortest

    So that’s why the matrices are different. In Dynamics you are moving the axes but not the points, and finding new coordinates for the points. In Maths 1B you are moving the points, not the axes, so the rotation appears to be in the other direction. Or alternatively in Maths 1B you are moving the axes, but you already know the new coordinates and you want the old ones, so you actually are doing the calculation in the opposite direction.

    I’m glad we cleared that up!

  • Archimedes’s Integrals

    One of my staff (thanks Fergus) told me ages ago about Archimedes’ proof that the volume of a sphere is 4/3 π R3 (where R is the radius of the sphere). It is a very very cool proof and it’s high time I shared it! One of the reasons it is so cool is that it uses the concept that a volume can be produced by stacking up a whole lot of thin slices. This is the idea behind integration, and Archimedes used this idea thousands of years before Newton or Leibniz.

    So here is a modernised version of the proof:

    We’re going to assume we already know a few things:

    1. How to calculate the area of a circle.
    2. Pythagoras’ theorem.
    3. The fact that all the points on a sphere are the same distance from the centre (the radius).
    4. The fact that a cone has a third the volume of a cylinder with the same base and height.
    5. How to calclate the volume of a cylinder.

    Ok, now let’s get a sphere of radius R and slice it in half to make a hemisphere. Next we’ll get a cylinder of radius R and height R, and we’ll scoop a cone out of it to make a sort of bowl (see the pictures).

    A diagram of a hemisphere and a cylinder the same height as the hemisphere with a cone scooped out of it.

    The volume left for our cylinder-minus-cone bowl is two thirds of the volume of the cylinder (since the cone is a third the volume of the cylinder). We’re going to show that the hemisphere has the same volume.

    We are going to do this by working our way up from the bottom and slicing both shapes into very thin slices. If we can show that at every possible height the slice for the bowl has the same area as the slice for the hemisphere, then it must be true that the two shapes have the same volume.

    A hemisphere and a cyclinder with a cone scooped out. Each has a bright green shape shown where both have been sliced at

    So suppose we are at height y above the base:

    Consider the bowl: when we slice the bowl, we will get a circle of radius R with a hole in the middle. The edge of this hole is on the outside surface of the cone. Now a line in the curved surface of the cone has slope 1 (since it goes across R and up R), so since we’ve gone up y, we must also have gone across y to get to the edge of the cone. That is, the radius of the circular hole is y. Hence the area of the slice at height y is π R2 – π y2.

    Consider the hemisphere: when we slice the hemisphere, we will get a circle – let’s say the radius of this circle is x. Now a point on the edge of this circle is a distance x from the vertical centre line of the hemisphere, and it’s a distance y above the base, and it’s a distance R direct from the big centre of the hemisphere in the middle of the base. Using Pythagoras’ theorem, we get that x2 = R2 – y2. But the area of our circular slice is π x2, which is equal to π (R2 – y2) = π R2 – π y2.

    A hemisphere and a cyclinder with a cone scooped out. Each has a bright green shape shown where both have been sliced at. The height of the slice is labelled y, the radius of the sphere/cylinder is labelled R, and the radius of the slice of the hemisphere is labelled x.

    So the areas of the slices at every height are the same! So when we stack up those areas to produce a volume, the two volumes must be the same!

    Since the bowl has two thirds the volume of the cylinder, this means our hemisphere’s volume is also two thirds the volume of the cylinder! Therefore a whole sphere must be four thirds the volume of the cylinder.

    Now the volume of the cylinder is the base area times the height, which is π R2 × R. So therefore the volume of the sphere must be 4/3 π R3.

    How cool is that?!


    These comments were left on the original blog post:

    David Robers 24 July 2014:
    Fact 4 is not _hugely_ obvious… One could look up a proof online, but how did the Greeks do it?

    David Butler 25 July 2014:
    Can’t find information about how they did the volume of a cone. However my instinct is that it is by analogy to the volume of a pyramid. If you take a cube and construct inside it a pyramid with base the same base as the cube and vertex at one of the top vertices of the cube, then you can observe that three such pyramids can fit neatly into the box, so a square pyramid has a third the volume of a cube. By stretching you can see that a recangular-based prism has a third the volume of a prism with the same base and height. By analogy the volume of the cone ought to work

  • Where’s the t?

    Once upon a time, I lectured Maths 1A calculus, and when I got to teaching hyperbolic trig functions I put a great deal of effort into making sure they were well-connected to other ideas the students knew. So I listed the properties of ordinary trig functions and alongside I listed the matching properties of hyperbolic trig functions.

    One of the major differences/connections is that as the value of t changes, the point (cos t, sin t) traces out a circle with equation x2 + y2=1, whereas the point (cosh t, sinh t) traces out a rectangular hyperbola with equation x2 – y2 = 1. I showed them how this happens using GeoGebra.

    This was very well received by the students. But after the lecture, one of the students came up and asked me what the t represented for the hyperbolic trig functions. With the trig functions, t is very well defined as the distance measured anticlockwise around the circle starting at the point (1,0): If you go a distance t along the edge of the circle, then the point you come to is (cos t, sin t). That’s very pleasant.

    But what is t for the hyperbolic trig functions? It’s certianly not the length along the curve. In fact, there is (and can be) no formula for the arc length of a hyperbola using your standard suite of functions. I had to admit to the student that I didn’t know and that I wasn’t sure it actually had a geometric interpretation at all.

    Well, it turns out I was wrong. There is a very good geometric representation: the value of t is not a length but an area!

    What you do is draw a line from the origin to the hyperbola, and allow that line to sweep out an area as your point moves along the hyperbola. When this area is t/2, then the point you come to is (cosh t, sinh t). Cool huh? (Finding that area using integrals that don’t involve using cosh and sinh is a tricky exercise though!)

    A graph showing the right-hand branch of a hyperbola, with a point on it marked (cosh t, sinh t), and a line segment from the origin to that point. The area between this line, the curve and the x-axis is coloured blue and marked with a label "half of t"

    And the best part about it is that the area interpretation is true for the trig functions too! When the length around the unit circle is t, then the area of the sector created is t/2 (by the formula for the area of a sector). So we have the same interpretation for both!

    A graph showing a circle, with a point on it marked (cos t, sin t), and a line segment from the origin to that point. The area between this line, the curve and the x-axis is coloured blue and marked with a label "half of t"

    I love it when things work out so neatly.

    PS: A handout comparing the properties of trig and hyperbolic trig functions can be found here . I didn’t do it earlier because it gives away the punchline in the first row of the table!

  • Two kinds of division

    If you had to explain what the expression “10 ÷ 5” (that is “10 divided by 5”) meant, what would you say? To be clear, I’m not asking for the answer, I’m asking for a story that will give it meaning.

    I’ve been asking people this for the last few days and there are two main stories:

    1. I have 10 things to split into 5 groups; 10 ÷ 5 is how much is in each group.
    2. I have 10 things to split into groups with 5 in each group; 10 ÷ 5 is how many groups there are.

    Most people only say one of these two, which is interesting because only knowing one of them can get you into all sorts of trouble when it comes to solving actual problems.

    If you only know it as “how many groups of 5 fit into 10” then you’re going to have to think quite hard to figure out how many each person gets when you share 10 among 5. And it would be even worse if it wasn’t a whole number of objects shared among a whole number of people but, say, a number of moles of chemical shared across a number of litres of water to make a concentration. Indeed, both perspectives on division are often needed in the same drug calculation problem in nursing and medicine!

    As a teacher you can get into trouble too: consider the meaning of “10 ÷ 1/2”. The first interpretation would give you “I have 10 things and I split them into half a group; 10 ÷ 1/2 is how much in each group.” While this is correct (and quite interesting actually), it makes much less sense than “I have 10 things and I split them into groups with half a thing in each group; 10 ÷ 1/2 is how many groups there are”.

    Mathematicians have the tendency to say that division is simply the inverse of multiplication (so that “10 ÷ 5” means “the solution to 10 = 5x”). But this denies that the understanding and use of maths is deeply connected to how we picture it. When two pictures explain the same maths, we’ve got to be both aware and careful!

    (PS: For those interested in a bit of Maths Education terminology, Meaning 1 listed above is called “Partitive Division” and Meaning 2 is called “Quotative Division“. It took me ages to figure out what they were going on about at my first Maths Education conference! Oh, and there are in fact more ways than these to think about division, corresponding to the many ways there are to think about multiplication!)

  • The Right Hand Rules

    Students in Maths 1M are learning the cross product at the moment. This is a way to multiply two vectors in 3D space – let’s call them v1 and v2 – to produce a new vector, which is called v1 × v2. The length of this new vector is related to the lengths of the two original vectors and the angle between them, and the direction is perpendicular to both of the original vectors. However there are two possible directions it could point and still be perpendicular to both. We need a consistent way to choose which of the two options to use, and this is provided by the so-called “right-hand rule”.

    A hand with thumb, forefinger and middle finger stretched in three directons. An arrow on the index finger is labelled v1, an arrow on the middle finger is labelled v2, and an arrow on the thumb is labelled v1 cross v2.

    I was taught the right-hand rule as per the picture here: you extend your thumb and forefinger as far as they go, and then stick out your middle finger. Then arrange it so your index finger points in the direction of v1 and your middle finger points in the direction of v2 . Then your thumb must point in the direction of v1 × v2. I have very strong memories of looking around in my first-year Physics exam to see people in various contortions as they used the rule.

    It’s a cute little rule and does the job well, but in fact it is not the only correct version of the right-hand rule. Often it is taught as if it’s the only possible way to do it, but to be honest it’s just a mnemonic so some other version is actually ok! Especially if you happen to find the other version easier to use and remember. I thought I’d put the various versions I know here to compare the alternatives.

    A hand with thumb, forefinger and middle finger stretched in three directons. An arrow on the index finger is labelled v1, an arrow on the middle finger is labelled v2, and an arrow on the thumb is labelled v1 cross v2.

    First, let’s look at a very slightly modified version of the original three-finger version. I don’t know about you, but it tends to hurt my hand to have the middle finger out but the other two curled under (it reminds me of the pain of trying to do the Vulcan salute too often). So I like to do the version shown on the right, where I point all three of the other fingers outwards. It’s not really different from the first version, but it sure is less painful!

    A hand with thumb and middle finger stretched in three directons, and the remaining fingers stretched in another. An arrow on the thumb is labelled v1, an arrow on the index finger is labelled v2, and an arrow on the remaining fingers is labelled v1 cross v2.

    The next version is pictured here on the left. The fingers are in the same arrangement, but different fingers relate to different vectors.  In this alternative version, the thumb is v1, the index finger is v2, and the other fingers are v1 × v2. Personally I find this one easier to use and more realistic because you can move your thumb to indicate smaller and bigger angles between v1 and v2. Other people like the first version because they think of their index finger as their first finger and so it makes sense to them for it to be the first vector.

    A hand with palm outstretched and thumb pointing sideways. An arrow on the thumb is labelled v1, an arrow on the index finger is labelled v2, and an arrow pointing perpendicularly outwards from the palm is labelled v1 cross v2.

    From this alternate version, we can ease the strain on our hand just a little more by realising that we don’t technically need the other three fingers to point outwards because our palm always faces that way anyway. So here is the “palm” version of the right-hand rule on the right: your thumb is v1, your fingers are v2, and v1 × v2 points out of your palm. This is my favourite, and not just because it’s the easiest to actually manipulate your fingers into shape! It gives to me a real sense that v1 and v2 are creating a plane and the cross product is pointing out of it.

    A hand in motion. The thumb is pointing up and the fingers are partway through sweeping inwards. An arrow where the fingers started is labelled v1, an arrow where the fingers are going is labelled v2, and an arrow on the thumb is labelled v1 cross v2.

    Even though the palm version is my favourite, there are two more versions I know about, so I’ll mention them too. The first I call the “sweep” version and it’s pictured on the left: you orient your four fingers towards v1, and then you sweep your fingers towards v2. If you do this, then your thumb will point in the direction of v1 × v2. There is something about the dynamic nature of this version that I do like — you really get the feel that the cross product is actually doing something to the two vectors because you’re moving your fingers. It does however take a greater leap of imagination than the others.

    A hand with the thumb pointing up and the fingers curling inwards. An arrow where the fingers start curling is labelled v1, an arrow where the fingers end curling is labelled v2, and an arrow on the thumb is labelled v1 cross v2.

    The final version I call the “curl” version and it’s shown on the right. It has the same idea as the sweep version, but you basically represent the sweep statically. You imagine v1 and v2 in a plane and you curl your fingers to represent the direction you need to rotate to go from v1 to  v2. Then your thumb will point in the direction of v1 × v2 . This one requires a lot of imagination again, but it does have an advantage that it is the same arrangment of your fingers that you use in Physics to find the direction of a torque or the direction of a magnetic field based on a flowing current. Interestingly, this is one of the most common to find by searching on the internet, and is the one described on Wolfram MathWorld.

    So there you go: that’s six versions of “the” right hand rule. Whichever of them makes most sense to you and causes least pain is fine for you to use – you can even make up your own if you like! But do remember other people may use different ones so take care when communicating with others to tell them how your version works!

  • My conic likes to hide in boxes

    Conics (or conic sections if you like) are very close to my heart. My PhD thesis was about conics and their higher-dimensional relatives, and way back in high school they were one of the bits I particularly loved. So it’s no surprise that I get excited each semester when the Maths 1B students study them.

    The students, on the other hand, get all worried about it. They wonder how they’ll remember the names of all the conics and quadrics, they get all confused about the procedure of figuring out what type of conic the equation represents, and they stress about the fact that drawing them seems so hard. This makes me sad, because they really are very very cool.

    One of the things I do to alleviate their pain, is to make the drawing part a little bit easier for them. I created a method of drawing conics in standard form that I like to call “The Method of Boxes”. You use the coefficients of the conic to draw a box, and the three different conics live in the box in different ways: the ellipse lives inside the box, the hyperbola lives outside the box, and the parabola lives through the box.

    A diagram showing how the three types of conics can be drawn using a box. Four versions of a parabola through the box, two versions of a hyperbola outside the box, and one version of an ellipse inside the box.

    (I’ve put an explanation of it on YouTube here http://youtu.be/PqBYj1UxJyM .)

    It’s a beautifully simple method, if I do say so myself, and it has the neat effect of stopping the students worrying about at least one aspect of learning about conics, thus leaving more room for actual learning. But the main thing it does is draw some important connections between the three conics.

    The students see the three conics as fundamentally different and so they keep them in their heads separately. The box method literally draws a connection between them – you can draw them all by starting with a box. This connects them all together so they can keep them in their heads in the same place. And the more connections there are, the more you feel you understand.

  • The sizes of infinity

    Last week a student visited the Drop-In Centre to talk about the different sizes of infinity. His lecturer had been talking about the sizes of sets and had made an off-hand comment that there were different sizes of infinite sets, and he wanted to know what the hell that meant.

    So I explained it. It’s not the simplest explanation, because you have to define some new ideas in order for it to make sense. But it is one of my favourite things about the families of number that some are the same size and some are different sizes, even though they are all infinite.

    So I want to explain it again here…

    The key to the whole thing is to realise that when talking about infinite sets, that counting is not really a useful way to find the sizes of sets. When you have a lovely finite set, such as the letters in the word “sluiced”, you can simply count the members of the set and you know how many there are (7 in this case). But the process of counting works because at some point you stop, and the number you’re up to is the number of objects there are.

    But what if you can’t stop? If your set has infinitely many objects in it, you’ll never get to a final number where you can stop and say “this is how many”. We need a new way to talk about the sizes of sets, and the way that we use is in some sense even more fundamental than counting.

    If you had two piles of things and were asked which pile has more, you’d probably count both and whichever had the bigger number would be bigger. But there’s another way: you could pair them off, one object in each pile, and if one pile runs out before the other one does, then you know that the other pile must be bigger. And if you are able to pair off everything in both piles, you know they must be the SAME size. THAT is the solution to our problem.

    We DEFINE that two sets are the same size when you can pair them off exactly. That is, there is some way to attach to each object in Set 1 a different object in Set 2, and in this way to cover all the objects in Set 2. If this is impossible, then the two sets are different sizes.

    So let’s look at the five families of number: the natural numbers, the integers, the rational numbers, the real numbers and the complex numbers – N, Z, Q, R and C. (Actually I’m going to ignore C, sorry.)

    The natural numbers, being the set of numbers {1, 2, 3, 4, …}, is what most of us think of when we think of an infinite set. We start counting how many objects it has, but we can never stop because the numbers don’t stop. So we have infinitely many. But it’s quite close to the finite sets, because we can at least try to count them. They call the size of N the countable infinity.

    Now let’s compare the sizes of N and Z. If they were the same size, it would be possible to find a way to line them up next to each other. That is, you could find one integer to call 1, one to call 2, one to call 3, and so on in such a way that you cover all the integers.

    Well, look at this:
    1 ↔ 0
    2 ↔ 1
    3 ↔ -1
    4 ↔ 2
    5 ↔ -2
    6 ↔ 3
    7 ↔ -3
    etc

    We have given every integer a different natural number and covered all the integers and all the natural numbers, so by our definition, this means that N and Z are the same size.

    Cool huh?

    What about N and Q? Well, if they were the same size, we’d be able to line them up next to each other like with N and Z. This time the trick takes two steps.

    First we’ll consider just the positive rational numbers. I need a way of organising them into a list so that I can line them up next to the natural numbers and cover all the rational numbers. Well, the denominator and the numerator of a fraction must add to something, so why don’t I organise them into the ones that add up to 1, the ones that add up to 2, the ones that add up to 3 etc. Within each of them, I can organise them in order of the numerator. So if I do this I’ll get a list like this:

    1/1, 1/2, 2/1, 1/3, 2/2, 3/1, 1/4, 2/3, 3/2, 4/1, 1/5, 2/4, 3/3, 4/2, 5/1, etc

    Of course, I’ve included some of them twice (since for example 2/2 = 1/1 and 2/4 = 1/2), so I’ll just remove those ones:

    1/1, 1/2, 1/3, 3/1, 1/4, 2/3, 3/2, 4/1, 1/5, 5/1, etc

    And now I have a neat list of all the rational numbers with a clearly-defined way of getting to each of them. Now I’ll line them up with the natural numbers:

    1 ↔ 1/1
    2 ↔ 1/2
    3 ↔ 1/3
    4 ↔ 3/1
    etc

    In this way I can cover all of the rational numbers exactly once, so by our definition, this means that N and the positive numbers in Q are the same size.

    But now we can do the same trick we did with Z. We’ve got a list of positive rational numbers, and a list of negative ones (and 0 as well) and we’ll just start with zero and flip-flop between the positive list and the negative list and we’ll be good:

    1 ↔ 0
    2 ↔ 1/1
    3 ↔ -1/1
    4 ↔ 1/2
    5 ↔ -1/2
    etc
    So N and Q are the same size.

    But what about N and R? I’d need to think of a way of putting R into a decent order so I can line it up next to N. I can’t think of a good way, but just because I can’t, it doesn’t mean it’s impossible – I’ll only know for sure it’s impossible if I prove it. And the best way to prove that something is impossible is to pretend you’ve done it and try to show that this concept is stupid – that is, a proof by contradiction.

    It’s way too hard to do this with all the real numbers, so I’ll just look at the ones between 0 and 1. All these numbers (except 1) will be “0.something”, and the “something” is a decimal expansion which may stop or may go on forever. For those that don’t go on forever, we’ll just say that it is infinite, but all the digits at the end are zeros. So we can represent each of the numbers from 0 to 1 as “0.something”, where the something is an infinite string of digits. (For the pedants, I’m ignoring here the possibility of having a number ending in an infinite string of 9’s.)

    Suppose we managed to line them up next to the Natural Numbers like this:

    1 ↔ real number
    2 ↔ real number
    3 ↔ real number
    etc

    All these real numbers have a decimal expansion, so let’s write down what that might look like:

    1 ↔ 0 . d11 d12 d13 d14 …
    2 ↔ 0 . d21 d22 d23 d24 …
    3 ↔ 0 . d31 d32 d33 d34 …
    etc

    Supposedly, we have covered all the numbers from 0 to 1 in this way. I’m going to show that in fact we haven’t covered all of them, by making a new number that’s not in the list we have.

    Look at the first digit in the first number: d11. Let’s pick some digit different to d11 and call it a11. (We should probably have a rule that says how to pick this different digit – let’s say that if d11 isn’t zero then we’ll pick a11 = 0, but if d11 is zero we’ll pick a11 =1.)

    Now look at the second digit in the second number: d22. Again we’ll pick a22 so that it’s different to d22. And we’ll continue this process for d33, d44, and so on.

    Consider the number 0.a11 a22 a33 …

    This number is not equal to the first number in our list, because its first digit is different. It’s not equal to the second number in our list, because its second digit is different. It’s not equal to the third number in our list, because the third digit is different. And so on. So our new number can’t be equal to any of the numbers in the list! So we didn’t cover all of the real numbers from 0 to 1 after all. But even if we include this number in the list, the same argument would show that there would always be at least one more, so the set of real numbers from 0 to 1 is not a countable infinity. Since the Real Numbers are bigger than just this little section, they can’t be a countable infinity of numbers either.

    So in a very real sense, the Real Numbers are bigger than the Rational Numbers, which means we have at least two different sizes of infinity!