PROLOGUE

skip this if you want

One of the things I managed to teach myself after dropping out of school is calculus. Before I knew what calculus is, merely hearing its name gave me the impression that it is one of the hardest topics in maths, and that I could not possibly learn it by looking into the freely-available resources online.

I was wrong.

It turned out that back then I was having a rather naive view of what mathematics is about. Understanding calculus is not hard at all. It is just a matter of whether you have stumbled upon the right resources to learn it. If you have difficulty understanding it, you are probably learning it the wrong way.

I am nowhere near being an "expert" in calculus (or anything like that). I have yet to allocate a vast amount of time working on sophisticated calculus puzzles and I don't really apply calculus to solve problems on a daily basis. I am nothing more than a kid who writes about the knowledge she or he acquired so that it may be used as a guidance for those who want to have a deep understanding in what they are learning or trying to learn.

Archy,
Oct 2014

Calculus is just a fanciful name for the study of change in maths. Calculus in general refers to the branch of maths that was made famous by Newton (considered one of the founders of calculus) in the 17th century. Don’t confuse it with Lambda calculus, propositional calculus, Situation calculus and unicorns, which are completely different things.

To understand calculus, one needs to be able to visualize the concepts of function, limit, differentiation, and integration.

● ● ●

What is a function?

A function can be seen as a machine that takes in value and gives you back another value. It is what we use in maths to map numbers (input) to other numbers (output).

A function is normally defined by an equation like this:

Now if you put 2 into this function you will get 12 in return.

The set of numbers that you can put into a function is known as the domain of the function. If you’d like to have a more in-depth understanding of function (e.g. its formal definition), check out my article on set theory. (You will probably love it.)

What is a limit?

A limit is the number you are “expected” to get from a function (or algebraic expression) when it takes in a certain input. By “expected” it is referring to the expectation of the output when $x$ “approaches” a certain value.

Here is an example where the limit (the expected output) is the same as the actual output.

The limit above is often read as “the limit of $\frac{5^x}{25}$ as $x$ approaches 2 is 1”. You can visualize “$x$ approaches 2” as a dot moving along the graph towards where $x = 2$ (and, as you can see, $y = 1$).

Now take this function for example:

Division by 0 is undefined. We would get $undefined$ when we put 3 into the function.

This is also why there is a hole in the graph of $f(x)$.

But looking at the graph, we can see that 0 is the expected value when $x$ approaches 3.

To compute the limit when the actual output is undefined, in normal case we simply need to simplify the algebraic expression to avoid division by 0. And then put the value $x$ is approaching into the function as $x$.

Limit can be viewed as the biggest/smallest impossible number for a function to output, when you put in numbers that are slightly smaller or bigger than what x is approaching.

Take the limit above for example:

$$\begin{align}f(x) &= \frac{(x^2-3x)^2}{x-3} \\\\\lim_{x\rightarrow 3}f(x) &= 0 \end{align}$$ 0 is the smallest impossible number for $f(x)$ to output if $f(x)$ takes in values slightly smaller than 3.

$$\begin{align}f(2.99999) &= -0.0000899994 \\f(2.9999999) &= -0.000008999994000001 \end{align}$$
No matter how close the input is to 3, as long as it is smaller than 3, it is impossible for the function to output 0 or any number bigger than 0. So 0 is the limit - the smallest impossible value when the input slightly is smaller.

0 is also the biggest impossible number for $f(x)$ to output if $f(x)$ takes in values slightly bigger than 3.

$$\begin{align}f(3.00001) &= 0.00009000059 \\f(3.0000001) &= 0.000000900000054 \end{align}$$
It is impossible for the function to output 0 or any number smaller than 0 if it is to take values slightly bigger than 3. So 0 is the limit - the biggest impossible value when the input slightly is bigger.

Here is a case where the situation is reversed.

$$\begin{align}f(x) &= -\frac{(x^2-3x)^2}{x-3} \\\\\lim_{x\rightarrow 3}f(x) &= 0 \end{align}$$
0 is the biggest impossible number for $f(x)$ to output if $f(x)$ takes in values slightly smaller than 3.

$$\begin{align}f(2.99999) &= 0.0000899994 \\f(2.9999999) &= 0.000008999994000001 \end{align}$$
And it is the smallest impossible number if $f(x)$ takes in value slightly bigger than 3.

$$\begin{align}f(3.00001) &= -0.00009000059 \\f(3.0000001) &= -0.000000900000054 \end{align}$$
This part before ● ● ● was added on 30th Oct, at 1AM SG Time after reading a useful comment on HN by amitkgupta84:
However, there are also situations when the limit is the value we will actually get when we put in value that is slightly smaller or bigger than what we are approaching.

$$f(x)=\left\{\begin{array}{ll}10 & \mbox{if } x > 1.51 \\20 & \mbox{if } x < 1.5 \\99 & \mbox{otherwise} \end{array}\right.$$
$$\lim_{x\rightarrow 1.503}f(x) = 99 \\ f(1.502999) = 99$$

● ● ●

Let's denote what x is approaching as $z$ and the limit as $L$.

$$\lim_{x\rightarrow z}f(x) = L$$
For every positive number, $|f(z-a) - L|$ can give us a number smaller than it, as long as $a$ is a number bigger than 0. This is also true when it is $f(z+a)$ instead of $f(z-a)$.

In short we state that

$$\text{For all }ε> 0\text{, there exists some }δ > 0 \\ \text{ such that whenever } 0 < |x − z| < δ\text{, we have }|f(x) − L| < ε$$
This is one way of formally defining what a limit is. It is known as the (ε, δ)-definition of limit.

What is differentiation?

Differentiation is a fanciful name for the process of obtaining a derivative. And a derivative is a function that gives you the “slope” (or rate of change) of another function at a certain point.

Basically, differentiation can be seen as a machine that takes in a function and gives you back another function.

We all know that the slope of a linear equation/function has a constant value. It has the same rate of change at every point on the line. So the graph of a linear equation/function is indeed a straight line.

For every value of 1 added to the input, a value of the slope (in this case, 4) would be added to the output.

The change between input and output is always in the ratio of 1 to $m$, where $m$ is the value for the slope. Adding 0.1 to the input will trigger a change of $+0.1m$ in the output, etc.

But this is only true for linear function. For functions that are not linear, every point doesn’t have a constant rate of change. Take a look at the function $f(x)=2^x$.

This is when we cannot use a constant to represent the “slope”: we need to use a function that gives us different “slope” (rate of change) at different point. Or rather, we shouldn’t even call it a “slope” anymore since the rate of change is not constant. We call it derivative.

If you are still trying to get your head around what it means by “the rate of change is not constant”, imagine a cat at rest starts to accelerate constantly at $1m/s^2$. Three seconds later it would be moving at the speed of $3m/s$. So far the rate of change in its speed is constant. But when the cat begins to slow down, the rate of change would no longer be constant.

How to find the derivative of a function?

The rate of change at a certain point in the function can be visualized as a straight line across the point.

This is normally called the tangent line at a certain point.

Since a non-linear function doesn’t have a constant rate of change, it means that every point of the function needs a different straight line to represent the rate of change.

To find the derivative of a function is to find another function that will output a value at every point, which can be represented by a corresponding straight line.

The rate of change at point $x$ is therefore sometimes also known as the derivative at point $x$.

To get the straight line that represents the derivative at a certain point, one way to do it is to first locate 2 different points on the graph and draw a line across them.

Some people refer to this line as a secant line.

This line represents a value. It represents the rate of change if the 2 points belong to a linear function.

But now this function is not linear. So this line is not accurate at all if we try to use it to represent the rate of change at any one of the 2 points.

Apparently, the closer the 2 points are, the more accurate the line is in representing the rate of change at any one point.

With that in mind, let’s start off with a line that cuts across 2 points (A and B), and shift the line accordingly as we shift B towards A along the graph. By the time B is at the same point as A, we would have a line that can accurately represent the rate of change at point A - we would have a line that represents the derivate of the function at point A.

By turning the process above into a function (denoted by $f’(x)$), this function would be the derivative of the function of the graph (denoted by $f(x)$).

Here we are denoting the coordinate of point B as $(x+h, f(x+h))$. And what we are doing is obtaining a value for the straight line (the “slope”/derivative) at point A as we shift point B to A by making $h$ approach 0. So the coordinate becomes $(x+0, f(x+0))$, which is the coordinate for point A.

This is the modern definition of derivative in terms of limit.
Besides $f'(x)$ [aka Lagrange's notation], here is another common notation for differentiation: $$\frac{d}{dx}f(x)$$ Gottfried Wilhelm Leibniz, a great German mathematician, came up with this notation in the 17th century when he was still alive. He came up with it by seeing $dx$ as an infinitesimal (infinitely small) change in $x$ [the input of $f(x)$]. And the single $d$ at the numerator here is supposed to be merged with $f(x)$ to indicate an infinitesimal change in $f(x)$ [the output of $f(x)$]. $\frac{d}{dx} f(x)$ is sometimes written as $\frac{dy}{dx}$, where $y$ stands for the output of $f(x)$. This is actually a more ancient way of defining a derivative: a derivative tells you the rate of change of the output at an infinitesimal scale as the input changes. Therefore it is $dy$ [or $d\ f(x)$] divided by $dx$ - an infinitesimal change in $f(x)$, divided by an infinitesimal change in $x$. So we are defining derivative in terms of a fraction ($\frac{dy}{dx}$) here, instead of a limit. We call this way of thinking about derivative non-standard analysis.

Let’s say we want to get the derivative of $x^3$. The straightforward way to do it is to compute the limit.

But we can speed up the process by using shortcuts like this.

Here is a list of famous shortcuts for differentiation.

$\frac{d}{dx} c = 0$

$\frac{d}{dx} c f(a) = c \frac{d}{dx} f(a)$

$\frac{d}{dx} x = 1$

$\frac{d}{dx} cx = c$

"Power rule"

:
$\frac{d}{dx} x^n=nx^{(n-1)}$

"Sum rule"

:
$\frac{d}{dx} f(x) + g(x) = f'(x) + g'(x)$

"Product rule"

:
$\frac{d}{dx} f(x) \cdot g(x) = f'(x)g(x) + g'(x)f(x)$

"Chain rule"

:
$\frac{d}{dx} f(g(x)) = f'(g(x))g'(x)$

"Quotient rule"

:
$\frac{d}{dx} \frac{f(x)}{g(x)} = \frac{f'(x)g(x) - g'(x)f(x)}{g(x)^2}$
$\frac{d}{dx} \frac{1}{x} = \frac{-f'(x)}{x^2}$
$\frac{d}{dx} c_1{^{ax}} = c_1{^{ax}} ln(c_1) a$

$\frac{d}{dx} e^x = e^x ln(e) = e^x$

$\frac{d}{dx} x^x = x^x (1+ln(x))$

$\frac{d}{dx} \log_c(x_2) = \frac{1}{x_2\ln{c}}$

$\frac{d}{dx} ln(x_1) = \frac{1}{x_1}$

$\frac{d}{dx} ln(|x|) = \frac{1}{x}$

$\frac{d}{dx} sin(x) = cos(x)$

$\frac{d}{dx} cos(x) = -sin(x)$

$\frac{d}{dx} tan(x) = -sec^2(x)$

$\frac{d}{dx} sec(x) = sec(x)tan(x)$

$\frac{d}{dx} csc(x) = -csc(x)cot(x)$

$\frac{d}{dx} cot(x) = -csc^2(x)$

$c$ is a constant. $c_1$ is a constant $>0$.
$n$ is an integer.
$x_1$ is a variable $>0$.
$x_2$ is a variable $>0$ but $\neq 1$.

These shortcuts can all be derived from the limit of $\frac{f(x+h)-f(x)}{(x+h)-x}$ above. They are sometimes referred to as “differentiation rules”. Calling them “rules” certainly makes them sound like some fundamental principles in calculus, but the truth is they are merely shortcuts to speed things up.

● ● ●

I shall now introduce a new concept called anti-differentiation. It is just the inverse of differentiation. It can be seen as a machine that takes in a function, $g(x)$, and give you back another function, $f(x)$, whose derivative is $g(x)$. Here, $g(x)$ is known as the anti-derivative of $f(x)$.

Apparently, we will get the same derivative when differentiating functions that are only different at the part where it adds (or subtracts) a constant.

As shown above, we can see that no matter what the constant is, it would always give us back the same function if we differentiate it. So, in order to be technically correct, we would need to put a placeholder, $+ C$, in the end of the function to indicate that the anti-derivative can be any function that plus some constant (or minus some constant for cases when $C$ is negative).

What is integration?

Integration is a fanciful name for the process of finding integral. Integral here is referring to either indefinite integral, or definite integral. (Therefore sometimes we say indefinite integration or definite integration just to be specific.)

Indefinite integral is fundamentally equivalent to anti-derivative. They are basically the same thing.

This is shown in the first fundamental theorem of calculus.
The $dx$ in the equation indicates that $x$ is the input we are integrating $g(x)$ with respect to. A function can take in more than one input [e.g. $f(x,q) = x^2+q^3$] so there are times when it is crucial to specify which input we are integrating the function with respect to [e.g. Is it $x$ or $q$? Integrating the function with respect to a different input would give us a different result].

Definite integral can be defined as the difference between 2 identical anti-derivatives that take in different inputs.

This is often read as “the definite integral of g(x) from a to b”.

This definition of definite integral is derived from the second fundamental theorem of calculus.
$a$ here is sometimes referred to as the lower limit, while $b$ is referred to as the upper limit.

Here is another way to define what a definite integral is. Imagine you want to find out the area of the region under a curve in a graph.

You can approximate the area of the region by drawing rectangles under the curve and adding the area of these rectangles together. The smaller the width of the rectangle is, the more rectangle there would be, and the more accurate the approximation would be.

This can actually be written into a summation. Here $n$ is the number of rectangles.

This is known as the Riemann sum. $\frac{b-a}{n}$ is the width of each rectangle, and $g(a+ \frac{b-a}{n}i)$ is the height.

We would be able to find the actual value for the area if we get the limit as $n$, the number of rectangle, goes to infinity. Infinity here can be imagined as a theoretically enormous number that is bigger than every real number on the number line. This limit is another way of defining definite integral.

Integral defined in this way is often known as Riemann integral. When the limit exists, we say the function is Riemann-integrable. When a function is not Riemann-integrable, it doesn't mean it is completely not integrable. It really just depends on how we define "integral". In a branch of maths known as real analysis, integration is normally defined in the Lebesgue's way, different from the Riemann's way above. Functions that can be integrated in the Lebesgue's way are Lebesgue-integrable. One typical example for a function that is not Riemann-integrable but Lebesgue-integrable is the nowhere continuous function.

In the limit above, we are getting the area of region underneath the graph of $g(x)$ from $A$ to $B$. And this is actually equivalent to “stacking up” the rate of changes of its anti-derivative, $f(x)$, at every point from $A$ to $B$. Therefore we are actually getting the difference between $f(a)$ and $f(b)$. This is why definite integral of $g(x)$ is able to give us the area from point $A$ to $B$ in the graph of $g(x)$: definite integral is giving us the overall change in value from $f(a)$ to $f(b)$.

This is precisely why $dx$ is there as a part of the integral notation $\int f(x)\ dx$. In the logic that differentiation is defined as $\frac{d f(x)}{dx} = g(x)$, we can see that $g(x)\ dx = d f(x)$, where $d$ is an infinitesimal change interpreted as $\frac{1}{\infty}$. The relation of every $x$ and $y$ in the function $f(x)$ [how $x$ changes affects $y$ changes, vice versa] can therefore be constructed by indeterminately1 summing up2 the infinite number of infinitesimal "slices" of its derivative $g(x)$, with each infinitesimal "slice" being a nonspecific3 $y$4 multiplied by an infinitesimal change in $x$, $dx$, that governs the value of the corresponding $y$. indeterminately1: this is why it is called indefinite integration. We are getting a result in terms of a variable, rather than getting an actual value. summing up2: this is what $\int$ indicates in the notation. nonspecific3: thus it is indefinite integration. What matters is the relation in terms of $x$, not the actual value. $y$4: this $y$ is the $y$ from $g(x)$.

How to find the definite integral of a function from $a$ to $b$?

Computing the limit of a Riemann sum can really be a tedious thing to do. So what we normally do is to find the indefinite integral first before putting $a$ and $b$ into the indefinite integral and getting the difference between them (using the definition of definite integral from the 2nd fundamental theorem of calculus mentioned above).

Of course, it is not always necessary to get the indefinite integral first (it is possible to arrive at an answer after some transformations of the definite integral).

How to find the indefinite integral (anti-derivative) of a function?

Apparently, we can reverse the differentiation shortcuts/rules above and turn them into integration shortcuts/rules. But sometimes a little bit of creativity is required for integration. Let’s take a look at the reverse-chain-rule.

Things don’t always come in a perfect bundle. To use reverse-chain-rule, most of the time we need to see the algebraic structure as $f’(g(x))$ and compute $g’(x)$ on our own.

Here is an example. Let’s say we need to solve this indefinite integral.

We can see $f’(x)$ as $\frac{3}{x^2}$ and $g(x)$ as $2x+1$.

What we need to do now is to obtain $f(x)$ by integrating $f’(x)$, and obtain $g’(x)$ by differentiating $g(x)$.

So if we need to solve the definite integral $\int_0^1 \frac{3}{(2x+1)^2} dx$, we just put the two values into the indefinite integral and get the difference like this:

This technique (reversing chain rule) is known as integration by substitution, or, $u$-substitution, where $g(x)$ is often written as $u$.
The truth is: finding an indefinite integral can be really hard sometimes. Aside from the fact that there isn't an "absolute" way of humanly doing it, there is actually a rigorous mathematical reason behind it. The set of functions we humans love to do differentiation on is normally closed under differentiation. This means if we differentiate a function, we should anticipate to get back a similar function (e.g. differentiating an elementary function gives us back an elementary function). But it is not the case for integration. It is possible to integrate an elementary function, and get a non-elementary function in return (e.g $\int e^{x^2} dx$ ). And as we can see, definite integration is more of a local operation, while indefinite integration is a global operation. This is why sometimes it makes more sense to solve a definite integral using transformations and techniques, arriving at the answer without having to compute the indefinite integral. There are respected individuals (e.g. Vladimir Reshetnikov, Cleo on Maths StackExchange) who are amazingly good at it.

● ● ●

Bonus: Partial derivative

When a function has more than one input value, we call it a multivariable function.

The graphs of functions that take in 1 input are lines in a 2D plane, while the graphs of functions that take in 2 inputs are surfaces in a 3D space, etc.

To differentiate functions like this, we find the derivative with respect to one of the variables (denoted by $x_i$ in this case). This type of derivative is known as a partial derivative.

Finding a partial derivative is very similar to finding a normal derivative: we get the limit as x approaches 0. But instead of having just $f(x+a)-f(x)$, we have $f(x_1,x_2..x_i+a..x_n)-f(x_1,x_2..x_i..x_n)$, since the function takes in more than one input.

To do a partial differentiation, we simply pretend that other variables in the function are constants as we differentiate the function with respect to the chosen input. Here is an example.

added on 29th Oct, at 5PM (SG Timezone, +8): [edited a couple of times around 8 30PM to express the idea of tangent space in a more clear and concise way. And corrected some mistakes I made in the analogy of tangent plane.] Upon differentiating the function, we need to replace the remaining $C$ back with the corresponding variables.

Basically, $\frac{\partial}{\partial x} f(x,a,b,c)$ tells you about how an infinitesimal change in the input $x$ would affect the output $f(x,a,b,c)$ when $a$, $b$ and $c$ remains unchangeable. To get the numerical value for the derivative, it is sometimes necessary to put an actual value into these variables.

The derivative of functions with 1 input at one point ($x,f(x)$) can be represented by a tangent lines when $x$ is specified. Meanwhile, the representation for the derivative of functions with 2 inputs at points($x_1,x_2,f(x_1,x_2)$) where $x_1$ is specified is a plane (tangent plane), for functions with 3 input it is a 3D space (3D tangent space), etc.

In normal differentiation (single-input functions). To obtain a numerical value for the rate of change at a certain point is to obtain the slope of the tangent line at that point: there is only 1 slope because there is only one single line going across one single point, $(x,f(x))$. To do it we just need to specify $x$ since $f(x)$ is only dependent on $x$. The graph of a two-variable function is a 2 dimensional surface in 3 dimensional space, so each point can be represented by $(x_1,x_2,f(x_1,x_2))$, where the value of $f(x_1,x_2)$ is dependent to both $x_1$ and $x_2$. The derivative with respect to any one of the variable (e.g. $x_1$) when the variable (this case, $x_1$) is specific is a value dependent to $f(x_1,x_2)$ and $x_2$. So it is no longer a tangent line across one point, but a tangent plane across infinitely many points whose $x_1 = \text{the specific value}$. This is why, if we want to get an actual numerical value for the slope, we need to specify $x_2$. The same applies to tangent of higher dimensions.
If you are interested in learning more about calculus, check out Calculus One by Jim Fowler on Coursera. It is truly amazing for beginners. For courses that cover more advanced topics in calculus and other areas of maths that use calculus, check out the ones by MIT OpenCourseWare. You'd probably find Pauls Online Notes useful too.