The Lebesgue Integral: A Newer and More Flexible Alternative to the Riemann Integral

Welcome back. This week, I am excited to delve into the Lebesgue integral, which is a more powerful alternative to the Riemann integral that we have dealt with so far. This new, more modern piece of mathematics is due to the work of Henri Lebesgue, a French mathematician who lived from 1875 to 1941. To actually define the Lebesgue integral, we will first develop some core ideas of measure theory. We will ultimately see from the construction of the Lebesgue integral that it is more versatile than the Riemann integral because it can be used to integrate functions defined on arbitrary sets, not just the typical Euclidean space. It is also important to note that all Riemann-integrable functions are Lebesgue-integrable and in that case, the values of the two integrals are the same. However, there exist functions (for example, f(x) = 1 when x is irrational, f(x) = 0 when x is rational) that are Lebesgue-integrable but not Riemann-integrable.

We will begin our discussion by talking about measures. Suppose X is a set that contains some elements. Essentially, a measure, µ, is a function that takes subsets of X as an input, and spits out a real number that represents a generalized “volume” of that subset.

For example, if we take X = ℝⁿ, the n-dimensional Euclidean space, then we can define the Lebesgue outer measure that quantifies the “volume” of subsets of ℝⁿ. To do this, we will begin by considering an n-dimensional rectangular prism:

I = [a₁, b₁]x[a₂, b₂]x…x[a_n,b_n].

In three-dimensions, the volume of a rectangular prism is equal to length times width times height. We will use this as an inspiration to make the following definition of the volume of an n-dimensional rectangular prism:

Vol(I) = (b₁– a₁)(b₂– a₂)…(b_n– a_n).

Next, we will say that a subset A of ℝⁿ is an elementary set if it is made up of finitely many n-dimensional (disjoint) rectangular prisms. Mathematically-speaking, we have the following where k is some positive integer:

A = I₁∪ I₂ ∪…∪ I_k.

If we wish to find the volume of A, then we would find the volume of each of its constituent rectangular prisms and then add those volumes up to reach a total. Thus,

Vol(A) = Σ_j Vol(I_j) where 1 ≤ j ≤ k.

Now let E be an arbitrary subset of ℝⁿ. Let g = {A_j: j = 1,2,3,…} be a collection of (potentially infinitely many) elementary subsets of ℝⁿ that cover E. If g covers E, then we mean that E is contained in the (potentially infinite) union of all of the A_j’s:

E ⊆ A₁ ∪ A₂ ∪….

We will now define the Lebesgue outer measure of E ⊆ ℝⁿ to be the following, where the infimum is taken over all possible collections, g, of elementary sets that cover E:

µ^*(E) = inf_g Σ_j m(A_j).

We should also note that the subsets of X that we can input into our measure function should make up what we call a σ-algebra (pronounced “sigma-algebra”). If M is a set of subsets of X, we will say that M is a sigma algebra if the following conditions hold. First, X itself should be an element of M (indeed, X is a subset of itself). Second, if A is in M, so should A^C (A^Cis the complement of A, or the set of all elements in X that are not in A). Third, if A₁, A₂, A₃, … are in M, then A₁∪ A₂ ∪ A₃ ∪… should also be in M.

Next, we will talk a little bit about functions and provide three important definitions. For our remaining discussions, we will assume that the set X is equipped with a σ-algebra M and a measure µ.

If E is a subset of X, we will let K_E(x) = 1 if x is in E and we will let K_E(x) = 0 if x is not in E but x is in X. This function K_E(x) will be called the characteristic function on E. This terminology makes sense because the function is only nonzero over E.

If s(x) is a function from X to ℝ, we will call s(x) a simple function if the range of s, denoted by s(X), is finite. In other words, there are only finitely many numbers in the range of s. We can think of s as being constant for a little while, then suddenly stepping up to a new value, being constant for a while longer, and then stepping up again, finitely many times.

Since s(x) only attains finitely many values in its output space, we can write the following enumeration of the range of s, where each c_j is a real number:

s(X) = {c₁, c₂,…., c_n}.

Now, for each j between 1 and n, let E_j consist of precisely all elements of X that are mapped to c_j:

E_j = {x ∈ X: s(x) = c_j}.

We can then use our c_j’s and E_j’s to decompose any simple function s(x) into a linear combination of characteristic functions. This decomposition will come in very handy when we define the Lebesgue integral.

s(x) = Σ_j c_j K_Ej(x) where 1 ≤ j ≤ n.

Our last preliminary definitions will be those of a measurable set and a measurable function. We will say that a subset S of X is measurable if and only if µ(U) = µ(U∩S) + µ(U \ S) for every subset U of X. In other words, we should be able to write the measure of any other subset U as the sum of the measure of the intersection of U and S and the measure of the complement of S in U.

With the definition of a measurable set in mind, we will say that a function f(x) from X to ℝ is measurable if the set {x : f(x) > a} is measurable for every real number a.

At last, we have developed enough background knowledge to construct the Lebesgue integral. We define the Lebesgue integral for a non-negative measurable function and then use that definition to write down the Lebesgue integral of an arbitrary measurable function. So suppose that f(x) is a measurable function from the measure space X to ℝ that satisfies the condition that f(x) ≥ 0 for all x in X. Now let s(x) be a simple function that sits in between f(x) and 0. In other words, f(x) ≥ s(x) ≥ 0 for all x in X.

Then express s(x) as a linear combination of characteristic functions (note that E_j and c_j are defined the same as before):

s(x) = Σ_j c_j K_Ej(x) where 1 ≤ j ≤ n.

Let E (with no subscripts!) be a subset of X, and define the following quantity (recall that the range of s has precisely n distinct values):

I_E(s) = Σ_j c_j µ(E ∩ E_j) where 1 ≤ j ≤ n.

It is interesting to recognize that I_E(s) is effectively an integral of s over E. We are taking each of the numbers in the range of s and multiplying them by the “volume” of the input points that are mapped to that particular output number.

The Lebesgue integral of the non-negative function f(x) over E with respect to the measure µ is then the following, where the supremum is taken over all possible simple functions satisfying 0 ≤ s(x) ≤ f(x) for all x:

∫_E f dµ = sup_s I_E(s).

If f(x) is a generic function that fluctuates above and below zero, then we can split f into a positive component, f⁺, and a negative component f^–, where:

f(x) = f⁺(x) – f^–(x).

Thus, both f⁺ and f^– are non-negative functions, and assuming that either ∫_E f ⁺ dµ or ∫_E f ^– dµ is finite, we define the Lebesgue integral of f over E with respect to µ as:

∫_E f dµ = ∫_E f ⁺ dµ – ∫_E f ^– dµ.

Tada!

This was some pretty heavy mathematics that we just went through, so congrats to everyone who has made it this far. The Lebesgue integral is the current gold standard in many branches of mathematics research and it has some curious applications in probability that I hope to explore with you all one day. Next week, I hope we can resume our discussion of partial differential equations by learning separation of variables and the method of characteristics. Until then, please take care.

Oliver Khan

Reader Interactions

Leave a Reply Cancel reply