Welcome back. In today’s blog post, I am excited to share a bit of control theory and explore how we can use the linear quadratic regulator to obtain optimal control laws. Some questions that might come up in this exploration are: what is control theory? and what is a control system? As I understand it, control theory is the mathematical science of how we can alter the behavior of a dynamical system to act the way we want it to. A control system is any physical or digital system that we might want to control. One of the most common goals in control theory is to take an unstable system (with solutions that blow up to infinity) and stabilize it. In our study today, our goal will be to take a linear time-invariant (LTI) dynamical system, stabilize it, and drive the state vector x(t) to a desired value which will be located (without loss of generality) at the origin. When controlling a dynamical system, we need a control law, often denoted u(t). When designing a control law, there are lots of options, and u(t) can get extremely complicated if we let it. However, it often suffices to keep u(t) simple and define a linear feedback control law,
u = -Kx,
that depends on the state x and a gain matrix K. Although this equation might look simple, these control laws can actually be very powerful, and extensive theory has been developed on how to design them. The next question is how we can find K. To do this, we can find a special value of K that minimizes some undesired cost, which can incorporate the deviation of the state vector from zero as well as some measure of the control effort. If the cost function depends quadratically on x(t) and u(t), this leads to a linear quadratic optimal control problem, so called because the dynamics are linear and the cost function is quadratic. In our discussion today, we will consider a linear quadratic optimal control problem and derive the optimal control law which is given by the linear quadratic regulator.
Problem Setup
Consider a linear-time invariant dynamical system of the form:
dx/dt = A x(t) + B u(t),
where x(t) is the state vector, u(t) is the control input, A is the state matrix, and B is the input matrix. Our goal will be to find the gain matrix K for the linear state-feedback control law,
u(t) = -K x(t),
that drives x(t) to zero. We will seek the unique matrix K that minimizes the quadratic cost function:
J = ∫[0, ∞) x(t)T Q x(t) + u(t)T R u(t) dt,
where Q = QT ≥ 0 is a symmetric positive semi-definite matrix and R = RT > 0 is a symmetric positive definite matrix.
Derivation of Optimal Gain Matrix
To determine the optimal matrix K, we follow this very handy lecture video from MATLAB. Begin by introducing a symmetric matrix P = PT. If we add and then subtract x(0)T P x(0) from the cost function, then nothing changes:
J = x(0)T P x(0) – x(0)T P x(0) + ∫[0, ∞) x(t)T Q x(t) + u(t)T R u(t) dt.
If we assume that x(t) asymptotically approaches zero, then we can bring the second x(0)T P x(0) inside the integral:
J = x(0)T P x(0) + ∫[0, ∞) d/dt [x(t)T P x(t)] + x(t)T Q x(t) + u(t)T R u(t) dt.
Now recognize that:
d/dt [xT P x] = (dx/dt)T P x(t) + x(t)T P (dx/dt) =
= x(t)T AT P x(t) + u(t)T BT P x(t) + x(t)T P A x(t) + x(t)T P B u(t).
If we substitute this expression back into the integral and group like terms, then the cost function can be written as:
J = x(0)T P x(0) + ∫[0, ∞) x(t)T [AT P + P A + Q] x(t) +
+ u(t)T Q u(t) + x(t)T P B u(t) + u(t)T BT P x(t) dt
Although the last three terms are unwieldy, we can “complete the square” to derive the alternative representation:
u(t)T Q u(t) + x(t)T P B u(t) + u(t)T BT P x(t) =
= [u(t) + R-1 BT P x(t)]T R [u(t) + R-1 BT P x(t)] – x(t)T P B R-1 BT P x(t)
If we substitute this back into the cost function, then:
J = x(0)T P x(0) + ∫[0, ∞) x(t)T [AT P + P A + Q – P B R-1 BT P] x(t) +
+ [u(t) + R-1 BT P x(t)]T R [u(t) + R-1 BT P x(t)] dt.
If we want to minimize the cost function, then we can’t do much about the term x(0)T P x(0) outside of the integral. However, we can make J smaller by setting the first collection of terms in the integral to zero. This yields an Algebraic Riccati Equation for P:
AT P + P A + Q – P B R-1 BT P = 0.
The cost function J is actually minimal if we also set the second collection of terms to zero. This can be done by requiring that:
u(t) + R-1 BT P x(t) = 0,
or expressed a little differently,
u(t) = – R-1 BT P x(t).
At this point, we have effectively solved our optimal control problem. The gain matrix is given by:
K = R-1 BT P,
where P must satisfy the Algebraic Riccati Equation:
AT P + P A + Q – P B R-1 BT P = 0.
By construction, the state vector of the controlled system must approach the origin because we formulated our algebraic Riccati equation under the assumption that x(∞) = 0. In future blog posts, I look forward to exploring control theory in more detail. Until then, please take care.
Leave a Reply