
NUMERICAL APPROXIMATION OF DRAZIN INVERSE IN MARKOV CHAIN THEORY A Thesis by DEREK FORT Submitted to the Office of Graduate Studies Texas A&M UniversityCommerce In partial fulfillment of the requirements for the degree of MASTER OF SCIENCE December 2014NUMERICAL APPROXIMATION OF DRAZIN INVERSE IN MARKOV CHAIN THEORY A Thesis by DEREK FORT Approved by: Advisor: Thomas Boucher Committee: Charles Dorsett Pamela Webster Head of Department: Tingxiu Wang Dean of the College: Brent Donham Dean of Graduate Studies: Arlene Horne3 Copyright © 2014 Derek Fort4 ABSTRACT NUMERICAL APPROXIMATION OF DRAZIN INVERSE IN MARKOV CHAIN THEORT Derek Fort, MS Texas A&M UniversityCommerce, 2014 Advisor: Thomas Boucher, PhD One generalized inverse of the matrix Q is the Drazin inverse, Q^#. Many important quantities in Markov chain theory can be expressed in terms of Q^#, where Q = I – P, the matrix P being the transition probability matrix of the chain. Previous work has established conditions and algorithms for estimating Q^# in a general setting but specific results are lacking. In this study, estimation of Q^# is considered for finite state Markov chains. Optimal parameter settings and convergence rates of the algorithms are examined.5 TABLE OF CONTENTS LIST OF TABLES .................................................................................................................... vii LIST OF FIGURES ................................................................................................................. viii CHAPTER 1. INTRODUCTION ...................................................................................................... 1 Markov Chain ....................................................................................................... 1 Transition Matrix ................................................................................................. 3 Markov Chain Properties ....................................................................................... 4 Rates of Convergence ............................................................................................ 6 2. MARKOV CHAIN APPLICATIONS ........................................................................ 8 Bayesian Inference ............................................................................................... 8 Ergodic Averages ................................................................................................. 9 Estimation for Central Limit Theorem ................................................................ 10 Batch Means ....................................................................................................... 11 Window Estimators.............................................................................................. 12 MetropolisHastings Algorithm ......................................................................... 12 The Metropolis Algorithm .................................................................................. 13 The Independence Sampler ................................................................................ 14 SingleComponent MetropolisHastings ........................................................... 14 Gibbs Sampling .................................................................................................. 156 3. VECTOR AND MATRIX NORMS ......................................................................... 16 Vector Norms....................................................................................................... 16 Matrix Norms ..................................................................................................... 17 4. COMPUTATIONAL TECHNIQUES OF DRAZIN INVERSE................................ 21 Recursive Definition ............................................................................................ 21 Convergence Rate ............................................................................................... 25 R Code ................................................................................................................ 29 Comparing Drazins ............................................................................................. 30 Matrix Size .......................................................................................................... 38 5. SUMMARY................................................................................................................ 43 REFERENCES .......................................................................................................................... 45 APPENDICES ...............................................................................................................................46 A. R CODE FOR DRAZIN 1 ........................................................................................47 B. R CODE FOR DRAZIN 2 ....................................................................................... 51 C. R CODE FOR DRAZIN 3 ........................................................................................55 D. R CODE FOR DRAZIN 4 ........................................................................................58 E. R CODE FOR DRAZIN 5 ........................................................................................61 VITA ...........................................................................................................................................65vii LIST OF TABLES TABLE 1. How α effects the convergence of # .............................................................................. 26 2. Randomly generated α ................................................................................................... 28viii LIST OF FIGURES FIGURE 1. Plot of Table 1.................................................................................................................. 26 2. Plot of Table 2 ................................................................................................................. 29 3. Drazin 1..............................................................................................................................31 4. This is the closed interval, [0.4, 0.8] of figure 3................................................................32 5. Drazin 2..............................................................................................................................34 6. This is the closed interval, [0.2, 0.9] of figure 5................................................................35 7. Drazin 3, on the x axis,1 is for 1norm, 2 is for 2norm, 3 is for supnorm, 4 is for F norm, and 5 is for Mnorm.................................................................................................36 8. Drazin 4, on the x axis,1 is for 1norm, 2 is for 2norm, 3 is for supnorm, 4 is for F norm, and 5 is for Mnorm.................................................................................................36 9. Drazin 5..............................................................................................................................37 10. Varying Matrix Size for α = 0.01 ................................................................................... 38 11. Varying Matrix Size for α = 0.05 ................................................................................... 39 12. Varying Matrix Size for α = 0.5 ..................................................................................... 39 13. Increasing Matrix Size .......................................................................................................40 14. The top, middle, and bottom graphs have intervals from [0.01, 1.1], [0.3, 0.9], [0.33, 0.84] respectively.....................................................................................................421 P(n) Chapter 1 INTRODUCTION Markov chains are widely used in areas such as physics, speech recognition, finance, statis tics, and many others. We discuss finite state space Markov chains, their applications, and the role in Markov chain theory of the Drazin inverse Q#, a generalized inverse of Q = I − P. We also review algorithms for calculating Q# and discuss their properties. 1.1 Markov Chain A time homogeneous Markov Chain is a sequence of random variables {X1, X2, X3, . . .}, that are not independent and identically distributed (IID), but satisfy the Markov property. Given the present state xn and the past states {x1, . . . , xn−1}, the probability of arriving in a future state xn+1 is defined by P(Xn+1 = xn+1Xn = xn,Xn−1= xn−1,...,X0= x0) = P(Xn+1 = xn+1Xn = xn) which does not depend on n nor any of the past states {x1, x2, ..., xn−1}. A finite state Markov chain transitions among a finite set of states, S. We denote S as the set of integers {0, 1, 2, . . . , n} for a finite state Markov chain. The probability of going from state i ∈ S to state j ∈ S in a singlestep transition is denoted Pi j = P(X1 = jX0 = i) = P(Xn+1 = jXn = i), (1) by time homogeneity. The probability of going from state i ∈ S to state j ∈ S in n time steps is denoted ij = P(Xn = jX0 = i). (2)2 P(m+n) k j k j Equation (2) is calculated iteratively from equation (1) using the following ChapmanKolmogorov equation. Theorem 1.1. Letm≥ 1, n ≥ 1, i ∈ S, and j ∈ S. Then the Chapman Kolmogorov equation is P(m+n) P(m) (n) i j = Σ k∈S ik Pk j Proof: i j = P(Xm+n = jX0 = i) = Σ P(Xm+n = j,Xm = kX0 = i) k∈S = Σ P(Xm+n = j,Xm = k,X0= i) k∈S P(X0 = i) = Σ P(Xm+n = jXm = k,X0= i)P(Xm = k,X0= i) k∈S P(X0 = i) = Σ P(Xm+n = jXm = k)P(Xm= k,X0= i) k∈S P(X0 = i) = Σ P(Xm= k,X0= i)P(n) k∈S P(X0 = i) = Σ P(Xm= kX0 = i)P(n) k∈S = Σ P(m) (n) k∈S ik Pk j • The ChapmanKolmogorov equation gives the multistep transitions in terms of single step transi tions in a form which looks like a matrix product. For example the matrix product AB has (i, j)th3 P(m) element (AB)i j = Σ aikbk j k where A = (ai j) and B= (bi j). This suggests linear algebra will be useful for convient representa tions of transition probabilitites and other Markov chain quantities. 1.2 Transition Matrix Finite state Markov chain transition probabilities can be recast to take advantage of matrix algebra. Let P be the probability transition matrix defined by Pi j = P(Xn+1 = jXn = i), where the (i, j)th entry is a nonnegative real number in the closed interval [0,1]. Then P is defined as, P1,1 P1,2 ... P1,n P2,1 P2,2 ... P2,n P = . . . .. . Pn,1 Pn,2 ... Pn,n with the following properties Pi, j ≥ 0, ∀ i, j and n Σ Pi, j = 1, ∀ i j=1 where each row is a probability distribution. Recall we define P(m) as the probability of going from a state i to a state j in msteps denoted by, i j = P(Xn+m = jXn = i).4 . = Then we can use ChapmanKolmogorov and linear algebra to calculatemultistep transition probabilities from a matrix product. Lemma 1.1. Letm≥ 1 and n ≥ 1 then P(m+n) = P(m)P(n). Proof: The proof follows directly from Theorem 1.1. • Example 1.1. Let Xi = 1 if you are sick on day i, otherwise, Xi = 2. Suppose P1,1 = 0.08 and P2,1 = 0.015. Then the probability transition matrix is given by, P = 0.08 0.92 0.015 0.985 What is the probability of being sick and 4 days later you are sick? Solution: Using Lemma 1.1 we see that P(4) = PPPP = P4. Thus P4 = 4 0.08 0.92 0.0161 0.98 0.015 0.985 0.0161 0.98 Therefore, the probability of being sick for 4 consecutive days is P(4) = 0.0161. Note that the rows of of P4 are the same, indicating that you are sick 1.61 1.3 Markov Chain Properties If a chain is irreducible, aperiodic, and positive recurrent, then the chain is ergodic and the distribution of {Xn} converges to a stationary distribution, π(·), which allows us to approximate the long term behavior of the chain. If the initial value X0 is sampled from π (·), then all subsequent iterates will be distributed according to π(·).5 i j ii i j A Markov Chain is irreducible if the chain can reach any nonempty set from any set in a finite number of iterations with positive probability. The next definitions are inspired by Meyn and Tweedie (1993) and Gilks, Richardson, and Spiegehatler (1996). Definition 1.1. {Xn} is called irreducible if for all i, j there exists n > 0 such that P(n) > 0. A Markov chain that is aperiodic irregularly transitions between different sets of states. That is, there are no predictable cycles. Definition 1.2. An irreducible chain {Xn} is called aperiodic if for all i, gcd{n > 0 : P(n) > 0} =1. Recurrent chains return in finitietimewith probability greater than 0. Definition 1.3. An irreducible chain {Xn} is recurrent if P[τii < ∞] > 0, where τii is the first return to state i defined by τii = min{t > 0 : Xt = iX0 = i}. Otherwise, {Xn} is called transient. Finally, a Markov Chain is positive recurrent if the expected amount of time between repeated visits to the same state is finite; it follows that each state is visited an infinite number of times with probability equal to 1. Definition 1.4. An irreducible chain {Xn} is called positive recurrent if E[τii] < ∞ for all i. Oth erwise, it is called nullrecurrent. Equivalently, {Xn} is positive recurrent if there exists, π(·) such that Σ π(i)P(n) = π( j) (3) i for all j and n ≥ 0. In matrix form we can write equation (3) as πP = π. The stationary distribution is π and solving the system πP = π is one way to calculate π. The next theorem is from Gilks, Richardson, and Spieghalter (1996, p. 47).6 i j N n ∞ Theorem 1.2. If {Xn} is positive recurrent and aperiodic then its stationary distribution π(·) is the unique probability distribution satisfying (3). We then say that {Xn} is ergodic and the following consequences hold: (i) P(n) → π( j) as n → ∞ for all i, j (ii) (Ergodic Theorem) If Eπ [ f (Xn)] < ∞, then P[ f¯ → Eπ [ f (Xn)] = 1, where Eπ [ f (Xn)] = N Σ f (i)π(i), the expectation of f (Xn) with respect to π(·) and i valued function. f¯ = Σt=1 f (Xt ) , where f (·) is a real In fact, Theorem 1.2(ii) is the strong law of large numbers for Markov chains. That is, X¯n A.S. ¯ −−→ μ as n → ∞ ⇐⇒ P ( lim Xn= μ\ = 1 → with the limiting distribution being the stationary one. An irreducible, aperiodic finite state chain is always positive recurrent and thus ergodic  law of large numbers and central limit theorems will always hold. Thus, the stationary distribution is important in studying longterm behavior of the chain. 1.4 Rates of Convergence It would help to know how quickly P(n) → π. Obviously, chains with faster convergence yield better estimates. The next definition is inspired from Gilks, Richardson, and Spieghalter (1996, p. 48). Definition 1.5. If {Xn} is ergodic (aperiodic and postive recurrent) and there exists λ such that 0 ≤ λ < 1 and a function V(·) > 1 such that Σ Pi j(t) − π( j) ≤ V(i)λ t (4) t7 k then {Xn} is said to be geometrically ergodic for all i. Finite state Markov chains are uniformly geometrically ergodic with V = c, where c is some constant. The smallest λ for which there exists a function V satisfying (4) is called the rate of convergence and denoted by λ ∗, where λ ∗ = in f{λ : there exists V such that (4) holds}. In many problems, transition probabilities are described by a sequence of eigenvalues {λ0, λ1,. ..}, where λ0 = 1, and the corresponding left eigenvectors {e0, e1,. ..}, that is, Σ ek(i)Pi j(t) = λkek( j) i for all j and for each k, such that Pi j(t) = Σ ek(i)ek( j)λ t . (5) k Here e0(·) = π(·). Geometrically ergodic chains have λ0 = 1 and all other eigenvalues are uni formly bounded within (1, 1). If t is large then the dominant term in (5) is π( j) = e0( j). The second largest eigenvalue in absolute value determines the rate of convergence of the Markov chain. An alternative definition of λ ∗ is λ ∗ = supλk k>0 which is related to α in Chapter 4.8 Chapter 2 MARKOV CHAIN APPLICATIONS This Chapter takes a look at Markov Chain Applications. In particular, Bayesian inference and theMetropolisHastingAlgorithm. Along with Gibbs Sampling and the metroplois algorithm. 2.1 Bayesian Inference Let D represent all observed data, and θ denote parameters and missing data as described in Gilks, Richardson, and Spiegehalter(1996). A joint probability distribution over all random quantities is denoted by P(D, θ). By conditioning over the joint distribution gives P(D, θ) = P(Dθ )P(θ ) where P(Dθ) is the likelihood and P(θ ) is the prior distribution. Having observed D, Bayes theorem is used to determine the posterior distribution of θ: P(θ D) = P(Dθ)P(θ ) .  J P(Dθ )P(θ )dθ The posterior expectation of a function f (θ) is E[P( f (θ )D)] = J f (θ )P(Dθ)P(θ )dθ . J P(Dθ )P(θ )dθ More generally, let X be a vector of k random variables with distribution π(·), where θ is the vector of model parameters and π(·) is the posterior distribution. The difficult task now is to9 n evaluate the following expectation for some function f (·) Eπ [ f (X)] = J f (x)π(x)dx . (6) J π(x)dx Here it is possible that π is known only up to a normalization constant. That is, J π(x)dx is unknown. In this situation π ∝ P(θ )P(Dθ ) and X could consist of discrete or continuous random variables. For example, if X consisted of discrete random variables, then the integrals in (6) would be replaced by summations. Eπ [ f (X)] = Σ f (x)π(x)dx Σ π(x)dx . In this situation, the calculation of Eπ [ f (X)] directly would not be possible since π is unknown. Instead, we look at methods of approximating Eπ [ f (X)]. 2.2 Ergodic Averages The following information is given by Gilks, Richardson, and Spiegehalter (1996, p. 45). Monte Carlo integration evaluates Eπ [ f (X)] by drawing samples {Xt : t = 1, .. ., n} from π(·) and then approximating 1 n Eπ [ f (X)] ≈ Σ f(Xt), t=1 i.e. the population mean of f (X) is estimated by a sample mean. When {Xt} are independent, the laws of large numbers ensure that the approximation can be made as accurate as desired by increasing the sample size n. In general, drawing samples {Xt} independently from π(·) is not feasible, since π(·) can be nonstandard. However the {Xt} need not be independent. The {Xt} can be generated by any10 processwhich draws samples throughout π(·). One way of doing this is through an ergodicMarkov chain having π(·) as its stationary distribution. This is then Markov chain Monte Carlo (MCMC). Thus as t increases, the sampled points {Xt } will look increasingly like dependent samples from π(·). After a sufficiently long burnin of say m iterations, points {Xt : t = m+1, . . . , n} will be dependent samples approximately from π (·). We can use the output from the Markov chain to estimate the expectation E[ f (X)], where X has distribution π(·). Burnin samples are usually discarded for this calculation, giving an esitmator f¯= 1 n Σ f (Xt). n − m t=m+1 This is called an ergodic average. Convergence to the required expectation is ensured by the ergodic theorem, Theorem 1.2. 2.3 Estimation for Central Limit Theorem Geometric convergence, as in definition 1.5, allows the existence of central limit theorems for ergodic averages, results of the form N1/2( f¯N − Eπ [ f (X)]) → N(0, σ2) (7) for some positive constant σ , as N → ∞, where the convergence is in distribution. To apply the central limit theorem we need estimates of σ 2. The following theorem is the asymptotic variance referred to later in Proposition 4.2.11 ∞ i=0 Σ ( i π 0 Theorem 2.1. ∞ σ2 = varπ ( f (X0))+2 Σ cov(X0,Xi) i=1 = Σ 1+λj a var ( f (X )) j=1 1 − λ j 1+λ ∗ ≤ 1 − λ ∗ for some nonnegative constants ai, where X0 is distributed according to π(·), and Σ∞ ai = 1. The measure of efficiency of theMarkov chain for estimating Eπ [ f (X)] is given by the ratio e f f f¯ = varπ ( f (X0)) σ2 . To determine the accuracy of the estimate of Eπ [ f (X)], two methods allows us to approximate σ2. 2.4 Batch Means f¯N we must estimate σ2. The following A Markov Chain that is run for N = mn iterations, where n is sufficiently large that 1 Yk = n kn Σ i=(k−1)n+1 f (Xi) are approximated independently N(Eπ [ f (X)], σ 2/n). So σ2 can be approximated by n m Yk − m− 1 f¯N )2. k=1 We need to choose n carefully so that it is large enough to give a valid approximation.12 i 2 1  N 2.5 Window Estimators Another way to approximate σ2 is to estimate γi ≡ covπ ( f (X0), f (Xi)) by 1 γi = − N−i Σ ( f (Xj) − f¯N)( f (Xj+i) − f¯N). j=1 As i increases there are less terms in the average, so the estimate becomes worse as i increases. It is necessary to use a truncated window estimator of σ2, where 0 ≤ ωN(i) ≤ 1, σˆN = γˆ0+2 ∞ Σ i=1 ωN(i)γˆi. 2.6 MetropolisHastings Algorithm For the Metropolis − Hastings algorithm, at each time n, Xn+1 is chosen by first sampling a candidate point Y from a proposal distribution q(·Xn). Note that the proposal distribution may depend on the current point Xn. The candidate point Y is then accepted with probability α(Xn,Y ) where α(X,Y) = min( π(Y)q(X Y) , . (8) π(X)q(YX) If the candidate point is accepted, the next state becomes Xn+1 = Y. If the candidate is rejected, the chain does not move, that is, Xn+1 = Xn. The proposal distribution q(··) can have any form, and the stationary distribution of the chain will be π(·). This can be seen from the following argument. The transition kernel for the13 MetropolisHastings algorithm is P(Xn+1Xn) = q(Xn+1Xn)α(Xn,Xn+1)+I(Xn+1 = Xn)[1 − q(YXn)α(Xn,Y )dY], (9) where I(·) denotes the indicator function (taking the value 1 when its argument is true, and 0 otherwise). The first term in (9) arises from acceptance of a candidate Y = Xn+1, and the second term arises from rejection, for all possible candidates Y. Using the fact that π(Xn)q(Xn+1Xn)α(Xn,Xn+1) = π(Xn+1)q(XnXn+1)α(Xn+1,Xn) (10) which follows from (8), we obtain the detailed balance equation: π(Xn)P(Xn+1Xn) = π(Xn+1)P(XnXn+1). (11) Integrating both sides of (11) with respect to Xn gives: π(Xn)P(Xn+1Xn) = π(Xn+1). (12) The lefthand side of equation (12) gives the marginal distribution of Xn+1 under the assumption that Xn is from π(·). 2.7 The Metropolis Algorithm The Metropolis algorithm considers only symmetric proposals, that is, q(XY) = q(YX). Choosing a proposal that generates each component of Y conditionally independent given Xn re14 1 1 q(X) duces the acceptance probability in (8) to α(X,Y) = min( π(Y) , π(X) . (13) 2.8 The Independence Sampler The independence sampler does not depend on X which is aMetropolisHastings algorithm whose proposal q(YX) = q(Y). The acceptance probability (8) can be written as α(X,Y) = min( ω(Y ) , ω(X) where ω(X) = π(X) and q(·) needs to be a good approximation of π(·) for the independence sam pler to work well. Heaviertailed independence proposals are safer than lightertailed proposals, due to the fact that lightertailed proposals may have long periods of being stuck in the tails. 2.9 SingleComponent MetropolisHastings SinglecomponentMetropolisHastings as defined by Strand (2009), works by dividing X into components {X.1,X.2,. .. ,X.h} of possibly differing dimensions. Let X.−i= {X.1,X.2, . .. ,X.i−1,X.i+1,. .. ,X.h−1X.h} which comprises of all components ofXexceptX.i. One iteration of the singlecomponentMetropolis Hastings is made up of h updating steps. Let Xn.i denote the state of X.i at the end of iteration n. For step i of iteration t +1, X.i is updated using MetropolisHastings. A candidate Y.i is generated from a proposal distribution15 − qi(Y.iXt.i,Xt.−i), where Xt.−i denotes the value of X.i after completeing step i − 1 of iteration t +1: Xt.−i = {Xt+1.1,. .. ,Xt+1.i−1,Xt.i+1,. .. ,Xt.h} where components 1, 2, . . . , i − 1 have already been updated. Thus the ith proposal distribution q(··, ·) generates a candidate only for the ith component of X which may depend on the cur rent values of any of the components of X. This candidate is then accepted with probability α(Xt.−i,Xt.i,Y.i) where ( π(Y.iX.−i)qi(X.iY.i,X.−i) α(X.−i,X.i,Y.i) = min 1, π(X X )q (Y X (14) ) .i .−i i .i .i,X.−i where π(X.iX. i) is the full conditional distribution of the ith remaing components, where X has distribution π(·): π(X X ) = π(X) .i .−i J π(X)dX.i component of X conditioning on the . (15) 2.10 Gibbs Sampling The Gibbs sampler is a special case of the singlecomponent MetropolisHastings. The proposal distribution of the Gibbs sampler for updating the ith component of X is qi(Y.iX.i,X.−i) = π(Y.iX.−i) (16) where π(Y.iX.−i) is given by (15). Substituting (16) into (14) gives an acceptance probability of 1, this implies that Gibbs sampler candidates are always accepted.16 Chapter 3 VECTOR AND MATRIX NORMS This section introduces norms of vectors and matrices. A norm is used to measure distance, size, and define the convergence of sequences of vectors or matrices. Norms will play a role in later calculations. 3.1 Vector Norms First, we will briefly discuss vector norms and then matrix norms. Note that all vectors and matrices in this context are treated as real. Definition 3.1. Let V be a vector space over R. A norm is a function   :V → R satisfying the following for all x, y ∈ V: 1. x ≥ 0 with equality iff x = 0 2. λ x = λ x 3. x+y ≤ x +y Note that if V = R then x = x. Some commonly used norms are provided in the next theorem. Theorem 3.1. Let V = Rn. Then for every (x1, x2, . . . , xn) ∈ V we have (a) the 1 − norm, x1, de f ined such that x1 = x1 +x2 +· · · +xn (17)17 1 x (b) the Euclidean norm (2 − norm), de f ined such that 2 2 1 x2 = (x12+x2 +· · · +xn )2 (18) (c) the sup − norm x∞, de f ined such that x∞ =max{xi1 ≤ i ≤ n} (19) (d) the more general, lp − norm, p ≥ 1 de f ined such that xp = (x1p+x2p+· · · +xnp) p 3.2 Matrix Norms Let Mn(R) be the space of square nxn matrices over R. Definition 3.2. A matrix norm   on the space Mn(R) is a norm on the space with the property AB ≤ AB for all A, B ∈ Mn(R). Define A as the induced matrix norm denoted by A = sup x=1 Ax. The next proposition is by Gallier (2012, p. 231).18 x x i x 2 Proposition 3.1. Let A be any square matrix such that A = (ai j), we have A1= sup x1=1 A∞ = sup n Ax1 = max Σ ai j j i=1 n Ax∞ =max Σ ai j x∞ =1 j=1 A2= sup x2=1 Ax2 = ρ(ATA) = ρ(AAT ). Now we define the Frobenius norm as stated by Gallier (2012, p. 225) Definition 3.3. The Frobenius norm(Fnorm)  F is defined so that for every real nxn matrix A, AF = ( n Σ i, j=1 ai j \1/2 = tr(AAT ) = tr(ATA). (20) satisfying the following property: (a) ρ(ATA) ≤ AF ≤ √n ρ(ATA) for all A. Proposition 3.2. The following inequalities hold for all x ∈ R : (a) x∞ ≤ x1 ≤ nx∞ (b) x∞ ≤ x2 ≤ √nx∞ (c) x2 ≤ x1 ≤ √nx∞ (d) x2 ≤ xF ≤ √nx2 Definition 3.4. The maximum modulus norm (Mnorm) is the largest component of the matrix in19 absolute value. AM := max(a1, . .. ,an) (21) Definition 3.5. Given any real vector space V, two norms say  a and  b are equivalent if and only if there exists some positive real numbers C1,C2 > 0, such that ua ≤ C1ub and ub ≤ C2ua, for all u ∈ V. Note that proposition 3.2 and definition 3.5 are by Gallier (2012, p. 78), indicates equivalence of norms  · 1,  · 2,  · ∞, and  · F so that convergence in one is convergence in the others. 3.2.1 Eigenvectors and Eigenvalues Eigenvalues and Eigenvectors will play a critical role in future calculations. A refresher of this material is important. The following definition is given by Lay (2006 p.303 ). Definition 3.6. An eigenvector of an nxnmatrix A is a nonzero vector x such that Ax= λ x for some scalar λ . A scalar λ is called an eigenvalue of A if there exists a nontrivial solution x of Ax = λ x. Note that λ is an eigenvalue of Aif and only if Ax = λ x for some nonzero vector x if and only if λ I − A is not invertible. Thus λ I − A is not invertible if and only if det(λ I − A) = 0. Definition 3.7. Let A be any nxn matrix and λ1, λ2, .. .,λn denote n, not necessarily distinct, eigen values of A. Then define ρ(A) as the largest modulus of the eigenvalues of A denoted by ρ(A) = max λi 1≤i≤n Proposition 3.3. Let A be any nxn matrix. Then for any matrix norm   we have ρ(A) ≤ A 20 k ∞  where ρ(A) = lim Ak 1/k. → We have mentioned the connection between convergence and eigenvalues; in the next chapter we will further our discussion.21 Chapter 4 COMPUTATIONAL TECHNIQUES FOR DRAZIN INVERSE We show in this section that many important quantities in Markov chain theory can be expressed in terms of a generalized inverse Q# of Q = I − P. Note that Q = I − P is not invertible because π = πP =⇒ πQ = π(I − P) = πI − πP = 0. Therefore by Definition 3.6 we have 1 is an eigenvalue for the eigenvector π /= 0 of Q = I − P, which is therefore not invertible. Meyer (1975) gave algebraic expressions and computational results for the Drazin inverse Q#, a generalized inverse of Q = I − P. Boucher (2014) extended computational results in terms of Q# to general state Markov chains through the use of bounded linear operators between Banach spaces of functionals. The current interest is to determine optimal parameter values and convergence rates of Q# in new algorithms for approximatingQ#. 4.1 Recursive Definiton The motivation for Proposition 4.2 is to show that irreducible, finite state chains are a spe cial case of Proposition 4.1, which is from Boucher (2014), and therefore satisfy the assumptions behind the more general Proposition 4.3. Recall that irreducible, aperiodic finite state Markov chains are Vuniformly ergodic withV = c, where c is some constant. Proposition 4.3 holds gener ally for ψirreducible, aperiodic, Vuniformly ergodic Markov chains. First, we state Proposition 6.1 which establishes the importance of Q#. Proposition 4.1. Suppose {Xt} is a ψirreducible, aperiodic Vuniformly ergodic Markov chain {Xt} with transition kernel P and PV < ∞. Then 1. Π =I − QQ#.22 V σ2 2. the fundamental kernel Z = (I − P +Π)−1 exists, is a bounded linear operator from L+∞ to L+∞ # V , and can be written as Z = I+PQ . 3. for any g with g ≤ V the CLT holds for g. Moreover, if g2 ≤ V then the asymptotic variance g > 0 and σ2 # g = πIg(I +P)Q g, (22) where Q# is the generalized Drazin inverse of Q = I − P. 4. Suppose {Xt} satisfies PxV ≤ λV(x)+L with V ≡ 1 so that {Xt} is uniformly ergodic, and g is a bounded function so that g := supx g(x) < ∞. Then for ε > 0, n > 2 g Q# /ε, (−(nε − 2 g Q# )2 sup Px(Sn(g) − Eπ [Sn(g)] ≥ nε) ≤ exp x 2n g 2 Q# 2 . (23) Proof: The proof is omitted, see Boucher (2014) for proof. Now we want to show finite state chains satisfy the assumptions of Proposition 6.1. Proposition 4.2. Suppose {Xt} is an irreducible, aperiodic finite Markov chain with transition kernel P. Then 1. If {Xt} irreducible it is ψ − irreducible 2. Finite state Markov chains are Vuniformly ergodic. 3. PV < ∞ 23 i j (x,A) Σ P i j < Proof: 1. A finite stateMarkov Chain is irreducible if the chain can reach any nonempty set from any set in a finite number of iterations with positive probability. That is, for all i, j there exists n > 0 such that P(n) > 0. ψirreducibility is defined as; there exists a probability measure ψ so that P(n) > ψ(A) for all points x and sets A in the state space. Thus an irreducible finite state Markov chain is ψirreducible with (n) i j ψ( j) = i Σ Σ P(n) i j where ψ(A) = Σ ψ( j) j∈A 2. Reference Definition 1.5, that is, an irreducible aperiodic finite state Markov chain is uni formly ergodic withV = c, where c is some constant. 3. PV is defined as PV := sup sup pxg , x∈X g∈V V(x) where Pxg := g(y)p(x,dy) = Σ g(y)Px j j Since were dealing with finite state Markov chains and g is bounded, this implies g := supxg(x) <∞ and thus Pxg <∞. Therefore J p(x,dy)=1 and we have PV :=sup sup x∈X g∈V pxg V(x) ∞.24 ρl+1 n n n The next proposition contains five algorithms to compute the Drazin inverse that follow from Boucher (2014). Throughout the rest of this chapter, a study of convergence will be conducted and conclusions will be made using these algorithms. Proposition 4.3. Suppose {Xt} is an aperiodic, irreducible finite state Markov chain with transi tion matrix P. Let Q = I  P, W = Ql, l ≥ 1, ρ = ρ(Q) and 0 < α < 2 . 1. (Drazin 1) Then the sequence {Q#}n ≥0 defined by Q# # # 0 = αW, Qn+1= (I − αWQ)Qn +αW, (24) converges to the Drazin inverse Q#o f Q = I − P. 2. (Drazin 2) Then the sequence {Q#}n ≥0 defined by Q# # # # 0 = αW, Qn+1= Qn(2I − QQn), (25) converges to the Drazin inverse Q#o f Q = I − P. 3. (Drazin 3) Then the sequence {Q#}n ≥0 defined by Q# # # W # 0 =W, Qn+1= Qn+ n+2(I − QQn), (26) converges to the Drazin inverse Q#o f Q = I − P.25 n n n n ρ2 4. (Drazin 4) Then the sequence {Q#}n ≥0 defined by Q# # # 1 ( 1 # 0 = (2I − WQ)W, Qn+1= Qn+ n+2 2I − n+2WQ W(I − QQn), (27) converges to the Drazin inverse Q#o f Q = I − P. 5. (Drazin 5) Let p > 1 be an abitrary integer. Then the sequence {Q#}n ≥0 defined by p−1 0 = αW, Qn+1= Qn Σ (I − QQn) , (28) Q# # # # k k=0 converges to the Drazin inverse Q#o f Q = I − P. Proof: The proof follows from proposition 4.2 and Boucher (2014). Several questions arise from this proposition; do the sequences {Q#} converge faster for different values of α , are there α that converge the fastest (i.e an optimal α ), do norms affect convergence, and does the size of the nxn probability matrix affect the rate of convergence? By hand, this becomes a tedious task to accomplish, but thanks to the computational capacity of R this task is greatly simplified. 4.2 Convergence Rate In this section, the rate of convergence will be discussed. The objective is to investigate the convergence of the the sequences {Q#} defined by the equations in Proposition 4.3. There are several factors that affect the convergence speed such as α and the size of the transition probability matrix. Throughout this section, we will be using equation (24), similar results follow for equations (25), (26), (27), and (28). Unless otherwise noted, all examples will have l = 1, that is, Q = I − P andW= Q. Therefore, in this case, the upperbound of α is 2 .26 n 4.2.1 Different Values of Alpha The data in Table 1 is from R. It is collected by running a program that randomly gener ated any number of 10x10 probability transition matrices, in this case, 40 matrices were generated. Notice that as α increases, the average number of iterations and the standard deviation decreases. Thus as α increases, equation (24) converges faster. This can be seen pictorially in Figure 1. Table 1: How α effects the convergence of Q#. Figure 1: Plot of Table 1.27 n Clearly, as α increases, the number of iterations decrease, accumulating around a certain number of iterations. Thus equation (24) is converging. An obvious question now is, what happens on the other side of α = 1? In fact, as α continues to increase past a certain value, the number of iterations begin to increase(not shown in the previous graph). Which leads to our next questions; is there an optimal α, and is there a range for which α is valid? In fact, α → (optimal α) and α is valid in an interval that depends upon the largest eigenvalue in modulus. 4.2.2 Optimal alpha In this section, we discuss whether there exists an optimalα and talk about the interval for which α is valid. Here we look at a specific example to see the convergence of α . Meyer (1975, p. 461) provided an example of calculating Q# where the probability transitionmatrix is given by P. 0 0.5 0.5 0 P = 0.5 0 0.5 0 0.5 0.25 0 0.25 0.25 0.25 0.25 0.25 This P will be used to compare results. Table 2 shows the number of iterations for 88 randomly generated alpha in the closed interval [0.005, .89] using P. This interval was chosen because prior testing indicated that α was invalid for α < 0 and α > .9 Note, this interval is for a specific example, the upperbound changes for different probability transition matrices. The table is a side by side version and α is in numerical order. For what value(s) of α are the number of iterations are minimized? What happens to the number of iterations over the entire interval?28 Table 2: Randomly generated α. Notice as α increases, the number of iterations decrease up to a certain α , then the number of iterations begin to increase as α continues to increase. Thus we have a minimum number of iterations and there exists an optimal α in the open interval (0.647, 0.667). Note, this optimal α is for a specific example. Optimal α varies from different probability transition matrices. Figure 2 is a graph of table 2 plotting the alpha against the number of iterations. The same conclusion can be made from Figure 2, which is a graph of Table 2. The number of iterations decrease to a certain α and then begin to increase. Thus we can say, as α → (optimal α ) then the number of iterations decrease. It is clear by the graph that the further away from the optimal α , the higher the number of iterations.29 Figure 2: Plot of Table 2. 4.3 R Code In this brief section, the code written in R will discussed. This will give the reader a little insight as to how fields are populated and where our data is coming from. These functions calculate the Drazin inverse of Q = I − P, where P is the transition matrix of an ergodic Markov chain. To randomly generate probability transition matrices, say a 10x10, then following code is used P < −matrix(runi f (100,0,1),10,10) P < −diag(1/apply(P,1, sum))%∗ %P When an α is randomly generated between two values, then the following code is used al pha < −runi f (1,0.01,1).30 Q# See the appendix for a more detailed and in depth explanation for each algorithm being used. 4.4 Comparing Drazins This section is titled “Comparing Drazins” because we will be comparing convergence of n in equations (24), (25), (26), (27), and (28), where equation (24) is referred to as Drazin 1 and so on. Simulations in this section include the matrix norms discussed in chapter 3, which will not effect the end result. That is, convergence rates and intervals for valid α do not change with choice of norm. Here we are going to be concerned with the following matrix norms; 1  norm, 2  norm, sup  norm, F  norm, and M  norm which are the following equations (17), (18), (19), (20), and (21) respectively. For the first example, all intervals of α are a sequence of 1,000 α and we will run all Drazins with the probability transition matrix denoted by P, 0.19058159 0.28406685 0.11446613 0.41088543 0.15021060 0.33215298 0.08412015 0.43351627 P = . (29) 0.63272794 0.26599196 0.04323895 0.05804115 0.06898887 0.01965881 0.56564054 0.34571178 Figure 3 depicts Drazin 1 with the five different norms. A sequence of 1,000 α were generated in the closed interval, [.001, 1.1]. Notice all the graphs are parabolas, indicating that there exists an α such that the number of iterations is minimized, thus there exists an optimal α. Also notice that there exists an interval for which optimal α exists, call this the optimal interval in the simulations, but, Figure 4 shows the optimal intervals of figure 3. Mnorm converges to a31 slightly smaller number of iterations and has a larger optimal intveral, this is not proof Mnorm converges fastest or has an optimal interval. Figure 3: Drazin 1 As α → 0, the number of iterations increase like a parabola, but begin to decrease very rapidly. This is a false convergence. Regardless of norm, the nth term is shrinking at the geometric rate ηn, where η = I − αWQ < 1. Thus, η → 1 as α − > 0, but αW → 0, and I − αWQ → I, so 0 is a fixed point when α = 0. This is one reason to check that the solution returned by the algorithm is in fact the Drazin inverse. This false convergence starts at a certain alpha value, say α ′. The interplay of the tolerance and the value of α chosen to run the algorithm determines α ′. The false convergence is due to the sequence converging to 0. For example, by decreasing the tolerance, α′ decreases.32 . Figure 4: This is the closed interval, [0.4, 0.8] of figure 3 To show α = 0 is a fixed point, we will show the first few terms of the algorithm. Here, we will use P in equation (29) and Drazin 1. The convergence to 0 occurs for small α, say α = 0.001. Notice Q# is very small indicating that Q# will be small also. Intuitively, smaller α implies smaller 0 1 steps in the algorithm, getting closer to 0 as α− > 0 8.094184e − 04 −2.840668e − 04 −1.144661e − 04 −4.108854e − 04 −1.502106e − 04 6.678470e − 04 −8.412015e − 05 −4.335163e − 04 Q# 0= −6.327279e − 04 −2.659920e − 04 9.567611e − 04 −5.804115e − 05 −6.898887e − 05 −1.965881e − 05 −5.656405e − 04 6.542882e − 04 33 . 0 . . 0.0016181349 −5.676472e − 04 −0.0002291915 −0.0008212962 −0.0003001862 1.335331e − 03 −0.0001686193 −0.0008665256 Q# 1= −0.0012639453 −5.318381e − 04 0.0019125395 −0.0001167562 −0.0001387026 −3.957008e − 05 −0.0011300949 0.0013083676 Let’s look at a very small α, say α = 0.00000000001. Then the first terms of the algorithm are as follows. Notice, Q# is smaller for the smaller α implying convergence to 0 for α = 0. 8.094184e − 12 −2.840668e − 12 −1.144661e − 12 −4.108854e − 12 −1.502106e − 12 6.678470e − 12 −8.412015e − 13 −4.335163e − 12 Q# 0= −6.327279e − 12 −2.659920e − 12 9.567610e − 12 −5.804115e − 13 −6.898887e − 12 −1.965881e − 13 −5.656405e − 12 6.542882e − 12 1.618837e − 11 −5.681337e − 12 −2.289323e − 12 −8.217709e − 12 −3.004212e − 12 1.335694e − 11 −1.682403e − 12 −8.670325e − 12 Q# 1= −1.265456e − 11 −5.319839e − 12 1.913522e − 11 −1.160823e − 12 −1.379777e − 12 −3.931762e − 13 −1.131281e − 11 1.308576e − 11 34 1 Similarly, Q# is smaller for the smaller α implying convergence to 0 for α = 0. Thus, there exists a neighborhood around 0 such that as α decreases, Drazin 1 geometrically converges to zero. Similar results hold for Drazin 2 and 5. In figure 5, Drazin 2 is ran. Keep in mind, we are using the same probability transition matrix to compare norms, convergence speeds, optimal α, and optimal interval from Drazin algo rithm to Drazin algorithm. Figure 5: Drazin 2 You can see that Drazin 2 follows Drazin 1, that is, parabola shaped with a tail of false conver gence. Figure 6 shows the closed interval [0.2, 0.9], a close up of the optimal intervals of figure 5. Here, all the norms converge to the same number of iterations but, Mnorm has an optimal interval slightly larger than the rest. This still does not prove norms impact our results. The number of iterations for Drazin 2 has drastically reduced from Drazin 1. The highest number of iterations for Drazin 1 was 89 and converged to 10 iterations compared to 12 and 5 respectively for Drazin 2. Notice the graph for Drazin 2 is not as smooth nor as well defined as a35 parabola compared to Drazin 1, indicating the variance of the number of iterations from alpha to alpha has decreased. Therefore, Drazin 2 converges faster, to a smaller number of iterations with a larger optimal interval. Thus Drazin 2 is perferred over Drazin 1.Figure 7 depicts Drazin 3; Figure 6: This is the closed interval, [0.2, 0.9] of figure 5 notice this graph is much different than those for Drazin 1 and Drazin 2. There is no α involved in Drazin 3, thus the reason for a much different graph. Here we are comparing the number of iterations against the different norms used. The caption of figure 7 helps describe the graph. There is little discrepancy between each of the norms, each of them converging relatively at the same rate. Note, this is for a specific example, the number of iterations can change from transition matrix to transition matrix. Drazin 4, in figure 8, is similar to Drazin 3 except the number of iterations have decreased with each norm. Thus Drazin 4 is converging faster for our given probability transition matrix. One of the nice things about Drazin 3 and Drazin 4 is, we are required to use less parameters than the other Drazin’s, i.e., α. While the other Drazins may (or may not) converge with a smaller36 Figure 7: Drazin 3, on the x axis,1 is for 1norm, 2 is for 2norm, 3 is for supnorm, 4 is for Fnorm, and 5 is for Mnorm Figure 8: Drazin 4, on the x axis,1 is for 1norm, 2 is for 2norm, 3 is for supnorm, 4 is for Fnorm, and 5 is for Mnorm37 number of iterations, Drazin 3 and 4 only require a probability transition matrix and a tolerance level (since norms do not drastically affect the number of iterations). Up to this point, Drazin 2 converges with the smallest number of iterations. Next we run Drazin 5, shown in figure 9 with the added parameter p. Here the legends are left out, the same color configuration is used as in Drazin 1 and Drazin 2. Figure 9: Drazin 5 Note, in figure 9 some color coded lines are not visible, this is because they’re hiding behind the purple line. Since the purple line was the last line colored in the R code, it will appear on top of38 the others. Forp = 2, we have our parabola shape with a relatively large optimal interval. Notice as p increases, the optimal interval increases and the number of iterations converges to a smaller number of iterations. 4.5 Matrix Size What happens to the rate of convergence as the matrix size n → ∞? Thanks to R once again, we can run simulations that test this notion. Here the highest n we used was n = 40. This time only, figures are supplied for different α. Figures 10, 11, and 12 are for α = 0.01, 0.05, and 0.5 respectively. Notice as the size of the probability transition matrices increase, the number of iterations start to converge to a certain value. Thus leaving curiosity as to if n → ∞, then does the variance between each trial go to 0, that is, does the number of iterations converge to a single point? You can see that as the matrix size increases, the number of iterations converge. With α → (optimalα), the number of iterations and standard deviation decrease. Figure 10: VaryingMatrix size for α = 0.01 In figure 13, we show Drazin 1 run for four randomly generated matrices of sizes 10x10, 100x100, 250x250, and 500x500. As the matrix size increased so did the time elapsed for R to39 Figure 11: VaryingMatrix size for α = 0.05 Figure 12: VaryingMatrix size for α = 0.5 calculate Drazin 1, that is of times in seconds, 0.27, 17.44, 205.09, and 1509.81 respectively. Notice as n increases the optimal interval begins to decrease converging to a fixed point. In the 500x500 graph, the optimal interval is the smallest, appearing to be a fixed point. As n → ∞ the probabilities Pi j of being in each state will be more dispersed over the col lection of states, on average 1/n, that is, the probabitlity of being in any given state decreases as n increases. Thus, as n → ∞, Pi j → 0 on average. This leads to our next proposition.40 Figure 13: IncreasingMatrix Size Proposition 4.4. Suppose P is an nxn probability transition matrix. As n → ∞ and for a constant tolerance, then Drazin 1 converges to 0. Proof: By proposition 4.3 and equation (24) we haveW= Q, Q = I − P, and41 . . . . I − αWQ = I − αQ2. Then I − αQ2 = I − α(I − P)2 2 (1 − p1,1) −p1,2 ... −p1,n = I − α −p2,1 (1 − p2,2) ... −p2,n . . . −pn,1 −pn,2 ... (1 − pn,n) Therefore, if i = j, we have, I − αQ2 = 1 − α(1 − pi,i)2, and if i /= j we have, I − αQ2 = −α(−pi, j)2. Thus I − αQ2 will tend to be closer to I as n → ∞. This combined with a constant tolerance, we have as n → ∞, Drazin 1 converges to 0. Drazin 1 is essentially a geometric series, the rate of convergence, η = I − αQW → I as n → ∞. Note, proposition 4.4 is only for Drazin 1. 4.5.1 Optimal Interval Thus far we have only looked at generic intervals for which alpha is valid. In this section we take a closer look and hone in on the optimal interval. In figure 14, a sequence of 10,000 alpha were generated for the probability transition matrix P, equation (29). The top graph of figure 14 shows a sequence of 10,000 alpha in the closed interval [0.01, 1.1]. The optimal interval is approximately in the interval [0.3, 0.9], depicted in the middle graph, indicating the minimum42 number of iterations is 5. The bottom graph shows the interval from [0.33, 0.84]. The purpose of figure 14 is to hone in on the opitmal interval, each graph (interval) has a sequence of 10,000 alpha to find the optimal interval. The optimal interval does not change from graph to graph even though more alpha are generated between smaller and smaller intervals. Thus, this is the optimal interval. Figure 14: The top, middle, and bottom graphs have intervals from [0.01, 1.1], [0.3, 0.9], [0.33, 0.84] respectively.43 2 Chapter 5 SUMMARY Many important quantities in Markov chain theory can be expressed in terms of the Drazin inverse, Q#, of Q = I − P. Meyer (1975) gave algebraic expressions and computational results and Boucher(2014) extended these results in terms of Q# to general state Markov chains. Previous work has established conditions and algorithms for estimating Q# in a general setting but specific results are lacking. Finite state Markov chains fall in the same lines as described by Boucher. That is, irreducible, aperiodic finite Markov chains are ψirreducible, Vuniformly ergodic, and PV < ∞. Optimal parameters of the five algorithms for computing Drazin inverse were examined. Drazin 1, Drazin 2, and Drazin 5 approximations give parabola shaped graphs, indicating there exists a minimum of iterations. Thus an optimal α exists, that is, there exists an α such that the number of iterations are minimized. This optimal α may exists in interval(s) called optimal interval(s), that is, the interval(s) such that the optimal α exists. α exists in the interval, 0 < α < pl+1 , where p is the largest eigenvalue in modulus of the probability transition matrix. In this bounded interval for α , there exists a tail of false convergence as α → 0. The nth term is shrinking at the geometric rate ηn, where η = I − αWQ < 1. Thus η → 1 as α → 0. It is clear as α → 0, the initial step of the recursive algorithms goes to 0, thus the sequence converges to 0. The starting point of this false convergence is an interplay between α and the tolerance. After testing, Drazin 2 converged to a smaller number of iterations than Drazin 1. But, Drazin 5 converged to a smaller number of iterations than Drazin 2, where Drazin 5 has an added parameter p. As p increases the number of iterations decrease. Drazin 3 and Drazin 4 are nice because they have less parameters, that is, α is not involved. Drazin 3 and Drazin 4 converge44 to a smaller number of iterations than Drazin1 but not Drazin 5. For this testing of the Drazin algorithms, five norms were tested, 1norm, 2norm, supnorm, Fnorm, and Mnorm. In a few circumstances, the Mnorm converged to a smaller number of iterations or had a larger optimal interval, not significantly though. This does not prove or disprove that any norm is better than the other. The size of the nxn probability transistion matrix, P, affects the number of iterations. That is, as n increases, every entry in P decreases. Thus as n → ∞, Drazin 1 converges 0 for a constant tolerance.45 REFERENCES Boucher, T. R. (2014). Computational Techniques for Markov Chains using Generalized Inverse. 2014 T.S. Texas A&M University Commerce. Gallier, J. (2012). Fundamentals of Linear Algebra and Optimization. Retrieved from http://www.cis.upenn.edu/~cis515/cis51511sl4.pdf Gilks, W. R., Richardson, S., & Spiegehalter, D. J. (Eds.). (1996). Markov Chain Monte Carlo in Practice. London, UK: Chapman & Hall. Lay, D. C. (2006). Linear Algebra and its Applications, 3rd edition. Boston, MA. Pearson AddisonWesley. Meyer, C. D. (1975). The Role of the Group Generalized Inverse in the Theory of Finite Markov Chains. SIAM Review, 17(3), 433464. Meyn, S.P., & Tweedie, R. L. (1993). Markoc Chains and Stochastic Stability, London, UK: SpringerVerlag. Strand, M. (2009). MetropolisHastings Markov Chain Monte Carlo. Retrieved from http://stat.haifa.ac.il/~stat_courses_ma/2011/2/4113/reading/m h%20Mathew%20Strand.PDF46 APPENDICES47 APPENDIX A R CODE FOR DRAZIN 148 R CODE FOR DRAZIN 1 This is the code to run Drazin 1, a probability transition matrix, P, is required to run this program. A sequence of alpha are generated by manually inputting an upperbound(upbd), lower bound(lowbd), and number alpha(numofalpha). Other parameters such as tolerance, norms, plots, and legends can be adjusted to satisfy needs. find.drazin1 < function(P,tol,alpha,norm.type){ stopifnot(norm.type %in% c("o", "I", "F", "2", "M")) if(nrow(P) != ncol(P)) stop("Error: transition matrix is not square.") ones < diag(diag(nrow(P))) for(i in 1:nrow(P)) I < diag(nrow(P)) Q < I  P W < Q Q.star < alpha*W i < 0 while(TRUE){ Q.temp < Q.star Q.star < (I  alpha*W%*%Q)%*%Q.star + alpha*W i < i + 1 if(is.na(max(abs(Q.starQ.temp)) < tol)) stop("No convergence.") if(norm.type == "o") {if(norm(Q.starQ.temp,type = "o") < tol){ break }} if(norm.type == "I") {if(norm(Q.starQ.temp,type = "I") < tol){ break }49 if(norm.type == "F") {if(norm(Q.starQ.temp,type = "F") < tol){ break }} if(norm.type == "2") {if(norm(Q.starQ.temp,type = "2") < tol){ break }} if(norm.type == "M") {if(norm(Q.starQ.temp,type = "M") < tol){ break }} } print("Drazin inverse: ",quote=FALSE) print(Q.star) return(list(Q.star=Q.star,num.iterations=i)) } it.veco < numeric(0) > A it.vecI < numeric(0) it.vecF < numeric(0) it.vec2 < numeric(0) it.vecM < numeric(0) lowbd < .001 upbd < 1.1 numofalpha < 1000 seqstep < numofalpha  1 A < seq(lowbd, upbd, by = ((upbdlowbd)/seqstep)) Am < matrix(A, 1, numofalpha, byrow = TRUE) for(i in 1:numofalpha){ it.veco[i] < find.drazin1(P, 0.01, Am[1, i], "o")[[2]] it.vecI[i] < find.drazin1(P, 0.01, Am[1, i], "I")[[2]] it.vecF[i] < find.drazin1(P, 0.01, Am[1, i], "F")[[2]]50 it.vec2[i] < find.drazin1(P, 0.01, Am[1, i], "2")[[2]] it.vecM[i] < find.drazin1(P, 0.01, Am[1, i], "M")[[2]] } plot(A,it.veco, main = "Drazin 1 With Norm Comparisons", type = "l", col = "black", xlab = "Alpha", ylab = "Number of Iterations", xlim = c(lowbd, upbd), ylim = c(0, max(it.veco))) lines(A, it.vecI, col = "red") lines(A, it.vecF, col = "blue") lines(A, it.vec2, col = "green") lines(A, it.vecM, col = "purple") par(xpd=FALSE) legend(.53, 60,c("1norm", "supnorm","Fnorm", "2norm", "Mnorm"), fill = c("black", "red", "blue", "green", "purple"), lty = 1)51 APPENDIX B R CODE FOR DRAZIN 252 R CODE FOR DRAZIN 2 This is the code to run Drazin 2, a probability transition matrix, P, is required to run this program. A sequence of alpha are generated by manually inputting an upperbound(upbd), lower bound(lowbd), and number alpha(numofalpha). Other parameters such as tolerance, norms, plots, and legends can be adjusted to satisfy needs. find.drazin2 < function(P,tol,alpha,norm.type){ stopifnot(norm.type %in% c("o", "I", "F", "2", "M")) if(nrow(P) != ncol(P)) stop("Error: transition matrix is not square.") ones < diag(diag(nrow(P))) for(i in 1:nrow(P)) I < diag(nrow(P)) Q < I  P W < Q Q.star < alpha*W i < 0 while(TRUE){ Q.temp < Q.star Q.star < Q.star%*%(2*I  Q%*%Q.star) i < i + 1 if(is.na(max(abs(Q.starQ.temp)) < tol)) stop("No convergence.") if(norm.type == "o") {if(norm(Q.starQ.temp,type = "o") < tol){ break }} if(norm.type == "I") {if(norm(Q.starQ.temp,type = "I") < tol){ break }}53 if(norm.type == "F") {if(norm(Q.starQ.temp,type = "F") < tol){ break }} if(norm.type == "2") {if(norm(Q.starQ.temp,type = "2") < tol){ break }} if(norm.type == "M") {if(norm(Q.starQ.temp,type = "M") < tol){ break }} } print("Drazin inverse: ",quote=FALSE) print(Q.star) return(list(Q.star=Q.star,num.iterations=i)) } it.veco < numeric(0) > A it.vecI < numeric(0) it.vecF < numeric(0) it.vec2 < numeric(0) it.vecM < numeric(0) lowbd < .001 upbd < 1.1 numofalpha < 1000 seqstep < numofalpha  1 A < seq(lowbd, upbd, by = ((upbdlowbd)/seqstep)) Am < matrix(A, 1, numofalpha, byrow = TRUE) for(i in 1:numofalpha){ it.veco[i] < find.drazin2(P, 0.01, Am[1, i], "o")[[2]] it.vecI[i] < find.drazin2(P, 0.01, Am[1, i], "I")[[2]] it.vecF[i] < find.drazin2(P, 0.01, Am[1, i], "F")[[2]]54 it.vec2[i] < find.drazin2(P, 0.01, Am[1, i], "2")[[2]] it.vecM[i] < find.drazin2(P, 0.01, Am[1, i], "M")[[2]] } plot(A,it.veco, main = "Drazin 2 With Norm Comparisons", type = "l", col = "black", xlab = "Alpha", ylab = "Number of Iterations", xlim = c(lowbd, upbd), ylim = c(min(it.veco)1, max(it.veco) +1)) lines(A, it.vecI, col = "red") lines(A, it.vecF, col = "blue") lines(A, it.vec2, col = "green") lines(A, it.vecM, col = "purple") par(xpd=FALSE) legend(.45, 7,c("1norm", "supnorm","Fnorm", "2norm", "Mnorm"), fill = c("black", "red","blue", "green", "purple"), lty = 1)55 APPENDIX C R CODE FOR DRAZIN 356 R CODE FOR DRAZIN 3 This is the code to run Drazin 2, a probability transition matrix, P, is required to run this program. Alpha is not required. Other parameters such as tolerance, norms, plots, and legends can be adjusted to satisfy needs. find.drazin3 < function(P,tol, norm.type){ stopifnot(norm.type %in% c("o", "I", "F", "2", "M")) if(nrow(P) != ncol(P)) stop("Error: transition matrix is not square.") ones < diag(diag(nrow(P))) for(i in 1:nrow(P)) I < diag(nrow(P)) Q < I  P W < Q Q.star <W i < 0 while(TRUE){ Q.temp < Q.star Q.star < Q.star + (W/(i+2))%*%(I  Q%*%Q.star) i < i + 1 if(is.na(max(abs(Q.starQ.temp)) < tol)) stop("No convergence.") if(norm.type == "o") {if(norm(Q.starQ.temp,type = "o") < tol){ break }} if(norm.type == "I") {if(norm(Q.starQ.temp,type = "I") < tol){ break }} if(norm.type == "F") {if(norm(Q.starQ.temp,type = "F") < tol){ break }}57 if(norm.type == "2") {if(norm(Q.starQ.temp,type = "2") < tol){ break }} if(norm.type == "M") {if(norm(Q.starQ.temp,type = "M") < tol){ break }} } return(list(Q.star=Q.star,num.iterations=i)) } it.veco < numeric(0) it.vecI < numeric(0) it.vecF < numeric(0) it.vec2 < numeric(0) it.vecM < numeric(0) it.veco[i] < find.drazin3(P, 0.01, "o")[[2]] it.vecI[i] < find.drazin3(P, 0.01, "I")[[2]] it.vecF[i] < find.drazin3(P, 0.01, "F")[[2]] it.vec2[i] < find.drazin3(P, 0.01, "2")[[2]] it.vecM[i] < find.drazin3(P, 0.01, "M")[[2]] plot(it.veco, main = "Drazin 3 With Norm Comparisons", col = "black", xlab = "Alpha", ylab = "Number of Iterations", xlim = c(0,1000), ylim = c(0, 1000)) points(it.vecI, type = "p") points(it.vecF,type = "p") points(it.vec2,type = "p") points(it.vecM,type = "p")58 APPENDIX D R CODE FOR DRAZIN 459 R CODE FOR DRAZIN 4 This is the code to run Drazin 4, a probability transition matrix, P, is required to run this program. Alpha is not required. Other parameters such as tolerance, norms, plots, and legends can be adjusted to satisfy needs. find.drazin4 < function(P,tol, norm.type){ stopifnot(norm.type %in% c("o", "I", "F", "2", "M")) if(nrow(P) != ncol(P)) stop("Error: transition matrix is not square.") ones < diag(diag(nrow(P))) for(i in 1:nrow(P)) I < diag(nrow(P)) Q < I  P W < Q Q.star < (2*I  W%*%Q)%*%W i < 0 while(TRUE){ Q.temp < Q.star Q.star < Q.star + (1/(i+2))*(2*I  (1/(i+2))*W%*%Q)%*%W%*%(I  Q%*%Q.star) i < i + 1 if(is.na(max(abs(Q.starQ.temp)) < tol)) stop("No convergence.") if(norm.type == "o") {if(norm(Q.starQ.temp,type = "o") < tol){ break }} if(norm.type == "I") {if(norm(Q.starQ.temp,type = "I") < tol){ break }} if(norm.type == "F") {if(norm(Q.starQ.temp,type = "F") < tol){ break }}60 if(norm.type == "2") {if(norm(Q.starQ.temp,type = "2") < tol){ break }} if(norm.type == "M") {if(norm(Q.starQ.temp,type = "M") < tol){ break }} } #print(c(alpha)) #print(c(i)) #print("Drazin inverse: ",quote=FALSE) #print(Q.star) return(list(Q.star=Q.star,num.iterations=i)) } it.veco < numeric(0) it.vecI < numeric(0) it.vecF < numeric(0) it.vec2 < numeric(0) it.vecM < numeric(0) it.veco[i] < find.drazin4(P, 0.01, "o")[[2]] it.vecI[i] < find.drazin4(P, 0.01, "I")[[2]] it.vecF[i] < find.drazin4(P, 0.01, "F")[[2]] it.vec2[i] < find.drazin4(P, 0.01, "2")[[2]] it.vecM[i] < find.drazin4(P, 0.01, "M")[[2]] plot(c(it.veco, it.vec2, it.vecI, it.vecF, it.vecM), main = "Drazin 4 Comparing all Norms", xlab = "Different Norms", ylab = "number of iterations", xlim = c(1,5), ylim = c(1,30))61 APPENDIX E R CODE FOR DRAZIN 562 R CODE FOR DRAZIN 1 This is the code to run Drazin 5, a probability transition matrix, P, is required to run this program. A sequence of alpha are generated by manually inputting an upperbound(upbd), lower bound(lowbd), and number alpha(numofalpha). Other parameters such as tolerance, norms, plots, and legends can be adjusted to satisfy needs. find.drazin5 < function(P,tol,alpha,norm.type,p){ stopifnot(norm.type %in% c("o", "I", "F", "E", "M")) if(nrow(P) != ncol(P)) stop("Error: transition matrix is not square.") ones < diag(diag(nrow(P))) for(i in 1:nrow(P)) I < diag(nrow(P)) Q < I  P >W library(expm) Q.star < alpha*W i < 0 while(TRUE){ Q.temp < Q.star a < 063 for(k in 0:(p1)){ a < a + (IQ%*%Q.temp)%^%k } Q.star < Q.temp%*%a i < i + 1 if(is.na(max(abs(Q.starQ.temp)) < tol)) stop("No convergence.") if(norm.type == "o") {if(norm(Q.starQ.temp,type = "o") < tol){ break }} if(norm.type == "I") {if(norm(Q.starQ.temp,type = "I") < tol){ break }} if(norm.type == "F") {if(norm(Q.starQ.temp,type = "F") < tol){ break }} if(norm.type == "E") {if(norm(Q.starQ.temp,type = "E") < tol){ break }} if(norm.type == "M") {if(norm(Q.starQ.temp,type = "M") < tol){ break }} } print("Drazin inverse: ",quote=FALSE) print(Q.star) return(list(Q.star=Q.star,num.iterations=i)) } it.veco < numeric(0) > A it.vecI < numeric(0) it.vecF < numeric(0) it.vecE < numeric(0) it.vecM < numeric(0) lowbd < .0001 upbd < 1.1 numofalpha < 1000 seqstep < numofalpha  164 p < 1000 A < seq(lowbd, upbd, by = ((upbdlowbd)/seqstep)) Am < matrix(A, 1, numofalpha, byrow = TRUE) for(i in 1:numofalpha){ it.veco[i] < find.drazin5(P, 0.01, Am[1, i], "o", p)[[2]] it.vecI[i] < find.drazin5(P, 0.01, Am[1, i], "I", p)[[2]] it.vecF[i] < find.drazin5(P, 0.01,Am[1, i], "F", p)[[2]] it.vecE[i] < find.drazin5(P, 0.01, Am[1, i], "E", p)[[2]] it.vecM[i] < find.drazin5(P, 0.01, Am[1, i],"M",p)[[2]] } #par(mfrow = c(3,2)) plot(A,it.veco, main = "Drazin 5, p = 50,Norm Comparisons", type = "l", xlab = "Alpha", ylab = "Number of Iterations", xlim = c(lowbd, upbd), ylim = c(0, max(it.veco))) lines(A, it.vecI, col = "red") lines(A, it.vecF, col = "blue") lines(A, it.vecE, col = "green") lines(A, it.vecM, col = "purple")65 VITA Derek Fort Received his Bachelor of Science degree in Mathematics with a minor in Interdisciplinary Studies from Texas A&M UniversityCommerce in 2013. He immediately pursued his Master of Science in Mathematics from Texas A&M UniversityCommerce and will graduate in December 2014. Research interests include but not limited to statistics and number theory. Currently he is a mechanical engineer and intends to further his education. Texas A&M UniversityCommerce, Binnion Hall, P.O. Box 3011, Commerce, TX, 754293011 derek.fort@leomail.tamuc.edu.
Click tabs to swap between content that is broken into logical sections.
Rating  
Title  Numerical Approximation of Drazin Inverse in Markov Chain Theory 
Author  Fort, Derek Wayne 
Subject  Statistics; Mathematics 
Abstract  NUMERICAL APPROXIMATION OF DRAZIN INVERSE IN MARKOV CHAIN THEORY A Thesis by DEREK FORT Submitted to the Office of Graduate Studies Texas A&M UniversityCommerce In partial fulfillment of the requirements for the degree of MASTER OF SCIENCE December 2014NUMERICAL APPROXIMATION OF DRAZIN INVERSE IN MARKOV CHAIN THEORY A Thesis by DEREK FORT Approved by: Advisor: Thomas Boucher Committee: Charles Dorsett Pamela Webster Head of Department: Tingxiu Wang Dean of the College: Brent Donham Dean of Graduate Studies: Arlene Horne3 Copyright © 2014 Derek Fort4 ABSTRACT NUMERICAL APPROXIMATION OF DRAZIN INVERSE IN MARKOV CHAIN THEORT Derek Fort, MS Texas A&M UniversityCommerce, 2014 Advisor: Thomas Boucher, PhD One generalized inverse of the matrix Q is the Drazin inverse, Q^#. Many important quantities in Markov chain theory can be expressed in terms of Q^#, where Q = I – P, the matrix P being the transition probability matrix of the chain. Previous work has established conditions and algorithms for estimating Q^# in a general setting but specific results are lacking. In this study, estimation of Q^# is considered for finite state Markov chains. Optimal parameter settings and convergence rates of the algorithms are examined.5 TABLE OF CONTENTS LIST OF TABLES .................................................................................................................... vii LIST OF FIGURES ................................................................................................................. viii CHAPTER 1. INTRODUCTION ...................................................................................................... 1 Markov Chain ....................................................................................................... 1 Transition Matrix ................................................................................................. 3 Markov Chain Properties ....................................................................................... 4 Rates of Convergence ............................................................................................ 6 2. MARKOV CHAIN APPLICATIONS ........................................................................ 8 Bayesian Inference ............................................................................................... 8 Ergodic Averages ................................................................................................. 9 Estimation for Central Limit Theorem ................................................................ 10 Batch Means ....................................................................................................... 11 Window Estimators.............................................................................................. 12 MetropolisHastings Algorithm ......................................................................... 12 The Metropolis Algorithm .................................................................................. 13 The Independence Sampler ................................................................................ 14 SingleComponent MetropolisHastings ........................................................... 14 Gibbs Sampling .................................................................................................. 156 3. VECTOR AND MATRIX NORMS ......................................................................... 16 Vector Norms....................................................................................................... 16 Matrix Norms ..................................................................................................... 17 4. COMPUTATIONAL TECHNIQUES OF DRAZIN INVERSE................................ 21 Recursive Definition ............................................................................................ 21 Convergence Rate ............................................................................................... 25 R Code ................................................................................................................ 29 Comparing Drazins ............................................................................................. 30 Matrix Size .......................................................................................................... 38 5. SUMMARY................................................................................................................ 43 REFERENCES .......................................................................................................................... 45 APPENDICES ...............................................................................................................................46 A. R CODE FOR DRAZIN 1 ........................................................................................47 B. R CODE FOR DRAZIN 2 ....................................................................................... 51 C. R CODE FOR DRAZIN 3 ........................................................................................55 D. R CODE FOR DRAZIN 4 ........................................................................................58 E. R CODE FOR DRAZIN 5 ........................................................................................61 VITA ...........................................................................................................................................65vii LIST OF TABLES TABLE 1. How α effects the convergence of # .............................................................................. 26 2. Randomly generated α ................................................................................................... 28viii LIST OF FIGURES FIGURE 1. Plot of Table 1.................................................................................................................. 26 2. Plot of Table 2 ................................................................................................................. 29 3. Drazin 1..............................................................................................................................31 4. This is the closed interval, [0.4, 0.8] of figure 3................................................................32 5. Drazin 2..............................................................................................................................34 6. This is the closed interval, [0.2, 0.9] of figure 5................................................................35 7. Drazin 3, on the x axis,1 is for 1norm, 2 is for 2norm, 3 is for supnorm, 4 is for F norm, and 5 is for Mnorm.................................................................................................36 8. Drazin 4, on the x axis,1 is for 1norm, 2 is for 2norm, 3 is for supnorm, 4 is for F norm, and 5 is for Mnorm.................................................................................................36 9. Drazin 5..............................................................................................................................37 10. Varying Matrix Size for α = 0.01 ................................................................................... 38 11. Varying Matrix Size for α = 0.05 ................................................................................... 39 12. Varying Matrix Size for α = 0.5 ..................................................................................... 39 13. Increasing Matrix Size .......................................................................................................40 14. The top, middle, and bottom graphs have intervals from [0.01, 1.1], [0.3, 0.9], [0.33, 0.84] respectively.....................................................................................................421 P(n) Chapter 1 INTRODUCTION Markov chains are widely used in areas such as physics, speech recognition, finance, statis tics, and many others. We discuss finite state space Markov chains, their applications, and the role in Markov chain theory of the Drazin inverse Q#, a generalized inverse of Q = I − P. We also review algorithms for calculating Q# and discuss their properties. 1.1 Markov Chain A time homogeneous Markov Chain is a sequence of random variables {X1, X2, X3, . . .}, that are not independent and identically distributed (IID), but satisfy the Markov property. Given the present state xn and the past states {x1, . . . , xn−1}, the probability of arriving in a future state xn+1 is defined by P(Xn+1 = xn+1Xn = xn,Xn−1= xn−1,...,X0= x0) = P(Xn+1 = xn+1Xn = xn) which does not depend on n nor any of the past states {x1, x2, ..., xn−1}. A finite state Markov chain transitions among a finite set of states, S. We denote S as the set of integers {0, 1, 2, . . . , n} for a finite state Markov chain. The probability of going from state i ∈ S to state j ∈ S in a singlestep transition is denoted Pi j = P(X1 = jX0 = i) = P(Xn+1 = jXn = i), (1) by time homogeneity. The probability of going from state i ∈ S to state j ∈ S in n time steps is denoted ij = P(Xn = jX0 = i). (2)2 P(m+n) k j k j Equation (2) is calculated iteratively from equation (1) using the following ChapmanKolmogorov equation. Theorem 1.1. Letm≥ 1, n ≥ 1, i ∈ S, and j ∈ S. Then the Chapman Kolmogorov equation is P(m+n) P(m) (n) i j = Σ k∈S ik Pk j Proof: i j = P(Xm+n = jX0 = i) = Σ P(Xm+n = j,Xm = kX0 = i) k∈S = Σ P(Xm+n = j,Xm = k,X0= i) k∈S P(X0 = i) = Σ P(Xm+n = jXm = k,X0= i)P(Xm = k,X0= i) k∈S P(X0 = i) = Σ P(Xm+n = jXm = k)P(Xm= k,X0= i) k∈S P(X0 = i) = Σ P(Xm= k,X0= i)P(n) k∈S P(X0 = i) = Σ P(Xm= kX0 = i)P(n) k∈S = Σ P(m) (n) k∈S ik Pk j • The ChapmanKolmogorov equation gives the multistep transitions in terms of single step transi tions in a form which looks like a matrix product. For example the matrix product AB has (i, j)th3 P(m) element (AB)i j = Σ aikbk j k where A = (ai j) and B= (bi j). This suggests linear algebra will be useful for convient representa tions of transition probabilitites and other Markov chain quantities. 1.2 Transition Matrix Finite state Markov chain transition probabilities can be recast to take advantage of matrix algebra. Let P be the probability transition matrix defined by Pi j = P(Xn+1 = jXn = i), where the (i, j)th entry is a nonnegative real number in the closed interval [0,1]. Then P is defined as, P1,1 P1,2 ... P1,n P2,1 P2,2 ... P2,n P = . . . .. . Pn,1 Pn,2 ... Pn,n with the following properties Pi, j ≥ 0, ∀ i, j and n Σ Pi, j = 1, ∀ i j=1 where each row is a probability distribution. Recall we define P(m) as the probability of going from a state i to a state j in msteps denoted by, i j = P(Xn+m = jXn = i).4 . = Then we can use ChapmanKolmogorov and linear algebra to calculatemultistep transition probabilities from a matrix product. Lemma 1.1. Letm≥ 1 and n ≥ 1 then P(m+n) = P(m)P(n). Proof: The proof follows directly from Theorem 1.1. • Example 1.1. Let Xi = 1 if you are sick on day i, otherwise, Xi = 2. Suppose P1,1 = 0.08 and P2,1 = 0.015. Then the probability transition matrix is given by, P = 0.08 0.92 0.015 0.985 What is the probability of being sick and 4 days later you are sick? Solution: Using Lemma 1.1 we see that P(4) = PPPP = P4. Thus P4 = 4 0.08 0.92 0.0161 0.98 0.015 0.985 0.0161 0.98 Therefore, the probability of being sick for 4 consecutive days is P(4) = 0.0161. Note that the rows of of P4 are the same, indicating that you are sick 1.61 1.3 Markov Chain Properties If a chain is irreducible, aperiodic, and positive recurrent, then the chain is ergodic and the distribution of {Xn} converges to a stationary distribution, π(·), which allows us to approximate the long term behavior of the chain. If the initial value X0 is sampled from π (·), then all subsequent iterates will be distributed according to π(·).5 i j ii i j A Markov Chain is irreducible if the chain can reach any nonempty set from any set in a finite number of iterations with positive probability. The next definitions are inspired by Meyn and Tweedie (1993) and Gilks, Richardson, and Spiegehatler (1996). Definition 1.1. {Xn} is called irreducible if for all i, j there exists n > 0 such that P(n) > 0. A Markov chain that is aperiodic irregularly transitions between different sets of states. That is, there are no predictable cycles. Definition 1.2. An irreducible chain {Xn} is called aperiodic if for all i, gcd{n > 0 : P(n) > 0} =1. Recurrent chains return in finitietimewith probability greater than 0. Definition 1.3. An irreducible chain {Xn} is recurrent if P[τii < ∞] > 0, where τii is the first return to state i defined by τii = min{t > 0 : Xt = iX0 = i}. Otherwise, {Xn} is called transient. Finally, a Markov Chain is positive recurrent if the expected amount of time between repeated visits to the same state is finite; it follows that each state is visited an infinite number of times with probability equal to 1. Definition 1.4. An irreducible chain {Xn} is called positive recurrent if E[τii] < ∞ for all i. Oth erwise, it is called nullrecurrent. Equivalently, {Xn} is positive recurrent if there exists, π(·) such that Σ π(i)P(n) = π( j) (3) i for all j and n ≥ 0. In matrix form we can write equation (3) as πP = π. The stationary distribution is π and solving the system πP = π is one way to calculate π. The next theorem is from Gilks, Richardson, and Spieghalter (1996, p. 47).6 i j N n ∞ Theorem 1.2. If {Xn} is positive recurrent and aperiodic then its stationary distribution π(·) is the unique probability distribution satisfying (3). We then say that {Xn} is ergodic and the following consequences hold: (i) P(n) → π( j) as n → ∞ for all i, j (ii) (Ergodic Theorem) If Eπ [ f (Xn)] < ∞, then P[ f¯ → Eπ [ f (Xn)] = 1, where Eπ [ f (Xn)] = N Σ f (i)π(i), the expectation of f (Xn) with respect to π(·) and i valued function. f¯ = Σt=1 f (Xt ) , where f (·) is a real In fact, Theorem 1.2(ii) is the strong law of large numbers for Markov chains. That is, X¯n A.S. ¯ −−→ μ as n → ∞ ⇐⇒ P ( lim Xn= μ\ = 1 → with the limiting distribution being the stationary one. An irreducible, aperiodic finite state chain is always positive recurrent and thus ergodic  law of large numbers and central limit theorems will always hold. Thus, the stationary distribution is important in studying longterm behavior of the chain. 1.4 Rates of Convergence It would help to know how quickly P(n) → π. Obviously, chains with faster convergence yield better estimates. The next definition is inspired from Gilks, Richardson, and Spieghalter (1996, p. 48). Definition 1.5. If {Xn} is ergodic (aperiodic and postive recurrent) and there exists λ such that 0 ≤ λ < 1 and a function V(·) > 1 such that Σ Pi j(t) − π( j) ≤ V(i)λ t (4) t7 k then {Xn} is said to be geometrically ergodic for all i. Finite state Markov chains are uniformly geometrically ergodic with V = c, where c is some constant. The smallest λ for which there exists a function V satisfying (4) is called the rate of convergence and denoted by λ ∗, where λ ∗ = in f{λ : there exists V such that (4) holds}. In many problems, transition probabilities are described by a sequence of eigenvalues {λ0, λ1,. ..}, where λ0 = 1, and the corresponding left eigenvectors {e0, e1,. ..}, that is, Σ ek(i)Pi j(t) = λkek( j) i for all j and for each k, such that Pi j(t) = Σ ek(i)ek( j)λ t . (5) k Here e0(·) = π(·). Geometrically ergodic chains have λ0 = 1 and all other eigenvalues are uni formly bounded within (1, 1). If t is large then the dominant term in (5) is π( j) = e0( j). The second largest eigenvalue in absolute value determines the rate of convergence of the Markov chain. An alternative definition of λ ∗ is λ ∗ = supλk k>0 which is related to α in Chapter 4.8 Chapter 2 MARKOV CHAIN APPLICATIONS This Chapter takes a look at Markov Chain Applications. In particular, Bayesian inference and theMetropolisHastingAlgorithm. Along with Gibbs Sampling and the metroplois algorithm. 2.1 Bayesian Inference Let D represent all observed data, and θ denote parameters and missing data as described in Gilks, Richardson, and Spiegehalter(1996). A joint probability distribution over all random quantities is denoted by P(D, θ). By conditioning over the joint distribution gives P(D, θ) = P(Dθ )P(θ ) where P(Dθ) is the likelihood and P(θ ) is the prior distribution. Having observed D, Bayes theorem is used to determine the posterior distribution of θ: P(θ D) = P(Dθ)P(θ ) .  J P(Dθ )P(θ )dθ The posterior expectation of a function f (θ) is E[P( f (θ )D)] = J f (θ )P(Dθ)P(θ )dθ . J P(Dθ )P(θ )dθ More generally, let X be a vector of k random variables with distribution π(·), where θ is the vector of model parameters and π(·) is the posterior distribution. The difficult task now is to9 n evaluate the following expectation for some function f (·) Eπ [ f (X)] = J f (x)π(x)dx . (6) J π(x)dx Here it is possible that π is known only up to a normalization constant. That is, J π(x)dx is unknown. In this situation π ∝ P(θ )P(Dθ ) and X could consist of discrete or continuous random variables. For example, if X consisted of discrete random variables, then the integrals in (6) would be replaced by summations. Eπ [ f (X)] = Σ f (x)π(x)dx Σ π(x)dx . In this situation, the calculation of Eπ [ f (X)] directly would not be possible since π is unknown. Instead, we look at methods of approximating Eπ [ f (X)]. 2.2 Ergodic Averages The following information is given by Gilks, Richardson, and Spiegehalter (1996, p. 45). Monte Carlo integration evaluates Eπ [ f (X)] by drawing samples {Xt : t = 1, .. ., n} from π(·) and then approximating 1 n Eπ [ f (X)] ≈ Σ f(Xt), t=1 i.e. the population mean of f (X) is estimated by a sample mean. When {Xt} are independent, the laws of large numbers ensure that the approximation can be made as accurate as desired by increasing the sample size n. In general, drawing samples {Xt} independently from π(·) is not feasible, since π(·) can be nonstandard. However the {Xt} need not be independent. The {Xt} can be generated by any10 processwhich draws samples throughout π(·). One way of doing this is through an ergodicMarkov chain having π(·) as its stationary distribution. This is then Markov chain Monte Carlo (MCMC). Thus as t increases, the sampled points {Xt } will look increasingly like dependent samples from π(·). After a sufficiently long burnin of say m iterations, points {Xt : t = m+1, . . . , n} will be dependent samples approximately from π (·). We can use the output from the Markov chain to estimate the expectation E[ f (X)], where X has distribution π(·). Burnin samples are usually discarded for this calculation, giving an esitmator f¯= 1 n Σ f (Xt). n − m t=m+1 This is called an ergodic average. Convergence to the required expectation is ensured by the ergodic theorem, Theorem 1.2. 2.3 Estimation for Central Limit Theorem Geometric convergence, as in definition 1.5, allows the existence of central limit theorems for ergodic averages, results of the form N1/2( f¯N − Eπ [ f (X)]) → N(0, σ2) (7) for some positive constant σ , as N → ∞, where the convergence is in distribution. To apply the central limit theorem we need estimates of σ 2. The following theorem is the asymptotic variance referred to later in Proposition 4.2.11 ∞ i=0 Σ ( i π 0 Theorem 2.1. ∞ σ2 = varπ ( f (X0))+2 Σ cov(X0,Xi) i=1 = Σ 1+λj a var ( f (X )) j=1 1 − λ j 1+λ ∗ ≤ 1 − λ ∗ for some nonnegative constants ai, where X0 is distributed according to π(·), and Σ∞ ai = 1. The measure of efficiency of theMarkov chain for estimating Eπ [ f (X)] is given by the ratio e f f f¯ = varπ ( f (X0)) σ2 . To determine the accuracy of the estimate of Eπ [ f (X)], two methods allows us to approximate σ2. 2.4 Batch Means f¯N we must estimate σ2. The following A Markov Chain that is run for N = mn iterations, where n is sufficiently large that 1 Yk = n kn Σ i=(k−1)n+1 f (Xi) are approximated independently N(Eπ [ f (X)], σ 2/n). So σ2 can be approximated by n m Yk − m− 1 f¯N )2. k=1 We need to choose n carefully so that it is large enough to give a valid approximation.12 i 2 1  N 2.5 Window Estimators Another way to approximate σ2 is to estimate γi ≡ covπ ( f (X0), f (Xi)) by 1 γi = − N−i Σ ( f (Xj) − f¯N)( f (Xj+i) − f¯N). j=1 As i increases there are less terms in the average, so the estimate becomes worse as i increases. It is necessary to use a truncated window estimator of σ2, where 0 ≤ ωN(i) ≤ 1, σˆN = γˆ0+2 ∞ Σ i=1 ωN(i)γˆi. 2.6 MetropolisHastings Algorithm For the Metropolis − Hastings algorithm, at each time n, Xn+1 is chosen by first sampling a candidate point Y from a proposal distribution q(·Xn). Note that the proposal distribution may depend on the current point Xn. The candidate point Y is then accepted with probability α(Xn,Y ) where α(X,Y) = min( π(Y)q(X Y) , . (8) π(X)q(YX) If the candidate point is accepted, the next state becomes Xn+1 = Y. If the candidate is rejected, the chain does not move, that is, Xn+1 = Xn. The proposal distribution q(··) can have any form, and the stationary distribution of the chain will be π(·). This can be seen from the following argument. The transition kernel for the13 MetropolisHastings algorithm is P(Xn+1Xn) = q(Xn+1Xn)α(Xn,Xn+1)+I(Xn+1 = Xn)[1 − q(YXn)α(Xn,Y )dY], (9) where I(·) denotes the indicator function (taking the value 1 when its argument is true, and 0 otherwise). The first term in (9) arises from acceptance of a candidate Y = Xn+1, and the second term arises from rejection, for all possible candidates Y. Using the fact that π(Xn)q(Xn+1Xn)α(Xn,Xn+1) = π(Xn+1)q(XnXn+1)α(Xn+1,Xn) (10) which follows from (8), we obtain the detailed balance equation: π(Xn)P(Xn+1Xn) = π(Xn+1)P(XnXn+1). (11) Integrating both sides of (11) with respect to Xn gives: π(Xn)P(Xn+1Xn) = π(Xn+1). (12) The lefthand side of equation (12) gives the marginal distribution of Xn+1 under the assumption that Xn is from π(·). 2.7 The Metropolis Algorithm The Metropolis algorithm considers only symmetric proposals, that is, q(XY) = q(YX). Choosing a proposal that generates each component of Y conditionally independent given Xn re14 1 1 q(X) duces the acceptance probability in (8) to α(X,Y) = min( π(Y) , π(X) . (13) 2.8 The Independence Sampler The independence sampler does not depend on X which is aMetropolisHastings algorithm whose proposal q(YX) = q(Y). The acceptance probability (8) can be written as α(X,Y) = min( ω(Y ) , ω(X) where ω(X) = π(X) and q(·) needs to be a good approximation of π(·) for the independence sam pler to work well. Heaviertailed independence proposals are safer than lightertailed proposals, due to the fact that lightertailed proposals may have long periods of being stuck in the tails. 2.9 SingleComponent MetropolisHastings SinglecomponentMetropolisHastings as defined by Strand (2009), works by dividing X into components {X.1,X.2,. .. ,X.h} of possibly differing dimensions. Let X.−i= {X.1,X.2, . .. ,X.i−1,X.i+1,. .. ,X.h−1X.h} which comprises of all components ofXexceptX.i. One iteration of the singlecomponentMetropolis Hastings is made up of h updating steps. Let Xn.i denote the state of X.i at the end of iteration n. For step i of iteration t +1, X.i is updated using MetropolisHastings. A candidate Y.i is generated from a proposal distribution15 − qi(Y.iXt.i,Xt.−i), where Xt.−i denotes the value of X.i after completeing step i − 1 of iteration t +1: Xt.−i = {Xt+1.1,. .. ,Xt+1.i−1,Xt.i+1,. .. ,Xt.h} where components 1, 2, . . . , i − 1 have already been updated. Thus the ith proposal distribution q(··, ·) generates a candidate only for the ith component of X which may depend on the cur rent values of any of the components of X. This candidate is then accepted with probability α(Xt.−i,Xt.i,Y.i) where ( π(Y.iX.−i)qi(X.iY.i,X.−i) α(X.−i,X.i,Y.i) = min 1, π(X X )q (Y X (14) ) .i .−i i .i .i,X.−i where π(X.iX. i) is the full conditional distribution of the ith remaing components, where X has distribution π(·): π(X X ) = π(X) .i .−i J π(X)dX.i component of X conditioning on the . (15) 2.10 Gibbs Sampling The Gibbs sampler is a special case of the singlecomponent MetropolisHastings. The proposal distribution of the Gibbs sampler for updating the ith component of X is qi(Y.iX.i,X.−i) = π(Y.iX.−i) (16) where π(Y.iX.−i) is given by (15). Substituting (16) into (14) gives an acceptance probability of 1, this implies that Gibbs sampler candidates are always accepted.16 Chapter 3 VECTOR AND MATRIX NORMS This section introduces norms of vectors and matrices. A norm is used to measure distance, size, and define the convergence of sequences of vectors or matrices. Norms will play a role in later calculations. 3.1 Vector Norms First, we will briefly discuss vector norms and then matrix norms. Note that all vectors and matrices in this context are treated as real. Definition 3.1. Let V be a vector space over R. A norm is a function   :V → R satisfying the following for all x, y ∈ V: 1. x ≥ 0 with equality iff x = 0 2. λ x = λ x 3. x+y ≤ x +y Note that if V = R then x = x. Some commonly used norms are provided in the next theorem. Theorem 3.1. Let V = Rn. Then for every (x1, x2, . . . , xn) ∈ V we have (a) the 1 − norm, x1, de f ined such that x1 = x1 +x2 +· · · +xn (17)17 1 x (b) the Euclidean norm (2 − norm), de f ined such that 2 2 1 x2 = (x12+x2 +· · · +xn )2 (18) (c) the sup − norm x∞, de f ined such that x∞ =max{xi1 ≤ i ≤ n} (19) (d) the more general, lp − norm, p ≥ 1 de f ined such that xp = (x1p+x2p+· · · +xnp) p 3.2 Matrix Norms Let Mn(R) be the space of square nxn matrices over R. Definition 3.2. A matrix norm   on the space Mn(R) is a norm on the space with the property AB ≤ AB for all A, B ∈ Mn(R). Define A as the induced matrix norm denoted by A = sup x=1 Ax. The next proposition is by Gallier (2012, p. 231).18 x x i x 2 Proposition 3.1. Let A be any square matrix such that A = (ai j), we have A1= sup x1=1 A∞ = sup n Ax1 = max Σ ai j j i=1 n Ax∞ =max Σ ai j x∞ =1 j=1 A2= sup x2=1 Ax2 = ρ(ATA) = ρ(AAT ). Now we define the Frobenius norm as stated by Gallier (2012, p. 225) Definition 3.3. The Frobenius norm(Fnorm)  F is defined so that for every real nxn matrix A, AF = ( n Σ i, j=1 ai j \1/2 = tr(AAT ) = tr(ATA). (20) satisfying the following property: (a) ρ(ATA) ≤ AF ≤ √n ρ(ATA) for all A. Proposition 3.2. The following inequalities hold for all x ∈ R : (a) x∞ ≤ x1 ≤ nx∞ (b) x∞ ≤ x2 ≤ √nx∞ (c) x2 ≤ x1 ≤ √nx∞ (d) x2 ≤ xF ≤ √nx2 Definition 3.4. The maximum modulus norm (Mnorm) is the largest component of the matrix in19 absolute value. AM := max(a1, . .. ,an) (21) Definition 3.5. Given any real vector space V, two norms say  a and  b are equivalent if and only if there exists some positive real numbers C1,C2 > 0, such that ua ≤ C1ub and ub ≤ C2ua, for all u ∈ V. Note that proposition 3.2 and definition 3.5 are by Gallier (2012, p. 78), indicates equivalence of norms  · 1,  · 2,  · ∞, and  · F so that convergence in one is convergence in the others. 3.2.1 Eigenvectors and Eigenvalues Eigenvalues and Eigenvectors will play a critical role in future calculations. A refresher of this material is important. The following definition is given by Lay (2006 p.303 ). Definition 3.6. An eigenvector of an nxnmatrix A is a nonzero vector x such that Ax= λ x for some scalar λ . A scalar λ is called an eigenvalue of A if there exists a nontrivial solution x of Ax = λ x. Note that λ is an eigenvalue of Aif and only if Ax = λ x for some nonzero vector x if and only if λ I − A is not invertible. Thus λ I − A is not invertible if and only if det(λ I − A) = 0. Definition 3.7. Let A be any nxn matrix and λ1, λ2, .. .,λn denote n, not necessarily distinct, eigen values of A. Then define ρ(A) as the largest modulus of the eigenvalues of A denoted by ρ(A) = max λi 1≤i≤n Proposition 3.3. Let A be any nxn matrix. Then for any matrix norm   we have ρ(A) ≤ A 20 k ∞  where ρ(A) = lim Ak 1/k. → We have mentioned the connection between convergence and eigenvalues; in the next chapter we will further our discussion.21 Chapter 4 COMPUTATIONAL TECHNIQUES FOR DRAZIN INVERSE We show in this section that many important quantities in Markov chain theory can be expressed in terms of a generalized inverse Q# of Q = I − P. Note that Q = I − P is not invertible because π = πP =⇒ πQ = π(I − P) = πI − πP = 0. Therefore by Definition 3.6 we have 1 is an eigenvalue for the eigenvector π /= 0 of Q = I − P, which is therefore not invertible. Meyer (1975) gave algebraic expressions and computational results for the Drazin inverse Q#, a generalized inverse of Q = I − P. Boucher (2014) extended computational results in terms of Q# to general state Markov chains through the use of bounded linear operators between Banach spaces of functionals. The current interest is to determine optimal parameter values and convergence rates of Q# in new algorithms for approximatingQ#. 4.1 Recursive Definiton The motivation for Proposition 4.2 is to show that irreducible, finite state chains are a spe cial case of Proposition 4.1, which is from Boucher (2014), and therefore satisfy the assumptions behind the more general Proposition 4.3. Recall that irreducible, aperiodic finite state Markov chains are Vuniformly ergodic withV = c, where c is some constant. Proposition 4.3 holds gener ally for ψirreducible, aperiodic, Vuniformly ergodic Markov chains. First, we state Proposition 6.1 which establishes the importance of Q#. Proposition 4.1. Suppose {Xt} is a ψirreducible, aperiodic Vuniformly ergodic Markov chain {Xt} with transition kernel P and PV < ∞. Then 1. Π =I − QQ#.22 V σ2 2. the fundamental kernel Z = (I − P +Π)−1 exists, is a bounded linear operator from L+∞ to L+∞ # V , and can be written as Z = I+PQ . 3. for any g with g ≤ V the CLT holds for g. Moreover, if g2 ≤ V then the asymptotic variance g > 0 and σ2 # g = πIg(I +P)Q g, (22) where Q# is the generalized Drazin inverse of Q = I − P. 4. Suppose {Xt} satisfies PxV ≤ λV(x)+L with V ≡ 1 so that {Xt} is uniformly ergodic, and g is a bounded function so that g := supx g(x) < ∞. Then for ε > 0, n > 2 g Q# /ε, (−(nε − 2 g Q# )2 sup Px(Sn(g) − Eπ [Sn(g)] ≥ nε) ≤ exp x 2n g 2 Q# 2 . (23) Proof: The proof is omitted, see Boucher (2014) for proof. Now we want to show finite state chains satisfy the assumptions of Proposition 6.1. Proposition 4.2. Suppose {Xt} is an irreducible, aperiodic finite Markov chain with transition kernel P. Then 1. If {Xt} irreducible it is ψ − irreducible 2. Finite state Markov chains are Vuniformly ergodic. 3. PV < ∞ 23 i j (x,A) Σ P i j < Proof: 1. A finite stateMarkov Chain is irreducible if the chain can reach any nonempty set from any set in a finite number of iterations with positive probability. That is, for all i, j there exists n > 0 such that P(n) > 0. ψirreducibility is defined as; there exists a probability measure ψ so that P(n) > ψ(A) for all points x and sets A in the state space. Thus an irreducible finite state Markov chain is ψirreducible with (n) i j ψ( j) = i Σ Σ P(n) i j where ψ(A) = Σ ψ( j) j∈A 2. Reference Definition 1.5, that is, an irreducible aperiodic finite state Markov chain is uni formly ergodic withV = c, where c is some constant. 3. PV is defined as PV := sup sup pxg , x∈X g∈V V(x) where Pxg := g(y)p(x,dy) = Σ g(y)Px j j Since were dealing with finite state Markov chains and g is bounded, this implies g := supxg(x) <∞ and thus Pxg <∞. Therefore J p(x,dy)=1 and we have PV :=sup sup x∈X g∈V pxg V(x) ∞.24 ρl+1 n n n The next proposition contains five algorithms to compute the Drazin inverse that follow from Boucher (2014). Throughout the rest of this chapter, a study of convergence will be conducted and conclusions will be made using these algorithms. Proposition 4.3. Suppose {Xt} is an aperiodic, irreducible finite state Markov chain with transi tion matrix P. Let Q = I  P, W = Ql, l ≥ 1, ρ = ρ(Q) and 0 < α < 2 . 1. (Drazin 1) Then the sequence {Q#}n ≥0 defined by Q# # # 0 = αW, Qn+1= (I − αWQ)Qn +αW, (24) converges to the Drazin inverse Q#o f Q = I − P. 2. (Drazin 2) Then the sequence {Q#}n ≥0 defined by Q# # # # 0 = αW, Qn+1= Qn(2I − QQn), (25) converges to the Drazin inverse Q#o f Q = I − P. 3. (Drazin 3) Then the sequence {Q#}n ≥0 defined by Q# # # W # 0 =W, Qn+1= Qn+ n+2(I − QQn), (26) converges to the Drazin inverse Q#o f Q = I − P.25 n n n n ρ2 4. (Drazin 4) Then the sequence {Q#}n ≥0 defined by Q# # # 1 ( 1 # 0 = (2I − WQ)W, Qn+1= Qn+ n+2 2I − n+2WQ W(I − QQn), (27) converges to the Drazin inverse Q#o f Q = I − P. 5. (Drazin 5) Let p > 1 be an abitrary integer. Then the sequence {Q#}n ≥0 defined by p−1 0 = αW, Qn+1= Qn Σ (I − QQn) , (28) Q# # # # k k=0 converges to the Drazin inverse Q#o f Q = I − P. Proof: The proof follows from proposition 4.2 and Boucher (2014). Several questions arise from this proposition; do the sequences {Q#} converge faster for different values of α , are there α that converge the fastest (i.e an optimal α ), do norms affect convergence, and does the size of the nxn probability matrix affect the rate of convergence? By hand, this becomes a tedious task to accomplish, but thanks to the computational capacity of R this task is greatly simplified. 4.2 Convergence Rate In this section, the rate of convergence will be discussed. The objective is to investigate the convergence of the the sequences {Q#} defined by the equations in Proposition 4.3. There are several factors that affect the convergence speed such as α and the size of the transition probability matrix. Throughout this section, we will be using equation (24), similar results follow for equations (25), (26), (27), and (28). Unless otherwise noted, all examples will have l = 1, that is, Q = I − P andW= Q. Therefore, in this case, the upperbound of α is 2 .26 n 4.2.1 Different Values of Alpha The data in Table 1 is from R. It is collected by running a program that randomly gener ated any number of 10x10 probability transition matrices, in this case, 40 matrices were generated. Notice that as α increases, the average number of iterations and the standard deviation decreases. Thus as α increases, equation (24) converges faster. This can be seen pictorially in Figure 1. Table 1: How α effects the convergence of Q#. Figure 1: Plot of Table 1.27 n Clearly, as α increases, the number of iterations decrease, accumulating around a certain number of iterations. Thus equation (24) is converging. An obvious question now is, what happens on the other side of α = 1? In fact, as α continues to increase past a certain value, the number of iterations begin to increase(not shown in the previous graph). Which leads to our next questions; is there an optimal α, and is there a range for which α is valid? In fact, α → (optimal α) and α is valid in an interval that depends upon the largest eigenvalue in modulus. 4.2.2 Optimal alpha In this section, we discuss whether there exists an optimalα and talk about the interval for which α is valid. Here we look at a specific example to see the convergence of α . Meyer (1975, p. 461) provided an example of calculating Q# where the probability transitionmatrix is given by P. 0 0.5 0.5 0 P = 0.5 0 0.5 0 0.5 0.25 0 0.25 0.25 0.25 0.25 0.25 This P will be used to compare results. Table 2 shows the number of iterations for 88 randomly generated alpha in the closed interval [0.005, .89] using P. This interval was chosen because prior testing indicated that α was invalid for α < 0 and α > .9 Note, this interval is for a specific example, the upperbound changes for different probability transition matrices. The table is a side by side version and α is in numerical order. For what value(s) of α are the number of iterations are minimized? What happens to the number of iterations over the entire interval?28 Table 2: Randomly generated α. Notice as α increases, the number of iterations decrease up to a certain α , then the number of iterations begin to increase as α continues to increase. Thus we have a minimum number of iterations and there exists an optimal α in the open interval (0.647, 0.667). Note, this optimal α is for a specific example. Optimal α varies from different probability transition matrices. Figure 2 is a graph of table 2 plotting the alpha against the number of iterations. The same conclusion can be made from Figure 2, which is a graph of Table 2. The number of iterations decrease to a certain α and then begin to increase. Thus we can say, as α → (optimal α ) then the number of iterations decrease. It is clear by the graph that the further away from the optimal α , the higher the number of iterations.29 Figure 2: Plot of Table 2. 4.3 R Code In this brief section, the code written in R will discussed. This will give the reader a little insight as to how fields are populated and where our data is coming from. These functions calculate the Drazin inverse of Q = I − P, where P is the transition matrix of an ergodic Markov chain. To randomly generate probability transition matrices, say a 10x10, then following code is used P < −matrix(runi f (100,0,1),10,10) P < −diag(1/apply(P,1, sum))%∗ %P When an α is randomly generated between two values, then the following code is used al pha < −runi f (1,0.01,1).30 Q# See the appendix for a more detailed and in depth explanation for each algorithm being used. 4.4 Comparing Drazins This section is titled “Comparing Drazins” because we will be comparing convergence of n in equations (24), (25), (26), (27), and (28), where equation (24) is referred to as Drazin 1 and so on. Simulations in this section include the matrix norms discussed in chapter 3, which will not effect the end result. That is, convergence rates and intervals for valid α do not change with choice of norm. Here we are going to be concerned with the following matrix norms; 1  norm, 2  norm, sup  norm, F  norm, and M  norm which are the following equations (17), (18), (19), (20), and (21) respectively. For the first example, all intervals of α are a sequence of 1,000 α and we will run all Drazins with the probability transition matrix denoted by P, 0.19058159 0.28406685 0.11446613 0.41088543 0.15021060 0.33215298 0.08412015 0.43351627 P = . (29) 0.63272794 0.26599196 0.04323895 0.05804115 0.06898887 0.01965881 0.56564054 0.34571178 Figure 3 depicts Drazin 1 with the five different norms. A sequence of 1,000 α were generated in the closed interval, [.001, 1.1]. Notice all the graphs are parabolas, indicating that there exists an α such that the number of iterations is minimized, thus there exists an optimal α. Also notice that there exists an interval for which optimal α exists, call this the optimal interval in the simulations, but, Figure 4 shows the optimal intervals of figure 3. Mnorm converges to a31 slightly smaller number of iterations and has a larger optimal intveral, this is not proof Mnorm converges fastest or has an optimal interval. Figure 3: Drazin 1 As α → 0, the number of iterations increase like a parabola, but begin to decrease very rapidly. This is a false convergence. Regardless of norm, the nth term is shrinking at the geometric rate ηn, where η = I − αWQ < 1. Thus, η → 1 as α − > 0, but αW → 0, and I − αWQ → I, so 0 is a fixed point when α = 0. This is one reason to check that the solution returned by the algorithm is in fact the Drazin inverse. This false convergence starts at a certain alpha value, say α ′. The interplay of the tolerance and the value of α chosen to run the algorithm determines α ′. The false convergence is due to the sequence converging to 0. For example, by decreasing the tolerance, α′ decreases.32 . Figure 4: This is the closed interval, [0.4, 0.8] of figure 3 To show α = 0 is a fixed point, we will show the first few terms of the algorithm. Here, we will use P in equation (29) and Drazin 1. The convergence to 0 occurs for small α, say α = 0.001. Notice Q# is very small indicating that Q# will be small also. Intuitively, smaller α implies smaller 0 1 steps in the algorithm, getting closer to 0 as α− > 0 8.094184e − 04 −2.840668e − 04 −1.144661e − 04 −4.108854e − 04 −1.502106e − 04 6.678470e − 04 −8.412015e − 05 −4.335163e − 04 Q# 0= −6.327279e − 04 −2.659920e − 04 9.567611e − 04 −5.804115e − 05 −6.898887e − 05 −1.965881e − 05 −5.656405e − 04 6.542882e − 04 33 . 0 . . 0.0016181349 −5.676472e − 04 −0.0002291915 −0.0008212962 −0.0003001862 1.335331e − 03 −0.0001686193 −0.0008665256 Q# 1= −0.0012639453 −5.318381e − 04 0.0019125395 −0.0001167562 −0.0001387026 −3.957008e − 05 −0.0011300949 0.0013083676 Let’s look at a very small α, say α = 0.00000000001. Then the first terms of the algorithm are as follows. Notice, Q# is smaller for the smaller α implying convergence to 0 for α = 0. 8.094184e − 12 −2.840668e − 12 −1.144661e − 12 −4.108854e − 12 −1.502106e − 12 6.678470e − 12 −8.412015e − 13 −4.335163e − 12 Q# 0= −6.327279e − 12 −2.659920e − 12 9.567610e − 12 −5.804115e − 13 −6.898887e − 12 −1.965881e − 13 −5.656405e − 12 6.542882e − 12 1.618837e − 11 −5.681337e − 12 −2.289323e − 12 −8.217709e − 12 −3.004212e − 12 1.335694e − 11 −1.682403e − 12 −8.670325e − 12 Q# 1= −1.265456e − 11 −5.319839e − 12 1.913522e − 11 −1.160823e − 12 −1.379777e − 12 −3.931762e − 13 −1.131281e − 11 1.308576e − 11 34 1 Similarly, Q# is smaller for the smaller α implying convergence to 0 for α = 0. Thus, there exists a neighborhood around 0 such that as α decreases, Drazin 1 geometrically converges to zero. Similar results hold for Drazin 2 and 5. In figure 5, Drazin 2 is ran. Keep in mind, we are using the same probability transition matrix to compare norms, convergence speeds, optimal α, and optimal interval from Drazin algo rithm to Drazin algorithm. Figure 5: Drazin 2 You can see that Drazin 2 follows Drazin 1, that is, parabola shaped with a tail of false conver gence. Figure 6 shows the closed interval [0.2, 0.9], a close up of the optimal intervals of figure 5. Here, all the norms converge to the same number of iterations but, Mnorm has an optimal interval slightly larger than the rest. This still does not prove norms impact our results. The number of iterations for Drazin 2 has drastically reduced from Drazin 1. The highest number of iterations for Drazin 1 was 89 and converged to 10 iterations compared to 12 and 5 respectively for Drazin 2. Notice the graph for Drazin 2 is not as smooth nor as well defined as a35 parabola compared to Drazin 1, indicating the variance of the number of iterations from alpha to alpha has decreased. Therefore, Drazin 2 converges faster, to a smaller number of iterations with a larger optimal interval. Thus Drazin 2 is perferred over Drazin 1.Figure 7 depicts Drazin 3; Figure 6: This is the closed interval, [0.2, 0.9] of figure 5 notice this graph is much different than those for Drazin 1 and Drazin 2. There is no α involved in Drazin 3, thus the reason for a much different graph. Here we are comparing the number of iterations against the different norms used. The caption of figure 7 helps describe the graph. There is little discrepancy between each of the norms, each of them converging relatively at the same rate. Note, this is for a specific example, the number of iterations can change from transition matrix to transition matrix. Drazin 4, in figure 8, is similar to Drazin 3 except the number of iterations have decreased with each norm. Thus Drazin 4 is converging faster for our given probability transition matrix. One of the nice things about Drazin 3 and Drazin 4 is, we are required to use less parameters than the other Drazin’s, i.e., α. While the other Drazins may (or may not) converge with a smaller36 Figure 7: Drazin 3, on the x axis,1 is for 1norm, 2 is for 2norm, 3 is for supnorm, 4 is for Fnorm, and 5 is for Mnorm Figure 8: Drazin 4, on the x axis,1 is for 1norm, 2 is for 2norm, 3 is for supnorm, 4 is for Fnorm, and 5 is for Mnorm37 number of iterations, Drazin 3 and 4 only require a probability transition matrix and a tolerance level (since norms do not drastically affect the number of iterations). Up to this point, Drazin 2 converges with the smallest number of iterations. Next we run Drazin 5, shown in figure 9 with the added parameter p. Here the legends are left out, the same color configuration is used as in Drazin 1 and Drazin 2. Figure 9: Drazin 5 Note, in figure 9 some color coded lines are not visible, this is because they’re hiding behind the purple line. Since the purple line was the last line colored in the R code, it will appear on top of38 the others. Forp = 2, we have our parabola shape with a relatively large optimal interval. Notice as p increases, the optimal interval increases and the number of iterations converges to a smaller number of iterations. 4.5 Matrix Size What happens to the rate of convergence as the matrix size n → ∞? Thanks to R once again, we can run simulations that test this notion. Here the highest n we used was n = 40. This time only, figures are supplied for different α. Figures 10, 11, and 12 are for α = 0.01, 0.05, and 0.5 respectively. Notice as the size of the probability transition matrices increase, the number of iterations start to converge to a certain value. Thus leaving curiosity as to if n → ∞, then does the variance between each trial go to 0, that is, does the number of iterations converge to a single point? You can see that as the matrix size increases, the number of iterations converge. With α → (optimalα), the number of iterations and standard deviation decrease. Figure 10: VaryingMatrix size for α = 0.01 In figure 13, we show Drazin 1 run for four randomly generated matrices of sizes 10x10, 100x100, 250x250, and 500x500. As the matrix size increased so did the time elapsed for R to39 Figure 11: VaryingMatrix size for α = 0.05 Figure 12: VaryingMatrix size for α = 0.5 calculate Drazin 1, that is of times in seconds, 0.27, 17.44, 205.09, and 1509.81 respectively. Notice as n increases the optimal interval begins to decrease converging to a fixed point. In the 500x500 graph, the optimal interval is the smallest, appearing to be a fixed point. As n → ∞ the probabilities Pi j of being in each state will be more dispersed over the col lection of states, on average 1/n, that is, the probabitlity of being in any given state decreases as n increases. Thus, as n → ∞, Pi j → 0 on average. This leads to our next proposition.40 Figure 13: IncreasingMatrix Size Proposition 4.4. Suppose P is an nxn probability transition matrix. As n → ∞ and for a constant tolerance, then Drazin 1 converges to 0. Proof: By proposition 4.3 and equation (24) we haveW= Q, Q = I − P, and41 . . . . I − αWQ = I − αQ2. Then I − αQ2 = I − α(I − P)2 2 (1 − p1,1) −p1,2 ... −p1,n = I − α −p2,1 (1 − p2,2) ... −p2,n . . . −pn,1 −pn,2 ... (1 − pn,n) Therefore, if i = j, we have, I − αQ2 = 1 − α(1 − pi,i)2, and if i /= j we have, I − αQ2 = −α(−pi, j)2. Thus I − αQ2 will tend to be closer to I as n → ∞. This combined with a constant tolerance, we have as n → ∞, Drazin 1 converges to 0. Drazin 1 is essentially a geometric series, the rate of convergence, η = I − αQW → I as n → ∞. Note, proposition 4.4 is only for Drazin 1. 4.5.1 Optimal Interval Thus far we have only looked at generic intervals for which alpha is valid. In this section we take a closer look and hone in on the optimal interval. In figure 14, a sequence of 10,000 alpha were generated for the probability transition matrix P, equation (29). The top graph of figure 14 shows a sequence of 10,000 alpha in the closed interval [0.01, 1.1]. The optimal interval is approximately in the interval [0.3, 0.9], depicted in the middle graph, indicating the minimum42 number of iterations is 5. The bottom graph shows the interval from [0.33, 0.84]. The purpose of figure 14 is to hone in on the opitmal interval, each graph (interval) has a sequence of 10,000 alpha to find the optimal interval. The optimal interval does not change from graph to graph even though more alpha are generated between smaller and smaller intervals. Thus, this is the optimal interval. Figure 14: The top, middle, and bottom graphs have intervals from [0.01, 1.1], [0.3, 0.9], [0.33, 0.84] respectively.43 2 Chapter 5 SUMMARY Many important quantities in Markov chain theory can be expressed in terms of the Drazin inverse, Q#, of Q = I − P. Meyer (1975) gave algebraic expressions and computational results and Boucher(2014) extended these results in terms of Q# to general state Markov chains. Previous work has established conditions and algorithms for estimating Q# in a general setting but specific results are lacking. Finite state Markov chains fall in the same lines as described by Boucher. That is, irreducible, aperiodic finite Markov chains are ψirreducible, Vuniformly ergodic, and PV < ∞. Optimal parameters of the five algorithms for computing Drazin inverse were examined. Drazin 1, Drazin 2, and Drazin 5 approximations give parabola shaped graphs, indicating there exists a minimum of iterations. Thus an optimal α exists, that is, there exists an α such that the number of iterations are minimized. This optimal α may exists in interval(s) called optimal interval(s), that is, the interval(s) such that the optimal α exists. α exists in the interval, 0 < α < pl+1 , where p is the largest eigenvalue in modulus of the probability transition matrix. In this bounded interval for α , there exists a tail of false convergence as α → 0. The nth term is shrinking at the geometric rate ηn, where η = I − αWQ < 1. Thus η → 1 as α → 0. It is clear as α → 0, the initial step of the recursive algorithms goes to 0, thus the sequence converges to 0. The starting point of this false convergence is an interplay between α and the tolerance. After testing, Drazin 2 converged to a smaller number of iterations than Drazin 1. But, Drazin 5 converged to a smaller number of iterations than Drazin 2, where Drazin 5 has an added parameter p. As p increases the number of iterations decrease. Drazin 3 and Drazin 4 are nice because they have less parameters, that is, α is not involved. Drazin 3 and Drazin 4 converge44 to a smaller number of iterations than Drazin1 but not Drazin 5. For this testing of the Drazin algorithms, five norms were tested, 1norm, 2norm, supnorm, Fnorm, and Mnorm. In a few circumstances, the Mnorm converged to a smaller number of iterations or had a larger optimal interval, not significantly though. This does not prove or disprove that any norm is better than the other. The size of the nxn probability transistion matrix, P, affects the number of iterations. That is, as n increases, every entry in P decreases. Thus as n → ∞, Drazin 1 converges 0 for a constant tolerance.45 REFERENCES Boucher, T. R. (2014). Computational Techniques for Markov Chains using Generalized Inverse. 2014 T.S. Texas A&M University Commerce. Gallier, J. (2012). Fundamentals of Linear Algebra and Optimization. Retrieved from http://www.cis.upenn.edu/~cis515/cis51511sl4.pdf Gilks, W. R., Richardson, S., & Spiegehalter, D. J. (Eds.). (1996). Markov Chain Monte Carlo in Practice. London, UK: Chapman & Hall. Lay, D. C. (2006). Linear Algebra and its Applications, 3rd edition. Boston, MA. Pearson AddisonWesley. Meyer, C. D. (1975). The Role of the Group Generalized Inverse in the Theory of Finite Markov Chains. SIAM Review, 17(3), 433464. Meyn, S.P., & Tweedie, R. L. (1993). Markoc Chains and Stochastic Stability, London, UK: SpringerVerlag. Strand, M. (2009). MetropolisHastings Markov Chain Monte Carlo. Retrieved from http://stat.haifa.ac.il/~stat_courses_ma/2011/2/4113/reading/m h%20Mathew%20Strand.PDF46 APPENDICES47 APPENDIX A R CODE FOR DRAZIN 148 R CODE FOR DRAZIN 1 This is the code to run Drazin 1, a probability transition matrix, P, is required to run this program. A sequence of alpha are generated by manually inputting an upperbound(upbd), lower bound(lowbd), and number alpha(numofalpha). Other parameters such as tolerance, norms, plots, and legends can be adjusted to satisfy needs. find.drazin1 < function(P,tol,alpha,norm.type){ stopifnot(norm.type %in% c("o", "I", "F", "2", "M")) if(nrow(P) != ncol(P)) stop("Error: transition matrix is not square.") ones < diag(diag(nrow(P))) for(i in 1:nrow(P)) I < diag(nrow(P)) Q < I  P W < Q Q.star < alpha*W i < 0 while(TRUE){ Q.temp < Q.star Q.star < (I  alpha*W%*%Q)%*%Q.star + alpha*W i < i + 1 if(is.na(max(abs(Q.starQ.temp)) < tol)) stop("No convergence.") if(norm.type == "o") {if(norm(Q.starQ.temp,type = "o") < tol){ break }} if(norm.type == "I") {if(norm(Q.starQ.temp,type = "I") < tol){ break }49 if(norm.type == "F") {if(norm(Q.starQ.temp,type = "F") < tol){ break }} if(norm.type == "2") {if(norm(Q.starQ.temp,type = "2") < tol){ break }} if(norm.type == "M") {if(norm(Q.starQ.temp,type = "M") < tol){ break }} } print("Drazin inverse: ",quote=FALSE) print(Q.star) return(list(Q.star=Q.star,num.iterations=i)) } it.veco < numeric(0) > A it.vecI < numeric(0) it.vecF < numeric(0) it.vec2 < numeric(0) it.vecM < numeric(0) lowbd < .001 upbd < 1.1 numofalpha < 1000 seqstep < numofalpha  1 A < seq(lowbd, upbd, by = ((upbdlowbd)/seqstep)) Am < matrix(A, 1, numofalpha, byrow = TRUE) for(i in 1:numofalpha){ it.veco[i] < find.drazin1(P, 0.01, Am[1, i], "o")[[2]] it.vecI[i] < find.drazin1(P, 0.01, Am[1, i], "I")[[2]] it.vecF[i] < find.drazin1(P, 0.01, Am[1, i], "F")[[2]]50 it.vec2[i] < find.drazin1(P, 0.01, Am[1, i], "2")[[2]] it.vecM[i] < find.drazin1(P, 0.01, Am[1, i], "M")[[2]] } plot(A,it.veco, main = "Drazin 1 With Norm Comparisons", type = "l", col = "black", xlab = "Alpha", ylab = "Number of Iterations", xlim = c(lowbd, upbd), ylim = c(0, max(it.veco))) lines(A, it.vecI, col = "red") lines(A, it.vecF, col = "blue") lines(A, it.vec2, col = "green") lines(A, it.vecM, col = "purple") par(xpd=FALSE) legend(.53, 60,c("1norm", "supnorm","Fnorm", "2norm", "Mnorm"), fill = c("black", "red", "blue", "green", "purple"), lty = 1)51 APPENDIX B R CODE FOR DRAZIN 252 R CODE FOR DRAZIN 2 This is the code to run Drazin 2, a probability transition matrix, P, is required to run this program. A sequence of alpha are generated by manually inputting an upperbound(upbd), lower bound(lowbd), and number alpha(numofalpha). Other parameters such as tolerance, norms, plots, and legends can be adjusted to satisfy needs. find.drazin2 < function(P,tol,alpha,norm.type){ stopifnot(norm.type %in% c("o", "I", "F", "2", "M")) if(nrow(P) != ncol(P)) stop("Error: transition matrix is not square.") ones < diag(diag(nrow(P))) for(i in 1:nrow(P)) I < diag(nrow(P)) Q < I  P W < Q Q.star < alpha*W i < 0 while(TRUE){ Q.temp < Q.star Q.star < Q.star%*%(2*I  Q%*%Q.star) i < i + 1 if(is.na(max(abs(Q.starQ.temp)) < tol)) stop("No convergence.") if(norm.type == "o") {if(norm(Q.starQ.temp,type = "o") < tol){ break }} if(norm.type == "I") {if(norm(Q.starQ.temp,type = "I") < tol){ break }}53 if(norm.type == "F") {if(norm(Q.starQ.temp,type = "F") < tol){ break }} if(norm.type == "2") {if(norm(Q.starQ.temp,type = "2") < tol){ break }} if(norm.type == "M") {if(norm(Q.starQ.temp,type = "M") < tol){ break }} } print("Drazin inverse: ",quote=FALSE) print(Q.star) return(list(Q.star=Q.star,num.iterations=i)) } it.veco < numeric(0) > A it.vecI < numeric(0) it.vecF < numeric(0) it.vec2 < numeric(0) it.vecM < numeric(0) lowbd < .001 upbd < 1.1 numofalpha < 1000 seqstep < numofalpha  1 A < seq(lowbd, upbd, by = ((upbdlowbd)/seqstep)) Am < matrix(A, 1, numofalpha, byrow = TRUE) for(i in 1:numofalpha){ it.veco[i] < find.drazin2(P, 0.01, Am[1, i], "o")[[2]] it.vecI[i] < find.drazin2(P, 0.01, Am[1, i], "I")[[2]] it.vecF[i] < find.drazin2(P, 0.01, Am[1, i], "F")[[2]]54 it.vec2[i] < find.drazin2(P, 0.01, Am[1, i], "2")[[2]] it.vecM[i] < find.drazin2(P, 0.01, Am[1, i], "M")[[2]] } plot(A,it.veco, main = "Drazin 2 With Norm Comparisons", type = "l", col = "black", xlab = "Alpha", ylab = "Number of Iterations", xlim = c(lowbd, upbd), ylim = c(min(it.veco)1, max(it.veco) +1)) lines(A, it.vecI, col = "red") lines(A, it.vecF, col = "blue") lines(A, it.vec2, col = "green") lines(A, it.vecM, col = "purple") par(xpd=FALSE) legend(.45, 7,c("1norm", "supnorm","Fnorm", "2norm", "Mnorm"), fill = c("black", "red","blue", "green", "purple"), lty = 1)55 APPENDIX C R CODE FOR DRAZIN 356 R CODE FOR DRAZIN 3 This is the code to run Drazin 2, a probability transition matrix, P, is required to run this program. Alpha is not required. Other parameters such as tolerance, norms, plots, and legends can be adjusted to satisfy needs. find.drazin3 < function(P,tol, norm.type){ stopifnot(norm.type %in% c("o", "I", "F", "2", "M")) if(nrow(P) != ncol(P)) stop("Error: transition matrix is not square.") ones < diag(diag(nrow(P))) for(i in 1:nrow(P)) I < diag(nrow(P)) Q < I  P W < Q Q.star <W i < 0 while(TRUE){ Q.temp < Q.star Q.star < Q.star + (W/(i+2))%*%(I  Q%*%Q.star) i < i + 1 if(is.na(max(abs(Q.starQ.temp)) < tol)) stop("No convergence.") if(norm.type == "o") {if(norm(Q.starQ.temp,type = "o") < tol){ break }} if(norm.type == "I") {if(norm(Q.starQ.temp,type = "I") < tol){ break }} if(norm.type == "F") {if(norm(Q.starQ.temp,type = "F") < tol){ break }}57 if(norm.type == "2") {if(norm(Q.starQ.temp,type = "2") < tol){ break }} if(norm.type == "M") {if(norm(Q.starQ.temp,type = "M") < tol){ break }} } return(list(Q.star=Q.star,num.iterations=i)) } it.veco < numeric(0) it.vecI < numeric(0) it.vecF < numeric(0) it.vec2 < numeric(0) it.vecM < numeric(0) it.veco[i] < find.drazin3(P, 0.01, "o")[[2]] it.vecI[i] < find.drazin3(P, 0.01, "I")[[2]] it.vecF[i] < find.drazin3(P, 0.01, "F")[[2]] it.vec2[i] < find.drazin3(P, 0.01, "2")[[2]] it.vecM[i] < find.drazin3(P, 0.01, "M")[[2]] plot(it.veco, main = "Drazin 3 With Norm Comparisons", col = "black", xlab = "Alpha", ylab = "Number of Iterations", xlim = c(0,1000), ylim = c(0, 1000)) points(it.vecI, type = "p") points(it.vecF,type = "p") points(it.vec2,type = "p") points(it.vecM,type = "p")58 APPENDIX D R CODE FOR DRAZIN 459 R CODE FOR DRAZIN 4 This is the code to run Drazin 4, a probability transition matrix, P, is required to run this program. Alpha is not required. Other parameters such as tolerance, norms, plots, and legends can be adjusted to satisfy needs. find.drazin4 < function(P,tol, norm.type){ stopifnot(norm.type %in% c("o", "I", "F", "2", "M")) if(nrow(P) != ncol(P)) stop("Error: transition matrix is not square.") ones < diag(diag(nrow(P))) for(i in 1:nrow(P)) I < diag(nrow(P)) Q < I  P W < Q Q.star < (2*I  W%*%Q)%*%W i < 0 while(TRUE){ Q.temp < Q.star Q.star < Q.star + (1/(i+2))*(2*I  (1/(i+2))*W%*%Q)%*%W%*%(I  Q%*%Q.star) i < i + 1 if(is.na(max(abs(Q.starQ.temp)) < tol)) stop("No convergence.") if(norm.type == "o") {if(norm(Q.starQ.temp,type = "o") < tol){ break }} if(norm.type == "I") {if(norm(Q.starQ.temp,type = "I") < tol){ break }} if(norm.type == "F") {if(norm(Q.starQ.temp,type = "F") < tol){ break }}60 if(norm.type == "2") {if(norm(Q.starQ.temp,type = "2") < tol){ break }} if(norm.type == "M") {if(norm(Q.starQ.temp,type = "M") < tol){ break }} } #print(c(alpha)) #print(c(i)) #print("Drazin inverse: ",quote=FALSE) #print(Q.star) return(list(Q.star=Q.star,num.iterations=i)) } it.veco < numeric(0) it.vecI < numeric(0) it.vecF < numeric(0) it.vec2 < numeric(0) it.vecM < numeric(0) it.veco[i] < find.drazin4(P, 0.01, "o")[[2]] it.vecI[i] < find.drazin4(P, 0.01, "I")[[2]] it.vecF[i] < find.drazin4(P, 0.01, "F")[[2]] it.vec2[i] < find.drazin4(P, 0.01, "2")[[2]] it.vecM[i] < find.drazin4(P, 0.01, "M")[[2]] plot(c(it.veco, it.vec2, it.vecI, it.vecF, it.vecM), main = "Drazin 4 Comparing all Norms", xlab = "Different Norms", ylab = "number of iterations", xlim = c(1,5), ylim = c(1,30))61 APPENDIX E R CODE FOR DRAZIN 562 R CODE FOR DRAZIN 1 This is the code to run Drazin 5, a probability transition matrix, P, is required to run this program. A sequence of alpha are generated by manually inputting an upperbound(upbd), lower bound(lowbd), and number alpha(numofalpha). Other parameters such as tolerance, norms, plots, and legends can be adjusted to satisfy needs. find.drazin5 < function(P,tol,alpha,norm.type,p){ stopifnot(norm.type %in% c("o", "I", "F", "E", "M")) if(nrow(P) != ncol(P)) stop("Error: transition matrix is not square.") ones < diag(diag(nrow(P))) for(i in 1:nrow(P)) I < diag(nrow(P)) Q < I  P >W library(expm) Q.star < alpha*W i < 0 while(TRUE){ Q.temp < Q.star a < 063 for(k in 0:(p1)){ a < a + (IQ%*%Q.temp)%^%k } Q.star < Q.temp%*%a i < i + 1 if(is.na(max(abs(Q.starQ.temp)) < tol)) stop("No convergence.") if(norm.type == "o") {if(norm(Q.starQ.temp,type = "o") < tol){ break }} if(norm.type == "I") {if(norm(Q.starQ.temp,type = "I") < tol){ break }} if(norm.type == "F") {if(norm(Q.starQ.temp,type = "F") < tol){ break }} if(norm.type == "E") {if(norm(Q.starQ.temp,type = "E") < tol){ break }} if(norm.type == "M") {if(norm(Q.starQ.temp,type = "M") < tol){ break }} } print("Drazin inverse: ",quote=FALSE) print(Q.star) return(list(Q.star=Q.star,num.iterations=i)) } it.veco < numeric(0) > A it.vecI < numeric(0) it.vecF < numeric(0) it.vecE < numeric(0) it.vecM < numeric(0) lowbd < .0001 upbd < 1.1 numofalpha < 1000 seqstep < numofalpha  164 p < 1000 A < seq(lowbd, upbd, by = ((upbdlowbd)/seqstep)) Am < matrix(A, 1, numofalpha, byrow = TRUE) for(i in 1:numofalpha){ it.veco[i] < find.drazin5(P, 0.01, Am[1, i], "o", p)[[2]] it.vecI[i] < find.drazin5(P, 0.01, Am[1, i], "I", p)[[2]] it.vecF[i] < find.drazin5(P, 0.01,Am[1, i], "F", p)[[2]] it.vecE[i] < find.drazin5(P, 0.01, Am[1, i], "E", p)[[2]] it.vecM[i] < find.drazin5(P, 0.01, Am[1, i],"M",p)[[2]] } #par(mfrow = c(3,2)) plot(A,it.veco, main = "Drazin 5, p = 50,Norm Comparisons", type = "l", xlab = "Alpha", ylab = "Number of Iterations", xlim = c(lowbd, upbd), ylim = c(0, max(it.veco))) lines(A, it.vecI, col = "red") lines(A, it.vecF, col = "blue") lines(A, it.vecE, col = "green") lines(A, it.vecM, col = "purple")65 VITA Derek Fort Received his Bachelor of Science degree in Mathematics with a minor in Interdisciplinary Studies from Texas A&M UniversityCommerce in 2013. He immediately pursued his Master of Science in Mathematics from Texas A&M UniversityCommerce and will graduate in December 2014. Research interests include but not limited to statistics and number theory. Currently he is a mechanical engineer and intends to further his education. Texas A&M UniversityCommerce, Binnion Hall, P.O. Box 3011, Commerce, TX, 754293011 derek.fort@leomail.tamuc.edu. 
Date  2014 
Faculty Advisor  Boucher, Thomas 
Committee Members 
Dorsett, Charles Webster, Pamela 
University Affiliation  Texas A&M UniversityCommerce 
Department  MAMathematics 
Degree Awarded  M.S. 
Pages  73 
Type  Text 
Format  
Language  eng 
Rights  All rights reserved. 
File name  (2)Fort_tamucommerce_1287N_10396.pdf 



A 

B 

C 

D 

E 

F 

G 

H 

J 

L 

M 

N 

O 

P 

Q 

R 

S 

T 

V 

W 


