Lecture Notes | Information Theory | Electrical Engineering and Computer Science

The following lecture notes were written for 6.441 by Professors Yury Polyanskiy of MIT and Yihong Wu of University of Illinois Urbana-Champaign. A complete copy of the notes are This resource may not render correctly in a screen reader. available for download (PDF - 7.6MB).

Lecture note files.
CHAPTERS	SECTIONS
Part I: Information Measures
Chapter 1: Information measures: Entropy and divergence (PDF)	1.1 Entropy 1.2 Divergence 1.3 Differential entropy
Chapter 2: Information measures: Mutual information (PDF)	2.1 Divergence: Main inequality 2.2 Conditional divergence 2.3 Mutual information 2.4 Conditional mutual information and conditional independence 2.5 Strong data-processing inequalities 2.6 How to avoid measurability problems?
Chapter 3: Sufficient statistic. Continuity of divergence and mutual information. (PDF)	3.1 Sufficient statistics and data-processing 3.2 Geometric interpretation of mutual information 3.3 Variational characterizations of divergence: Donsker-Varadhan 3.4 Variational characterizations of divergence: Gelfand-Yaglom-Perez 3.5 Continuity of divergence. Dependence on sigma-algebra 3.6 Variational characterizations and continuity of mutual information
Chapter 4: Extremization of mutual information: Capacity saddle point (PDF)	4.1 Convexity of information measures 4.2 Local behavior of divergence 4.3 Local behavior of divergence and Fisher information 4.4 Extremization of mutual information 4.5 Capacity = information radius 4.6 Existence of caod (general case) 4.7 Gaussian saddle point
Chapter 5: Single-letterization. Probability of error. Entropy rate. (PDF)	5.1 Extremization of mutual information for memoryless sources and channels 5.2 Gaussian capacity via orthogonal symmetry 5.3 Information measures and probability of error 5.4 Fano, LeCam and minimax risks 5.5 Entropy rate 5.6 Entropy and symbol (bit) error rate 5.7 Mutual information rate 5.8 Toeplitz matrices and Szego's theorem
Part II: Lossless Data Compression
Chapter 6: Variable-length Lossless Compression (PDF - 1.1MB)	6.1 Variable-length, lossless, optimal compressor 6.2 Uniquely decodable codes, prefix codes and Huffman codes
Chapter 7: Fixed-length (almost lossless) compression. Slepian-Wolf problem. (PDF)	7.1 Fixed-length code, almost lossless 7.2 Linear Compression 7.3 Compression with Side Information at both compressor and decompressor 7.4 Slepian-Wolf (Compression with Side Information at Decompressor only) 7.5 Multi-terminal Slepian Wolf 7.6 Source-coding with a helper (Ahlswede-Korner-Wyner)
Chapter 8: Compressing stationary ergodic sources (PDF)	8.1 Bits of ergodic theory 8.2 Proof of Shannon-McMillan 8.3 Proof of Birkhoff -Khintchine 8.4 Sinai's generator theorem
Chapter 9: Universal compression (PDF)	9.1 Arithmetic coding 9.2 Combinatorial construction of Fitingof 9.3 Optimal compressors for a class of sources. Redundancy 9.4 Approximate minimax solution: Je_reys prior 9.5 Sequential probability assignment: Krichevsky-Trofimov 9.6 Lempel-Ziv compressor
Part III: Binary Hypothesis Testing
Chapter 10: Binary hypothesis testing (PDF)	10.1 Binary Hypothesis Testing 10.2 Neyman-Pearson formulation 10.3 Likelihood ratio tests 10.4 Converse bounds on R(P, Q) 10.5 Achievability bounds on R(P,Q) 10.6 Asymptotics
Chapter 11: Hypothesis testing asymptotics I (PDF)	11.1 Stein's regime 11.2 Chernoff regime 11.3 Basics of Large deviation theory
Chapter 12: Information projection and Large deviation (PDF)	12.1 Large-deviation exponents 12.2 Information Projection 12.3 Interpretation of Information Projection 12.4 Generalization: Sanov's theorem
Chapter 13: Hypothesis testing asymptotics II (PDF - 2.0MB)	13.1 (E0,E1)-Tradeoff 13.2 Equivalent forms of Theorem 13.1 13.3 Sequential Hypothesis Testing
Part IV: Channel Coding
Chapter 14: Channel coding (PDF)	14.1 Channel Coding 14.2 Basic Results 14.3 General (Weak) Converse Bounds 14.4 General achievability bounds: Preview
Chapter 15: Channel coding: Achievability bounds (PDF)	15.1 Information density 15.2 Shannon's achievability bound 15.3 Dependence-testing bound 15.4 Feinstein's Lemma
Chapter 16: Linear codes. Channel capacity. (PDF)	16.1 Linear coding 16.2 Channels and channel capacity 16.3 Bounds on C_e; Capacity of Stationary Memoryless Channels 16.4 Examples of DMC 16.5 Information Stability
Chapter 17: Channels with input constraints. Gaussian channels. (PDF)	17.1 Channel coding with input constraints 17.2 Capacity under input constraint C(P) ?= Ci(P) 17.3 Applications 17.4 Non-stationary AWGN 17.5 Stationary Additive Colored Gaussian noise channel 17.6 Additive White Gaussian Noise channel with Intersymbol Interference 17.7 Gaussian channels with amplitude constraints 17.8 Gaussian channels with fading
Chapter 18: Lattice codes (by O. Ordentlich) (PDF)	18.1 Lattice Definitions 18.2 First Attempt at AWGN Capacity 18.3 Nested Lattice Codes/Voronoi Constellations 18.4 Dirty Paper Coding 18.5 Construction of Good Nested Lattice Pairs
Chapter 19: Channel coding: Energy-per-bit, continuous-time channels (PDF - 1.1MB)	19.1 Energy per bit 19.2 What is N0? 19.3 Capacity of the continuous-time band-limited AWGN channel 19.4 Capacity of the continuous-time band-unlimited AWGN channel 19.5 Capacity per unit cost
Chapter 20: Advanced channel coding. Source-Channel separation. (PDF)	20.1 Strong Converse 20.2 Stationary memoryless channel without strong converse 20.3 Channel Dispersion 20.4 Normalized Rate 20.5 Joint Source Channel Coding
Chapter 21: Channel coding with feedback (PDF - 1.2MB)	21.1 Feedback does not increase capacity for stationary memoryless channels 21.2 Alternative proof of Theorem 21.1 and Massey's directed information 21.3 When is feedback really useful?
Chapter 22: Capacity-achieving codes via Forney concatenation (PDF)	22.1 Error exponents 22.2 Achieving polynomially small error probability 22.3 Concatenated codes 22.4 Achieving exponentially small error probability
Part V: Lossy Data Compression
Chapter 23: Rate-distortion theory (PDF)	23.1 Scalar quantization 23.2 Information-theoretic vector quantization 23.3 Converting excess distortion to average
Chapter 24: Rate distortion: Achievability bounds (PDF)	24.1 Recap 24.2 Shannon's rate-distortion theorem 24.3 Covering lemma
Chapter 25: Evaluating R(D). Lossy Source-Channel separation. (PDF)	25.1 Evaluation of R(D) 25.2 Analog of saddle-point property in rate-distortion 25.3 Lossy joint source-channel coding 25.4 What is lacking in classical lossy compression?
Part VI: Advanced Topics
Chapter 26: Multiple-access channel (PDF)	26.1 Problem motivation and main results 26.2 MAC achievability bound 26.3 MAC capacity region proof
Chapter 27: Examples of MACs. Maximal Pe and zero-error capacity. (PDF)	27.1 Recap 27.2 Orthogonal MAC 27.3 BSC MAC 27.4 Adder MAC 27.5 Multiplier MAC 27.6 Contraction MAC 27.7 Gaussian MAC 27.8 MAC Peculiarities
Chapter 28: Random number generators (PDF)	28.1 Setup 28.2 Converse 28.3 Elias' construction of RNG from lossless compressors 28.4 Peres' iterated von Neumann's scheme 28.5 Bernoulli factory 28.6 Related problems