MIT 6.172 Performance Engineering Fall 2018, coursework

course link
playlist with lectures on youtube

Instructors: Prof. Charles Leiserson, Prof. Julian Shun

Status

In progress

Lectures (23/23)
Assignments (3/10)
Projects (0/4)

Lectures

1. Introduction and matrix multiplication

cache-efficiency
compiler flags
parallel algorithms of matrix multiplication
vectorization

2. Bentley Rules for optimizing work

data structures
logic
loops
functions

3. Bit Hacks

Bit operations and their applications

4. Assembly Language & Computer Architecture

Assembly language overview
Floating-point and vector hardware
Overview of computer architecture
Superscalar processing
Out-of-order execution
Branch prediction

5. C to Assembly

LLVM IR Primer
C to LLVM IR
LLVM IR to Assembly

6. Multicore Programming

Shared-memory hardware
Concurrency platforms

7. Races and Parallelism

Race conditions
Cilksan
Work and span analysis
Cilkscale
Scheduling

8. Analysis of Multithreaded Algorithms

Parallelization analysis
Cilk loop parallelism

9. What Compilers Can and Cannot Do

Compiler reports
Optimizing a scalar
Optimizing a structure
Optimizing function calls
Optimizing loops

10. Measurement and Timing

Timing variability
Ways to measure time
Interpretation of measurements

11. Storage Allocation

Stack and heap allocation
Garbage collection

12. Parallel Storage Allocation

Virtual memory allocation
Cactus stack
Parallel allocation strategies

13. The Cilk Runtime System

Internals of Cilk runtime system

14. Caching and Cache-Efficient Algorithms

Cache associativity
Cache-aware algorithms
Cache-oblivious algorithms

15. Cache-Oblivious Algorithms

Cache-oblivious stencil computations
Cache-oblivious sorting

16. Nondeterministic Parallel Programming

Nondeterministic parallel programming
Atomicity
Data races
Deadlock
Transactional memory

17. Synchronization Without Locks

Sequential consistency
Mutual exclusion without locks
Relaxed memory consistency
Instruction reordering
Compare-and-swap

18. Domain Specific Languages and Autotuning

GraphIt
Halide
OpenTuner

19. Leiserchess Codewalk

Project 4 overview

20. Speculative Parallelism & Leiserchess

Speculative parallelism
More project 4 insights

21. Tuning a TSP Algorithm

Review of recursive generation
The traveling salesperson problem
A sequence of TSP algorithms
Principles of algorithm engineering

22. Graph Optimization

Graph representations
Breadth-first search
Graph compression/reordering

23. High Performance in Dynamic Languages

Julia overview

Assignments

1. Basic Tools, C Primer

valgrind
asan
llvm-cov

2. Profiling

perf
cachegrind
inlining analysis
pointers vs arrays
efficient memory usage

3. Vectorization

vectorization in clang
sse and avx2 comparison
performance measure
vectorization cost

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
hw1_Basic_tools		hw1_Basic_tools
hw2_Profiling		hw2_Profiling
hw3_Vectorization		hw3_Vectorization
lec01_Introduction_and_matrix_multiplication		lec01_Introduction_and_matrix_multiplication
lec02_Bentley_rules_for_optimizing_work		lec02_Bentley_rules_for_optimizing_work
lec03_Bit_hacks		lec03_Bit_hacks
lec04_Assembly_language_and_computer_architecture		lec04_Assembly_language_and_computer_architecture
lec05_C_to_assembly		lec05_C_to_assembly
lec06_Multicore_programming		lec06_Multicore_programming
lec07_Races_and_parallelism		lec07_Races_and_parallelism
lec08_Analysis_of_multithreaded_algorithms		lec08_Analysis_of_multithreaded_algorithms
lec09_What_compilers_can_and_cannot_do		lec09_What_compilers_can_and_cannot_do
lec10_Measurement_and_timing		lec10_Measurement_and_timing
lec11_Storage_allocation		lec11_Storage_allocation
lec12_Parallel_storage_allocation		lec12_Parallel_storage_allocation
lec13_The_Cilk_runtime_system		lec13_The_Cilk_runtime_system
lec14_Caching_and_cache_efficient_algorithms		lec14_Caching_and_cache_efficient_algorithms
lec15_Cache_oblivious_algorithms		lec15_Cache_oblivious_algorithms
lec16_Nondeterministic_parallel_programming		lec16_Nondeterministic_parallel_programming
lec17_Synchronization_without_locks		lec17_Synchronization_without_locks
lec18_Domain_specific_languages_and_autotuning		lec18_Domain_specific_languages_and_autotuning
lec19_Leiserchess_codewalk		lec19_Leiserchess_codewalk
lec20_Speculative_parallelism_n_leiserchess		lec20_Speculative_parallelism_n_leiserchess
lec21_Tuning_a_TSP_algorithm		lec21_Tuning_a_TSP_algorithm
lec22_Graph_optimization		lec22_Graph_optimization
lec23_High_performance_in_dynamic_languages		lec23_High_performance_in_dynamic_languages
.clang-format		.clang-format
.gitignore		.gitignore
README.md		README.md
apply_format.sh		apply_format.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MIT 6.172 Performance Engineering Fall 2018, coursework

Status

Lectures

1. Introduction and matrix multiplication

2. Bentley Rules for optimizing work

3. Bit Hacks

4. Assembly Language & Computer Architecture

5. C to Assembly

6. Multicore Programming

7. Races and Parallelism

8. Analysis of Multithreaded Algorithms

9. What Compilers Can and Cannot Do

10. Measurement and Timing

11. Storage Allocation

12. Parallel Storage Allocation

13. The Cilk Runtime System

14. Caching and Cache-Efficient Algorithms

15. Cache-Oblivious Algorithms

16. Nondeterministic Parallel Programming

17. Synchronization Without Locks

18. Domain Specific Languages and Autotuning

19. Leiserchess Codewalk

20. Speculative Parallelism & Leiserchess

21. Tuning a TSP Algorithm

22. Graph Optimization

23. High Performance in Dynamic Languages

Assignments

1. Basic Tools, C Primer

2. Profiling

3. Vectorization

About

Releases

Packages

Languages

sovadim/MIT_6.172_Performance_engineering_coursework

Folders and files

Latest commit

History

Repository files navigation

MIT 6.172 Performance Engineering Fall 2018, coursework

Status

Lectures

1. Introduction and matrix multiplication

2. Bentley Rules for optimizing work

3. Bit Hacks

4. Assembly Language & Computer Architecture

5. C to Assembly

6. Multicore Programming

7. Races and Parallelism

8. Analysis of Multithreaded Algorithms

9. What Compilers Can and Cannot Do

10. Measurement and Timing

11. Storage Allocation

12. Parallel Storage Allocation

13. The Cilk Runtime System

14. Caching and Cache-Efficient Algorithms

15. Cache-Oblivious Algorithms

16. Nondeterministic Parallel Programming

17. Synchronization Without Locks

18. Domain Specific Languages and Autotuning

19. Leiserchess Codewalk

20. Speculative Parallelism & Leiserchess

21. Tuning a TSP Algorithm

22. Graph Optimization

23. High Performance in Dynamic Languages

Assignments

1. Basic Tools, C Primer

2. Profiling

3. Vectorization

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages