Skip to content

sovadim/MIT_6.172_Performance_engineering_coursework

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

48 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MIT 6.172 Performance Engineering Fall 2018, coursework

course link
playlist with lectures on youtube

Instructors: Prof. Charles Leiserson, Prof. Julian Shun

Status

In progress

Lectures

1. Introduction and matrix multiplication

  • cache-efficiency
  • compiler flags
  • parallel algorithms of matrix multiplication
  • vectorization

2. Bentley Rules for optimizing work

  • data structures
  • logic
  • loops
  • functions

3. Bit Hacks

  • Bit operations and their applications

4. Assembly Language & Computer Architecture

  • Assembly language overview
  • Floating-point and vector hardware
  • Overview of computer architecture
  • Superscalar processing
  • Out-of-order execution
  • Branch prediction

5. C to Assembly

  • LLVM IR Primer
  • C to LLVM IR
  • LLVM IR to Assembly

6. Multicore Programming

  • Shared-memory hardware
  • Concurrency platforms

7. Races and Parallelism

  • Race conditions
  • Cilksan
  • Work and span analysis
  • Cilkscale
  • Scheduling

8. Analysis of Multithreaded Algorithms

  • Parallelization analysis
  • Cilk loop parallelism

9. What Compilers Can and Cannot Do

  • Compiler reports
  • Optimizing a scalar
  • Optimizing a structure
  • Optimizing function calls
  • Optimizing loops

10. Measurement and Timing

  • Timing variability
  • Ways to measure time
  • Interpretation of measurements

11. Storage Allocation

  • Stack and heap allocation
  • Garbage collection

12. Parallel Storage Allocation

  • Virtual memory allocation
  • Cactus stack
  • Parallel allocation strategies

13. The Cilk Runtime System

  • Internals of Cilk runtime system

14. Caching and Cache-Efficient Algorithms

  • Cache associativity
  • Cache-aware algorithms
  • Cache-oblivious algorithms

15. Cache-Oblivious Algorithms

  • Cache-oblivious stencil computations
  • Cache-oblivious sorting

16. Nondeterministic Parallel Programming

  • Nondeterministic parallel programming
  • Atomicity
  • Data races
  • Deadlock
  • Transactional memory

17. Synchronization Without Locks

  • Sequential consistency
  • Mutual exclusion without locks
  • Relaxed memory consistency
  • Instruction reordering
  • Compare-and-swap

18. Domain Specific Languages and Autotuning

  • GraphIt
  • Halide
  • OpenTuner

19. Leiserchess Codewalk

  • Project 4 overview

20. Speculative Parallelism & Leiserchess

  • Speculative parallelism
  • More project 4 insights

21. Tuning a TSP Algorithm

  • Review of recursive generation
  • The traveling salesperson problem
  • A sequence of TSP algorithms
  • Principles of algorithm engineering

22. Graph Optimization

  • Graph representations
  • Breadth-first search
  • Graph compression/reordering

23. High Performance in Dynamic Languages

  • Julia overview

Assignments

1. Basic Tools, C Primer

  • valgrind
  • asan
  • llvm-cov

2. Profiling

  • perf
  • cachegrind
  • inlining analysis
  • pointers vs arrays
  • efficient memory usage

3. Vectorization

  • vectorization in clang
  • sse and avx2 comparison
  • performance measure
  • vectorization cost

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published