course link
playlist with lectures on youtube
Instructors: Prof. Charles Leiserson, Prof. Julian Shun
In progress
- Lectures (23/23)
- Assignments (3/10)
- Projects (0/4)
- cache-efficiency
- compiler flags
- parallel algorithms of matrix multiplication
- vectorization
- data structures
- logic
- loops
- functions
- Bit operations and their applications
- Assembly language overview
- Floating-point and vector hardware
- Overview of computer architecture
- Superscalar processing
- Out-of-order execution
- Branch prediction
- LLVM IR Primer
- C to LLVM IR
- LLVM IR to Assembly
- Shared-memory hardware
- Concurrency platforms
- Race conditions
- Cilksan
- Work and span analysis
- Cilkscale
- Scheduling
- Parallelization analysis
- Cilk loop parallelism
- Compiler reports
- Optimizing a scalar
- Optimizing a structure
- Optimizing function calls
- Optimizing loops
- Timing variability
- Ways to measure time
- Interpretation of measurements
- Stack and heap allocation
- Garbage collection
- Virtual memory allocation
- Cactus stack
- Parallel allocation strategies
- Internals of Cilk runtime system
- Cache associativity
- Cache-aware algorithms
- Cache-oblivious algorithms
- Cache-oblivious stencil computations
- Cache-oblivious sorting
- Nondeterministic parallel programming
- Atomicity
- Data races
- Deadlock
- Transactional memory
- Sequential consistency
- Mutual exclusion without locks
- Relaxed memory consistency
- Instruction reordering
- Compare-and-swap
- GraphIt
- Halide
- OpenTuner
- Project 4 overview
- Speculative parallelism
- More project 4 insights
- Review of recursive generation
- The traveling salesperson problem
- A sequence of TSP algorithms
- Principles of algorithm engineering
- Graph representations
- Breadth-first search
- Graph compression/reordering
- Julia overview
- valgrind
- asan
- llvm-cov
- perf
- cachegrind
- inlining analysis
- pointers vs arrays
- efficient memory usage
- vectorization in clang
- sse and avx2 comparison
- performance measure
- vectorization cost