Skip to content

vdivakar/Papers-Stack

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 

Repository files navigation

Papers-Stack

Keeping a track of research papers I've read.
... Guilty of not pushing the recently read on the Top!

Articles-Collection
Sheet



Keys -> || ✅ : Done reading || 📖 : In progress || 🚫 : Dropped ||
Paper Name Notes Link Year
1 WaveNet: A Generative Model for Raw Audio notes arxiv 2016
Causal conv. layers with dilation. Autoregression model. Sequential inference
2 Fast Wavenet Generation Algorithm notes arxiv 2016
WaveNet improvement. O(2 L) -> O(L). Still sequential though. 
Use queues to push & pop already computed states at each layer
3 Parallel WaveNet: Fast High-Fidelity Speech Synthesis arxiv 2017
⬇️ Probability Density Distillation - Teacher + student based architecture. Marries efficient training of Wavenet with efficient IAF for sampling. Sampling is parallel here for realtime synthesis.
✔️medium - An Explanation of Discretized Logistic Mixture Likelihood  
✔️ vimeo - Parallel WaveNet
4 📕 Improved Variational Inference with Inverse Autoregressive Flow arxiv 2016
⬇️ ⭐ ✔️ Introduction to Normalizing Flows (ECCV2020 Tutorial) video
5 Deep Unsupervised Learning UC Berkeley lectures course
✔️ L1 - Introduction -> Types: 1. Generative models 2. Self-supervised models 01:10:00
✔️ L2 - Autoregressive Models -> histogram. parameterized distribution. 1.)RNN based 2.)Masking based. 2.1)MADE 2.2)Masked ConvNets 02:27:23
✔️ L3 - Flow Models -> Model output != p_theta(x); instead z=f_theta(x). z comes from a prob dist. Sampling is inverse of f_inverse_theta(x). 
-> Autoregressive Flows:- Fast training; Slow sampling 
-> Inverse Autoregressive Flow:- Slow training; Fast Sampling
01:56:53
6 ClariNet: Parallel Wave Generation in End-to-End Text-to-Speech arxiv 2018
- - -------------------- --- --- --
7 Deep Photo Enhancer: unpaired learning for Image Enhancement using GANS CPVR arxiv 2018
Cycle gan extension; individual BN for x->y' & x'->y''; adaptive weighting for WGAN
8 AN IMAGE IS WORTH 16X16 WORDS: TRANSFORMERS FOR IMAGE RECOGNITION AT SCALE Google Brain arxiv ICRL, 2021
Vision Transformer (ViT) - sequence of img patches to Transformer. Less computation than ResNets. Training on large data trumps inductive bias in CNNs and outperforms.