Roadmap to Learn Transformers in less then 3 minutes
When I started my journey into machine learning, I remember watching an interview with Andrej Karpathy, a Sr. Director of AI at Tesla, on the Lex Fridman podcast, where he talked about something called a “transformer,” and he described it so profoundly as a new way of computing. In combination, I had learned that ChatGPT, BERT were all powered by transformers and that in order to become a machine learning engineer, you NEEDED to know transformers.
But upon looking online for a simple roadmap of the topic I needed to cover, I found that most blog posts focus on the individual components you need to know inside a transformer architecture rather than how to get to the point to potentially understand a transformer architecture.
Definition of Transformer: Machine learning architectures that allow asking multiple questions about an input and receiving answers out of questions from surrounding inputs, which helps us compute an output!
So lets beging
1 . Some basic understanding on Machine Learning
Definition of Machine Learning: process of exploring data using mathematics to create scalable and reusable algorithms in order to make decisions/improve society. Pretty much give an algorithm some data with output or no outputs and hope it can solve new data using this given data.
Machine Learning is broken down into two parts: Supervised and Unsupervised learning (kinda there are a few subsets of machine learning, but we only need to focus on these)
Supervised: given we feed the algorithm with pairs of inputs and corresponding outputs. The goal of supervised learning is to feed new inputs and try to figure out an output.
Unsupervised learning: where you feed the algorithm inputs and outputs, but your goal isn’t to predict new outputs but rather identifying patterns, visualizations, etc.
I recommend just going through “The Hundred-Page Machine Learning Book” by Andriy Burkov, getting a basic grip of the concepts.
So a subset of machine learning techniques is deep learning. A part of deep learning is the transformer architecture.
Deep learing
Now the main concept you should learn is:
- Feedfoward Neural Networks -> Focus!
- CNN - the basics
- RNN - learn GRU’s and LSTM’s
- Attention Architecture
- Transformers Architecture
You should complete these processes in order and ensure you have a strong understanding of feedforward neural networks. I recommend these two resources:
- (Neural Networks and Deep Learning)[http://neuralnetworksanddeeplearning.com]
- (3blue2brown)[https://www.youtube.com/watch?v=aircAruvnKk&list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi]