What are some good resources on word2vec

Word embedding

When a child is born it takes time to develop the ability to speak and understand. Children only learn the language that people around them will speak. Humans can learn languages ​​quickly on their own, but computers cannot. For example, you can easily understand the difference between a cat and a dog, a man and a woman, and so on.
This happens because our neural networks are different from the artificial neural networks that machines have. Computers learn languages ​​differently from humans. They use word embedding techniques to understand people's language.

What is word embedding?

The simple definition of word embedding is converting text into numbers. In order for the computer to understand the language, we convert the text into vector form so that computers can develop connections between vectors to words and understand what we are saying. With word embedding, we solve problems related to natural language processing.

Understand NLP

Natural language processing helps machines understand and develop the ability to write, read, and hear what we are saying. Google, DuckDuckGo and many other browsers use NLP to break down the language barriers between humans and machines. In addition, Microsoft Word and Google Translate are NLP applications.

Word embedding algorithms

Word embedding is a vector representation and requires machine learning techniques and algorithms. These algorithms use artificial neural networks and data to create the connections between different words. For example, when a model learns the words “king” and “queen”, the vectors are related to each other. This helps the machine to distinguish the two words and yet relate them to each other. Here are three common algorithms that you can use for word embedding in machine learning.

1. Wort2Vec

Word2Vec is the most popular word embedding algorithm. This algorithm uses neural networks to learn embedding more efficiently. This algorithm actually consists of a number of algorithms. You can use these algorithms for NLP tasks. Word2Vec only uses a hidden layer and connects it to the neural network. All linear neurons are the hidden layers in the neurons. To train the model, the input layer contains the number of neurons that corresponds to the number of words in the vocabulary. The size of the output and input layers remains the same. However, the size of the hidden layer is determined according to the vectors of the dimensions of the result words. You can do word embedding with Word2Vec using two methods. Both methods require artificial neural networks. These methods are:

- CBOW or Common Bag of Words

In this method, each word is an input and the neural network predicts the word that relates to the context. For example: “I'm going home by bus”. In this example we enter the word “bus” into the neural network with the context “I am going home on a bus”. Then the machine creates a vector that connects “drive home” to the bus, which is represented as the source of the journey.

- Skim grams

Skim Gram uses the same trick that an ordinary word sack or any other machine learning algorithm does. Since we have unnamed words, word embedding is essentially semi-supervised learning. In this method, the algorithm takes neighboring words and labels them accordingly.

2. GloVe

The Global Vectors for Word Representation or GloVe algorithm is very similar to Word2Vec. However, the method is a little different from Word2Vec. GloVe only takes into account the context information based on 1-1. This means that GloVe only creates a word-related matrix that contains the probability P (a | b) that the k-word is considered around the word b.
The main purpose of this technique is to find the representation of the two vectors such that the logarithmic probability of their point products equals the coincidence. They provide excellent results for relating the words to one another in context.

3. Embedding layer

This is the first hidden layer of the artificial neural network. This layer should specify three augments.
Input dim
This represents the vocabulary size in the text data. For example, if you have integer encoded data with values ​​from 0 to 10, then the vocabulary size would be 11.
Output dim
They represent the vector space size. The vector space would be that of the embedded words. This can be 32, 100 or larger.
Input length
This represents the length of the input sequences. For example, if the words in your input documents are up to 1000, then this value would also be 1000.


Word embedding is essential to machine learning as it helps the computer understand your language. It contains different algorithms that process words differently, but the main focus is on supporting language machine learning. Computers cannot understand what we are asking. Instead, computers are coded for each word with a vector representation that relates to other words depending on the context.