You might have visited any old library where the document retrieval system was based on catalog system and have seen how the librarians used to search a book for us but the advent of the world wide web has changed everything and everyone these days is a kind of document retriever and everyone knows about document search and its limitations.
Before going deeper into the NLP, we need to know:
Predicting future is difficult but how about predicting the next few words, immediately after the current word, for example:
Your program doesn’t _______? (work or allow or run … )
I like to eat Chinese _________? (food or cuisine or dumplings …)
She is my girl _______? (friend)
Please turn of your cell ________? (phone or membrane)
Hopefully, you can conclude which one is very likely next word. So, language modelling is about formalizing this intuition in machines and we do this by introducing models that assign a probability to each possible next word. These models are then used to…
When we estimate how relevant a document is to a given query by feeding usual term and document frequency as parameters to a Bayesian model is termed as probabilistic retrieval or we can say in other terms it is an attempt to formalize the idea behind ranked retrieval in terms of probability theory.
This model is actually based on some assumptions¹. We assume
Based on these two points this theory…
The term natural language refers to the way we humans communicate with each other and the field of study that is concerned with the interactions between computers and human (natural) languages is called as natural language processing and NLP in short. Its this NLP that is concerned with programming computers to fruitfully process large sets of natural language corpora.
Now a days we are surrounded with natural languages in the form of speech and text like for example: Social media Posts, SMS, Signs, Email, comments, tweets, Web Pages and similarly we have natural language in the form of speech and…
word2vec treats each word in corpus like an atomic entity and generates a vector for each word. In this sense Word2vec is very much like Glove — both treat words as the smallest unit to train on.
FastText (which is essentially an extension of word2vec model), treats each word as composed of character ngrams. So the vector for a word is made of the sum of this character n grams
NLP Researcher| Ph.D. | Research focus “Social Networking and Human-Centered Computing”.