We have many text data to improve through the NLP technologies. And NLP technologies have many algorithms and representations. There are Good Old Passion AI with rule and template. And Deep Learning algorithms.
NLG Objective is machine understanding of natural language so that they can do something useful for people. Semantic interpretation and Syntactic Analysis is so challenging to implement.
A sequence of steps that converts unstructured text into a data structure an algorithm can work with.
First, text to Numeric data (Bag of Words)
There are two problems such as order and importance.
We can solve order problem by bigram. and Importance problem can be explained by TF/IDF
Second Linguistic inquiry and word count.
It uses the Based on the classification of the word.
And then A feature space based on the classes.
NLP problems commonly were divided by classification and regression.
Regression problems is evaluated by Root mean squared Error.
One of NLP classification algorithms is SVM, support vector machine that find the best line that divides the classes. Then It maximizes the distance between the nearest data point and the track. Svm contain a kernel function to classify data that is not linearly separable.
And It's robust to noise.
Evaluation has three factors such as accuracy (prediction is correct), recall (positive occurrences are predicted), precision (positive prediction is accurate).
Precision is essential to be right (Expensive interventions, Safe to miss out, eg. placing a Bet)
The recall is necessary to be complete (Inexpensive interventions, Dangerous to miss out, eg, Cancer screening)
You carefully think about evaluation methods. Is it worth it for customers using the evaluation?
Overfitting is when a model learns a dataset to well
To overcome this problem, remove unnecessary features. Regularization.
TF-IDF: FreeCodeCamp medium website
LIWC: 2010 paper by
Tausczik and Pennebaker
댓글 없음:
댓글 쓰기