Analisa Sentimen Tweet Indonesia Menggunakan Fitur Ekstrasi Dan Teknik Cross Validation Terhadap Model Naïve Bayes

Authors

  • Ahmad Turmudi Zy Universitas Pelita Bangsa

Abstract

Sentiment analysis is a science in the field of natural language processing studies to analyze data in the form of positive and negative opinions with the aim of getting results in decision making. One of the media in sentiment analysis research is twitter. The main problem in sentiment analysis classification is how to choose the right features and validation in the test. The model used for this research is Naïve Bayes. Naïve Bayes can be combined with feature extraction. In testing the feature extraction of CountVectorizer and TFIDFVectorizer is compared using the Cross Validation technique to improve the Naïve Bayes classification. Value measurement is done by comparing between testing without validation and using validation. Accuracy can be measured using confusion matrix, precision and recall. The results of the study show that using the TF- IDFVectorizer feature extraction is better than the CountVectorizer with the highest accuracy of 85.98% and for the final test the extraction feature with Cross Validation is better than not using Cross Validation with the highest accuracy value of 97.67%. Thus, testing the extraction feature that is best used is the TF-IDFVectorizer and by using the Cross Validation technique it can improve the performance of the Naïve Bayes model in the sentiment analysis of Indonesian-language twitter so that it.

Keywords : Sentiment analysis, twitter, Naïve Bayes, feature extraction, Count Vectorizer, TF-IDF Vectorizer, Crosss Validation.

Downloads

Published

2022-09-11