Skip to content

Latest commit

 

History

History
46 lines (35 loc) · 1.31 KB

File metadata and controls

46 lines (35 loc) · 1.31 KB

Sentiment Analysis

Supervised classification of textual reviews based on its sentiment into one of the five polarities:

  1. Strong negative
  2. Weak negative
  3. Neutral
  4. Weak Positive
  5. Strong Positive

Methodology

  1. Text Pre-processing: The raw data was processed to convert it into a format that can be used for further processing. The following steps were applied:
    • Case normalisation
    • Tokenisation
    • Lemmitization
  2. Feature Generation: Once the data was cleansed, relevant features were extracted from the it such as:
    • Creation of N-grams
    • Term and inverse document frequency
  3. Model : Logistic regression is the classifier used for determining the polarity of a review.

Datasets:

  1. train_data.csv:

    The training set consists of 650,000 product reviews.

  2. train_label.csv:

    This dataset contains the sentiment lables of the training dataset. The label set (1,2,3,4,5) refer to five polarity levels (strong negative, weak negative, neutral, weak positive, strong and positive) respectively.

  3. test_data.csv:

    The test set consists of 50,000 product reviews.

  4. predicted_label.csv:

    This dataset contains the predicted sentiment labels of the test data.