Role: Research Assistant
Company: USC / Integrated Media Systems Center
Technologies: .NET, Python, NLP, PowerBI
Duration: 06/2017 - 08/2017
Overview
This chapter—“Sentiment Analysis of Korean Teenagers’ Language Based on Sentiment Dictionary Construction”—published in Frontier Computing (2019), focuses on developing a sentiment analysis framework tailored to the unique linguistic patterns of Korean teenagers. Recognizing the limitations of existing sentiment tools for youth-specific expressions, the study introduces a custom sentiment lexicon and applies a rule-based algorithm leveraging n-gram patterns.
Key Contributions
- First Author Leadership: Led the research and publication effort as the first author, contributing to the conceptual design, data collection, algorithm development, and writing.
- Teen‑focused sentiment lexicon: Developed a specialized sentiment dictionary by analyzing language patterns typical of Korean teens, addressing limitations of general-purpose Korean sentiment resources.
- N-gram sentiment scoring algorithm: Deployed an algorithm using n-grams to compute sentiment scores at the sentence level, enhancing capture of contextual teenage expressions .
- Web crawling methodology: Collected authentic teenage language samples via web crawling of several Korean sites, ensuring real-world relevance.
Outcomes
- The study successfully demonstrated the feasibility of constructing a teen‑specific Korean sentiment lexicon.
- Experimentation on public datasets (likely crawled content) showed that using this specialized lexicon with the n‑gram algorithm effectively analyzes teenage sentiment, though precise performance metrics (e.g., accuracy, F1 score) are not available in the abstract researchgate.net.
- Overall, the approach yielded promising results, validating both the custom lexicon and algorithmic approach for teen language.
Resources
- Sentiment lexicon tailored to Korean teenagers' expressions.
- N‑gram based scoring algorithm implemented atop KoNLPy (a Korean NLP toolkit).
- Web‑crawled textual datasets representing authentic teenage speech via online platforms.