Abstract
Hate speech and offensive language content on social media platforms has increased in both volume and tone across Aotearoa. The current study aims to develop a method to monitor hate speech and offensive language using transformer-based pretrained language models (e.g., XLM-RoBERTa). A hate speech and offensive language text classification model was developed using open-source hate speech language training data. We applied our text classification system on a random monthly sample of tweets from across a hundred locations. The results found that the rate of hate speech as identified by the system developed for this study has steadily increased over time. There also appears to be an urban-rural split in the occurrence of hate speech and offensive language. However, a closer inspection of hate speech found that the model was not sensitive to Aotearoa-specific linguistic features (e.g., ‘bugger’) and words with structural similarities to slurs were misclassified as hate speech or offensive language. The findings suggest that language models are immensely valuable; however, further work is needed to develop training data specific to the social, political, and linguistic context of Aotearoa.