During COVID-19, Research Commons' services continue.

SFU graduate students are encouraged to book consultations with the Research Commons staff and partners. Consultations are available by phone, via email, or through online video-conference.

Not finding what you're looking for? Please get in touch with us at research-commons@sfu.ca so we can discuss your research support needs. 


Tuesday, July 27, 2021 - 3:00pm to 5:00pm
Wednesday, July 28, 2021 - 3:00pm to 5:00pm
Thursday, July 29, 2021 - 3:00pm to 5:00pm
via Zoom (link will be sent to participants 24 hours before the workshop/event begins)

All times are Pacific Time Zone (Vancouver, BC, Canada).

Text mining techniques can be applied to various data sources (e.g., newspaper articles, emails, online discussion posts, etc.) to efficiently extract useful data for different research purposes. For example, health science researchers may be interested in investigating a frequency of a particular disease name mentioned in a large set of newspaper articles. Educational researchers, on the other side, may wish to extract and categorize students' opinions from discussion forum in a high enrollment course. R offers a comprehensive set of functionalities for text mining. In this workshop, you will learn how to implement basic methods for preprocessing textual data, metadata management, a creation of term-document matrices over the collection of textual documents, sentiment analysis, text tokenization, word relationship extraction and text visualization.


  • Participants will need to have R and RStudio installed on their device prior to attending the workshop
  • Familiarity with R and the RStudio environment including an understanding of basic functionality such as object assignment, data structures, and running scripts 


Attendance requirement:

You need to attend all 3 days. Different topics are covered each day and it builds on materials covered on the previous day, so if you miss a day, we won't have the resources to help you catch up in this online environment. A rough schedule of what's being covered each day will be forthcoming. They will be refined each day as required.

Day 1 (July 27): Text Preprocessing & Metadata Management
Day 2 (July 28): Term-Document Matrix (TDM) & Sentiment Analysis
Day 3 (July 29): Results Visualization: Text Tokenization, Relationship Extraction, & Word Clouds


Matthew McKitrick
Sina Nazeri

