Determining what facilities and equipment will be needed for data storage is an important component of planning for data management. It is useful to think about file storage solutions through two distinct stages of your research project: first during the active stage, and then for the long term storage of the final outputs.
Active data storage
During the active stage of your research project, the data collected will likely be regularly modified and possibly accessed by more than one research team member. At this stage of a project, data storage and sharing between team members should allow for several points of collection, and for work on cleaning, transformation, and analysis of files. In some cases, files may need to be transported to and from cloud-based research analysis software. This ‘active data storage’ is typically necessary as a short-term solution, only used during the collection and analysis phases of your project.
External commercial cloud storage or other storage options (such as DropBox, Google Drive, AWS, and so on) also exist; however, care must be taken when considering their use and there are guidelines for what types of data could be stored on such services. For more information on ethics considerations contact Research Ethics, and for legal considerations contact Research Services.
Long-term data storage
Researchers invest a great deal of time, effort, and resources in collecting data to support their projects. This investment means research data has significant value in addition to its intrinsic value as a record of human knowledge. You or one of your collaborators may want to use data again in the future, and funding agencies may have requirements for whether data is to be saved at the end of a project. Before you begin data collection, it is important to determine what files you will keep, where, and for how long. In some cases the research ethics process stipulates that certain data needs to be destroyed within a given time frame. Here are some considerations on what to store for the long term:
- Data that directly supports published research
- Data that must be shared as required by funding agencies, research institutions, or publishers
- Data with historical significance and long-term value
- Accompanying documentation and metadata
Once you have selected the materials to be saved, try to use stable and accessible file formats for long-term storage.
Research projects sometimes use active data storage infrastructure for long-term storage as well. Such storage, whether locally supported (e.g., SFU OneDrive) or a commercial option (e.g., DropBox, or Google Drive), might be adequate for personal use but are typically not dependable solutions for long-term data discovery or stable access. Instead, data can be published online either in institutional repositories or in trusted external data archives and subject-specific repositories. A stable and established data repository will manage all aspects of long term data storage for you.
If you are a student at SFU and your research data files support your thesis, consider depositing your data in Summit, SFU's institutional research repository. In other cases, researchers can publish their data in the SFU Research Data Collection in the Canadian Federated Research Data Repository (FRDR).
If publishing your data in a repository is not an option, consider using nearline storage for inactive data available from the Digital Research Alliance of Canada.