You are here

Data management FAQs: Research data management services

zeros and ones banner image

What is research data?

Research data is "That which is collected, observed, or created in a digital form, for purposes of analysing to produce original research results" (University of Edinburgh).

Research data is anything collected over the course of academic work that underlies published scholarship such as journal articles, conference presentations, exhibitions or performances.

What research data management training, support, and guidance exists?

In addition to resources offered by the SFU Library, the following are excellent starting points:

  • The Digital Curation Centre: pamphlets and general-purpose documentation on the topic of digital curation (which often means research data curation)
  • MANTRA Research Data Management Training free online program
  • Interuniversity Consortium for Political and Social Research (ICPSR) Guide to Social Science Data Preparation and Archiving

What are funders' requirements?

In June of 2016 the Tri-Agencies released their "Tri-Agency Statement of Principles on Digital Data Management" outlining the importance of good data management practices with an intention to develop new data management requirements. While the Tri-Agencies underscore the importance of "research data collected with the use of public funds belong, to the fullest extent possible, in the public domain and available for reuse by others,' they have not yet made these requirements mandatory.

Currently, the Social Sciences and Humanities Research Council (SSHRC) requires "All research data collected with the use of SSHRC funds must be preserved and made available for use by others within a reasonable period of time." SSHRC considers "a reasonable period" to be within two years of the completion of the research project for which the data was collected."

The Canadian Institutes of Health Research similarly requires the deposit of non-sensitive data within 12 months of publication.

Other funders (or publications) may have their own requirements for data deposit.

Are all written records and computer files in a research lab considered research data?

No.

Only “recorded information necessary for the reconstruction and evaluation of reported results of research and the events and processes leading to those results, regardless of the form or the media on which it may be recorded” are considered research data.  In other words, you probably don't have to worry about one-time procedural information like schedules and contact information when you're thinking about long-term preservation of your research.

How do I cite my (or someone else's) data?

DataCite recommends a fairly simple citation style for citing data, similar to APA format: 

  • Creator (PublicationYear): Title. Version. Publisher. ResourceType [optional]. Identifier [e.g. DOI]

SFU's Research Data Repository Radar will automatically generate a citation that looks like this based on the stored metadata for a given dataset or research object and displays it on its landing page.

Am I required to deposit my data?

Currently, there is no requirement for researchers to deposit their data although, for data integrity and long-term preservation, it is highly recommended

Is there a rights transfer of any kind involved in depositing data with SFU Library?

Absolutely not. You retain full ownership of your data and can remove it from the Library's repository (or make it private) at any time you want.

Which file formats should I use when working with data?

The Library makes an effort to preserve files at the "bit-level" (e.g. backing up the file and ensuring it does not become corrupted) no matter what format they are submitted in.

The chart below, however, indicates the Library's level of confidence in being able to preserve files so that they remain usable through software upgrades and changes in the computing milieu.

File formats with the following characteristics will more likely be able to retain their functionality over time:
  • Complete and open documentation
  • Platform independence, non-proprietary (e.g. not Windows-only or Mac-only)
  • Minimal embedded content: for example, we'd rather have video and images separately from documents rather than embedded in them.
  • No password protection on the files themselves; SFU's Research Data Repository Radar takes care of private permissions
  • Wide adoption -- common programs and formats are better

 

Format Best Good Poor
Text
  • Plain text (.txt) -- ASCII or Unicode (Notepad, TextEdit, gedit)
  • XML with included schema (.xml)
  • PDF/A-1 (.pdf) -- (See options in your PDF creator to ensure PDF/A-1 format)
  • Markdown (.md)
  • Plain text (.txt) -- non-ASCII or Unicode
  • Rich Text Format 1.x (.rtf)
  • Cascading Style Sheets (.css)
  • HTML (.html, .htm)
  • LaTeX with referenced files (.latex)
  • Open Office (.odt, .sxw)
  • PDF with fonts embedded (.pdf)
  • Microsoft Word (.doc, .docx)
  • all others

Non-vector

Images

  • TIFF -- uncompressed (.tiff)
  • PNG (.png)
  • TIFF -- compressed (.tiff)
  • JPEG (.jpg)
  • GIF (.gif)
  • BMP (.bmp)
  • PhotoShop (.psd)
  • RAW files
  • all others

Vector

Graphics

  • SVG (.svg)
  • Computer Graphics Metafile (.cgm)
  • Encapsulated Postscript (.eps)
  • Macromedia Flash (.swf)
  • all others
Audio
  • AIFF -- PCM (.aif, .aiff)
  • WAV -- PCM (.wav)
  • Note -- neither of these are compressed audio formats. For very large audio, or audio that is already compressed, it is advised to stick to one of the formats listed under "medium confidence."
  • Standard MIDI (.mid)
  • Ogg Vorbis (.ogg)
  • Free Lossless Audio Codec (.flac)
  • MP3 (.mp3)
  • MPEG-4 without DRM (.mp4, .aac, .m4a)
  • AIFC -- compressed AIFF (.aifc)
  • RealAudio (.rm, .ra)
  • Windows Media Audio (.wma)
  • all others
Video
  • MPEG-4 (.mp4)
  • MPEG-1, MPEG-2 (.mp1, .mp2)
  • QuickTime (.mov)
  • Matroska (.mkv)
  • Ogg Theora (.ogv, .ogg)
  • Windows Media Video (.wmv)
  • AVI (.avi)
  • RealVideo (.rm, .rv)
  • all others

Spreadsheet

or Database

  • Comma-separated Values (.csv)
  • MySQL database backup
  • Other delimited text (please convert to CSV if possible)
  • dBASE (.dbf)
  • OpenOffice (.ods)
  • Excel (.xls, .xlsx)
  • all others
Computer Programs  
  • Computer program source code (unmanaged)
  • Managed source code (e.g. Visual Studio)
  • Compiled / Executable files
Presentation  
  • OpenOffice (.odp)
  • PDF (.pdf)
  • PowerPoint (.ppt, .pptx)
  • all others

What is metadata and how do I use it?

Metadata comprises the descriptive tags you assign to your data to help make it accessible over time. Metadata can be descriptive, structural or administrative in nature (see Data Management Planning for further information). Common metadata elements, regardless of discipline, will include fields like:

  • Title of data set: References the 'what, where, when who and scale' of your data
  • Creator: Usually the author (or PI) of the data set and should include the complete name " William Wadsworth Longfellow"
  • Subject:  Include standardized disciplinary terminology
  • Description: This showcases the 'how and why' of your data and includes how the data was collected along with any other relevant information
  • Abstract: Can be taken from grant proposal or project reports. You can use the abstract to articulate how the dataset will meet a need
  • Contributor: Include here any other individuals contributing to the data
  • Date: Usually the time frame the data was collected and/or the date the data was deposited. Use YYYY-MM-DD format
  • Type: Refers to the nature of the data such as image, text, software, spreadsheet
  • License: Indicates who, and under what circumstances, users can access your data

Adapted from "Johnston, Lisa R., Jeffryes, Jon. (2015) Data Management Workshop Series, Winter 2015. University of Minnesota Libraries. http://z.umn.edu/datamgmt15.