Why create a Data Management Plan?
It is beneficial to all researchers to manage data so that it remains understandable and accessible over time. Fundamentally, data is anything that forms the foundation of scholarly publication. Data is not necessarily numeric and can take the form of images, oral interviews, tabular data, music—anything you produce which your published scholarship draws on.
Good data management benefits researchers in several ways:
- contextualizes your research and assists in recognizing any potential gaps/roadblocks before you start
- improves access to or re-use of data
- increases visibility of faculty research
- ensures data integrity.
While Canada currently doesn’t require grant recipients to curate or actively manage their research data, this is beginning to change. As agencies such as the Canadian Institutes for Health Research (CIHR) and the Social Sciences and Humanities Research Council (SSRC) now require grant recipients to adhere to a number of requirements supporting open access of published materials including data sets, other funders like the Natural Sciences and Engineering Research Council of Canada (NSERC) are looking to incorporate data management plans into their funding requirements.
Data Management Plan elements
The following provides broadly applicable elements to consider and include when drafting your data management plan. The SFU Library's Data Management Plan template is also available for you to download and fill in.
1. Data products
When drafting your data management plan, start by describing the type of data you expect to produce/collect/use in either digital or physical formats.
Consider what the scope or nature of what your data might be and how you intend to collect it. Is it observational data, survey data, administrative data? Audiovisual recordings? Laboratory notebooks? Imaging output?
Describe whether you intend to share your data or indicate if there are restrictions on use or re-use.
Indicate where you will deposit your data. Will it be deposited in the SFU Library's Research Data Repository, Radar, or will it be another discipline-specific repository?
3. Data description
How well you describe your data will largely determine how usable it remains over time. Metadata helps ensure long term preservation and discoverabilty.
Metadata elements tend to fall into one of three categories: descriptive, administrative and structural.
Descriptive metadata: describes the object or data and gives the basic facts: who created it (i.e. authorship), title, keywords, and abstract.
This is usually the easiest, and the most immediately logical metadata to maintain; if your research software were iTunes (which it hopefully isn't), you'd likely already have some idea as to how to maintain authorship details. If your software wants to help you do this, and it has some export functionality that doesn't make it impossible to extract this metadata at some later date, let it help -- standards are much more effective when they're reinforced by the tools you're working with.
Administrative metadata: includes information about the management of the object and may include information about: preservation and rights management, creation date, copyright permissions, required software, provenance (history), and file integrity checks.
Structural metadata: describes the structure of an object including its components and how they are related. It also describes the format, process, and inter-relatedness of objects. It can be used to facilitate navigation, or define the format or sequence of complex objects. This will often but not always be handled and documented automatically by the software you're using to create your data (be it tabular data or an audio recording), but it's still helpful to have some idea of how to access this information.
Metadata standards: vary from discipline to discipline to reflect the specific requirements needed to accurately describe the data being produced. Choose a metadata standard that is consistent with disciplinary requirements or with the type of data you will be generating/collecting.
Radar, SFU's Research Data Repository uses the Data Documentation Initiative (DDI) metadata standard. DDI is an effort to create an international standard for describing data from the social, behavioral, and economic sciences. Expressed in XML, the DDI metadata specification now supports the entire research data lifecycle. DDI metadata accompanies and enables data conceptualization, collection, processing, distribution, discovery, analysis, repurposing and archiving."
Cornell University has a series of guides to assist in writing metadata and associated 'Read Me' files.
4. Data organization
File Formats: See Which file formats should I use when working with data?
Storage: File and folder naming conventions are not the be-all and end-all of good organizational practice, -- no matter how many hyphens you put in there, they're still functionally only one metadata field -- but they are important to pay attention to. Be consistent, avoid spaces, and don't try to duplicate any functionality that's already performed by the software you're working with (e.g. versioning).
5. Intellectual property
Who will be responsible for determining ownership and licensing of your data? This is important for inter-university and other collaborations, when default assumptions about intellectual property may be difficult to reconcile with one another. Even if you plan to freely release and share your data, these decisions must be made by the original rights holders. This is also a good time to think about any overarching legal requirements which may preclude the reuse of your data.
6. Ethics and privacy
Are there legal constraints preventing you from sharing your data? How will you ensure confidentiality of subjects? Redistributing human subjects data can be very difficult. Anonymization is a given, but simply removing individuals' names from a dataset may not be sufficient, depending on the presence of other demographic information. We encourage you to discuss any ambiguities with SFU's Office of Research Ethics.
It likely comes as no surprise that a significant part of why most data is not shared in the absence of funder mandates, is because it's not a terribly exciting budget item and there's insufficient will at the end of a research project to make data available in a form that's perceived as "not embarrassingly messy" or to jump through the deposit hoops of a given repository. We're doing our best to lessen the pain of jumping through those hoops, but it's also important that you reserve some funding dollars for data publication — the same as you might reserve open access publication fees.
The Portage Network, a Canada-wide association providing research data management services, provides an online tool called the DMP (Data Management Plan) Assistant for preparing Data Management Plans, available from https://portagenetwork.ca/.
We've also prepared a sample Data Management Plan template which may be downloaded and adapted.