Versioning your research data

Many copies and versions of research data files are a common occurrence in research projects. A common way to distinguish different versions of data files from one another is by using a consistent method of naming the file versions. To manage this automatically, dedicated file versioning software could also be used. The primary goal of versioning is to keep raw data organized during the collection phase, and distinct from cleaned and transformed data. 

In addition to keeping track of file versions, you should also structure your research data files by organizing and naming them in a consistent way.

Best practices for file version management

Although these best practices do exist, it is recommended that you and your research team decide what methods work in your research context. Consistently apply versioning methodology, even for less significant changes.

Avoid descriptive version labels

Avoid using descriptive labels for versions (final, draft, revision, etc.), which can make it difficult to interpret file version chronology.

Use ordinal numbers

Consider using zero-filled ordinal numbers (i.e., 01, 02, 03, etc.) to identify significant version changes (in practice, this might look like dataFile_v01, dataFile_v02).

Use underscores

Underscores may be used to denote less significant changes (e.g., marking smaller changes by naming successive versions as dataFile_v01_01, dataFile_v01_02).

Do not use decimal points to denote smaller changes, as this can cause errors with some software. See the Structure your research data page for more information about file naming.

Use software that automates versioning

Where available, use software or services to make the versioning process automated so that you don't have to think about the file naming conventions described above:

Wikis have a page history feature and GoogleDocs have a version history feature, allowing users to easily restore previous versions of their documents.

The Open Science Framework (OSF) online research project management system has file versioning functionality tailored for research data use.

The free and open-source Git software system can be installed on your local computer independently of network access, and provides complete version-tracking abilities for files.

Additional resources

Version control and authenticity - UK Data Service
Information about version control and data file authenticity best practices.