Implementing strategies to organize and name your data files coherently will help make research data more accessible and meaningful, both now and into the future. These strategies include using logical and consistent folder structures, predictable systems for file naming, and distinguishing different versions of data files from one another. Identifying how your data files will be organized is an important component of planning for data management.
File and folder structure and naming
There are three broad criteria to consider when structuring folders and naming research data files:
- Organisation of data file folders is important for future access and retrieval, and needs to take into account the naming constraints of the system where the file is located
- Context could include content specific or descriptive information, independent of where the data are stored
- Consistency includes selecting a naming convention and making sure that the rules are followed systematically by always including the same information (such as date and time) in the same order (e.g. YYYYMMDD)
When it comes to deciding what information to record in the file name, there are several common file elements to consider including:
- Creation date (the date and/or time when the file was created; if the file name begins with the YYYYMMDD format, most file systems that automatically sort alphanumerically will order your files chronologically)
- Creator (the author of the file)
- Description (a brief, expository word or phrase about the file and/or its context)
- Research team (details of the research team involved in the file's creation)
- Project name or number (project-specific details to give additional research context to the file)
- Version (often a number-based record of the files revision chronology)
When creating file names, do not use spaces as this can cause errors in some software. Instead, use camel case (e.g. RawData.txt) or underscores (e.g. raw_data.txt). Including some of the elements from the list above, your file name could look like 20230518_LastName_RawData_CBRF_v02.csv for example.
Managing versions of your files
Storing a sequence of versions of data files is commonly required in research projects. A common way to distinguish different versions of data files from one another is by using a consistent method of naming the file versions. To manage this automatically, dedicated file versioning software could also be used. The primary goal of versioning is to keep raw data organized during the collection phase, and distinct from the sequence of cleaned and transformed data files.
There are two primary ways to consistently track file versions: using file names or using software to automate versioning.
Here are some recommendations when using file names to track versions:
- Avoiding descriptive labels for versions ('final', 'draft', 'revision', etc.), which can make it difficult to interpret file version chronology.
- Consider using zero-filled ordinal numbers (i.e., 01, 02, 03, etc.) to identify significant version changes (in practice, this might look like dataFile_v01, dataFile_v02).
- Use underscores to identify less significant changes (e.g., mark smaller changes by naming successive versions as dataFile_v01_01, dataFile_v01_02).
- Using decimal points to denote smaller changes can cause errors with some software.
If you would rather not manually track versions with file naming conventions, use software or services to make the versioning process automated. Here are some solutions that might work for you:
- Both OneDrive (including SharePoint) and Google Docs have a version history feature, allowing you to easily view or restore previous versions of files.
- The Open Science Framework (OSF) research project management system has file versioning functionality for the associated online storage.
- The free and open-source Git software system can be installed on your local computer independently of network access, and provides complete version-tracking abilities for files.
Organising data - UK Data Service
File name suggestions and well-organised folder structures for finding and keeping track of data
File naming and folder hierarchy - MIT Libraries
Guidelines for naming and organizing research data
Version control and authenticity - UK Data Service
Information about version control and data file authenticity best practices.