ShoalBase Methods & Data Standards
Last updated: November 2025
ShoalBase is a global, community-driven database documenting social behaviour in fishes across scientific, recreational, and naturalist observations. This Methods page outlines how data are collected, cleaned, standardised, and prepared for exploration and research use.
1. Overview of the Submission Framework
ShoalBase collects structured behavioural observations submitted via the ShoalBase Data Submission Form. To allow consistency across contributors of diverse backgrounds, the form includes questions covering the following broad categories:
Species information
Context of observation
Social behaviour categories
Evidence and attribution
Contributor details (optional)
Environmental and contextual variables
These categories align with established behavioural ecology terminology and allow flexible but standardised reporting across species and settings.
A detailed Contributor Guide is available on the Contribute page to assist observers in classifying each field.
2. Data Cleaning
All submissions undergo cleaning using a combination of manual screening and an R-based pipeline. The following steps facilitate uniformity across entries.
2.1 Species Names
Scientific names are trimmed of whitespace and standardised.
Obvious typos or formatting errors are corrected where possible.
Synonyms and outdated names may be harmonised in future updates using FishBase/GBIF.
Common names are not standardised due to regional variation but are preserved verbatim, except for obvious typos.
2.2 Country and Geographic Information
The field Country of observation undergoes multi-stage cleaning:
Trim whitespace and convert blank strings to NA.
Aggregate common variants:
“UK”, “United Kingdom”, “Scotland”, “Wales”, “England”, “Northern Ireland” = UK
“USA”, “United States”, “U.S.A.” = USA
Aggregate frequent spelling variants (e.g. “Brasil” = Brazil).
Use the countrycode package to convert remaining names to standard ISO country formats.
Remove unrecognisable or ambiguous entries.
This allows consistent mapping and geographic summaries.
2.3 Handling Multi-Choice Categories
Some fields allow multiple selections (e.g. “shoaling, predation reduction”). These are stored as comma-separated strings and later expanded into long format during analysis on the Explore page.
Fields treated as multi-choice include:
Group size
Social system
Purpose of social behaviour
Social modifiers
Observation type
Evidence type
Habitat type
Each expanded term is filtered against a valid option list (see below) to prevent misclassification due to free-text “other” entries.
3. Controlled Vocabularies
To maximise consistency across entries, ShoalBase uses biologically meaningful categories and widely used terms.
Categories with controlled vocabularies include:
Life stage
Observation setting
Fish origin
Group size
Social system
Purpose of social behaviour
Social modifiers
Evidence type
Observation type
Habitat type
A full list and definitions of these terms are provided in the Contributor Guide.
4. Environmental Variables
Where provided:
Temperature (°C) is converted to numeric and screened for unrealistic values or invalid symbols or formats.
Oxygen (% air saturation) is numeric and cleaned similarly.
Additional context (e.g. salinity, depth, time of day) is preserved verbatim.
These data allow ecological and physiological analyses of social behaviour under environmental gradients.
5. Data Validation and Quality Assurance
Additional human/curator checks (periodic) include:
Review of ambiguous or unusual submissions.
Inspection of photo/video attachments where available.
Follow-up requests for clarification (for contributors who opted in).
Impossible combinations (e.g. spawning behaviour in larvae).
Missing essential fields (species scientific name, social system).
Format checks for numeric fields.
Non-permitted values in categorical fields.
ShoalBase intentionally accepts some observational uncertainty to capture and document valuable behavioural observations.
6. Data Storage & Accessibility
All submissions are stored in a secure, versioned Google Sheets dataset before being processed through the Shiny dashboard.
The live app on the Explore page reads directly from the cleaned version of the dataset on each load, ensuring that updates appear immediately.
All data can be copied from the table in the Explore page, or a CSV file with the complete submitted dataset can be requested.
7. Data Use, Citation, and Licensing
ShoalBase aims to maintain open and responsible data sharing.
Default licence: CC-BY (users must cite ShoalBase and contributors where attributed).
Attribution: Only contributors who select “Yes, with attribution” have their names displayed publicly.
Anonymous option: Observations can be submitted without attribution or identifying information.
Each submission is assigned a unique ShoalBase Record ID in the format SB-00001, SB-00002, etc. This ID can be used to reference or retrieve specific observations in publications and correspondence.
How to cite ShoalBase data:
If you are using ShoalBase data in research or publications, please cite the project as:
Killen, SS., Cortese, D., Munson, A., Kochhann, D., Pineda, M, Tiddy, IC. (2025). ShoalBase: Global community database of fish social behaviour. World Wide Web electronic publication. www.shoalbase.org.
To cite a specific contributor’s observation, in addition to citing Shoalbase as described above, please reference it as:
Contributor Name (Year). Species. ShoalBase Record ID: SB-XXXX.
This ensures proper credit for community contributors while maintaining consistent dataset citation practices.
And importantly, for any observation that is derived from a published source, you are also expected to cite that original source of information.
8. Planned Updates
Future development will include:
Additional data visualisation on the Explore page, as the dataset grows
Taxonomic harmonisation using FishBase/GBIF
Automated anomaly detection
9. Contact
For methodological questions, suggestions, potential collaborations, or clarification regarding specific entries, please contact the ShoalBase coordination team via the Contact page