#

5  Reproducible science

How do we approach open science now?

Overall impressions After reading Lowndes et al 2017 “Our path to better science in less time using open data science tools” https://www.nature.com/articles/s41559-017-0160, the main impressions that we all had were that the paper is super helpful, but there are differences in needs for true collaborations and individual projects. But we individually we are doing some of the approaches that the authors outline from their experience with Ocean Health Index but certainly not all. Nice points from the paper:

• Documenting communication and decisions through Rmd files and through Issues.

• Not just code but also filenames, repo structures, meetings, communication conventions.

• Despite not working in large collaborative groups, being kind to your future self is a key point of importance

• Highlights the importance of incremental changes to your workflow - emphasis on evolution not revolution.

• Make changes achievable and self-paced and know that each repo is a bit better than the last.

• The efficiency of not reproducing each others work all the time if code is clear and useable.

• We work on a lot of the same thing - could certain conventions help each other. What do we feel is the need for evolving our approach?

• Where do we get the latest updates for new packages, functions? Twitter largely, coding clubs.

• Can we create a project where we can all contribute that helps us act in the same way as a more collaborative unit e.g. OHI? What are the ideas for what a project could be?

• Creating a shiny app e.g. for mapping and being able to change it to demonstrate different visualisations?

• Needs to be practical and that a good proportion of people do as tasks that then contributes to a lab project?

• Lab workbook

###What could a lab book be for size ecology? • Create an ongoing workable document with lab ethos/values

• Adding common tasks that might be useful for numerous members as we go.

• Link to website

• Bring in Mizer packages

• Visualisation where all of our work overlaps

• Mapping tasks Some resources for approaching tidy repositories based on our conversations (people can add to as we go):

• Alexa Fredston has some fantastic and super accessible documentation on setting up repos and getting communicating with Git - really worth a read (and not too long!) https://afredston.github.io/learngit/ learn-git.html#2_Setting_up_version_control

#Next session on tidy data (8/6/22) We need to prep for the next session by: • Reading Wilson et al 2017 “Good practices in scientific computing” https://journals.plos.org/ ploscompbiol/article?i d=10.1371/journal.pcbi.1005510

• Come ready discuss our current approach to data management

• Long-term goals: “Have clear systems for data management, storage and backup, as well as for documentation of methods and code.”

How do we store our data and code at present?

Overall impressions

Following reading Wilson et al 2017 ‘Good enough practices in scientific computing’ https://journals.plos.org/ploscompbiol/article?i d=10.1371/journal.pcbi.1000424#s6 it is clear we are doing some of these things already (as in meeting one) which is positive.

Ways we store our data:

We are backing up raw data and storing it in an unadulterated form. Where we store the raw data depends on our need. For example, Freddie will store all raw data within the R project file, whereas Rich will store it externally so there isn’t duplication of the datasets across projects (a space constraint specific to the data being used).

Ways we share our methods:

• Through our published papers

• Through labelling throughout R markdown files or commenting in R

Ways we share code now:

• Zip the whole project and send to whoever wants to run it

Sharing through github to be cloned or forked and send large raw_data files separately

Things we liked from the paper:

• Highlighting the importance of the ReadMe file for documenting what people need to do on accessing your code and storage

• The importance of shifting the naming convention for figures away from numbering (i.e. Fig_1, Fig_2) to a descriptive label that doesn’t need to be changed when the figure has been

Providing a file structure that could be applied by someone who wanted to try shifting the organisational

structure of their code and data

What is the need for within the size ecology lab?

• We thought that having a repo structuring example that we can all play with (pushing, pulling, creating new branches for from GitHub) could be a really good way for people to gain confidence in GitHub.

Other useful thoughts that came out of the discussion

Could be that raw_data stored outside of the project could also be stored openly on something like Zenodo (if it is possible to do so)

• file.exists statements can help with preventing on running tidy data scripts if the tidied raw data already exists within the project.

Streamlining storage for project to one place so that data are not across multiple sources. Lab memebersneed to be abreast of the various options and advantages of different file storage solutions provided at UTAS re storage vs privacy vs disaster recovery.

• Can we find out how to link to our UTAS OneDrive through an API?

We could add our accepted (non-formatted) versions of manuscripts to the project/Github repo too which should increase open accessibility.