Upcoming workshop series on using data science for ecological applications

Contact Drs. Sarah Goslee and Bogdan Caradima if you have questions, or would like to sign up!

Drs. Sarah Goslee and Bogdan Caradima (PSU) will be leading monthly virtual sessions, in which guest speakers are invited to demonstrate common tools and methods in data science relevant to the ecological/biological sciences. Individuals at all levels of experience are welcome.  This series will begin early this spring and continue indefinitely, as interest, time and resources allow.  You can contact Dr. Caradima (bbc5423@psu.edu) for more information.

Format
Every month, a speaker will host and deliver a short presentation / live demonstration (~15-20 min) on a topic of their choice, followed by a group discussion and Q&A period (~25 min). Following these workshops, the speaker may opt to provide the audience with additional learning materials, including reference sheets, additional online resources, and scripts with reproducible examples (reprexes). Sarah and I will lead the first two sessions (topics and dates to be announced).

Potential Topics
Potential topics will generally be aimed at a beginner or intermediate level and dependent on audience interest. The speaker will focus on readily transferable tools, methods, and best practices for reproducible data analysis and modelling workflows, such as:
    • Setting up a project workspace to facilitate good data management
    • Selected topics on scripting with Python and R:
        ◦ writing clean, robust, and efficient code
        ◦ key packages for cleaning, manipulating, visualizing data (e.g., tidyverse)
        ◦ how to use powerful features in IDEs such as RStudio
    • Using a terminal with common Bash commands
    • Regular expressions (regexes) for querying and processing text patterns
    • Troubleshooting software issues that can occur in data science workflows (e.g., package management, finding similar problems resolved online, how to prepare reprexes to post on GitHub/Stackoverflow)
    • Introduction to Linux, its major distributions, and using Linux for daily computing
    • Version control software (e.g., git) and online hosting platforms (e.g., GitHub)

More advanced topics may follow:
    • Instruction for Linux power users: customizing the configuration of Bash programs with dotfiles, creating Bash aliases, and using common Bash programs (e.g., nano, vim, ssh, htop, screen, tmux)
    • Using containers (Docker, Singularity) for fully reproducible scientific computing environments
    • A working guide to High Performance Computing (e.g., parallelization, remote server administration)