2018: Biological Data Analysis: The Right Way

cbios boot camp

Course Directors: Drs Istvan Albert, Naomi Altman, Anton Nekrutenko and Cooduvalli Shashikant

“Biological Data Analysis: The Right Waywas a five-day boot camp sponsored by the NIH-funded Computation, Bioinformatics and Statistics (CBIOS) Predoctoral Training Program. The boot camp was open to post-candidacy graduate students, postdoctoral researchers, staff and faculty members. A description of the workshop is available below:

There are growing concerns about the ability of our scientific community to generate rigorous, transparently documented research that can be meaningfully reproduced and validated by different laboratories. The goal of the boot camp is to train researchers towards fuller appreciation of the issues contributing to ‘reproducibility crisis’, a greater familiarity with computational and statistical techniques, exposure to innovative tools specifically devised to improve transparency and reproducibility.

The boot camp will have three modules, each coordinated by a CBIOS training faculty member, a series of talks, and ‘hands on’ exercises. The first module, Computational Statistics and Simulation, instructed by Dr. Altman will teach simulation-based statistical methods critical for rigorous and reproducible research. The participants will learn key concepts for comparing statistical methods, design realistic simulation studies that mimic important features of complex data sets they will encounter in their research as an aid for discovering systematic signals and patterns.

The second module, Python/Software Carpentry, instructed by Dr. Albert will focus on computing and programming aspects of analyzing ‘omic’ data. Most scientists are never taught how to build, use, validate and share software well. As a result many will expend their efforts by doing things inefficiently. In this module, the participants will be introduced to the principles of efficient data analysis, computer programming and software engineering demonstrated via the Python programming language. The goal of the module is to help participants spend less time wrestling with software and more time doing useful research.

The third module, A Computational Platform for Transparent and Reproducible Research, instructed by Dr. Nekrutenko, will train participants in the use of Galaxy, an innovative data integration and analysis platform for biomedical research. This module will provide participants with core bioinformatics skills, and experience with a widely deployed platform designed to maximize transparency and reproducibility of research.

Additionally there will be talks by Drs. James Broach, Ross Hardison, Qunhua Li, George Perry and others on issues pertaining to reproducible research in diverse biological contexts.