
The Roar supercomputer is Penn State's high-performance research cloud, managed by the Institution for Computational and Data Sciences. There are free and paid allocations available to Penn State researchers, including participants in the BG program. More information and instructions for access are available on the ICDS website.
A further list of bioinformatics tools, platforms, and software developed by Penn State researchers for biological data analysis is available below.
Galaxy, an open, web-based platform for accessible, reproducible and transparent computational biomedical research
Biostars-Bioinformatics Explained, a forum to explore bioinformatics, computational genomics and biological data analyses
Biostars-Galaxy Explained, a forum to explore Galaxy
Neurostars, a forum to engage Neuroinformatics community
PyBlue, a simple static site generator
Genetrack, a bioinformatics software package for sorting, queirying and visualizing interval oriented data
BooleanNet, a Boolean network simulation software for life science
Genome browser with erythroid transcription factor occupancy and other features of gene regulation, genome-wide in mouse
KmerGenie, kmer size selection for genome assembly
TwoPaCo, deBruijn graph construction from complete genomes
bcalm, deBruijn graph compaction in low memory
FlowgramFixer, base caller for IonTorrent sequencing data
SPRITE, parallel SNP Detection Pipeline
FASCIA, parallel subgraph counting for determining approximate counts of tree-structured subgraphs in large networks
BEAM (Source code), BEAM2 (Source code), BEAM3 (Souce code, compiling needs GNU Scientific Library) and BEAMimpute for SNP-SNP interaction association mapping
PASS, PASS2 (Source code, here) Peak calling in ChIP data based on Poisson de-clumping, controls FWER and FDR
GPASS for detecting SNP disease associations in case control studies
dCaP Joint peak caller and differential binding detector for ChIP-Seq data in multiple samples.
DBM Dynamic Bayesian Markov model for genotype calling, haplotype inference, de novo inference of population structure and local admixture for next-gen sequencing data
TIPS Tree based Bayesian detection method of subtle population structures
CHB Coalescence guided Baysian inference of haplotypes from genotype data
EulerAlign Alignment of DNA sequences using Eulerian graphs
MultiGPS, a framework for analyzing collections of multi-condition ChiP-seq datasets
STAMP, a webserver resource for aligning transcription factor DNA binding motifs
SOMBRERO, a motif-finder that is based on the Self-Organizing Map neural network algorithm
RescueNet, uses the Self-Organizing Map neural network algorithm for codon usage anaysis and gene-prediction
PipMaker and MultiPipMaker server software (bzipped tar file of source code; beta version; latest release: 2011-Aug-12)
LASTZ alignment program (latest release: 1.02.00, 2010-Jan-12)
[↑] BLASTZ alignment program (obsolete; replaced by LASTZ) (gzipped tar file of source code; latest release: 2004-Dec-22)
Multiz and TBA alignment programs (gzipped tar file of source code; latest release: 2009-Jan-21)
Sim4 alignment program revised 2012-Oct-10
VennGenerator (latest release: 2009-Jul-23)
DIAL (gzipped tar file of source code; latest release: 2011-Jun-06)
YASRA , Yet Another Short Read Assembler (gzipped tar file of source code; latest release: 2014-Mar-27)
CHAP (fast version; gzipped tar file; 71 Mb; 2011-Aug-02)
CHAP 2 (link to GitHub)
StructureFold, at Galaxy, for RNA secondary structure mapping and reconstruction
ShortStack, for comprehensive annotation and quantification of small RNA genes
PS-HomPPI: Partner-Specific Protein-Protein Interface Residue Predictor
NPS-HomPPI: Non-Partner-Specific Protein-Protein Interface Residue Predictor
PrISE : Prediction of protein-protein Interface residues using Structural Elements
DockRank: Rank Docked Models Using Predicted Partner-Specific Protein-Protein Binding Sites
RNABindRPlus: A server for predicting RNA-binding residues in proteins using a combination of sequence-homology and machine learning methods
RNABindR v2.0: A server for predicting RNA-binding residues in proteins
FastRNABindR: A server for large-scale prediction of protein-RNA interface residues
BCPREDS: B-cell epitope prediction server
MHCMIR: Predicting peptide-MHC-II binding affinity
BacGen: Predicting protective bacterial antigens
BiNA: Biomolecular Network Alignment
EnsembleGly: glycosylation site prediction
PRIDB: The Protein-RNA Interaction Database
ProtinDB - PROTein-protein INterface residues Data Base
INDUS - INtelligent Data Understanding System
EpiT: Epitopes Toolkit
Gennotate: Genome Annotation Toolkiti
Pref-R: A Qualitative Preference Reasoner
AVT-DTL -- software for learning decision tree classifiers from attribute value taxonomies and data and some sample data sets and attribute value taxonomies are available for download
AVT-NBL -- software for learning decision naive bayes classifiers from attribute value taxonomies and data and some sample data sets and attribute value taxonomies are available for download.
BALLET (BALancing selection LikElihood Test), a program written in C that can perform genomic scans for balancing selection
SweepFinder2, a program written in C that can perform genomic scans for recent selective sweeps selection while controlling for background selection and mutation rate variation
CDROM, an R implementation method for classifying duplicated gene retention mechanisms
Flynotyper, a quantitative tool for functional genetic analysis in D. melanogaster
Athena, Analysis Tool for Heritable and Environmental Network
Biobin, stand alone command line application, for investigating rare variant burden
Biofilter, an interface for accessing multiple, public human genetic data sources
genomeSIMLA, generates datasets using a forward-time population simulator which relies on random mating, genetic drift, recombination, and population growth to allow a population to naturally obtain LD features
Phenogram, for creating chromosomal ideograms
PheWAS-View, for visually integrating PheWAS results
PLATO, a Platform for the Analysis, Translation and Organization of large-scale data
Synthesis-View, for data visualization
PARIS, Pathway Analysis by Randomization Incorporating Structure
pMDR, Parallel Multifactor Dimensionality Reduction for gene-gene and gene-environment interactions
LD-Spline, a database routine that defines the genomic boundaries a particular SNP represents using linkage disequilibrium statistics from the International HapMap Project
LD-Plus, a data visualization script for the display of single SNP statistics in the context of linkage disequilibrium and haplotype structures
Imputation, a method for inferring missing genotypes in a dataset
Spectrum/EPP: Estimation and Projection Package is used to estimate and project adult HIV prevalence and incidence from surveillance data.
IMIS: R-package for Incremental Mixture Importance Sampling. Reference: Raftery and Bao (2010) Biometrics.
Codeml_FE: A modified version of Ziheng Yang's codeml that implements 11 new fixed-effects codon models. Reference: Bao, Gu, Dunn and Bielawski (2007) BMC evolutionary Biology.
LiBaC: The primary use is to identify positively selected sites when the process of evolution is highly heterogeneous among sites. Reference: Bao, Gu, Dunn and Bielawski (2008) Molecular Biology and Evolution.
MetRxn, a comprehensive collection of consistent metabolite and reaction entities for use in metabolic analysis and model construction
Precursor Identifier, Identify biomass precursors that are not produced upon essential (synthetic lethal) gene deletion
OptCom, a comprehensive modeling framework for the flux balance analysis of microbial communities
OptForce, identify the minimal set of genetic interventions that shape the metabolism of a microorganism
SL Finder identify synthetic lethal genes or reactions in genome-scale metabolic models
EMU generator, Elementary Metabolite Unit generation code for isotope mapping models
GrowMatch, reconciling in silico predictions with in vivo growth observations
GapFind/GapFill, identifying and filling network gaps for genome-scale metabolic models
FCF, Flux Coupling Analysis
OptKnock, strain redesign for overproduction using gene/reaction deletions
IPRO, integrated environment for various protein engineering tasks
MAPs, a database of Modular Antibody Parts for predicting and designing antibody variable domains
OptZyme ,enzyme redesign through the use of transition state analogues
OptCDR, de novo design of antibody Complementarity Determining Regions for binding targeted epitopes in antigens
eShuffle, prediction of crossover distributions using DNA shuffling