The Huck Institutes of the Life Sciences


A list of bioinformatics tools, platforms and software developed by Penn State researchers for biological data analysis.


Galaxy, an open, web-based platform for accessible, reproducible and transparent computational biomedical research

Biostars-Bioinformatics Explained, a forum to explore bioinformatics, computational genomics and biological data analyses

Biostars-Galaxy Explained, a forum to explore Galaxy

Neurostars, a forum to engage Neuroinformatics community

Bioconductor User Support

PyBluea simple static site generator

Genetrack, a bioinformatics software package for sorting, queirying and visualizing interval oriented data

BooleanNet, a Boolean network simulation software for life science


Genome browser with erythroid transcription factor occupancy and other features of gene regulation, genome-wide in mouse


KmerGenie, kmer size selection for genome assembly

TwoPaCo, deBruijn graph construction from complete genomes

bcalm, deBruijn graph compaction in low memory

FlowgramFixer, base caller for IonTorrent sequencing data


SPRITE, parallel SNP Detection Pipeline

FASCIA, parallel subgraph counting for determining approximate counts of tree-structured subgraphs in large networks


BEAM (Source code), BEAM2 (Source code), BEAM3  (Souce code, compiling needs GNU Scientific Library) and BEAMimpute  for  SNP-SNP interaction association mapping

PASSPASS2 (Source code, herePeak calling in ChIP data based on Poisson de-clumping, controls FWER and FDR

GPASS for detecting SNP disease associations in case control studies

dCaP Joint peak caller and differential binding detector for ChIP-Seq data in multiple samples.

DBM Dynamic Bayesian Markov model for genotype calling, haplotype inference, de novo inference of population structure and local admixture for next-gen sequencing data

TIPS Tree based Bayesian detection method of subtle population structures

CHB Coalescence guided Baysian inference of haplotypes from genotype data

EulerAlign Alignment of DNA sequences using Eulerian graphs


MultiGPS, a framework for analyzing collections of multi-condition ChiP-seq datasets

STAMP, a webserver resource for aligning transcription factor DNA binding motifs

SOMBRERO, a motif-finder that is based on the Self-Organizing Map neural network algorithm

RescueNetuses the Self-Organizing Map neural network algorithm for codon usage anaysis and gene-prediction


PipMaker and MultiPipMaker server software  (bzipped tar file of source code; beta version; latest release: 2011-Aug-12)

LASTZ alignment program  (latest release: 1.02.00, 2010-Jan-12)

[↑] BLASTZ alignment program  (obsolete; replaced by LASTZ)  (gzipped tar file of source code; latest release: 2004-Dec-22)   [publication]

Multiz and TBA alignment programs  (gzipped tar file of source code; latest release: 2009-Jan-21)   [publication]   [documentation]

Sim4 alignment program  revised 2012-Oct-10   [instructions]   [publication]  


VennGenerator  (latest release: 2009-Jul-23)

DIAL  (gzipped tar file of source code; latest release: 2011-Jun-06)

YASRA , Yet Another Short Read Assembler (gzipped tar file of source code; latest release: 2014-Mar-27)

CHAP  (fast version; gzipped tar file; 71 Mb; 2011-Aug-02)

CHAP 2  (link to GitHub)


StructureFold, at Galaxy, for RNA secondary structure mapping and reconstruction

ShortStack, for comprehensive annotation and quantification of small RNA genes


PS-HomPPI: Partner-Specific Protein-Protein Interface Residue Predictor

NPS-HomPPI: Non-Partner-Specific Protein-Protein Interface Residue Predictor

PrISE : Prediction of protein-protein Interface residues using Structural Elements

DockRank: Rank Docked Models Using Predicted Partner-Specific Protein-Protein Binding Sites

RNABindRPlus: A server for predicting RNA-binding residues in proteins using a combination of sequence-homology and machine learning methods

RNABindR v2.0: A server for predicting RNA-binding residues in proteins

FastRNABindR: A server for large-scale prediction of protein-RNA interface residues

BCPREDS: B-cell epitope prediction server

MHCMIR: Predicting peptide-MHC-II binding affinity

BacGen: Predicting protective bacterial antigens

BiNA: Biomolecular Network Alignment

EnsembleGly: glycosylation site prediction


PRIDB: The Protein-RNA Interaction Database

ProtinDB - PROTein-protein INterface residues Data Base


INDUS - INtelligent Data Understanding System

EpiT: Epitopes Toolkit

Gennotate: Genome Annotation Toolkiti

Pref-R: A Qualitative Preference Reasoner

AVT-DTL -- software for learning decision tree classifiers from attribute value taxonomies and data and some sample data sets and attribute value taxonomies are available for download

AVT-NBL -- software for learning decision naive bayes classifiers from attribute value taxonomies and data and some sample data sets and attribute value taxonomies are available for download.


BALLET (BALancing selection LikElihood Test), a program written in C that can perform genomic scans for balancing selection

SweepFinder2, a program written in C that can perform genomic scans for recent selective sweeps selection while controlling for background selection and mutation rate variation

CDROM, an R implementation method for classifying duplicated gene retention mechanisms


Flynotyper, a quantitative tool for functional genetic analysis in D. melanogaster


Athena, Analysis Tool for Heritable and Environmental Network

Biobin, stand alone command line application, for investigating rare variant burden

Biofilter, an interface for accessing  multiple, public human genetic data sources

genomeSIMLAgenerates datasets using a forward-time population simulator which relies on random mating, genetic drift, recombination, and population growth to allow a population to naturally obtain LD features

Phenogram, for creating chromosomal ideograms

PheWAS-View, for visually integrating PheWAS results

PLATO, a Platform for the Analysis, Translation and Organization of large-scale data

Synthesis-View, for data visualization

PARIS, Pathway Analysis by Randomization Incorporating Structure

pMDR, Parallel Multifactor Dimensionality Reduction for gene-gene and gene-environment interactions

LD-Splinea database routine that defines the genomic boundaries a particular SNP represents using linkage disequilibrium statistics from the International HapMap Project

LD-Plusa data visualization script for the display of single SNP statistics in the context of linkage disequilibrium and haplotype structures

Imputation, a method for inferring missing genotypes in a dataset


Spectrum/EPP: Estimation and Projection Package is used to estimate and project adult HIV prevalence and incidence from surveillance data.

IMIS: R-package for Incremental Mixture Importance Sampling. Reference: Raftery and Bao (2010) Biometrics.

Codeml_FE: A modified version of Ziheng Yang's codeml that implements 11 new fixed-effects codon models. Reference: Bao, Gu, Dunn and Bielawski (2007) BMC evolutionary Biology.

LiBaC: The primary use is to identify positively selected sites when the process of evolution is highly heterogeneous among sites. Reference: Bao, Gu, Dunn and Bielawski (2008) Molecular Biology and Evolution.


MetRxn, a comprehensive collection of consistent metabolite and reaction entities for use in metabolic analysis and model construction

Precursor Identifier,  Identify biomass precursors that are not produced upon essential (synthetic lethal) gene deletion

OptCom, a comprehensive modeling framework for the flux balance analysis of microbial communities

OptForce, identify the minimal set of genetic interventions that shape the metabolism of a microorganism

SL Finder identify synthetic lethal genes or reactions in genome-scale metabolic models

EMU generator, Elementary Metabolite Unit generation code for isotope mapping models

GrowMatch, reconciling in silico predictions with in vivo growth observations

GapFind/GapFill, identifying and filling network gaps for genome-scale metabolic models

FCF, Flux Coupling Analysis

OptKnock, strain redesign for overproduction using gene/reaction deletions

IPRO, integrated environment for various protein engineering tasks 

MAPs, a database of Modular Antibody Parts for predicting and designing antibody variable domains 

OptZyme ,enzyme redesign through the use of transition state analogues

OptCDR, de novo design of antibody Complementarity Determining Regions for binding targeted epitopes in antigens

eShuffle, prediction of crossover distributions using DNA shuffling