Last Updated: December 18, 2008
Table of ContentsFor statistical computing resources and other software for accurate computing, such as high-precision libraries, optimizers, and random number generators see our statistical computing page. And for software written by me for data distribution, accuracy, and replication see my software page. For sources of research data, see my Data Resources page.
Where to Start |
||
The R Statistical Language | The open source statistical
language of choice for most tasks. Based on the 'S' language. Thousands
of contributed packages |
GPL |
Other General Statistics Packages |
||
ADE | A modular multi-variate analysis program which includes modules for spatial data analysis. Plays well with R. | GPL |
Adamsoft | A general purpose package that specialized in client-server based data management, and large-data/low memory computations. Good for large datasets. | GPL
|
DataPlot | A powerful, but somewhat byzantine package from the National Institute of Standards | OSS |
Gretl | An open source econometrics package that plays nicely with R | GPL |
ExaStat | Basic statistics and regression on large data, using Windows. |
OSS |
Macanova | Reasonably powerful & programmable, if not easy to use. |
GPL |
OpenStat | General package focusing on teaching, IRT. |
OSS |
PSPP | Aspires to replace SPSS. Reads SPSS files and provides the data manipulation functions, but is missing most of the analytical features. | GPL |
Simfit | Reasonably powerful with emphasis on simulation , command-line. |
OSS |
WinIADAMS | A free Windows package for exploratory analysis, time series, and linear models. Nice interactive multi-dimensional table browser and interactive plots. | No
Source |
Accurate Statistics |
||
(The following modules for R, are very useful for highly accurate statistical computing on hard problem. For more resources, and computing libraries, see my Resources for Accurate Computing page. ) | ||
accuracy | Sensitivity analysis and true random number generation | GPL |
gmp | Multiple precision arithmetic |
GPL |
OpenTURNS | Tools for modeling uncerntainty and risks. |
GPL |
rgenoud | Optimizer using genetic algorithms and derivatives | GPL |
rstream | Parallelizable random number generators | GPL |
trust | Trust region based optimization | GPL |
UNF | Universal Numeric Fingerprints -- format independent data validation. | GPL |
Data-Interactive Graphics (Data Visualization) |
||
Also see the plotting category. |
||
Gaugin | Grouping, glyphs, tableplots, oh my.
|
GPL |
GGobi | Supports data interactive
visualization, exploration, comp, and analysis. Includes automated
projection pursuit in high-dimensions. |
GPL |
Improvise | A Java toolkit for linked visualizations. |
GPL |
KLIMT | Interactive analysis of classification and regression trees
|
GPL |
LabPlot |
Data analysis and visualization |
GPL |
Mondrian | Mondrian is especially useful
for interactive visualization of categorical data, and very large
datasets. |
No
Source |
OPEN DX | Generates visualizations and
animations for very large scale scientific data |
OSS |
ParaView |
Parallel visualization of large
datasets. |
GPL |
prefuse | Java visualization toolkit |
OSS |
Processing | A language for rapid developmet of interactive data visualizations. Well integrated with Java and can produced polished visualizations. |
OSS |
VISIT |
Parallel large data
visualization software |
|
VISTA |
Dynamic, interactive, multi-view
graphics. Plus a very interesting visual user-interface, akin to
data-desk, but more advanced statistically. |
GPL |
Data Plotting (and publication-ready graphics) |
||
Almost
all of the tools listed on this page have some sort of graphing
capabilities. These packages specialize in it. Also see the visualization category. |
||
Gnuplot | Command-line driven plots in 2D,
and 3D. |
GPL |
GUPPI | Extensible plotting tool for
Gnome. |
GPL |
Jas3 | A visualization and curve fitting package in java. | GPL |
SciGraphica | High performance plotting package similar to Microcal Origin. | GPL |
Image and Plot Analysis |
||
These
packages can be used to manipulate images, extract quantitative
information from images, including recovering data from published plots
and graphs. |
||
DataScan | Extracts information from
topographic images, microscopic images, and others. |
OSS |
g3data |
Specifically for extracting data
from published graphs. |
GPL |
Image/J | Can extract data from scanned maps, charts, graphs and even photos. | OSS |
Scion
Image |
Programmable image program with
data capture capabilities. |
No
source |
Data Mining |
||
Also see the categories on text mining and machine learning | ||
Auton Labs Software | Dozens of independent backages for machine learning, includig many classifiers. | Source Available (registration required) |
Databionic | Clustering, visualization, and classification using emergent self-organizing maps. | GPL |
Knime | Supports data pipelines for data processing, clustering, supervised learning, etc. GUI, CLI and API based. | OSS |
ORange | Predictive modeling, ensemble methods, clustering and validation, using C components and GUI widgets, and Python integration. | GPL |
Rattle | A Gnome based interface that glues together a large number of (clustering, association, machine learing, evaluation) modules in R for data mining |
GPL |
Shogun | Machine learning toolbox with multiple SVM,LDA, LPM classifiers. C++ with interfaces for Octave, R, Matlab, Python | GPL |
Tanagra | Supports data processing streams including clustering, supervised learing, meta-spv, and cross-validation. Provides a GUI interface. |
OSS |
Qualitative Data Manipulation, Management, Mining and Analysis |
||
A list
of commercial and non-commercial tools for qualitative analysis is part
of the open
directory project and a well-subscribed discussion list about
software can be found as part of jisc,
and a comparison of QDAS packages is here.
The Natural Language Processing TaskView describes many R packages (interfaces to external toolkits) for text understanding.
The ML-Interfaces package on BioConductor provides a
uniform interface to a large set of machine learning packages in R. |
||
Advene | Video annotation |
OSS |
AnSWR | From the CDC, for mixed qualitative/quantitative analysis. |
No Source |
Automap/ORA | Text tagging (similar to Atlas-TI), with more linguistic coding options, visualization and analysis of network of concepts identified . | No Source |
Elan | For complex annotation of audio and video. | GPL |
EZ-Text |
From the CDC, for textual data
analysis. |
No
Source |
Gate | A toolkit for information extraction from text . | GPL |
Judge |
Performs automatic classification and clustering of documents, | GPL |
Lingpipe | Java librarie for linguistic processing and analysis. |
No Commercial Use |
Kea | Performs automated key phrase extraction. | GPL |
Language Archiving Technology |
A hosted service for text
management and analysis. |
Hosted |
NLTK |
A python toolkit for natural language processsing. Includes tutorials on NPL. |
GPL |
Perl | The programming language for supreme text mangling. | OSS |
Pliny | For annotating documents, text and images, and generating maps and graphs of relationships. | OSS |
SIL tools |
If you have a lot of text on-line, the concordance, indexing, and database from the Summer Institute of Linguistics may be what you need. | No
Source |
Tabari | Uses special purpose rules for categorizing news events from new text. | GPL |
Tams | Textual analysis and markup. Similar to Atlas-TI. |
GPL |
TextStat | Another indexing/concordance package. | GPL |
VUE |
Visual understanding environment. Allows you to create annotated networks of multimedia objects for presentation and commentary. A sort of non-linear, scholarly, PowerPoint. |
OSS |
Weft |
For qualitative data management and coding. |
No Source |
Weka | Weka is a collection of machine learning algorithms for data mining, including text mining. (R-Weka connects Weka and R, and is available on CRAN). | GPL |
Wordfish |
Scaling software for estimating political positions from texts. |
GPL |
YALE (now RapidMiner) | A flexible standalone package that contains many data mining algorithms. | GPL |
Spatial Statistics and GIS |
||
In
addition to the individual packages below, the Free GIS Site and OpenSourceGis sites maintains
lists of many open-source GIS packages. The CISSS Tools
Clearinghouse maintains links to many spatial analysis programs.
Kelly pace gives a list of links to software
for advanced spatiotemporal econometrics. The AI-geostats software page has a
links to geo-spatial statistics programs and code. And Rgeo lists lots of
contributed packages for doing geospatial statistics with R, including 'fields', 'geoR',
'graper' , 'grass', and 'spatstat'. |
||
Choroware | Chloropleth maps with genetic algorithm generated class intervals. | GPL |
CrimeStat | Network, spatial and statistical analysis for crime data. Created for the National Institute of Justice. | No Source |
Fragstats | Designed to compute a wide variety of landscape metrics for categorical map patterns | GPL |
Geoda | Unusual in in its combination of GIS and spatial econmetrics. | No
Source |
Geovista Studio | General GIS toolkit and exploratory data analysis system | GPL |
Grass | One the most powerful, free, geographic information system for the display of spatial data. | GPL |
LandSerf | Land surface visualization and analysis | No Source |
SatScan | Space-time scan statistics -- for analysis of disease and other clusters distributed in space and time | No
Source |
SAGA | Combines GIS with kriging and terrain analysis | GPL |
Spatial Econometrics Lib. |
A library of Matlab functions
for advanced spatial, and spatiotemporal econometric analysis |
OSS |
STARS |
Space time analysis of regional
systems. Designed for the dynamic exploratory analysis of data measured
for areal units at multiple points in time. If you have
spatial time-series data, check this. |
GPL |
Survey Data Collection and Analysis |
||
The
general software packages above have some facilities for survey
analysis. The programs below specialize in data collection and/or the
analysis of complex surveys. Also see the Epidemiology
section. |
||
AM | Handles analysis of complex survey samples, such as NAEP and TIMMS | No
Source |
dopoxtools | Free research web survey hosting | Hosted |
Mod_survey | A very mature open source survey
system. It is implemented as a drop-in apache module. It supports
creation of survey templates using XML, and export of the resulting
data in a number of interchange formats. Mod_survey can be configured
in a decentralized way, so that all users on a particular web server
can administer their own surveys independently. (Also see YaaCs, below) |
GPL |
OpenSurveyPilot | Server based web survey system |
GPL |
PHPEsp | PHP based web survey system | GPL |
Lime Survey | PHP based web survey system |
GPL |
PEBL | A programming environment for building interactive psychology experiments | GPL |
protogenie | Free research web survey hosting | Hosted |
PsychExps | A repository of experimental design scripts to be run under the macromedia authorware environment. | Mixed |
Quex Suite | Web based CATI system with integrated VOiP (Asterix), XML form language, and paper form scanning capability.. | GPL |
SurveyWiz | Simple JavaScript based web
survey system |
GPL |
TESS | Time-Sharing Experiments for the Social Sciences. n NSF funded infrastructure to provide both web and phone surveys. | Hosted |
WebExp2 | A java-based system for on-line psych experiments. | No
Source |
YaaCs | A CATI system that uses Mod_survey for the data collection, and offers additional management of other phases of the survey work flow -- questionnaire building, interviewer management, etc. | GPL |
Agent-Based Simulation |
||
The International Society for Artificial Life maintains a list of links to many agent-based simulation framework. | ||
Ascape | Agent based simulation package |
GPL |
breve | Simulation in a 3-D world, using Python or a simple scripting language. | GPL |
EVO |
A simulation environment for
co-evolution, based on SWARM |
OSS |
MASON |
A java-based agent-based modeling system popular in political science | OSS |
NetLogo |
An updated dialect of the Logo
language for multi-agent simulation |
No Source |
REPAST | A multi agent simulation
toolkit, with multiple implementations and built in adaptive features |
OSS |
Sesam | Simulation system with cool visual model building interface. | OSS |
SOAR | Agent based modeling based on cognitive/AI constructs. | GPL |
Swarm | A mature, full-featured
framework for agent-based modeling, built in Objective C |
GPL |
Dynamic Event Simulation |
||
This overlaps with Agent-Based Simulation above. I have listed only packages below, but several programmng libraries are also available, including: DSOL (Java), SimPy (Python), Adevs (C++) and DeX (Python, C++, Scripting). | ||
Desmo-J | Discrete event simulation framework | GPL |
OMNet++ | OMNeT++ is a component-based, modular and open-architecture simulation environment with strong GUI support and an embeddable simulation kernel, focussing on communication networks, but general enough to be used for network, systems, and business process simulation. | Academic Source License (not open source) |
Monte Carlo and Markov-Chain Monte Carlo (MCMC) Simulation |
||
R, and
many of the other general packages above can be used for MC simulation. R also has a number of modules to
perform Bayesian MCMC analysis directly, and through communicating with
BUGS, and JAGS. |
||
JAGS |
Just another GIBBS sampler. A
program for Bayesian hierarchical models. ("Not unlike BUGS") |
GPL |
MCMCpack |
An R module to perform MCMC based
analysis. Very easy to use, since it contains a large variety of
pre-configured models |
GPL |
McSim | A specially tailored Monte Carlo
simulation package. Goes well beyond general packages. |
GPL |
OpenBugs | Open source rewrite of BUGS for bayesian simulation | GPL |
WinBUGS |
Still the best BUGS for windows,
but not OSS. |
No
Source |
Specialized Statistical Packages | ||
Blossom | multi-response permutation tests | No Source |
Fityk |
Nonlinear peak fitting. |
GPL |
Gambit | game theory made simple(r) | OSS |
gSwing |
Election result tracking and
display |
GPL |
M.D. Anderson Cancer Center | Has useful biostat software from
the biostats department. |
Mixed. |
MDSX | Multidimensional Scaling Routines for Windows | No Source |
MPCA |
Discrete and independent
component analysis. |
GPL |
MX | Structureal Equation Modeling
(like LISREL) |
No
Source |
PAST | PAlaeontological STatistics. Not strictly social science, of course, but the correspondence analysis, geometric analysis and cladistics could be applied fruitfully. | No Source |
Sitkis | Computes common bibilometric network statistics. | No Source |
Permap |
Perceptual maps created through interactive multidmensional scaling. |
No Source |
TETRAD | A LISREL like structural equation modeling program | GPL |
TDA | Transition Data Analysis.A system for analyzing event data , supports lots of options and models | GPL |
Voteview |
Voteview and nominate are for
viewing and analyzing roll-call voting. |
GPL |
Epidemiology | ||
The CDC Software Page also offers a set of special packages for sampling design factors, meta-analysis, and spatial analysis.The WWW Virtual Epidemiology Library. Also see the category on survey tools. | ||
MIX | Guided interactive meta-analysis. |
GPL |
Epidata | Provides for programmed data
entry and simple analysis. |
No source. |
Epigrass |
Epigrass is a software for visualizing, analyzing and simulating of epidemic processes on geo-referenced networks. |
GPL
|
Epi-info | Epidemiological statistics,
maps, reports. |
No
Source |
Openepi |
Javascript-based (on or off-line) simple epidemiological statistics. |
OSS |
Netepi |
Web based secure data entry and
analysis for epidemiology. |
GPL |
WinPepi |
over 75 modules for common epidemiolical methods. |
No Source |
Data Cleaning, and Management |
||
For managing qualitative data, see the Text Tools section. For other database options see the Free SQL List and The ACM's Sigmod List | ||
Berkeley DB |
A fast key-value based DB. Very
lightweight (much more lightweight than SQL, and does not require
separate server running). Very fast for key-based retrievals.Also see
thefilehash and R.huge packages
for using key-value DB's in R. |
OSS |
CCOUNT | Does data cleaning, advanced cross-tabulation, and other market research function. Also reads many mainframe-style data formats (e.g. EBCDC, Column Binary). Modeled after SPSS Quantum. | GPL |
CSPRO | Does form base data entry,
crosstabulation, and mapping. From the U.S. Census. |
GPL |
DataCleaner | Tools for data review and editing. |
OSS |
HDF |
Hierarchical Data Format -- a
portable format for representing and manipulating large scientific
datasets. The latest version is compatible with netcdf. Also see the
netcdf packages for R. |
|
IVEware | Multiple imputation for missing data | OSS |
MySql | One of the most mature and stable open source SQL databases. | GPL |
netCDF |
A portable format for
repesresenting and manipulating large scientific datasets. Also see the
netcdf package in R; the NCO package for manipulating netcdf data on the command line, and the Parallel-NetCDF package for high-speed access to NetCCDF data. |
GPL |
PostGRES | One of the most mature and
stable open source SQL databases. |
GPL |
R DBI |
Connects R and SQL databases. |
GPL |
Matrix Algebra, Symbolic Algebra, and
Computational Algebra Systems |
||
These are standalone systems. For related programmer's libraries see my Resources for Numerical Accuracy listing. The following feature comparison contrasts these and a dozen other more specialized packages. | ||
Axiom |
Computer algebra. Lots of functions. Good documentation | GPL |
Giac/Xcas | A computer algebra system. Included limited compatibility with Maple, MuPad and TI89 syntax; arbitrary precision |
GPL |
Ginac | A computer algebra system. (C++ Library) |
GPL |
FreeMat | Matrix algebra system. Matlab
compatibility and built-in parallelization. |
GPL |
GAP |
Computer algebra system for
group theory. Computatinal discrete algebra. |
OSS |
JACAL. | A computer algebra system. | GPL |
Magnus |
Computer algebra system for group theory. | GPL |
matrex |
A 'spreadsheet' where each cell is a matrix. Provides graphing, presentations, multi-threaded function-based calculations | GPL |
Mathomatic |
Yet another computer algebra
system |
GPL |
Maxima | A computer algebra system. | GPL |
OCTAVE | A matrix manipulation/mathematics environment like Matlab. Mature. | GPL |
PARI/GP | A computer algebra system with arbitrary precision arithmetic, like Maple or Mathematica. | GPL |
RLAB | A matrix manipulation
environment. |
GPL |
SAGE | General purpose mathematical
computing environment |
GPL |
SciLab | A matrix manipulation/mathematics environment like Matlab. Mature. | GPL |
Tela | Tensor computing |
GPL |
YACAS | Yet another computer algebra
system. (Eponymous) Comes with Euler, for numerical programming. |
GPL |
Yorick |
An older matrix language. |
OSS |
Social Network Analysis |
||
Also see
the Spatial category above for software with
complementary and overlapping spatial network and display features. |
||
Bibexcel |
Bibliometric citation analysis. |
No Source |
CiteSpace |
Visualizes networks over time. |
No Source |
Cfinder |
Uses the clique percolation method to find overlapping dense groups of nodes. |
No Source |
Egonet |
Collection and analysis of egocentric network data. |
No Source |
GraphViz |
Mathematical graph visualization |
OSS |
Insoshi |
A social network platform -- useful for data collection. |
GPL |
Nettvis |
Analyze and visualize social networks. Includes an on-line service. |
GPL |
NetworkX |
Python toolkit for visualization and analysis |
OSS |
NWD |
Network workbench, visualization and descriptives. |
OSS |
Pajek | Graph clustering, partitioning, citation analysis, network comparison (differences, unions), metrics. | No Source |
Proximity |
Visualization and knowledge discovery from heterogenous relational networks. |
OSS |
R Modules for Network Analysis |
A number of R modules mainatined by Carter Butts, including SNA, network, nettheory, metamatrix . Also see Statnet for more R network packages. |
OSS |
Sitkis | Computes common bibilometric network statistics. | No Source |
SocNetV |
Provides core graph measures for
social network analysis |
GPL |
Sonia |
Animated visualizations of logitudinal social networks
|
GPL |
STOCNET | Analysis of some interesting
models, including evolution of social networks, blockmodeling, dyadic variable and actor anlaysis, maximum likelihood analysis of longitudinal (evolution of) networks (through SIENNA) , core network analysis. |
GPL |
Tulip | Visualization for extremely large graphs. Plugins are available for clustering and core graph metrics. | GPL |
VISONE | Provides core graph measures for social network analysis | No
source |
WinMine | Bayesian and dependency (decision-tree) network builder | No
source |
Differential Equations and Dynamic Simulation |
||
A good list of dynamic simulation packages is maintained by the SIAM activity group on dynamic systems. | ||
PETC | scientific toolkit for differential equations | No Source |
scirun | A scientific environment for simulation and PDE's. | No
source |
SUNDIALS | Nonlinear and differential/algebraic equation Solver | OSS |
Machine Learning |
||
A good list of machine learning tools is at mloss.org. Also see the categories on text mining and data mining | ||
dysii | C++ Library for probablistic learning within dynamic systems, high peformance. | GPL |
There are some web-based statistics tutorials out there, but none that I like. I recommend some readings:
"Entia non sunt mutiplicanda sine necessitate" - William of Ockham's rule
"Ad indicia spectate." - Micah's corollary
"Doing econometrics is like trying to learn the laws of electricity by playing the radio." - Orcutt's observation
"One problem with political science is that its laboratories are unsecured, allowing real people to roam around inside them, spitting in test tubes and fiddling with computers" - Walter Kirn
"You can see a lot, just by looking." - Yogi Berra
Copyright © 1995-2010 | Micah Altman |