News
Features
The Lemur Toolkit
Indri Search Engine
Lemur Wiki
Download
People
Discussion
Tutorials
Sign Up

 
CMU - Language Technologies Institute
Carnegie Mellon University
CIIR, University of Massachusetts Amherst
University of Massachusetts
 

The Lemur Project is sponsored by the Advanced Research and Development Activity in Information Technology (ARDA) under its Statistical Language Modeling for Information Retrieval Research Program and by the National Science Foundation.

The Lemur Toolkit

for Language Modeling and Information Retrieval

The Lemur Toolkit is a open-source toolkit designed to facilitate research in language modeling and information retrieval. Lemur supports a wide range of industrial and research language applications such as ad-hoc retrieval, site-search, and text mining.

The toolkit supports indexing of large-scale text databases, the construction of simple language models for documents, queries, or subcollections, and the implementation of retrieval systems based on language models as well as a variety of other retrieval models. The system is written in the C and C++ languages, and is designed as a research system to run under Unix operating systems, although it can also run under Windows.


As part of the migration of portions of the Lemur Toolkit to SourceForge, we have recently opened up bug tracking, feature requests, and support requests so that you can directly submit these items to us, the developers.

Browsing of bugs, feature requests and support requests are open to anyone, but if you wish to add a bug, feature request, or support request, you need to have an account on SourceForge. If you need to create an account, you can create one here.


News and announcements about the Lemur Toolkit, such as the latest release notes, upcoming releases and known problems with current versions.
 
An "at-a-glance" listing of features within the Lemur Toolkit.
 
How to install and use the Lemur Toolkit, together with code-level documentation, applications guides, working with offset annotations and beginners guides.
 
More about Indri, Lemur's latest search engine that is also available on its own when all you need is a search engine. Indri has an index capable of indexing very large collections and a structured query language that supports fields and passages. Search the collected works of William Shakespeare with Indri.
 
Wiki pages of documentation for the Lemur Toolkit and Indri Search Engine. Includes articles on using the toolkit, programming with the tookit and example code.
 
Get all of the source, executables and data files for the toolkit here (hosted by SourceForge). Looking for an older version? Try the download archives.
 
Key contributors to the Lemur Toolkit.
 
Open discussion forum for users and developers of the toolkit.
 
A set of tutorials and trails to help you get started working with the Lemur Toolkit.
 
to be notified of new releases and updates to Lemur (hosted by SourceForge).

 

The toolkit is in constant development as part of the Lemur Project, a collaboration between the Computer Science Department at the University of Massachusetts and the School of Computer Science at Carnegie Mellon University.

The current system was primarily designed and written at Carnegie Mellon University and at the University of Massachusetts, Amherst. If you have any comments about this work, or are interested in using the toolkit for your own purposes, we would like to hear from you. Please send us some email.

 


The Lemur Project The Lemur Project
Last modified:December 19, 2007. 14:27:09 pm