Information Retrieval Intelligence

Your Source for Information Retrieval and Intelligence
"Where Marketing Meets Science"

Mi Islita is a research site about information retrieval, data mining, and search engine technologies. Our content is frequently referenced or used by both IR scholars and search engine marketers.

Resources

IR Watch - The Newsletter - A newsletter on research that normally does not reach mainstream.
IR Thoughts - A blog wherein we comment on IR and search engines and debunk search marketing myths.
IR Calls - A list of conferences and industry events we recommend you to attend.
IR Tutorials - Tutorials on Vector Space and LSI Models, Matrix Algebra, and more.
Educational Links - Graduate theses and research projects referencing Mi Islita.
Marketing Links - Search engine marketing articles referencing Mi Islita.

Tables of Correlation Features Useful for data mining and analysis

Tables of Correlation Features - These tables of correlation features are presented for the convenience of analysts and for use with statistical and practical significance tests. Now you can quickly discriminate between correlation coefficients and their statistical or practical significance.

Correlation Coefficients A tutorial to dispel SEOs quack "science".

A tutorial on Correlation Coefficients - This is a three-section tutorial on correlation coefficients. Section one discusses the proper way of computing r values. Section two describes statistical applications. Section three discusses recent research on the problem of correlation coefficients as biased estimators.

Standard Errors! A tutorial that any Web miner should read.

A Tutorial on Standard Errors - Every statistic from a sample distribution has a standard error that is specific to that statistic. Using the incorrect definition for a standard error invalidates any research study.

Fractals with a browser! And you thought this was not possible!

A Web Browser Approach to the Design of Fractals and Multifractals - The traditional way of constructing fractals is with computer algorithms. Some of these use pixel-by-pixel drawing techniques wherein the output of a recursive function is evaluated against a predefined condition. Other strategies or combination of techniques use HTML tables, image files, VML, SVG, or the canvas tag introduced in HTML5. In most cases, implementing these strategies and techniques involves a learning curve, requires of a large number of iterations, and consumes server resources. The purpose of this article is to present an alternative to the design of fractals that does not require of any additional software or special resources, but of a Web browser that supports CSS and XHTML. Classic fractals as well as Sierpinski Carpets generated with Mann Iteration Method and probabilistic methods are reproduced. The impact of CSS sub-pixel rounding errors on the patterns, and caused by five different browsers, is documented. End users can test the proposed approach by downloading the corresponding Fractal Source Codes to their local machines.

Other Additions

Fractal Resources Index - A sub-site within Mi Islita.com. This is a collection of resources about fractals and their application to language modeling, information retrieval, Web design, information security, and affine areas.
IP Packet Fragmentation Tutorial - A Tutorial on IP Packet Fragmentation Analysis.
MTU and MSS Tutorial - A Tutorial on Maximum Transmission Unit and Maximum Segment Size Calculations.
RSJ-PM: Probabilistic Model Tutorial: - A Tutorial on the Robertson-Sparck Jones Probabilistic Model for Information Retrieval.
PCA and SPCA Tutorial - A tutorial on Principal Component Analysis and Standardized PCA.
A Linear Algebra Approach to the Vector Space Model - An improved fast track tutorial.
Binary Similarity Calculator - Multiple similarity calculators accessible through a single interface.
Levenshtein Edit Distance Tool - A tool for making edits (insertions, substitutions, and deletions).

Feel free to inspect our site. Enjoy your staying.

Graduate Courses

AIRWeb: Web Spam and Internet Vulnerabilities: Covers Adversarial Information Retrieval, including Web Spam and Internet Vulnerabilities. Lectures are based on research papers presented at the AIRWeb Workshops. Blog Category
Search Engines Architecture: Covers the construction and implementation of main components utilized by search engines (forward and inverted indexes, web crawlers, parsers, search interfaces, etc). This is a hands-on course with lab sessions. Blog Category
Web Mining and Business Intelligence: Covers Web Mining, search engines, and business intelligence. Students will learn by doing: (a) how search engines index and rank web documents, (b) how to conduct business intelligence from online resources, and (c) how to apply Web Mining strategies and algorithms in their research or workplace. Blog Category

Information Retrieval Tutorials and Fast Tracks

Check these other IR Tutorials.

Recent Tutorials | LSI Category Posts | Hot Blog Post: A Call to SEOs Claiming to Sell LSI

Data Mining

Association and Scalar Clusters Tutorial - Part 1: Back Mapping Term Clusters to Documents
Row-Pruning Algorithm Tutorial

Matrix Algebra

Eigenvalues and Eigenvectors
Matrix Operations
Stochastic Matrices

Vector Space and Indexing

Document Indexing
Cosine Similarity
EF-Ratios

Probabilistic Models

RSJ-PM: Robertson-Sparck Jones Probabilistic Model

SVD and LSI

LSI Keyword Research and Co-Occurrence Theory
Latent Semantic Indexing (LSI) How-to Calculations
Computing the Full SVD of a Matrix
Computing Singular Values
Understanding SVD and LSI

Fast Tracks

LSI Keyword Research - A Fast Track Tutorial
Latent Semantic Indexing (LSI) Fast Track Tutorial
Singular Value Decomposition (SVD) Fast Track Tutorial

Conference Engagements

See us at any of these conferences or events.

Location

W3C AIRWeb'09, Madrid, Spain - Program Committee Member 2009/04/21

2009 SIDIM XXIV University of Puerto Rico, Rio Piedras - Speaker: Scaled Inverse Document Frequency Abstract | Program and abstracts book 2009/3/6,7

University of Puerto Rico, Bayamon Lecture: Understanding Search Engines Abstract 2008/04/23

W3C AIRWeb'08, Beijing, China - Program Committee Member 2008/04/22

University of Turabo, Gurabo Lecture: Web Mining, Search Engines, and Information Security Abstract 2008/02/21

ICANN's 29th International Public Meeting - Attendee 2007/06/25

W3C AIRWeb'07, Banff, Canada - Program Committee Member 2007/05/08

Polytechnic University, Turing Lab Lecture: Search Engines, Vector Space Models, and LSI - Speaker 2007/04/26

IntekTel International Technology Conference and Expo, San Juan, Puerto Rico - Speaker 2007/04/12

Interamerican University, Bayamon Lecture: Introduction to Search Technologies, Dept of Electrical Engineering - Speaker 2007/03/16

OJOBuscador Congress 2, Madrid, Spain - Speaker 2007/03/08

About Mi Islita

Want to know more about this site?

Founded by Dr. E. Garcia, Mi Islita's main audience are university researchers, computer science students, and search engine marketers. Dr. Garcia's research interests include chemometrics, statistical optimization methods, applied fractal geometry, Chaos, and information retrieval. He is available for speaking engagements, peer review work, and consulting. For speaking engagements organizers are responsible for any travel and accomodation arrangements. For additional information contact admin@miislita.com.

Peer Review Work

W3C AIRWeb09; Adversarial Information Retrieval on the Web, PC Member.
W3C AIRWeb08; Adversarial Information Retrieval on the Web, PC Member.
Universidad del Turabo, School of Engineering Graduate Network Security Certificate Program, Puerto Rico Higher Education Council's evaluation committee president.
W3C AIRWeb07; Adversarial Information Retrieval on the Web, PC Member.
JASIST, Journal of the American Society for Information Science and Technology, Edited by Dr. Donald Kraft, Lousiana State University.

Status of the Current Document

ACCESSIBILITY STATEMENT →