6.5.15

Open Source Search Engines


In this post I'm going to provide links to the most popular Open Source Search Engines. You can use them for searching information on your own computer or server. You can also study the source code to get the idea about how search engines are built. My aim is just to explore available resources and gather relevant information for you.

INDRI
  • Parses PDF, HTML, XML, and TREC documents
  • API can be used from Java, PHP, or C++
  • Works on Windows, Linux, Solaris and Mac OS X
  • Can be used on a cluster of machines for faster indexing and retrieval
  • Last update 01/05/2015
Apache Lucene
  • 100%-pure Java
  • small RAM requirements -only 1MB heap
  • index size roughly 20-30% the size of text indexed
  • fast, memory-efficient and typo-tolerant
  • ranked searching
  • many powerful query types
  • Cross-Platform Solution
  • pluggable ranking models

Lucene implementations in languages other than Java:

C++, .NET, Objective-C, C, Python, Perl, Ruby, Common Lisp, Zend Framework for PHP 5 and etc.

Managing Gigabytes for Java
  • Java
  • efficient implementation of phrase queries, proximity restrictions, ordered conjunction, and combined multiple-index queries
  • Indices can be built for a collection split in several parts, and combined later
  • Indices can be clustered both lexically and documentally
Swish-e
  • can index plain text, e-mail, PDF, HTML, XML, Microsoft® Word/PowerPoint/Excel
  • Includes a web spider for indexing remote documents over HTTP
  • Can report structural errors in your XML and HTML documents
Terrier IR Platform (written in Java), can support corporas written in languages other than English.

Xapian - written in C++, with bindings to allow use from Perl, Python, PHP, Java, Tcl, C#, Ruby, Lua, Erlang and Node.js. The latest stable version is 1.2.20, released on 2015-03-04.

ht://Dig Search Engine Software (download the most recent version)

Zettair (written in C).

Комментариев нет:

Отправить комментарий

Как создать свой сайт?