Spaql Book:
Specification for an OAI Static Repository and an OAI Static Repository Gateway
http://www.openarchives.org/OAI/2.0/guidelines-static-repository.htm
A Static Repository, introduced here, provides a simple approach for exposing relatively static and small collections of metadata records through the OAI-PMH. The Static Repository approach is targeted at organizations that:
- Have metadata collections ranging in size between 1 and 5000 records;
- Can make static content available through a network-accessible Web server;
- Need a technically simpler implementation strategy compared to acting as an OAI-PMH Repository, which requires processing OAI-PMH requests.
A Static Repository is an XML file that is made accessible at a persistent HTTP URL. The XML file contains metadata records and repository information.
A Static Repository becomes accessible via OAI-PMH through the intermediation of one Static Repository Gateway. The restriction that only one Static Repository Gateway acts as an intermediary for each Static Repository reduces potential problems with large-scale duplication of metadata records among OAI-PMH repositories. A Static Repository Gateway uses the metadata records and repository information, provided via XML in the Static Repository, to respond to the six OAI-PMH requests for access to that information. Because a Static Repository Gateway maps a unique Static Repository base URL to each such Static Repository, harvesters can access a Static Repository in exactly the same manner as they access any other OAI-PMH Repository.
The relationship between Static Repositories, a Static Repository Gateway, and an OAI-PMH harvester is illustrated in the figure below. The Static Repository and the Static Repository Gateway are described in the remainder of this document. Implementers whose sole interest is the creation of a Static Repository may skip Section 4 that describes the Static Repository Gateway.
SphinxSE MySQL Full Text Search
SphinxSE Full Text Search engine:
http://sphinxsearch.com/docs/current.html#about
Sphinx is a full-text search engine, publicly distributed under GPL version 2. Commercial licensing (eg. for embedded use) is available upon request.
Technically, Sphinx is a standalone software package provides fast and relevant full-text search functionality to client applications. It was specially designed to integrate well with SQL databases storing the data, and to be easily accessed by scripting languages. However, Sphinx does not depend on nor require any specific database to function.
Applications can access Sphinx search daemon (searchd) using any of the three different access methods: a) via Sphinx own implementation of MySQL network protocol (using a small SQL subset called SphinxQL, this is recommended way), b) via native search API (SphinxAPI) or c) via MySQL server with a pluggable storage engine (SphinxSE).
Official native SphinxAPI implementations for PHP, Perl, Python, Ruby and Java are included within the distribution package. API is very lightweight so porting it to a new language is known to take a few hours or days. Third party API ports and plugins exist for Perl, C#, Haskell, Ruby-on-Rails, and possibly other languages and frameworks.
Starting from version 1.10-beta, Sphinx supports two different indexing backends: „disk“ index backend, and „realtime“ (RT) index backend. Disk indexes support online full-text index rebuilds, but online updates can only be done on non-text (attribute) data. RT indexes additionally allow for online full-text index updates. Previous versions only supported disk indexes.
Data can be loaded into disk indexes using a so-called data source. Built-in sources can fetch data directly from MySQL, PostgreSQL, MSSQL, ODBC compliant database (Oracle, etc) or a pipe in TSV or a custom XML format. Adding new data sources drivers (eg. to natively support other DBMSes) is designed to be as easy as possible. RT indexes, as of 1.10-beta, can only be populated using SphinxQL.
http://www.sphinxsearch.com/wiki/doku.php?id=tutorials
2.3. Installing Sphinx packages on Debian and Ubuntu
There are two ways of getting Sphinx for Ubuntu: regular deb packages and the Launchpad PPA repository.
Deb packages:
- Sphinx requires a few libraries to be installed on Debian/Ubuntu. Use apt-get to download and install these dependencies:
$ sudo apt-get install mysql-client unixodbc libpq5
- Now you can install Sphinx:
$ sudo dpkg -i sphinxsearch_2.2.11-dev-0ubuntu12~trusty_amd64.deb
PPA repository (Ubuntu only).
Installing Sphinx is much easier from Sphinxsearch PPA repository, because you will get all dependencies and can also update Sphinx to the latest version with the same command.
- First, add Sphinxsearch repository and update the list of packages:
$ sudo add-apt-repository ppa:builds/sphinxsearch-rel22
$ sudo apt-get update
- Install/update sphinxsearch package:
$ sudo apt-get install sphinxsearch
Sphinx searchd
daemon can be started/stopped using service command:
$ sudo service sphinxsearch start
For a complete Tutorial see: https://www.digitalocean.com/community/tutorials/how-to-install-and-configure-sphinx-on-ubuntu-16-04
Prerequisites
Before you begin this guide, you will need:
- One Ubuntu 16.04 server.
- A sudo non-root user, which you can set up by following this tutorial.
- MySQL installed on your server, which you can set up by following the step 2 of this tutorial.
Step 1 — Installing Sphinx
Installing Sphinx on Ubuntu is easy because it’s in the native package repository. Install it using apt-get
.
Now you have successfully installed Sphinx on your server. Before starting the Sphinx daemon, let’s configure it.
Sphinx’s configuration should be in a file called sphinx.conf
in /etc/sphinxsearch
. The configuration consists of 3 main blocks that are essential to run: index, searchd, and source. We’ll provide an example configuration file for you to use, and explain each section so you can customize it later.
First, create the sphinx.conf
file.
Each of these index, searchd, and source blocks are described below. Then, at the end of this step, the entirety of sphinx.conf
is included for you to copy and paste into the file.
source src1
{
type = mysql
sql_host = localhost
sql_user = root
sql_pass = your_root_mysql_password
sql_db = test
sql_port = 3306
sql_query = \
SELECT id, group_id, UNIX_TIMESTAMP(date_added) AS date_added, title, content \
FROM documents
sql_attr_uint = group_id
sql_attr_timestamp = date_added
}
index test1
{
source = src1
path = /var/lib/sphinxsearch/data/test1
docinfo = extern
}
searchd
{
listen = 9306:mysql41
log = /var/log/sphinxsearch/searchd.log
query_log = /var/log/sphinxsearch/query.log
read_timeout = 5
max_children = 30
pid_file = /var/run/sphinxsearch/searchd.pid
seamless_rotate = 1
preopen_indexes = 1
unlink_old = 1
binlog_path = /var/lib/sphinxsearch/data
}
To explore more configurations, you can take a look at the /etc/sphinxsearch/sphinx.conf.sample
file, which has all the variables explained in even more detail.
2.5. Installing Sphinx on Windows
Installing Sphinx on a Windows server is often easier than installing on a Linux environment; unless you are preparing code patches, you can use the pre-compiled binary files from the Downloads area on the website.
- Extract everything from the .zip file you have downloaded –
sphinx-2.2.11-dev-win32.zip
, orsphinx-2.2.11-dev-win32-pgsql.zip
if you need PostgresSQL support as well. (We are using version 2.2.11-dev here for the sake of example only; be sure to change this to a specific version you’re using.) You can use Windows Explorer in Windows XP and up to extract the files, or a freeware package like 7Zip to open the archive.For the remainder of this guide, we will assume that the folders are unzipped intoC:\Sphinx
, such thatsearchd.exe
can be found inC:\Sphinx\bin\searchd.exe
. If you decide to use any different location for the folders or configuration file, please change it accordingly. - Edit the contents of sphinx.conf.in – specifically entries relating to @CONFDIR@ – to paths suitable for your system.
- Install the
searchd
system as a Windows service:C:\Sphinx\bin> C:\Sphinx\bin\searchd --install --config C:\Sphinx\sphinx.conf.in --servicename SphinxSearch
- The
searchd
service will now be listed in the Services panel within the Management Console, available from Administrative Tools. It will not have been started, as you will need to configure it and build your indexes withindexer
before starting the service. A guide to do this can be found under Quick tour.During the next steps of the install (which involve running indexer pretty much as you would on Linux) you may find that you get an error relating to libmysql.dll not being found. If you have MySQL installed, you should find a copy of this library in your Windows directory, or sometimes in Windows\System32, or failing that in the MySQL core directories. If you do receive an error please copy libmysql.dll into the bin directory.