Specification for an OAI Static Repository and an OAI Static Repository Gateway




A Static Repository, introduced here, provides a simple approach for exposing relatively static and small collections of metadata records through the OAI-PMH. The Static Repository approach is targeted at organizations that:

  • Have metadata collections ranging in size between 1 and 5000 records;
  • Can make static content available through a network-accessible Web server;
  • Need a technically simpler implementation strategy compared to acting as an OAI-PMH Repository, which requires processing OAI-PMH requests.

A Static Repository is an XML file that is made accessible at a persistent HTTP URL. The XML file contains metadata records and repository information.

A Static Repository becomes accessible via OAI-PMH through the intermediation of one Static Repository Gateway. The restriction that only one Static Repository Gateway acts as an intermediary for each Static Repository reduces potential problems with large-scale duplication of metadata records among OAI-PMH repositories. A Static Repository Gateway uses the metadata records and repository information, provided via XML in the Static Repository, to respond to the six OAI-PMH requests for access to that information. Because a Static Repository Gateway maps a unique Static Repository base URL to each such Static Repository, harvesters can access a Static Repository in exactly the same manner as they access any other OAI-PMH Repository.

The relationship between Static Repositories, a Static Repository Gateway, and an OAI-PMH harvester is illustrated in the figure below. The Static Repository and the Static Repository Gateway are described in the remainder of this document. Implementers whose sole interest is the creation of a Static Repository may skip Section 4 that describes the Static Repository Gateway.

SphinxSE MySQL Full Text Search

SphinxSE Full Text Search engine:


Sphinx is a full-text search engine, publicly distributed under GPL version 2. Commercial licensing (eg. for embedded use) is available upon request.

Technically, Sphinx is a standalone software package provides fast and relevant full-text search functionality to client applications. It was specially designed to integrate well with SQL databases storing the data, and to be easily accessed by scripting languages. However, Sphinx does not depend on nor require any specific database to function.

Applications can access Sphinx search daemon (searchd) using any of the three different access methods: a) via Sphinx own implementation of MySQL network protocol (using a small SQL subset called SphinxQL, this is recommended way), b) via native search API (SphinxAPI) or c) via MySQL server with a pluggable storage engine (SphinxSE).

Official native SphinxAPI implementations for PHP, Perl, Python, Ruby and Java are included within the distribution package. API is very lightweight so porting it to a new language is known to take a few hours or days. Third party API ports and plugins exist for Perl, C#, Haskell, Ruby-on-Rails, and possibly other languages and frameworks.

Starting from version 1.10-beta, Sphinx supports two different indexing backends: „disk“ index backend, and „realtime“ (RT) index backend. Disk indexes support online full-text index rebuilds, but online updates can only be done on non-text (attribute) data. RT indexes additionally allow for online full-text index updates. Previous versions only supported disk indexes.

Data can be loaded into disk indexes using a so-called data source. Built-in sources can fetch data directly from MySQL, PostgreSQL, MSSQL, ODBC compliant database (Oracle, etc) or a pipe in TSV or a custom XML format. Adding new data sources drivers (eg. to natively support other DBMSes) is designed to be as easy as possible. RT indexes, as of 1.10-beta, can only be populated using SphinxQL.


2.3. Installing Sphinx packages on Debian and Ubuntu

There are two ways of getting Sphinx for Ubuntu: regular deb packages and the Launchpad PPA repository.

Deb packages:

  1. Sphinx requires a few libraries to be installed on Debian/Ubuntu. Use apt-get to download and install these dependencies:$ sudo apt-get install mysql-client unixodbc libpq5
  2. Now you can install Sphinx:$ sudo dpkg -i sphinxsearch_2.2.11-dev-0ubuntu12~trusty_amd64.deb

PPA repository (Ubuntu only).

Installing Sphinx is much easier from Sphinxsearch PPA repository, because you will get all dependencies and can also update Sphinx to the latest version with the same command.

  1. First, add Sphinxsearch repository and update the list of packages:$ sudo add-apt-repository ppa:builds/sphinxsearch-rel22$ sudo apt-get update
  2. Install/update sphinxsearch package:$ sudo apt-get install sphinxsearch

Sphinx searchd daemon can be started/stopped using service command:

$ sudo service sphinxsearch start

For a complete Tutorial see: https://www.digitalocean.com/community/tutorials/how-to-install-and-configure-sphinx-on-ubuntu-16-04


Before you begin this guide, you will need:

  • One Ubuntu 16.04 server.
  • A sudo non-root user, which you can set up by following this tutorial.
  • MySQL installed on your server, which you can set up by following the step 2 of this tutorial.

Step 1 — Installing Sphinx

Installing Sphinx on Ubuntu is easy because it’s in the native package repository. Install it using apt-get.

sudo apt-get install sphinxsearch

Now you have successfully installed Sphinx on your server. Before starting the Sphinx daemon, let’s configure it.

Sphinx’s configuration should be in a file called sphinx.conf in /etc/sphinxsearch. The configuration consists of 3 main blocks that are essential to run: index, searchd, and source. We’ll provide an example configuration file for you to use, and explain each section so you can customize it later.

First, create the sphinx.conf file.

  1. sudo nano /etc/sphinxsearch/sphinx.conf

Each of these index, searchd, and source blocks are described below. Then, at the end of this step, the entirety of sphinx.conf is included for you to copy and paste into the file.

source src1
  type			= mysql

  sql_host		= localhost
  sql_user		= root
  sql_pass		= your_root_mysql_password
  sql_db		= test
  sql_port		= 3306

  sql_query		= \
  SELECT id, group_id, UNIX_TIMESTAMP(date_added) AS date_added, title, content \
  FROM documents

  sql_attr_uint			= group_id
  sql_attr_timestamp	= date_added
index test1
  source			= src1
  path				= /var/lib/sphinxsearch/data/test1
  docinfo			= extern
  listen			= 9306:mysql41
  log				= /var/log/sphinxsearch/searchd.log
  query_log			= /var/log/sphinxsearch/query.log
  read_timeout		= 5
  max_children		= 30
  pid_file			= /var/run/sphinxsearch/searchd.pid
  seamless_rotate	= 1
  preopen_indexes	= 1
  unlink_old		= 1
  binlog_path		= /var/lib/sphinxsearch/data

To explore more configurations, you can take a look at the /etc/sphinxsearch/sphinx.conf.sample file, which has all the variables explained in even more detail.

2.5. Installing Sphinx on Windows

Installing Sphinx on a Windows server is often easier than installing on a Linux environment; unless you are preparing code patches, you can use the pre-compiled binary files from the Downloads area on the website.

  1. Extract everything from the .zip file you have downloaded – sphinx-2.2.11-dev-win32.zip, or sphinx-2.2.11-dev-win32-pgsql.zip if you need PostgresSQL support as well. (We are using version 2.2.11-dev here for the sake of example only; be sure to change this to a specific version you’re using.) You can use Windows Explorer in Windows XP and up to extract the files, or a freeware package like 7Zip to open the archive.For the remainder of this guide, we will assume that the folders are unzipped into C:\Sphinx, such that searchd.exe can be found in C:\Sphinx\bin\searchd.exe. If you decide to use any different location for the folders or configuration file, please change it accordingly.
  2. Edit the contents of sphinx.conf.in – specifically entries relating to @CONFDIR@ – to paths suitable for your system.
  3. Install the searchd system as a Windows service:C:\Sphinx\bin> C:\Sphinx\bin\searchd --install --config C:\Sphinx\sphinx.conf.in --servicename SphinxSearch
  4. The searchd service will now be listed in the Services panel within the Management Console, available from Administrative Tools. It will not have been started, as you will need to configure it and build your indexes with indexer before starting the service. A guide to do this can be found under Quick tour.During the next steps of the install (which involve running indexer pretty much as you would on Linux) you may find that you get an error relating to libmysql.dll not being found. If you have MySQL installed, you should find a copy of this library in your Windows directory, or sometimes in Windows\System32, or failing that in the MySQL core directories. If you do receive an error please copy libmysql.dll into the bin directory.