A B C D E F G H I J K L M N O P Q R S T U V W X Z

Relevanssi Search Plugin for WordPress

Fix your WordPress search!

Fix your WordPress search!

WordPress search doesn’t search everything, and doesn’t give you enough control over what is searched and how. Relevanssi gives you full access and full control, with plenty of filters and ways to make Relevanssi work the way you want your search to work.
PDF contents Relevanssi can read the text from your PDFs, index it and search it! Read more about indexing PDFs.
Multisite searches Relevanssi can run searches across many subsites in the same multisite network.
Custom fields Relevanssi will find the content in your custom fields, including things like WooCommerce SKUs, ACF field content or whatever it is you store in custom fields. Read more about custom field search.
User profiles Yes, Relevanssi will find users by their names and profile descriptions.
Taxonomy terms No matter if you prefer categories, tags or custom taxonomies, Relevanssi will return the term archive pages in searches!
Shortcode output Relevanssi can expand shortcodes and find content generated by shortcodes.

SBOM (Software Bill Of Materials) for Docker

https://docs.docker.com/engine/sbom/

A Software Bill Of Materials (SBOM) is analogous to a packing list for a shipment. It lists all the components that make up the software, or were used to build it. For container images, this includes the operating system packages that are installed (for example, ca-certificates) along with language-specific packages that the software depends on (for example, Log4j). The SBOM could include a subset of this information or even more details, like the versions of components and their source.

It is available as a plugin with the latest docker desktop software.

Solr Cell & schemaless indexing with Apache Tika

Key Solr Cell Concepts

When using the Solr Cell framework, it is helpful to keep the following in mind:

  • Tika will automatically attempt to determine the input document type (e.g., Word, PDF, HTML) and extract the content appropriately. If you like, you can explicitly specify a MIME type for Tika with the stream.type parameter. See http://tika.apache.org/1.24.1/formats.html for the file types supported.
  • Briefly, Tika internally works by synthesizing an XHTML document from the core content of the parsed document which is passed to a configured SAX ContentHandler provided by Solr Cell. Solr responds to Tika’s SAX events to create one or more text fields from the content. Tika exposes document metadata as well (apart from the XHTML).
  • Tika produces metadata such as Title, Subject, and Author according to specifications such as the DublinCore. The metadata available is highly dependent on the file types and what they in turn contain. Some of the general metadata created is described in the section Metadata Created by Tika below. Solr Cell supplies some metadata of its own too.
  • Solr Cell concatenates text from the internal XHTML into a content field. You can configure which elements should be included/ignored, and which should map to another field.
  • Solr Cell maps each piece of metadata onto a field. By default it maps to the same name but several parameters control how this is done.
  • When Solr Cell finishes creating the internal SolrInputDocument, the rest of the Lucene/Solr indexing stack takes over. The next step after any update handler is the Update Request Processor chain.

Solr Cell is a contrib, which means it’s not automatically included with Solr but must be configured. The example configsets have Solr Cell configured, but if you are not using those, you will want to pay attention to the section Configuring the ExtractingRequestHandler in solrconfig.xml below.

For more information see:

https://solr.apache.org/guide/8_7/uploading-data-with-solr-cell-using-apache-tika.html

SolrWayback

SolrWayback

GitHub repo: https://github.com/netarchivesuite/solrwayback

About SolrWayback

SolrWayback is a web application for browsing historical harvested ARC/WARC files similar to the Internet Archive Wayback Machine. SolrWayback runs on a Solr server containing ARC/WARC files indexed using the warc-indexer.

SolrWayback bundle release 4.4.2 can be downloaded here: https://github.com/netarchivesuite/solrwayback/releases/tag/4.4.2

The bundle is the recommended way to get started with SolrWayback. You download the bundle, follow the installation guide and index your own WARC files. Then you are up to speed.

 

## Install instructions from ReadMe file once SolrWayback bundle has been downloaded.

### 1) INITIAL SETUP
Properties:
Copy the two files `properties/solrwayback.properties` and `properties/solrwaybackweb.properties` to your HOME folder (or the home-folder for Tomcat user)

Optional: For screenshot previews to work you may have to edit `solrwayback.properties` and change the value of the last two properties : `chrome.command` and `screenshot.temp.imagedir`.
Chrome(Chromium) must has to be installed for screenshot preview images.

If there are errors when running a script, try change the permissions for the file (`startup.sh` etc). Linux: `chmod +x filename.sh`

### 2) STARTING SOLRWAYBACK
SolrWayback requires both Solr and Tomcat to be running.

#### Tomcat:

* Start tomcat: `apache-tomcat-8.5.60/bin/startup.sh`
* Stop tomcat: `apache-tomcat-8.5.60/bin/shutdown.sh`
* (For windows navigate to `apache-tomcat-8.5.60/bin/` and type `startup.bat` or `shutdown.bat`)
* To see Tomcat is running open: http://localhost:8080/solrwayback/

#### Solr:
* Start solr: `solr-7.7.3/bin/solr start`
* Stop solr: `solr-7.7.3/bin/solr stop -all`
* (For windows navigate to `solr-7.7.3/bin/` and type `solr.cmd start` or `solr.cmd stop -all`)
* To see Solr is running open: http://localhost:8983/solr/#/netarchivebuilder

### 3) INDEXING
SolrWayback uses a Solr index of WARC files to support freetext search and more complex queries.
If you do not have existing WARC files, see steps below on harvesting with wget.

The script `warc-indexer.sh` in the `indexing`-folder allows for multi processing and keeps track of already
indexed files, so the collection can be extended by adding more WARCs and running the script again.

Call `indexing/warc-indexer.sh -h` for usage and how to adjust the number of processes to use for indexing.
Example usage that will index all WARC-files in the warcs1 folder.
„`
THREADS=2 ./warc-indexer.sh warcs1/*
„`

This will start indexing files from the warcs1 folder using 2 threads. Assigning a higher number of threads than CPU cores available will result in slower indexing. Each indexing job require 1GB ram, so this can also be a limiting factor.

You can also populate the collection and collectionid field in Solr with custom values:
„`
THREADS=4 INDEXER_CUSTOM=“–collection_id collection1 –collection corona2021″ ./warc-indexer.sh warcs1/*
„`

You can then enable facetting on these fields in solrwaybackweb.properties.

 

The script keeps track of processed files by checking if a log from a previous analysis is available. The logs are stored
in the `status`-folder (this can be changed using the `STATUS_ROOT` variable). To re-index a WARC file, delete the
corresponding log file.

The script `warc-indexer.sh` is not available for Windows. For windows platform only a more primitive script is provided that also works for Linux/MacOs.
1. Copy ARC/WARC files into folder: `indexing/warcs1`
2. Start indexing: call `indexing/batch_warcs1_folder.sh` (or batch_warcs1_folder.bat for windows)

Indexing can take up to 20 minutes for 1GB warc-files. After indexing, the warc-files must stay in the same folder since SolrWayback is using them during playback etc.

Having whitespace characters in WARC file names can result in pagepreviews and playback not working on some systems.
There can be up to 5 minutes delay before the indexed files are visible from search. Visit this url after index job have finished to commit them instantly: http://localhost:8983/solr/netarchivebuilder/update?commit=true
There is a batch_warcs2_folder.sh similar script to show how to easily add new WARC files to the collection without indexing the old ones again.

For more information about the warc-indexer see: https://github.com/ukwa/webarchive-discovery/wiki/Quick-Start

Specification for an OAI Static Repository and an OAI Static Repository Gateway

 

http://www.openarchives.org/OAI/2.0/guidelines-static-repository.htm

 

A Static Repository, introduced here, provides a simple approach for exposing relatively static and small collections of metadata records through the OAI-PMH. The Static Repository approach is targeted at organizations that:

  • Have metadata collections ranging in size between 1 and 5000 records;
  • Can make static content available through a network-accessible Web server;
  • Need a technically simpler implementation strategy compared to acting as an OAI-PMH Repository, which requires processing OAI-PMH requests.

A Static Repository is an XML file that is made accessible at a persistent HTTP URL. The XML file contains metadata records and repository information.

A Static Repository becomes accessible via OAI-PMH through the intermediation of one Static Repository Gateway. The restriction that only one Static Repository Gateway acts as an intermediary for each Static Repository reduces potential problems with large-scale duplication of metadata records among OAI-PMH repositories. A Static Repository Gateway uses the metadata records and repository information, provided via XML in the Static Repository, to respond to the six OAI-PMH requests for access to that information. Because a Static Repository Gateway maps a unique Static Repository base URL to each such Static Repository, harvesters can access a Static Repository in exactly the same manner as they access any other OAI-PMH Repository.

The relationship between Static Repositories, a Static Repository Gateway, and an OAI-PMH harvester is illustrated in the figure below. The Static Repository and the Static Repository Gateway are described in the remainder of this document. Implementers whose sole interest is the creation of a Static Repository may skip Section 4 that describes the Static Repository Gateway.

SphinxSE MySQL Full Text Search

SphinxSE Full Text Search engine:

http://sphinxsearch.com/docs/current.html#about

Sphinx is a full-text search engine, publicly distributed under GPL version 2. Commercial licensing (eg. for embedded use) is available upon request.

Technically, Sphinx is a standalone software package provides fast and relevant full-text search functionality to client applications. It was specially designed to integrate well with SQL databases storing the data, and to be easily accessed by scripting languages. However, Sphinx does not depend on nor require any specific database to function.

Applications can access Sphinx search daemon (searchd) using any of the three different access methods: a) via Sphinx own implementation of MySQL network protocol (using a small SQL subset called SphinxQL, this is recommended way), b) via native search API (SphinxAPI) or c) via MySQL server with a pluggable storage engine (SphinxSE).

Official native SphinxAPI implementations for PHP, Perl, Python, Ruby and Java are included within the distribution package. API is very lightweight so porting it to a new language is known to take a few hours or days. Third party API ports and plugins exist for Perl, C#, Haskell, Ruby-on-Rails, and possibly other languages and frameworks.

Starting from version 1.10-beta, Sphinx supports two different indexing backends: „disk“ index backend, and „realtime“ (RT) index backend. Disk indexes support online full-text index rebuilds, but online updates can only be done on non-text (attribute) data. RT indexes additionally allow for online full-text index updates. Previous versions only supported disk indexes.

Data can be loaded into disk indexes using a so-called data source. Built-in sources can fetch data directly from MySQL, PostgreSQL, MSSQL, ODBC compliant database (Oracle, etc) or a pipe in TSV or a custom XML format. Adding new data sources drivers (eg. to natively support other DBMSes) is designed to be as easy as possible. RT indexes, as of 1.10-beta, can only be populated using SphinxQL.

http://www.sphinxsearch.com/wiki/doku.php?id=tutorials

2.3. Installing Sphinx packages on Debian and Ubuntu

There are two ways of getting Sphinx for Ubuntu: regular deb packages and the Launchpad PPA repository.

Deb packages:

  1. Sphinx requires a few libraries to be installed on Debian/Ubuntu. Use apt-get to download and install these dependencies:$ sudo apt-get install mysql-client unixodbc libpq5
  2. Now you can install Sphinx:$ sudo dpkg -i sphinxsearch_2.2.11-dev-0ubuntu12~trusty_amd64.deb

PPA repository (Ubuntu only).

Installing Sphinx is much easier from Sphinxsearch PPA repository, because you will get all dependencies and can also update Sphinx to the latest version with the same command.

  1. First, add Sphinxsearch repository and update the list of packages:$ sudo add-apt-repository ppa:builds/sphinxsearch-rel22$ sudo apt-get update
  2. Install/update sphinxsearch package:$ sudo apt-get install sphinxsearch

Sphinx searchd daemon can be started/stopped using service command:

$ sudo service sphinxsearch start

For a complete Tutorial see: https://www.digitalocean.com/community/tutorials/how-to-install-and-configure-sphinx-on-ubuntu-16-04

Prerequisites

Before you begin this guide, you will need:

  • One Ubuntu 16.04 server.
  • A sudo non-root user, which you can set up by following this tutorial.
  • MySQL installed on your server, which you can set up by following the step 2 of this tutorial.

Step 1 — Installing Sphinx

Installing Sphinx on Ubuntu is easy because it’s in the native package repository. Install it using apt-get.

sudo apt-get install sphinxsearch

Now you have successfully installed Sphinx on your server. Before starting the Sphinx daemon, let’s configure it.

Sphinx’s configuration should be in a file called sphinx.conf in /etc/sphinxsearch. The configuration consists of 3 main blocks that are essential to run: index, searchd, and source. We’ll provide an example configuration file for you to use, and explain each section so you can customize it later.

First, create the sphinx.conf file.

  1. sudo nano /etc/sphinxsearch/sphinx.conf

Each of these index, searchd, and source blocks are described below. Then, at the end of this step, the entirety of sphinx.conf is included for you to copy and paste into the file.

source src1
{
  type			= mysql

  sql_host		= localhost
  sql_user		= root
  sql_pass		= your_root_mysql_password
  sql_db		= test
  sql_port		= 3306

  sql_query		= \
  SELECT id, group_id, UNIX_TIMESTAMP(date_added) AS date_added, title, content \
  FROM documents

  sql_attr_uint			= group_id
  sql_attr_timestamp	= date_added
}
index test1
{
  source			= src1
  path				= /var/lib/sphinxsearch/data/test1
  docinfo			= extern
}
searchd
{
  listen			= 9306:mysql41
  log				= /var/log/sphinxsearch/searchd.log
  query_log			= /var/log/sphinxsearch/query.log
  read_timeout		= 5
  max_children		= 30
  pid_file			= /var/run/sphinxsearch/searchd.pid
  seamless_rotate	= 1
  preopen_indexes	= 1
  unlink_old		= 1
  binlog_path		= /var/lib/sphinxsearch/data
}

To explore more configurations, you can take a look at the /etc/sphinxsearch/sphinx.conf.sample file, which has all the variables explained in even more detail.

2.5. Installing Sphinx on Windows

Installing Sphinx on a Windows server is often easier than installing on a Linux environment; unless you are preparing code patches, you can use the pre-compiled binary files from the Downloads area on the website.

  1. Extract everything from the .zip file you have downloaded – sphinx-2.2.11-dev-win32.zip, or sphinx-2.2.11-dev-win32-pgsql.zip if you need PostgresSQL support as well. (We are using version 2.2.11-dev here for the sake of example only; be sure to change this to a specific version you’re using.) You can use Windows Explorer in Windows XP and up to extract the files, or a freeware package like 7Zip to open the archive.For the remainder of this guide, we will assume that the folders are unzipped into C:\Sphinx, such that searchd.exe can be found in C:\Sphinx\bin\searchd.exe. If you decide to use any different location for the folders or configuration file, please change it accordingly.
  2. Edit the contents of sphinx.conf.in – specifically entries relating to @CONFDIR@ – to paths suitable for your system.
  3. Install the searchd system as a Windows service:C:\Sphinx\bin> C:\Sphinx\bin\searchd --install --config C:\Sphinx\sphinx.conf.in --servicename SphinxSearch
  4. The searchd service will now be listed in the Services panel within the Management Console, available from Administrative Tools. It will not have been started, as you will need to configure it and build your indexes with indexer before starting the service. A guide to do this can be found under Quick tour.During the next steps of the install (which involve running indexer pretty much as you would on Linux) you may find that you get an error relating to libmysql.dll not being found. If you have MySQL installed, you should find a copy of this library in your Windows directory, or sometimes in Windows\System32, or failing that in the MySQL core directories. If you do receive an error please copy libmysql.dll into the bin directory.

TEI Boilerplate (TEI anzeigen mit CSS und javascript)

https://github.com/TEI-Boilerplate/TEI-Boilerplate

 

#About TEI Boilerplate

TEI Boilerplate (http://teiboilerplate.org/) is a lightweight solution for publishing styled TEI (Text Encoding Initiative) P5 content directly in modern browsers. With TEI Boilerplate, TEI XML files can be served directly to the web without server-side processing or translation to HTML. Our TEI Boilerplate Demo illustrates many TEI features rendered by TEI Boilerplate.

#Browser Compatibility

TEI Boilerplate requires a robust, modern browser to do its work. It is compatible with current versions of Firefox, Chrome, Safari, and Internet Explorer.

Note: For security reasons, some browsers (e.g., Chrome) will not process the XSLT transformation when the TEI document is opened from the local file system. Chrome does work fine when the TEI files are delivered through a Web server, including localhost.

If you have problems with TEI Boilerplate with a modern browser, please let us know by filing a bug report or feature request at http://github.com/GrantLS/TEI-Boilerplate/issues.

#Introduction

TEI is an XML-based language for describing and analyzing literary texts and other documents of interest to humanities scholars. Although TEI provides mechanisms for describing the design, presentational, and material features of the source document, projects and individual scholars that use TEI are responsible for developing their own methods, or implementing existing solutions, for converting the TEI to a presentation-ready state for the web or print (Rahtz, 2012). Two potential paths to reach this goal are:

  1. Transforming TEI to HTML using XSLT and styling the HTML output with CSS.
  2. Styling the TEI directly with CSS by referencing a CSS stylesheet from within the TEI document.

Both of these approaches have advantages and disadvantages. Although HTML is the language of the web and, as such, is well supported by browsers, HTML’s descriptive capabilities are much less expressive than TEI’s. When TEI is transformed to HTML, much of the richness of the TEI is lost or obscured in the resulting HTML. However, the browser understands HTML very well and knows, for example, when to initiate retrieval of a document based on certain user events, such as clicking a link. The second option, CSS-styled TEI, delivers the TEI document directly to the browser. However, while the browser may apply CSS to format and style a TEI document, the browser doesnot understand the semantics of TEI. For instance, the browser does not understand that TEI’s <ptr> and <ref> elements are linking elements.

TEI Boilerplate bridges the gap between these two approaches by making use of the built-in XSLT (1.0) capabilities of browsers to embed the TEI XML, with minimal modifications, within an HTML5 shell document. Features expected of web documents, such as clickable links and display of linked images, are enabled through selective transformation of a very small number of TEI elements and attributes. Both the HTML5 shell and the embedded TEI are styled using CSS.

TEI Boilerplate is not intended to be a replacement for the many excellent XSLT solutions for publishing and displaying TEI/XML on the web. It is intended to be a simple and lightweight alternative to more complex XSLT solutions. There are both practical and theoretical advantages to this lightweight approach.

#Using it in Your Project

Download the TEI Boilerplate files, and host the dist directory on a web server.

The simplest way to use TEI Boilerplate (TEIBP) is simply to add your TEI files to the dist/content directory of TEI Boilerplate and include the following xml-stylesheet processing instruction at the top of your TEI documents, after the XML declaration and before the root <TEI> element:

<?xml-stylesheet type="text/xsl" href="teibp.xsl"?>

You may then access your TEI files from a modern browser and see the resulting styled document.

TEI Critical Apparatus Toolbox

This page offers a graphical user interface for the customisation of an XSLT transformation from TEI XML to LateX (reledmac) and PDF. Although we tried to offer as generic a transformation as possible, further customisation might be necessary for your own flavour of encoding. You have the possibility of performing such advanced customisation at the bottom of this form, provided you know XSLT and LateX.

http://teicat.huma-num.fr/print.php

The TEI Critical Apparatus Toolbox is a tool for people preparing a natively digital TEI critical edition.

The Toolbox lets you

  • Check your encoding: offers facilities to display your edition while it is still in the making, and check the consistency of your encoding
  • Display parallel versions: choose the sigla of the witnesses, and the different versions of the text, following each chosen witness, will be displayed in parallel columns.
  • Print an edition of a TEI XML edition, with a TEI-to-LateX and PDF transformation
  • Annotate an image: lets you easily trace zones on an image to prepare a documentary edition
  • Get statistics on the XML tags effectively used in different parts of your edition, and some word count.