A B C D E F G H I J K L M N O P Q R S T U V W X Z

Göttingen Research Online (GRO)

GRO.data ist ein universelles Forschungsdatenrepositorium für den Göttingen Campus. Forschende mit Zugehörigkeit zu einer Institution des Göttingen Campus können es kostenfrei nutzen. GRO.data dient verschiedenen Zwecken, wie z.B.:

  • Datensätze auf einfache Art dauerhaft sichern
  • Änderungen an Datensätzen über mehrere Versionen hinweg verfolgen
  • Daten mit KollegInnen teilen
  • Daten öffentlich verfügbar machen (publizieren)
  • Persistente Identifikatoren für publizierte Daten erhalten

Wer kann den Dienst nutzen?

Jede/r Forschende am Göttingen Campus kann ihr/sein institutionelles Konto nutzen um sich einzuloggen, einen persönlichen Bereich („Dataverse“) anzulegen und dann hierin Dateien als „Datasets“ hochzuladen. Sie können dann anderen Forschenden oder Gruppen Zugriff auf diese Dateien ermöglichen oder die Daten publizieren und einen DOI erhalten, um diesen bspw. für die Zitation zu nutzen. Wenn Sie eine ORCID iD haben, können Ihre Daten auch automatisch zu Ihrem ORCID record hinzugefügt werden.

Technische Details

GRO.data basiert auf dem Open-Source-Programm Dataverse. Es wurde und wird kontinuierlich von der Göttinger eResearch Alliance angepasst, um sich gut in die lokale Infrastruktur zu integrieren und den Zugriff über Single Sign-On, d.h. mit Ihrem GWDG-Konto, zu ermöglichen. Die GWDG hostet diesen Dienst und kümmert sich um Datensicherheit und -schutz, einschließlich Backups und Zugriffskontrolle, sowie um die Verfügbarkeit und Zuverlässigkeit.

grab-site (web crawler)

 

grab-site is the archivist’s web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns

GitHub repo: https://github.com/ArchiveTeam/grab-site

Installation steps:

Install on Ubuntu 18.04, 20.04, 22.04, Debian 10 (buster), Debian 11 (bullseye)

  1. On Debian, use su to become root if sudo is not configured to give you access.

sudo apt-get updatesudo apt-get install –no-install-recommends \    wget ca-certificates git build-essential libssl-dev zlib1g-dev \    libbz2-dev libreadline-dev libsqlite3-dev libffi-dev libxml2-dev \    libxslt1-dev libre2-dev pkg-config

If you see Unable to locate package, run the two commands again.

  1. As a non-root user:

wget https://raw.githubusercontent.com/pyenv/pyenv-installer/master/bin/pyenv-installerchmod +x pyenv-installer./pyenv-installer~/.pyenv/bin/pyenv install 3.8.15~/.pyenv/versions/3.8.15/bin/python -m venv ~/gs-venv~/gs-venv/bin/pip install –no-binary lxml –upgrade git+https://github.com/ArchiveTeam/grab-site

–no-binary lxml is necessary for the html5-parser build.

  1. Add this to your ~/.bashrc or ~/.zshrc:

PATH=“$PATH:$HOME/gs-venv/bin“

and then restart your shell (e.g. by opening a new terminal tab/window).

Upgrade an existing install

To update grab-site, simply run the ~/gs-venv/bin/pip install … or nix-env … command used to install it originally (see above).

After upgrading, stop gs-server with kill or ctrl-c, then start it again. Existing grab-site crawls will automatically reconnect to the new server.

Using grab-site

First, start the dashboard with:

gs-server

and point your browser to http://127.0.0.1:29000/

Note: gs-server listens on all interfaces by default, so you can reach the dashboard by a non-localhost IP as well, e.g. a LAN or WAN IP. (Sub-note: no code execution capabilities are exposed on any interface.)

Then, start as many crawls as you want with:

grab-site ‚URL‘

>>Do this inside tmux unless they’re very short crawls.<<

grab-site outputs WARCs, logs, and control files to a new subdirectory in the directory from which you launched grab-site, referred to here as „DIR“. (Use ls -lrt to find it.)

You can pass multiple URL arguments to include them in the same crawl, whether they are on the same domain or different domains entirely.

See „SolrWayback“ entry in the Knowledge Base for help with accessing WARC files.

HAProxy – TCP/HTTP Load Balancer

HAProxy is a free reverse-proxy offering high availability, load balancing, and proxying for TCP and HTTP-based applications. It is particularly suited for very high traffic web sites and powers a significant portion of the world’s most visited ones. Over the years it has become the de-facto standard opensource load balancer, is now shipped with most mainstream Linux distributions, and is often deployed by default in cloud platforms.

http://www.haproxy.org/

Historical Context Ontology (HiCO)

The Historical Context Ontology (HiCO) is an OWL 2 DL ontology developed for representing the context of a claim. In particular, it addresses features characterising hermeneutical activities performed by scholars while generating new information (i.e. an interpretation act). It allows to represent and reason on reliability of argumentations around attributions, by evaluating features such as motivations, types of cited sources or criteria, dates, relations with other claims (e.g. agreement/disagreement). Specifically, historical context regards events and situations that are part of the life-cycle of cultural heritage objects. For instance, being created by somebody, or being created at a certain time, are events related to an artefact that are claimed by an agent at a certain time, motivated with usage of primary sources, and recorded in a secondary source (e.g. a cataloguing record). HiCO extends the PROV Ontology with terms for describing aspects of the hermeneutical activity, and reuses existing ontologies, such as CiTO Ontology for linking attributions to related sources.

IRI:
http://purl.org/emmedi/hico
Version IRI:
http://purl.org/emmedi/hico/2020-03-31
Date:
31/03/2020
Current version:
2.0
Authors:
Marilena Daquino
Contributors:
Francesca Tomasi
Silvio Peroni
Other visualisation:
Ontology source

https://marilenadaquino.github.io/hico/

JSON and mysql

Here is a tutorial how to work with json in mysql:

https://www.digitalocean.com/community/tutorials/working-with-json-in-mysql

Introduction

MySQL version 5.7.8 introduces a JSON data type that allows you to access data in JSON documents.

SQL databases tend to be rigid in design. By its nature, the structured query language enforces data type and size constraints.

In comparison, NoSQL databases encourage flexibility in design. In these schema-less databases, there is no imposed structural restriction, only data to be saved.

The JSON data type in MySQL grants you the strengths of both of these systems. It allows you to structure some parts of your database and leave others to be flexible.

The first half of this article will design a database with JSON fields. It will step through using the built-in functions available to MySQL to create, read, update, and delete rows.

The second half of this article will utilize the Eloquent ORM with Laravel to communicate with the database. You will build an admin panel that supports displaying products, adding new products, modifying existing products, and deleting products.

 

Kubernetes and Rancher and Swarm

Introduction

docker-compose is used to deploy multiple containers on the same host. Once you want to run and connect containers on multiple hosts, you will need a container orchestration tool such as Docker swarm or Kubernetes.

Kubernetes

Kubernetes is the de facto standard orchestration tool and makes it easy to deploy, maintain and scale Docker containers.

Basic concepts and objects in Kubernetes include:

  • Cluster: consists of worker machines/nodes that run containers (host Pods).
  • Pod: the smallest unit that can be deployed in Kubernetes. It consists of a single container or multiple containers that are tightly grouped and share the same resources. Each Pod has an unique IP address.
  • ReplicaSets: manage Pods. Replicas are copies of Pods and ensure high availability. ReplicaSets define the desired scale and state of a group of Pods.
  • Deployment: manage ReplicaSets. Deployments create and update ReplicaSets which, in turn, create and update Pods.
  • ConfigMap: overrides container-specific data like config files, environment variables, entire directories of data – enables customizing without modifying the original base image.
  • Services: provide stable DNS and IP address for a a group of Pods.
  • Ingresses: define how traffic outside the cluster is routed to inside the cluster; are used to expose Kubernetes Services to the world.
  • Manifests are specifications of Kubernetes objects written in YAML or JSON. To create an object such as a Pod, a Deployment etc, usekubectl apply -f path/to/manifest.yamlto apply its manifest to the cluster.

Rancher

Rancher facilitates the creation, upgrade, monitoring and management of Kubernetes clusters and simplifies automating Kubernetes-related tasks. Rancher is designed to be an authentication proxy, i.e. whoever wants to access the clusters does so through authenticating in Rancher. Rancher can be accessed through a feature-rich and easy-to-use web interface. To add Kubernetes clusters to Rancher you can either import an existing cluster or create a new cluster using RKE (Rancher Kubernetes Engine).

SWARM

https://docs.docker.com/engine/swarm/

Current versions of Docker include swarm mode for natively managing a cluster of Docker Engines called a swarm. Use the Docker CLI to create a swarm, deploy application services to a swarm, and manage swarm behavior.

Docker Swarm mode is built into the Docker Engine. Do not confuse Docker Swarm mode with Docker Classic Swarm which is no longer actively developed.

  • Cluster management integrated with Docker Engine: Use the Docker Engine CLI to create a swarm of Docker Engines where you can deploy application services. You don’t need additional orchestration software to create or manage a swarm.
  • Decentralized design: Instead of handling differentiation between node roles at deployment time, the Docker Engine handles any specialization at runtime. You can deploy both kinds of nodes, managers and workers, using the Docker Engine. This means you can build an entire swarm from a single disk image.
  • Declarative service model: Docker Engine uses a declarative approach to let you define the desired state of the various services in your application stack. For example, you might describe an application comprised of a web front end service with message queueing services and a database backend.
  • Scaling: For each service, you can declare the number of tasks you want to run. When you scale up or down, the swarm manager automatically adapts by adding or removing tasks to maintain the desired state.
  • Desired state reconciliation: The swarm manager node constantly monitors the cluster state and reconciles any differences between the actual state and your expressed desired state. For example, if you set up a service to run 10 replicas of a container, and a worker machine hosting two of those replicas crashes, the manager creates two new replicas to replace the replicas that crashed. The swarm manager assigns the new replicas to workers that are running and available.
  • Multi-host networking: You can specify an overlay network for your services. The swarm manager automatically assigns addresses to the containers on the overlay network when it initializes or updates the application.
  • Service discovery: Swarm manager nodes assign each service in the swarm a unique DNS name and load balances running containers. You can query every container running in the swarm through a DNS server embedded in the swarm.
  • Load balancing: You can expose the ports for services to an external load balancer. Internally, the swarm lets you specify how to distribute service containers between nodes.
  • Secure by default: Each node in the swarm enforces TLS mutual authentication and encryption to secure communications between itself and all other nodes. You have the option to use self-signed root certificates or certificates from a custom root CA.
  • Rolling updates: At rollout time you can apply service updates to nodes incrementally. The swarm manager lets you control the delay between service deployment to different sets of nodes. If anything goes wrong, you can roll back to a previous version of the service.

Sources

https://www.youtube.com/watch?v=5h1TCrh_hZ0  Introduction to Kubernetes and Rancher (1.5h)

Sadegh, S. (2021). Kubernetes with Rancher at the GWDG. Part 1: Basic concepts. GWDG Nachrichten, 44(3), 16-18.

Sadegh, S. (2021). Kubernetes with Rancher at the GWDG. Part 2: Setup and Deployment. GWDG Nachrichten, 44(4), 15-19.

https://kubernetes.io/docs/home/

https://rancher.com/docs/

Further reading

Sadegh, S. (2021). Kubernetes with Rancher at the GWDG. Part 3: Deploy a Workload. GWDG Nachrichten, 44(8), 6-8.

Sadegh, S. (2021). Kubernetes with Rancher at the GWDG. Part 4: Setup a PostgreSQL Cluster. GWDG Nachrichten, 44(10), 5-9.