A B C D E F G H I J K L M N O P Q R S T U V W X Z

tei2html Stylesheets aus dem Projekt Gutenberg

https://github.com/jhellingman/tei2html

tei2html

A collection of XSLT 3.0 style sheets to transform a document encoded in according to the TEILite DTD to HTML. They have been specifically developed to create a monolithic (single) HTML document for posting on Project Gutenberg, but are also capable of generating ePub files from the same source with a similar look and feel. In line with the type of material Project Gutenberg deals with, these style sheets are designed to deal with encoded pre-existing works, rather than works created digitally from scratch.

The transformation supports the following elements as present in TEI:

  • Plain Text and text styles
  • Title Pages
  • Tables
  • Lists
  • Tables of contents
  • Poetry and Plays
  • Footnotes
  • Illustrations
  • Cover images

Furthermore, the script can generate

  • Tables of Contents.
  • Lists of Corrections.
  • Colophons
  • Metadata as used in ePub

Tei2Html includes localisation support for English, Dutch, and to a lesser extend German, French, Spanish, Tagalog and Cebuano.

Note, that these scripts partly depend on the use of rend attributes and other TEI conventions. It should not be expected that an arbitrary TEI file renders well with those scripts (although it should render reasonable in any case.)

If you need modifications, extensions of these scripts, or need to have other TEI/XML/XSLT related work done, please be in touch with me on how I can help.

The Socio-Technical Sustainability Roadmap

The Visual Media Workshop, University of Pittsburgh:

https://sites.haa.pitt.edu/sustainabilityroadmap/

These pages are designed to introduce you to the Socio-Technical Sustainability Roadmap (STSR), including who it is designed to help, how it functions as a workshop, and what you can expect to leave knowing:

Welcome and Getting Started

Overview of the STSR Modules

Possible Workshop Schedules

What will I take away from the STSR?

In 2018-2019, we will be running a series of institutes based on this STSR at five locations across the United States. To learn more, please visit the website:

Sustaining DH: An NEH Institute for Advanced Topics in the Digital Humanities

Ubuntu Server add new user

Add a new user to existing Ubuntu Server

Login with sudo-user, then:

      1.  sudo adduser <UserName>
      2. Enter password as stated in command line

Make user sudo-user

      1. sudo adduser <UserName> sudo
      2. To verify type id <UserName>. To the left under „user groups“ it should state „sudo

Source: https://www.cyberciti.biz/faq/how-to-create-a-sudo-user-on-ubuntu-linux-server/

Allow user to login via ssh

      1. Open the ssh config file using vim:
        vim /etc/ssh/sshd_config
      2. Scroll to button until you find AllowUsers
      3. Add username, separate multiple usernames by space:
        AllowUsers user1 user2 user3
      4. Reload ssh:
        service sshd restart

Source: https://kifarunix.com/allow-deny-specific-users-to-login-via-ssh-on-ubuntu-18-04/

UTF8 Daten (Arabisch, Hebräisch, Griechisch etc.) in eine MySQL Datenbank eingeben und auslesen

you must set charset in first connect with mysql by this query:

SET CHARACTER SET utf8

for example in mysqli functions

$MySQL_Handle = mysqli_connect(HOSTNAME,DATABASE_USERNAME,DATABASE_PASSWORD,DATABASE_NAME) 
or die ( mysqli_error($MySQL_Handle) ); 

$sSQL= 'SET CHARACTER SET utf8'; 

mysqli_query($MySQL_Handle,$sSQL) 
or die ('Can\'t charset in DataBase'); 

and PDO sample :

$dbh = new PDO('mysql:host=localhost;dbname=' . $DB_NAME, $DB_USER,
$DB_PASS, array(PDO::MYSQL_ATTR_INIT_COMMAND => "SET NAMES 'UTF8'"));
$dbh->exec("SET CHARACTER SET UTF8");

this action need before insert and before select.

Web Archiving

Below are the various web crawlers tested in an attempt to adequately crawl NADW sites for web archiving purposes.

Heritrix – https://heritrix.readthedocs.io/en/latest/getting-started.html#installation

Heritrix Documentation – Heritrix is the Internet Archive’s open-source, extensible, web-scale, archival-quality web crawler project. Heritrix seeks to collect and preserve the digital artifacts of our culture for the benefit of future researchers and generations. https://heritrix.readthedocs.io/en/latest/

 

Archiveweb.page – https://archiveweb.page/

Archiveweb.page Documentation – ArchiveWeb.page is the latest tool from Webrecorder to turn your browser into a full-featured interactive web archiving system! ArchiveWeb.page is available as an extension for any Chrome or Chromium based browsers. (A standalone app version is also in development.) NB: Function as a Chrome Extension. It seems that once you launch the app, you have to manually click through each link and it records your progress as a WARC file.

 

WARCreate – https://warcreate.com/

WARCreate Documentation – WARCreate is a Google Chrome extension that allows a user to create a Web ARChive (WARC) file from any browseable webpage. The resulting files can then be used with other tools like the Internet Archive’s open source Wayback Machine. The tool is an evolving product with the end result pushing toward being a personal web archiving solution for those that wish to securely archive their metadata in a standardize way. NB: Functions as a Chrome extension. Once launched, it captures whatever webpage you are currently on as a WARC file.

 

grab-sitehttps://github.com/ArchiveTeam/grab-site

grab-site Documentation – The archivist’s web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns. grab-site is an easy preconfigured web crawler designed for backing up websites. Give grab-site a URL and it will recursively crawl the site and write WARC files. NB: only works with specific versions of Python: 3.7 or 3.8.

 

ArchiveBoxhttps://archivebox.io/

ArchiveBox Documentation – ArchiveBox is a powerful, self-hosted internet archiving solution to collect, save, and view sites you want to preserve offline. You can set it up as a command-line tool, web app, and desktop app (alpha), on Linux, macOS, and Windows (WSL/Docker). You can feed it URLs one at a time, or schedule regular imports from browser bookmarks or history, feeds like RSS, bookmark services like Pocket/Pinboard, and more. See input formats for a full list. It saves snapshots of the URLs you feed it in several formats: HTML, PDF, PNG screenshots, WARC, and more out-of-the-box, with a wide variety of content extracted and preserved automatically (article text, audio/video, git repos, etc.). See output formats for a full list. The goal is to sleep soundly knowing the part of the internet you care about will be automatically preserved in durable, easily accessible formats for decades after it goes down. NB: At first glance, this seemed like the best option as it’s super easy to use. However, it only crawls one layer deep. Not enough for our purposes.

 

Browsertrix Crawler – https://github.com/webrecorder/browsertrix-crawler

Browsertrix Documentation – Browsertrix Crawler is a simplified (Chrome) browser-based high-fidelity crawling system, designed to run a complex, customizable browser-based crawl in a single Docker container. NB: As of 23 Oct, I have not yet successfully set this up and run it. I keep running into issues with Docker.

WebLicht

WebLicht ist eine online-Anwendung zur automatischen Annotion von Textcorpora.

Beschreibung von der WebLicht-Website:

WebLicht is an execution environment for automatic annotation of text corpora. Linguistic tools such as tokenizers, part of speech taggers, and parsers are encapsulated as web services, which can be combined by the user into custom processing chains. The resulting annotations can then be visualized in an appropriate way, such as in a table or tree format.

Link zum Tool: https://weblicht.sfs.uni-tuebingen.de/weblichtwiki/index.php/Main_Page

Wie viel Platz ist noch auf einem Linux Volume frei? | No more disk space: How can I find what is taking up the space?

As always in Linux, there’s more than one way to get the job done. However, if you need to do it from CLI, this is my preferred method:

I start by running this as root or with sudo:

du -cha --max-depth=1 / | grep -E "M|G"

The grep is to limit the returning lines to those which return with values in the Megabyte or Gigabyte range. If your disks are big enough, you could add |T as well to include Terabyte amounts. You may get some errors on /proc, /sys, and/or /dev since they are not real files on disk. However, it should still provide valid output for the rest of the directories in root. After you find the biggest ones you can then run the command inside of that directory in order to narrow your way down the culprit. So for example, if /var was the biggest you could do it like this next:

du -cha --max-depth=1 /var | grep -E "M|G"

Gesamter Thread unter: https://askubuntu.com/questions/911865/no-more-disk-space-how-can-i-find-what-is-taking-up-the-space

Wikibase

Wikibase ist eine Open-SourceGraphdatenbank, die für die Wissensdatenbank Wikidata entwickelt wurde. Das Datenbankmanagementsystem besteht aus einer Sammlung von Erweiterungen für die MediaWiki-Software. Wikibase steht unter der GPL-Lizenz frei und kostenlos zur Verfügung. Zu den Besonderheiten von Wikibase gehören ein eigenes Datenmodell, Versionierung und Mehrsprachigkeit. Zum Zugriff auf Wikibase gibt es mehrere Programmierschnittstellen und Client-Programme. Das Datenmodell einer Wikibase-Instanz wird auf das Resource Description Framework gemappt, so dass die Datenbasis auch per SPARQL abgefragt werden kann. Neben Wikidata wird Wikibase vor allem im Wissenschafts- und Kulturbereich eingesetzt.

Die DNB testet gerade wikibase für die GND und dazu gibt es einen interessanten Vortrag:

Word Dokumente in mehrere Teile aufteilen und speichern

Auf dieser Seite gibt es VBA Code, mit dem man Word Dokumente in unterkapitel etc. zerschneiden und speichern kann. Funktioniert mit Word 2016:

https://de.extendoffice.com/documents/word/966-word-split-documents-into-multiple-documents.html

Dieser Code funktioniert z.B. mit dem Trennzeichen „///“

Sub SplitNotes(delim As String, strFilename As String)
Dim doc As Document
Dim arrNotes
Dim I As Long
Dim X As Long
Dim Response As Integer
arrNotes = Split(ActiveDocument.Range, delim)
Response = MsgBox(„This will split the document into “ & UBound(arrNotes) + 1 & “ sections.Do you wish to proceed?“, 4)
If Response = 7 Then Exit Sub
For I = LBound(arrNotes) To UBound(arrNotes)
If Trim(arrNotes(I)) <> „“ Then
X = X + 1
Set doc = Documents.Add
doc.Range = arrNotes(I)
doc.SaveAs ThisDocument.Path & „\“ & strFilename & Format(X, „000“)
doc.Close True
End If
Next I
End Sub
Sub test()
‚delimiter & filename
SplitNotes „///“, „Notes “
End Sub