Guide: Installing a validator for production use

Track	Test Bed operation

This guide walks you through the process of installing an XML, RDF, JSON or CSV validator, on your own infrastructure for production use.

What you will achieve

The purpose of this guide is to walk you through the setup of a validator on your own infrastructure for use in production. With the term validator we explicitly refer to a validator instance that has been defined through configuration by reusing one of the Test Bed’s validator foundations for XML, RDF, JSON or CSV content. Such validators share common characteristics that allow for a detailed set of guidelines when installing on your own infrastructure. Other custom validators can vary significantly in their design and needs, and are not addressed by this guide.

If you choose to not carry out an actual installation you will at least be aware of the steps and settings to configure for such a setup.

Note

Using the DIGIT Test Bed: The European Commission provides hosting and automated updates for public validators as part of the Test Bed’s cloud infrastructure. Determine whether such a managed instance or a separate setup on your own infrastructure is more appropriate for you in Step 1: Determine the need for an on-premise setup.

What you will need

About 45 minutes.
A text editor.
A web browser.
A machine with at least a dual core CPU, 2 GBs of RAM and 10 GBs of free storage. This machine should also be able to access the internet (for software installations). The machine’s operating system should ideally be linux-based to allow most options.
Administrator privileges on the target machine.
A basic understanding of Docker, Kubernetes and ICT concepts such as ports and reverse proxies.
A validator to use for the installation.

Note

Machine specifications and instances: The machine resource specifications listed above provide a reasonable minimum for the setup of a single validator instance on a single machine. Depending on your needs you may need to scale your validator to handle increased workloads and/or spread its installation across multiple machines to ensure high availability. Such considerations are addressed in the steps that follow.

How to complete this guide

Production installations vary depending on the needs they aim to achieve. The steps that follow are designed to be considered sequentially, to both provide the practical steps to follow, and also highlight the questions and design decisions you need to consider. Where different alternatives exist these are highlighted at the beginning of each step to help you choose the most suitable approach. The guide can be followed as-is to make an actual installation or be consulted as a set of guidelines.

To provide a concrete example we will be considering an installation for a SHACL-based validator for RDF, and specifically the fictional Purchase Order validator defined as part of the RDF validation guide. The approach to follow for other validators (for a different RDF or a non-RDF specification) is nonetheless identical.

Note

Configuration examples: The included examples consider the default settings for RDF validators when considering context paths and endpoints. The defaults for other types of validators can be found in their respective detailed guides (XML, RDF, JSON, CSV), but are also summarised in this guide for ease of reference.

Steps

Carry out the following steps to complete this guide.

Step 1: Determine the need for an on-premise setup

As part of its service offering, the Test Bed provides free hosting of validators in its public cloud infrastructure. Using this approach you define your validator’s configuration in a Git repository (e.g. on GitHub), which you then share with the Test Bed team to create your validator. The validator is exposed under https://www.itb.ec.europa.eu and is automatically updated whenever you make changes to your configuration. Using this approach saves you the cost and effort involved in hosting, operating and monitoring your validator yourself. It can be ideal if your primary goals are to use the validator as a community tool in support of your underlying specification(s), or as a building block for conformance testing.

Opting for an installation of a validator on your own infrastructure (on-premise) does nonetheless extend its potential and opens up additional use cases. An on-premise installation is advised in the following scenarios:

Private validator: Validators hosted on the Test Bed are publicly accessible, and even if not configured as such, are still deployed on the public cloud. If you want to fully restrict access to the validator an on-premise setup is the way to go.
Full operational control: You may want to have complete operational control over the validator, managing yourself any patches and updates. In addition, you may not want to be bound to the Test Bed’s operational window.
Integration with other resources: Your validator may need to access non-public resources and integrate with internal systems (e.g. an internal triple store).
Branding: You may want to expose the validator through your own portal, matching your existing branding and theming. In this case you would define your own user interface and use the validator internally via its machine-to-machine API.
Validation in production operations: Nothing prevents validators from being used as quality control services in production operations. Doing so however you would likely need to adapt the validator’s setup to ensure high availability and scale it to match your production load.

It is interesting to point out that an on-premise setup does not exclude a public instance managed by the Test Bed. You could for example offer the validator through the Test Bed as a public tool for users, but also operate one or more instances internally as part of your own quality assurance processes.

Step 2: Install the validator

The installation approach followed for the validator will likely depend on your organisation’s preferences and guidelines. The current guide provides three main alternative approaches that should cover most needs, each with its own degree of dependencies, isolation, simplicity and flexibility. Specifically:

Use the validator as an all-in-one JAR file with externally provided configuration.
Containerise the validator using Docker and run it as a Docker container.
Containerise the validator using Docker and run it using Kubernetes.

Of these three approaches the one proposed by the Test Bed is to use Kubernetes as it brings the isolation and flexibility of containers but with significant simplifications for advanced container orchestration.

Approach 1: Using JAR file

Note

When to use this approach? Opt for using the validator’s JAR file directly if your organisation prevents the use of Docker or Kubernetes.

Validators use the Java platform and are packaged as executable JAR files. These JAR files are termed all-in-one JAR files given that they include both the validator’s (web) application as well as an embedded server for its deployment. The only prerequisite software needed in this case is a Java Runtime Environment of at least version 17.

The installation of Java 17 can be done via a JDK installation package, although different operating system vendors may offer alternatives through their preferred package managers (e.g. apt or yum). Once you have completed the installation you can validate it by issuing java --version. Doing so you should see an output such as the following:

> java --version

java version "17.0.6" 2023-01-17 LTS
Java(TM) SE Runtime Environment (build 17.0.6+9-LTS-190)
Java HotSpot(TM) 64-Bit Server VM (build 17.0.6+9-LTS-190, mixed mode, sharing)

The next step is to download the validator JAR. The application itself is generic, representing a shell to be combined with your specific configuration, and is included in a package maintained and published by the Test Bed. Download the package that corresponds to your case:

XML validator: https://www.itb.ec.europa.eu/xml-jar/xml/validator.zip
RDF validator: https://www.itb.ec.europa.eu/shacl-jar/any/validator.zip
JSON validator: https://www.itb.ec.europa.eu/json-jar/any/validator.zip
CSV validator: https://www.itb.ec.europa.eu/csv-jar/any/validator.zip

Once downloaded extract the included JAR file in a target folder (e.g. /opt/validator):

/opt/validator
└── validator.jar

With the validator application in place what is needed is to provide it its configuration. This includes your domain configuration property file and validation artefacts as described in the validation guide. Considering our Purchase Order validator example, after having downloaded the RDF validator JAR, we will put in place our configuration files as follows:

/opt/validator
├── /resources
│   └── /order
│       ├── config.properties
│       └── /shapes
│           ├── PurchaseOrder-common-shapes.ttl
│           └── PurchaseOrder-large-shapes.ttl
└── validator.jar

Recall that a validator’s configuration is placed under a resource root (in our case folder /opt/validator/resources), within which each subfolder is considered a domain. These domains represent the logically distinct validator instances that will be exposed (in our case /opt/validator/resources/order corresponding to an order domain). Each domain folder contains a configuration property file (config.properties) and any set of files and folders that define validation artefacts (unless these are loaded from remote resources).

Note

Multiple domains: Nothing prevents you from defining additional domains if needed. This is a possibility not available for validators managed by the Test Bed, where each user is restricted to a single domain.

With the validator’s artefacts in place, the next step is to define or override its configuration properties either as environment variables or as system properties. The minimum properties you need to define are:

validator.resourceRoot: The root folder from which all domain configurations will be loaded.
logging.file.path: The validator’s log output folder.
validator.tmpFolder: The validator’s temporary work folder.

An example complete command line for your validator that defines these properties would be as follows:

java -Dvalidator.resourceRoot=/opt/validator/resources \
     -Dlogging.file.path=/opt/validator/logs \
     -Dvalidator.tmpFolder=/opt/validator/tmp \
     -jar /opt/validator/validator.jar

Launching our validator as such, and assuming it has its UI enabled, would make it available at http://HOST:8080/shacl/order/upload.

The final step is to ensure that your validator can be managed as a service, allowing it to recover from failures and at least start when the host server boots. How this is done varies on your preferences and OS support for services, with a common choice being systemd. Before defining the service we will create a group and user to run the validator with limited privileges:

sudo groupadd -r validatorgroup
sudo useradd -r -s /bin/false -g validatorgroup validatoruser

To define the service, create (as sudo) its definition in file /etc/systemd/system/myvalidator.service with contents such as the following:

[Unit]
Description=My validator

[Service]
WorkingDirectory=/opt/validator
ExecStart=/bin/java -Dvalidator.resourceRoot=/opt/validator/resources -Dlogging.file.path=/opt/validator/logs -Dvalidator.tmpFolder=/opt/validator/tmp -jar validator.jar
User=validatoruser
Type=simple
Restart=on-failure
RestartSec=10

[Install]
WantedBy=multi-user.target

With the service defined we can now use systemctl to control it. First refresh the daemon to detect the new service:

sudo systemctl daemon-reload

Once refreshed, set the service to start on server boot:

sudo systemctl enable myvalidator

Note

Operating your validator: For more information on how your validator can be operated check Operations when running via JAR file.

Approach 2: Using Docker

Note

When to use this approach? Opt for Docker if you cannot use Kubernetes or if you don’t need the availability and scaling it offers.

Using Docker allows you to define each validator instance in isolation and ensure it runs without any side-effects to other processes and data. The first step is to ensure that Docker is installed on the machine that will host the validator. If you already have Docker ensure it is at least at version 17.06.0, otherwise you can follow Docker’s online instructions to install or upgrade it.

Once your installation is complete, you will be able to test it using the docker --version command. This should provide output as follows:

> docker --version

Docker version 18.03.0-ce, build 0520e24

Defining your Docker image is discussed as part of the detailed validation guide. In short, you define a Dockerfile in which:

You start from the appropriate base image (e.g. isaitb/shacl-validator:latest for RDF validators).
You copy your configuration resource root folder into the validator.
You set property validator.resourceRoot as an environment variable pointing to your resource root.

Once defined you can then build your image (e.g. myorg/myvalidator:latest) and publish it to the Docker Hub or your private registry. With your image published, you can then launch a validator instance as follows:

docker run --name myvalidator -d --restart=unless-stopped -port 8080:8080 myorg/myvalidator:latest

Launching our validator as such, and assuming it has its UI enabled, would make it available at http://HOST:8080/shacl/order/upload. Note that specifying the --restart=unless-stopped flag ensures that the validator will be started at server boot and that it will be restarted if there is an unexpected failure.

Note

Operating your validator: For more information on how your validator can be operated check Operations when running via Docker.

Approach 3: Using Kubernetes

Note

When to use this approach? Always if possible. Use of Kubernetes can be adapted for both simple setups and advanced configurations to accommodate production operations.

Kubernetes is one of the most popular container orchestration solutions, providing powerful capabilities that allow you to define how your containers should be managed. Kubernetes is typically associated with large scale microservice architectures to automatically manage hundreds of containers over numerous nodes. If this is your case then you can leverage such a setup to easily bring online a set of validator instances depending on your needs. However, even if you do not already use Kubernetes, there is nothing preventing you from making an initial minimal setup to facilitate the operation of your validators. In order to be more complete, this guide assumes that you have no existing Kubernetes setup.

The first step is to install Kubernetes on your target machine, the minimum required version being 1.19. A good choice for a minimal, lightweight, but still production-grade installation is k3s that comes pre-configured for immediate use. The k3s documentation includes an installation guide with different options, however for a typical installation all you need is to issue the following command:

curl -sfL https://get.k3s.io | sh -s - --write-kubeconfig-mode 644

Once the installation completes you can verify your installation using kubectl, the Kubernetes administration tool, by checking your installation’s version.

> kubectl version
Client Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.4+k3s1", GitCommit:"838a906ab5eba62ff529d6a3a746384eba810758", GitTreeState:"clean", BuildDate:"2021-02-22T19:49:27Z", GoVersion:"go1.15.8", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.4+k3s1", GitCommit:"838a906ab5eba62ff529d6a3a746384eba810758", GitTreeState:"clean", BuildDate:"2021-02-22T19:49:27Z", GoVersion:"go1.15.8", Compiler:"gc", Platform:"linux/amd64"}

With Kubernetes installed the next step is to define a manifest for our validator (you can download a copy of this here). This manifest is used to instruct Kubernetes on how to run our validator.

apiVersion: v1
kind: Service
metadata:
  name: myvalidator-service
spec:
  type: NodePort
  selector:
    app: myvalidator
  ports:
    - protocol: TCP
      port: 8080
      targetPort: 8080
      nodePort: 30001
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myvalidator-deployment
  labels:
    app: myvalidator
spec:
  replicas: 1
  selector:
    matchLabels:
      app: myvalidator
  template:
    metadata:
      labels:
        app: myvalidator
    spec:
      containers:
      - name: myvalidator
        image: myorg/myvalidator:latest
        imagePullPolicy: Always
        ports:
        - containerPort: 8080
        readinessProbe:
          httpGet:
            path: /shacl/api/healthcheck
            port: 8080
          initialDelaySeconds: 30
          periodSeconds: 10

As part of this manifest we are describing two resources:

A Deployment named myvalidator-deployment. A deployment defines how individual containers will be created, validated and replicated (as pods in Kubernetes terms).
A Service named myvalidator-service. A service in Kubernetes is linked to a deployment to make the related application usable within and outside the Kubernetes cluster.

From this manifest a few points merit highlighting:

The deployment specifies that its containers are built from Docker image myorg/myvalidator:latest. This is the image we need to build for our validator.
For the container we specify a readinessProbe of type httpGet. This will make HTTP pings to the configured path (the validator’s SOAP WSDL definition) to detect when the container is ready to receive requests.
The deployment is set with a replicas value of 1. This tells Kubernetes to always ensure there is one validator instance available.
The service is of type NodePort and defines a specific nodePort. This port (30001 in our case) is the host port on which the service will be listening to.

To deploy the validator we finally need to apply its manifest. To do this we use the kubectl apply command:

> kubectl apply -f myvalidator.yml
service/myvalidator-service created
deployment.apps/myvalidator-deployment created

Launching our validator as such, and assuming it has its UI enabled, makes it available at http://HOST:30001/shacl/order/upload.

It is interesting to note that setting replicas to 1 defines a single validator instance but also enables rolling updates. Whenever there is a new deployment to make, Kubernetes creates a new container and only uses it to replace the existing one when it is ready (as defined by the configured readinessProbe). In addition, if the number of replicas is increased, Kubernetes creates the additional containers and automatically load-balances them in a round-robin manner. From an external perspective you continue to connect to the configured port (30001) as this is the service’s port, not the port of individual containers.

Note

Both rolling updates and transparent scaling through new instances need to be addressed in a custom manner when not using Kubernetes.

Step 3: Scale your validator

Scaling a validator involves deploying additional instances to gracefully handle increased load. Such new validator instances can be deployed either on the same host or across separate hosts. In the latter case, this also increases the validator’s availability in case of machine failure.

All validators are stateless and self-sufficient. Each validator instance at startup loads its own configuration and does not record data for incoming validation requests and produced reports. There is one caveat however, linked to the validator’s web user interface. Once a user performs a validation, the resulting report is created and asynchronously downloaded to the user’s browser via the validation result page. Even though reports are deleted upon download this means that if a user validates against a specific validator instance, the report download requests need to be sent to the same instance. Such a constraint does not apply when validators are used through their machine-to-machine API (SOAP and REST). In terms of scaling the validator with multiple instances this means that:

If the web user interface is enabled, a mechanism needs to be in place to ensure session affinity, i.e. that user sessions are sticky.
If only machine-to-machine APIs are used there is no need for session affinity.

Note

Disabling the web user interface: If the validator’s web user interface is not needed it can be disabled through the validator.channels configuration property and omitting value form (see the validator’s domain configuration properties).

Given that each validator is realised by multiple instances, we need a means of distributing load across them and ensuring, if needed, session affinity. There are different ways of achieving this but as a general case we will assume that you have a web server in place that will act as a load balancer and reverse proxy. For details on how to configure your reserve proxy check Step 4: Configure your reverse proxy.

Scaling when running via JAR file

When operating as a JAR file, deploying additional instances involves the following steps:

On the target machine copy the validator folder for each new instance.
Define a service per validator instance for its startup and management.
Adapt the runtime configuration for each service to match its folder and to assign a different port number (if on the same host).
Adapt your reverse proxy to load balance requests across your instances.

To define three instances of your validator on the same host you need to have three copies of the /opt/validator folder discussed in the installation:

/opt/validator1
├── /resources
│   └── ...
└── validator.jar
/opt/validator2
/opt/validator3

For each validator instance define a systemd service:

Service /etc/systemd/system/myvalidator1.service for instance 1.
Service /etc/systemd/system/myvalidator2.service for instance 2.
Service /etc/systemd/system/myvalidator3.service for instance 3.

When defining each instance’s service adapt the runtime configuration properties to match its folder. When doing this ensure that you also set the server.port property to different values per service to avoid port conflicts. For example, the service of instance 1 would define the following:

WorkingDirectory=/opt/validator1
ExecStart=/bin/java -Dserver.port=10001 -Dvalidator.resourceRoot=/opt/validator1/resources -Dlogging.file.path=/opt/validator1/logs -Dvalidator.tmpFolder=/opt/validator1/tmp -jar validator.jar

To complete the services’ setup ensure you reload the systemctl daemon and enable them for automatic startup:

sudo systemctl daemon-reload
sudo systemctl enable myvalidator1
sudo systemctl enable myvalidator2
sudo systemctl enable myvalidator3

The final step is to load balance your instances through your reverse proxy. Check Proxy setup when running via JAR file or Docker to do this and complete your setup.

Scaling when running via Docker

When operating via Docker, launching additional validator instances is simplified through containerisation. The steps to follow in this case are:

Launch a container per instance mapping the containers’ ports to different host ports.
Adapt your reverse proxy to load balance requests across your containers.

To define three instances of your validator issue three docker run commands adapting each container’s port mapping and name (for easier referencing):

docker run --name myvalidator1 -d --restart=unless-stopped -port 10001:8080 myorg/myvalidator:latest
docker run --name myvalidator2 -d --restart=unless-stopped -port 10002:8080 myorg/myvalidator:latest
docker run --name myvalidator3 -d --restart=unless-stopped -port 10003:8080 myorg/myvalidator:latest

Executing these three commands will launch three containers, listening on ports 10001, 10002 and 10003. Check Proxy setup when running via JAR file or Docker to see how your reverse proxy should be configured to load balance their traffic.

Scaling when running via Kubernetes

One of the key use cases of Kubernetes is the simplification of operating containerised applications and in particular their scaling depending on your needs. Scaling your validator with additional instances could be done exactly as in the case of JAR file or Docker based deployments by creating distinct application instances, each mapped to different host ports, and load balancing them through your reverse proxy. This could be achieved by defining per instance a new Service mapped to a different nodePort. Doing so however would prevent Kubernetes from managing all instances as a whole and bringing benefits such as resource-based load distribution and rolling updates.

Considering this, we will scale our validator as expected by Kubernetes using replicas. To do this we will carry out the following steps:

Adapt the validator’s manifest, setting the number of replicas to the desired number of instances.
If you use the validator’s web UI, define an ingress to ensure session affinity.

To set up three instances, we would need to edit the validator’s manifest (download an updated copy here), and specify the number of replicas:

apiVersion: v1
kind: Service
...
---
apiVersion: apps/v1
kind: Deployment
...
spec:
  replicas: 3
  ...

This tells Kubernetes that for our service we want to always have three instances running. Requests sent to the service’s listen port (the nodePort set to 30001) will be load balanced in a round-robin manner across instances. Moreover, when updates are deployed Kubernetes will do this in a rolling manner, incrementally updating instances while ensuring that the prescribed number of replicas is always available. Given that Kubernetes handles load balancing itself there is no need for any such configuration on your reverse proxy.

Completing the change is done by applying the manifest change:

kubectl apply -f myvalidator.yml

In case we plan to use the validator’s web user interface we need an extra level of configuration to ensure session affinity when selecting validator instance pods. Given that pods are internally managed by Kubernetes, this is not handled through a separate reverse proxy. In contrast, this is achieved by means of an Ingress, the mechanism foreseen by Kubernetes to manage inbound traffic external to the cluster.

The k3s distribution we installed comes by default with the Traefik Ingress Controller enabled as the implementation used to realise ingress rules, listening on the node’s port 80. An alternate configuration, or altogether different ingress controller, could be defined, however this default setup allows to proceed without additional steps.

To ensure load balancing and session affinity we need to define the ingress rule for our validator and adapt our service definition (download the updated manifest file here)

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: myvalidator-ingress
spec:
  rules:
  - http:
      paths:
      - path: /shacl
        pathType: Prefix
        backend:
          service:
            name: myvalidator-service
            port:
              number: 8080
---
apiVersion: v1
kind: Service
metadata:
  name: myvalidator-service
  annotations:
    traefik.ingress.kubernetes.io/affinity: "true"
spec:
  type: ClusterIP
  selector:
    app: myvalidator
  ports:
    - protocol: TCP
      port: 8080
      targetPort: 8080
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myvalidator-deployment
  labels:
    app: myvalidator
spec:
  replicas: 3
  selector:
    matchLabels:
      app: myvalidator
  template:
    metadata:
      labels:
        app: myvalidator
    spec:
      containers:
      - name: myvalidator
        image: myorg/myvalidator:latest
        imagePullPolicy: Always
        ports:
        - containerPort: 8080
        readinessProbe:
          httpGet:
            path: /shacl/api/healthcheck
            port: 8080
          initialDelaySeconds: 30
          periodSeconds: 10

The manifest includes a new definition for myvalidator-ingress specifying that any requests received by the ingress controller whose path starts with /shacl should be directed to service myvalidator-service listening internally on port 8080. Apart from this, the myvalidator-service service definition is also adapted with two changes:

The service’s type is set to ClusterIP. This means that the service is no longer exposed on a host port, given that requests are now received by the ingress controller.
The service is annotated with traefik.ingress.kubernetes.io/affinity set to true to ensure session affinity is maintained.

To complete the updates apply the manifest file changes:

kubectl apply -f myvalidator.yml

With our updated configuration, the service is now available at http://HOST/shacl/order/upload and, thanks to our ingress configuration, is load balanced across its pod replicas.

Step 4: Configure your reverse proxy

A reverse proxy acts as the entry point to your validator and is needed in all but the simplest setups. Use of a reverse proxy addresses the following points:

It maps publicly exposed paths to your internal configuration.
It rewrites internal hosts, ports and path mappings from responses to what should be externally exposed.
It enables load balancing across multiple validator instances.
It ensures session affinity needed by the validator’s web user interface.

A reverse proxy is not always required however. If you only use your validator internally the services that consume it will likely be able to access it directly (i.e. there is no need for mapping or rewriting). In addition, if you only run a single validator instance there is no need for load balancing and session affinity. Moreover, in case you have opted for a Kubernetes setup such aspects are already handled by your ingress controller.

In short, you can skip defining a reverse proxy for your validator if both these points apply:

You only access the validator internally.
You either run a single validator instance, or use multiple instances via Kubernetes.

In order to provide concrete steps this guide assumes the popular nginx server as the reverse proxy in use. The discussed principles and configurations are nonetheless simple and can be applied to any server implementation.

Proxy setup when running via JAR file or Docker

Note

Why use a reverse proxy in this case? Your reverse proxy is used to publicly expose your validator and load balance its instances.

When exposing your validator externally you need to map its location to a public path. In doing so you also need to ensure that any full internal addresses returned by the validator are adapted to their public counterparts. To do this include the following location block in your configuration:

location /shacl/ {
    proxy_pass http://HOST:PORT/shacl/;
    sub_filter "http://HOST:PORT/shacl/soap/order/validation" "https://www.myorg.org/shacl/soap/order/validation";
    sub_filter_once off;
    sub_filter_types *;
}

This configuration matches requests for paths starting with /shacl/ that are passed onto the service listening on the internal HOST and PORT. The sub_filter definition is defined to cover the service’s SOAP API to rewrite the endpoint URL reported in its WSDL to its public counterpart.

An alternative approach to using sub_filter directives to correctly publish your validator’s the SOAP endpoint(s) is to configure in the validator itself the base SOAP endpoint publishing URL. This is the full URL up to, but without including, the domain part of the URL (“order” in our example). To do this you will need to set a system property or environment variable named validator.baseSoapEndpointUrl with the URL as its value. For example, if we are running via JAR file we would do this as follows:

java -Dvalidator.resourceRoot=/opt/validator/resources \
     -Dlogging.file.path=/opt/validator/logs \
     -Dvalidator.tmpFolder=/opt/validator/tmp \
     -Dvalidator.baseSoapEndpointUrl=https://www.myorg.org/shacl/soap/
     -jar /opt/validator/validator.jar

Having used this system property (or environment variable) the proxy setup is largely simplified by avoiding rewrites:

location /shacl/ {
    proxy_pass http://HOST:PORT/shacl/;
}

In case we are running multiple validator instances we additionally need to enable load balancing and, if the validator’s user interface is enabled, session affinity. This is achieved with the following configuration:

http {

    ...
    upstream backend {
        ip_hash; # Only needed if session affinity is necessary
        server HOST1:PORT1;
        server HOST2:PORT2;
        server HOST3:PORT3;
    }

    server {
        ...
        location /shacl/ {
            proxy_pass http://backend/shacl/;
            sub_filter "http://backend/shacl/soap/order/validation" "https://www.myorg.org/shacl/soap/order/validation";
            sub_filter_once off;
            sub_filter_types *;
        }
    }
}

This configuration uses the backend placeholder that is defined as a set of upstream servers and is replaced accordingly when requests are treated. Note that adding ip_hash ensures session affinity by selecting the upstream server based on the client’s IP address. If access to the web interface is not needed this can be omitted resulting in a default round-robin selection.

Proxy setup when running via Kubernetes

Note

Why use a reverse proxy in this case? Your reverse proxy is used to publicly expose your validator.

As discussed when scaling your validator via Kubernetes, load balancing and session affinity (if needed) are handled at the level of Kubernetes via an ingress definition. This means that the only reason to define an external reverse proxy on top of this would be to manage the validator’s public exposure.

When exposing your validator externally you need to map its location to a public path. In doing so you also need to ensure that any full internal addresses returned by the validator are adapted to their public counterparts. To do this include the following location block in your configuration:

location /shacl/ {
    proxy_pass http://HOST:PORT/shacl/;
    sub_filter "http://HOST:PORT/shacl/soap/order/validation" "https://www.myorg.org/shacl/soap/order/validation";
    sub_filter_once off;
    sub_filter_types *;
}

This configuration matches requests for paths starting with /shacl/ that are passed onto the service listening on the internal HOST and PORT. The PORT in particular, is either the service’s defined nodePort if running without an ingress, or, if an ingress is defined, the port on which the ingress controller is listening (port 80 by default).

The proxy’s sub_filter definition is defined to cover the service’s SOAP API to rewrite the endpoint URL reported in its WSDL to its public counterpart. An alternative approach to using sub_filter directives to correctly publish your validator’s the SOAP endpoint(s) is to configure in the validator itself the base SOAP endpoint publishing URL. This is the full URL up to, but without including, the domain part of the URL (“order” in our example). To do this you will need to set an environment variable named validator.baseSoapEndpointUrl with the URL as its value:

...
spec:
  containers:
  - name: myvalidator
  ...
  env:
  - name: "validator.baseSoapEndpointUrl"
    value: "https://www.myorg.org/shacl/soap/"
  ...

Having set this environment variable the proxy setup is largely simplified by avoiding rewrites:

location /shacl/ {
    proxy_pass http://HOST:PORT/shacl/;
}

Step 5: Operate your validator

The current section lists commands you may find useful when managing your validator instance. These depend on the installation approach you followed.

Operations when running via JAR file

The following table summarises common actions for a validator running as a JAR file, via a systemd service:

Action	Command
View validator logs	The log file is available under the validator’s configured `logging.file.path` folder. If this is set to e.g. `/opt/validator/logs` the log file will be `/opt/validator/logs/validator.log`.
Stop validator	For all service instances issue `sudo systemctl stop SERVICE_NAME`.
Start validator	For all service instances issue `sudo systemctl start SERVICE_NAME`.

To update a validator follow these steps:

Stop each of its service instances.
Download and extract the latest JAR file (if you want to update the validator’s software). Place the JAR file in each validator instance’s folder.
Replace the resources within each validator instance’s folder with their latest version (if you want to update the validator’s artefacts).
Start the validator’s services.

Operations when running via Docker

The following table summarises common actions for a validator running as a Docker container:

Action	Command
View validator logs	Issue `docker logs -f CONTAINER_NAME`.
Stop validator	For each container issue `docker stop CONTAINER_NAME`.
Start validator	For each container issue `docker start CONTAINER_NAME`.

To update a validator follow these steps:

Pull the latest image via docker pull IMAGE.
Stop and remove each container via docker stop CONTAINER_NAME && docker rm CONTAINER_NAME.
Rerun each container by issuing its docker run command.

Operations when running via Kubernetes

The following table summarises common actions for a validator running via Kubernetes:

Action	Command
View validator logs	Issue `kubectl get pods` to view the running pods and their names, followed by `kubectl logs -f POD_NAME`.
Stop validator	Edit the service’s manifest setting the `replicas` to 0 and issuing `kubectl apply -f myvalidator.yml`.
Start validator	Edit the service’s manifest setting the `replicas` to a value greater than 0 and issuing `kubectl apply -f myvalidator.yml`.

To update a validator issue kubectl rollout restart deployments/myvalidator-deployment. Given that the deployment’s manifest defines that each container’s image is pulled before it is started, this results in a rolling update of all pods based on the latest image (if one is available).

Summary

Congratulations! You have now set up a validator ready for production use. In doing so you selected the installation approach that suits your needs (JAR, Docker or Kubernetes) and installed its supporting tooling. You also considered what is needed to scale your validator for increased throughput and availability and manage its access via your reverse proxy.

References

This section contains additional references linked to this guide.

Default paths per validator type

In the course of this guide several references are made in configurations and examples to paths that are exposed by the validator. The example considered by this guide is an RDF validator with a default application context of /shacl.

The following table lists the default contexts and paths per validator type.

Validator type	Default context path	SOAP API endpoint	SOAP WSDL path	Web UI	REST API
RDF	`/shacl`	`/shacl/soap/DOMAIN/validation`	`/shacl/soap/DOMAIN/validation?wsdl`	`/shacl/DOMAIN/upload`	`/shacl/swagger-ui.html`
JSON	`/json`	`/json/soap/DOMAIN/validation`	`/json/soap/DOMAIN/validation?wsdl`	`/json/DOMAIN/upload`	`/json/swagger-ui.html`
CSV	`/csv`	`/csv/soap/DOMAIN/validation`	`/csv/soap/DOMAIN/validation?wsdl`	`/csv/DOMAIN/upload`	`/csv/swagger-ui.html`
XML	`/`	`/api/DOMAIN/validation`	`/api/DOMAIN/validation?wsdl`	`/DOMAIN/upload`	`/swagger-ui.html`

Note

Adapting the context path: The context path used by a validator can be set using property server.servlet.context-path set as a system property or environment variable.

Guide: Installing a validator for production use

What you will achieve

What you will need

How to complete this guide

Steps

Step 1: Determine the need for an on-premise setup

Step 2: Install the validator

Approach 1: Using JAR file

Approach 2: Using Docker

Approach 3: Using Kubernetes

Step 3: Scale your validator

Scaling when running via JAR file

Scaling when running via Docker

Scaling when running via Kubernetes

Step 4: Configure your reverse proxy

Proxy setup when running via JAR file or Docker

Proxy setup when running via Kubernetes

Step 5: Operate your validator

Operations when running via JAR file

Operations when running via Docker

Operations when running via Kubernetes

Summary

See also

References

Default paths per validator type