Guide: Installing a validator for production use
Track |
---|
This guide walks you through the process of installing an XML, RDF, JSON or CSV validator, on your own infrastructure for production use.
What you will achieve
The purpose of this guide is to walk you through the setup of a validator on your own infrastructure for use in production. With the term validator we explicitly refer to a validator instance that has been defined through configuration by reusing one of the test bed’s validator foundations for XML, RDF, JSON or CSV content. Such validators share common characteristics that allow for a detailed set of guidelines when installing on your own infrastructure. Other custom validators can vary significantly in their design and needs, and are not addressed by this guide.
If you choose to not carry out an actual installation you will at least be aware of the steps and settings to configure for such a setup.
Note
Using the DIGIT test bed: The European Commission provides hosting and automated updates for public validators as part of the test bed’s cloud infrastructure. Determine whether such a managed instance or a separate setup on your own infrastructure is more appropriate for you in Step 1: Determine the need for an on-premise setup.
What you will need
About 45 minutes.
A text editor.
A web browser.
A machine with at least a dual core CPU, 2 GBs of RAM and 10 GBs of free storage. This machine should also be able to access the internet (for software installations). The machine’s operating system should ideally be linux-based to allow most options.
Administrator privileges on the target machine.
A basic understanding of Docker, Kubernetes and ICT concepts such as ports and reverse proxies.
A validator to use for the installation.
Note
Machine specifications and instances: The machine resource specifications listed above provide a reasonable minimum for the setup of a single validator instance on a single machine. Depending on your needs you may need to scale your validator to handle increased workloads and/or spread its installation across multiple machines to ensure high availability. Such considerations are addressed in the steps that follow.
How to complete this guide
Production installations vary depending on the needs they aim to achieve. The steps that follow are designed to be considered sequentially, to both provide the practical steps to follow, and also highlight the questions and design decisions you need to consider. Where different alternatives exist these are highlighted at the beginning of each step to help you choose the most suitable approach. The guide can be followed as-is to make an actual installation or be consulted as a set of guidelines.
To provide a concrete example we will be considering an installation for a SHACL-based validator for RDF, and specifically the fictional Purchase Order validator defined as part of the RDF validation guide. The approach to follow for other validators (for a different RDF or a non-RDF specification) is nonetheless identical.
Note
Configuration examples: The included examples consider the default settings for RDF validators when considering context paths and endpoints. The defaults for other types of validators can be found in their respective detailed guides (XML, RDF, JSON, CSV), but are also summarised in this guide for ease of reference.
Steps
Carry out the following steps to complete this guide.
Step 1: Determine the need for an on-premise setup
As part of its service offering, the test bed provides free hosting of validators in its public cloud infrastructure. Using this approach
you define your validator’s configuration in a Git repository (e.g. on GitHub), which you then share with the test bed team to create
your validator. The validator is exposed under https://www.itb.ec.europa.eu
and is automatically updated whenever you make changes to your
configuration. Using this approach saves you the cost and effort involved in hosting, operating and monitoring your validator yourself.
It can be ideal if your primary goals are to use the validator as a community tool in support of your underlying specification(s), or as a building
block for conformance testing.
Opting for an installation of a validator on your own infrastructure (on-premise) does nonetheless extend its potential and opens up additional use cases. An on-premise installation is advised in the following scenarios:
Private validator: Validators hosted on the test bed are publicly accessible, and even if not configured as such, are still deployed on the public cloud. If you want to fully restrict access to the validator an on-premise setup is the way to go.
Full operational control: You may want to have complete operational control over the validator, managing yourself any patches and updates. In addition, you may not want to be bound to the test bed’s operational window.
Integration with other resources: Your validator may need to access non-public resources and integrate with internal systems (e.g. an internal triple store).
Branding: You may want to expose the validator through your own portal, matching your existing branding and theming. In this case you would define your own user interface and use the validator internally via its machine-to-machine API.
Validation in production operations: Nothing prevents validators from being used as quality control services in production operations. Doing so however you would likely need to adapt the validator’s setup to ensure high availability and scale it to match your production load.
It is interesting to point out that an on-premise setup does not exclude a public instance managed by the test bed. You could for example offer the validator through the test bed as a public tool for users, but also operate one or more instances internally as part of your own quality assurance processes.
Step 2: Install the validator
The installation approach followed for the validator will likely depend on your organisation’s preferences and guidelines. The current guide provides three main alternative approaches that should cover most needs, each with its own degree of dependencies, isolation, simplicity and flexibility. Specifically:
Use the validator as an all-in-one JAR file with externally provided configuration.
Containerise the validator using Docker and run it as a Docker container.
Containerise the validator using Docker and run it using Kubernetes.
Of these three approaches the one proposed by the test bed is to use Kubernetes as it brings the isolation and flexibility of containers but with significant simplifications for advanced container orchestration.
Approach 1: Using JAR file
Note
When to use this approach? Opt for using the validator’s JAR file directly if your organisation prevents the use of Docker or Kubernetes.
Validators use the Java platform and are packaged as executable JAR files. These JAR files are termed all-in-one JAR files given that they include both the validator’s (web) application as well as an embedded server for its deployment. The only prerequisite software needed in this case is a Java Runtime Environment of at least version 17.
The installation of Java 17 can be done via a JDK installation package, although different operating system vendors may offer
alternatives through their preferred package managers (e.g. apt or yum). Once you have completed the installation you can validate
it by issuing java --version
. Doing so you should see an output such as the following:
> java --version
java version "17.0.6" 2023-01-17 LTS
Java(TM) SE Runtime Environment (build 17.0.6+9-LTS-190)
Java HotSpot(TM) 64-Bit Server VM (build 17.0.6+9-LTS-190, mixed mode, sharing)
The next step is to download the validator JAR. The application itself is generic, representing a shell to be combined with your specific configuration, and is included in a package maintained and published by the test bed. Download the package that corresponds to your case:
XML validator: https://www.itb.ec.europa.eu/xml-jar/xml/validator.zip
RDF validator: https://www.itb.ec.europa.eu/shacl-jar/any/validator.zip
JSON validator: https://www.itb.ec.europa.eu/json-jar/any/validator.zip
CSV validator: https://www.itb.ec.europa.eu/csv-jar/any/validator.zip
Once downloaded extract the included JAR file in a target folder (e.g. /opt/validator
):
/opt/validator
└── validator.jar
With the validator application in place what is needed is to provide it its configuration. This includes your domain configuration property file and validation artefacts as described in the validation guide. Considering our Purchase Order validator example, after having downloaded the RDF validator JAR, we will put in place our configuration files as follows:
/opt/validator
├── /resources
│ └── /order
│ ├── config.properties
│ └── /shapes
│ ├── PurchaseOrder-common-shapes.ttl
│ └── PurchaseOrder-large-shapes.ttl
└── validator.jar
Recall that a validator’s configuration is placed under a resource root (in our case folder /opt/validator/resources
), within which each
subfolder is considered a domain. These domains represent the logically distinct validator instances that will be exposed (in our case
/opt/validator/resources/order
corresponding to an order domain). Each domain folder contains a
configuration property file (config.properties
) and any set of files and folders that define
validation artefacts (unless these are loaded from remote resources).
Note
Multiple domains: Nothing prevents you from defining additional domains if needed. This is a possibility not available for validators managed by the test bed, where each user is restricted to a single domain.
With the validator’s artefacts in place, the next step is to define or override its configuration properties either as environment variables or as system properties. The minimum properties you need to define are:
validator.resourceRoot
: The root folder from which all domain configurations will be loaded.
logging.file.path
: The validator’s log output folder.
validator.tmpFolder
: The validator’s temporary work folder.
An example complete command line for your validator that defines these properties would be as follows:
java -Dvalidator.resourceRoot=/opt/validator/resources \
-Dlogging.file.path=/opt/validator/logs \
-Dvalidator.tmpFolder=/opt/validator/tmp \
-jar /opt/validator/validator.jar
Launching our validator as such, and assuming it has its UI enabled, would make it available at http://HOST:8080/shacl/order/upload
.
The final step is to ensure that your validator can be managed as a service, allowing it to recover from failures and at least start when the host server boots. How this is done varies on your preferences and OS support for services, with a common choice being systemd. Before defining the service we will create a group and user to run the validator with limited privileges:
sudo groupadd -r validatorgroup
sudo useradd -r -s /bin/false -g validatorgroup validatoruser
To define the service, create (as sudo) its definition in file /etc/systemd/system/myvalidator.service
with contents such as the following:
[Unit]
Description=My validator
[Service]
WorkingDirectory=/opt/validator
ExecStart=/bin/java -Dvalidator.resourceRoot=/opt/validator/resources -Dlogging.file.path=/opt/validator/logs -Dvalidator.tmpFolder=/opt/validator/tmp -jar validator.jar
User=validatoruser
Type=simple
Restart=on-failure
RestartSec=10
[Install]
WantedBy=multi-user.target
With the service defined we can now use systemctl to control it. First refresh the daemon to detect the new service:
sudo systemctl daemon-reload
Once refreshed, set the service to start on server boot:
sudo systemctl enable myvalidator
Note
Operating your validator: For more information on how your validator can be operated check Operations when running via JAR file.
Approach 2: Using Docker
Note
When to use this approach? Opt for Docker if you cannot use Kubernetes or if you don’t need the availability and scaling it offers.
Using Docker allows you to define each validator instance in isolation and ensure it runs without any side-effects to other processes and data. The first step is to ensure that Docker is installed on the machine that will host the validator. If you already have Docker ensure it is at least at version 17.06.0, otherwise you can follow Docker’s online instructions to install or upgrade it.
Once your installation is complete, you will be able to test it using the docker --version
command. This should provide output as follows:
> docker --version
Docker version 18.03.0-ce, build 0520e24
Defining your Docker image is discussed as part of the detailed validation guide. In short, you define a Dockerfile in which:
You start from the appropriate base image (e.g.
isaitb/shacl-validator:latest
for RDF validators).You copy your configuration resource root folder into the validator.
You set property
validator.resourceRoot
as an environment variable pointing to your resource root.
Once defined you can then build your image (e.g. myorg/myvalidator:latest
) and publish it to the Docker Hub or your private registry. With
your image published, you can then launch a validator instance as follows:
docker run --name myvalidator -d --restart=unless-stopped -port 8080:8080 myorg/myvalidator:latest
Launching our validator as such, and assuming it has its UI enabled, would make it available at http://HOST:8080/shacl/order/upload
. Note
that specifying the --restart=unless-stopped
flag ensures that the validator will be started at server boot and that it will be restarted
if there is an unexpected failure.
Note
Operating your validator: For more information on how your validator can be operated check Operations when running via Docker.
Approach 3: Using Kubernetes
Note
When to use this approach? Always if possible. Use of Kubernetes can be adapted for both simple setups and advanced configurations to accommodate production operations.
Kubernetes is one of the most popular container orchestration solutions, providing powerful capabilities that allow you to define how your containers should be managed. Kubernetes is typically associated with large scale microservice architectures to automatically manage hundreds of containers over numerous nodes. If this is your case then you can leverage such a setup to easily bring online a set of validator instances depending on your needs. However, even if you do not already use Kubernetes, there is nothing preventing you from making an initial minimal setup to facilitate the operation of your validators. In order to be more complete, this guide assumes that you have no existing Kubernetes setup.
The first step is to install Kubernetes on your target machine, the minimum required version being 1.19. A good choice for a minimal, lightweight, but still production-grade installation is k3s that comes pre-configured for immediate use. The k3s documentation includes an installation guide with different options, however for a typical installation all you need is to issue the following command:
curl -sfL https://get.k3s.io | sh -s - --write-kubeconfig-mode 644
Once the installation completes you can verify your installation using kubectl, the Kubernetes administration tool, by checking your installation’s version.
> kubectl version
Client Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.4+k3s1", GitCommit:"838a906ab5eba62ff529d6a3a746384eba810758", GitTreeState:"clean", BuildDate:"2021-02-22T19:49:27Z", GoVersion:"go1.15.8", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.4+k3s1", GitCommit:"838a906ab5eba62ff529d6a3a746384eba810758", GitTreeState:"clean", BuildDate:"2021-02-22T19:49:27Z", GoVersion:"go1.15.8", Compiler:"gc", Platform:"linux/amd64"}
With Kubernetes installed the next step is to define a manifest for our validator (you can download a copy of this here
).
This manifest is used to instruct Kubernetes on how to run our validator.
apiVersion: v1
kind: Service
metadata:
name: myvalidator-service
spec:
type: NodePort
selector:
app: myvalidator
ports:
- protocol: TCP
port: 8080
targetPort: 8080
nodePort: 30001
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: myvalidator-deployment
labels:
app: myvalidator
spec:
replicas: 1
selector:
matchLabels:
app: myvalidator
template:
metadata:
labels:
app: myvalidator
spec:
containers:
- name: myvalidator
image: myorg/myvalidator:latest
imagePullPolicy: Always
ports:
- containerPort: 8080
readinessProbe:
httpGet:
path: /shacl/api/healthcheck
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
As part of this manifest we are describing two resources:
A Deployment named
myvalidator-deployment
. A deployment defines how individual containers will be created, validated and replicated (as pods in Kubernetes terms).A Service named
myvalidator-service
. A service in Kubernetes is linked to a deployment to make the related application usable within and outside the Kubernetes cluster.
From this manifest a few points merit highlighting:
The deployment specifies that its containers are built from Docker image
myorg/myvalidator:latest
. This is the image we need to build for our validator.For the container we specify a
readinessProbe
of typehttpGet
. This will make HTTP pings to the configured path (the validator’s SOAP WSDL definition) to detect when the container is ready to receive requests.The deployment is set with a
replicas
value of 1. This tells Kubernetes to always ensure there is one validator instance available.The service is of type
NodePort
and defines a specificnodePort
. This port (30001 in our case) is the host port on which the service will be listening to.
To deploy the validator we finally need to apply its manifest. To do this we use the kubectl apply
command:
> kubectl apply -f myvalidator.yml
service/myvalidator-service created
deployment.apps/myvalidator-deployment created
Launching our validator as such, and assuming it has its UI enabled, makes it available at http://HOST:30001/shacl/order/upload
.
It is interesting to note that setting replicas
to 1 defines a single validator instance but also enables rolling updates. Whenever
there is a new deployment to make, Kubernetes creates a new container and only uses it to replace the existing one when it is ready (as defined
by the configured readinessProbe
). In addition, if the number of replicas
is increased, Kubernetes creates the additional containers
and automatically load-balances them in a round-robin manner. From an external perspective you continue to connect to the configured port (30001)
as this is the service’s port, not the port of individual containers.
Note
Both rolling updates and transparent scaling through new instances need to be addressed in a custom manner when not using Kubernetes.
Step 3: Scale your validator
Scaling a validator involves deploying additional instances to gracefully handle increased load. Such new validator instances can be deployed either on the same host or across separate hosts. In the latter case, this also increases the validator’s availability in case of machine failure.
All validators are stateless and self-sufficient. Each validator instance at startup loads its own configuration and does not record data for incoming validation requests and produced reports. There is one caveat however, linked to the validator’s web user interface. Once a user performs a validation, the resulting report is created and asynchronously downloaded to the user’s browser via the validation result page. Even though reports are deleted upon download this means that if a user validates against a specific validator instance, the report download requests need to be sent to the same instance. Such a constraint does not apply when validators are used through their machine-to-machine API (SOAP and REST). In terms of scaling the validator with multiple instances this means that:
If the web user interface is enabled, a mechanism needs to be in place to ensure session affinity, i.e. that user sessions are sticky.
If only machine-to-machine APIs are used there is no need for session affinity.
Note
Disabling the web user interface: If the validator’s web user interface is not needed it can be disabled through the validator.channels
configuration property and omitting value form
(see the validator’s domain configuration properties).
Given that each validator is realised by multiple instances, we need a means of distributing load across them and ensuring, if needed, session affinity. There are different ways of achieving this but as a general case we will assume that you have a web server in place that will act as a load balancer and reverse proxy. For details on how to configure your reserve proxy check Step 4: Configure your reverse proxy.
Scaling when running via JAR file
When operating as a JAR file, deploying additional instances involves the following steps:
On the target machine copy the validator folder for each new instance.
Define a service per validator instance for its startup and management.
Adapt the runtime configuration for each service to match its folder and to assign a different port number (if on the same host).
Adapt your reverse proxy to load balance requests across your instances.
To define three instances of your validator on the same host you need to have three copies of the /opt/validator
folder discussed in the
installation:
/opt/validator1
├── /resources
│ └── ...
└── validator.jar
/opt/validator2
/opt/validator3
For each validator instance define a systemd service:
Service
/etc/systemd/system/myvalidator1.service
for instance 1.Service
/etc/systemd/system/myvalidator2.service
for instance 2.Service
/etc/systemd/system/myvalidator3.service
for instance 3.
When defining each instance’s service adapt the runtime configuration properties to match its folder. When doing this ensure that you also
set the server.port
property to different values per service to avoid port conflicts. For example, the service of instance 1 would define
the following:
WorkingDirectory=/opt/validator1
ExecStart=/bin/java -Dserver.port=10001 -Dvalidator.resourceRoot=/opt/validator1/resources -Dlogging.file.path=/opt/validator1/logs -Dvalidator.tmpFolder=/opt/validator1/tmp -jar validator.jar
To complete the services’ setup ensure you reload the systemctl daemon and enable them for automatic startup:
sudo systemctl daemon-reload
sudo systemctl enable myvalidator1
sudo systemctl enable myvalidator2
sudo systemctl enable myvalidator3
The final step is to load balance your instances through your reverse proxy. Check Proxy setup when running via JAR file or Docker to do this and complete your setup.
Scaling when running via Docker
When operating via Docker, launching additional validator instances is simplified through containerisation. The steps to follow in this case are:
Launch a container per instance mapping the containers’ ports to different host ports.
Adapt your reverse proxy to load balance requests across your containers.
To define three instances of your validator issue three docker run
commands adapting each container’s port mapping and name (for
easier referencing):
docker run --name myvalidator1 -d --restart=unless-stopped -port 10001:8080 myorg/myvalidator:latest
docker run --name myvalidator2 -d --restart=unless-stopped -port 10002:8080 myorg/myvalidator:latest
docker run --name myvalidator3 -d --restart=unless-stopped -port 10003:8080 myorg/myvalidator:latest
Executing these three commands will launch three containers, listening on ports 10001, 10002 and 10003. Check Proxy setup when running via JAR file or Docker to see how your reverse proxy should be configured to load balance their traffic.
Scaling when running via Kubernetes
One of the key use cases of Kubernetes is the simplification of operating containerised applications and in particular their scaling depending on your needs. Scaling your validator with additional instances could be done exactly as in the case of JAR file or Docker based deployments by creating distinct application instances, each mapped to different host ports, and load balancing them through your reverse proxy. This could be achieved by defining per instance a new Service mapped to a different nodePort. Doing so however would prevent Kubernetes from managing all instances as a whole and bringing benefits such as resource-based load distribution and rolling updates.
Considering this, we will scale our validator as expected by Kubernetes using replicas. To do this we will carry out the following steps:
Adapt the validator’s manifest, setting the number of replicas to the desired number of instances.
If you use the validator’s web UI, define an ingress to ensure session affinity.
To set up three instances, we would need to edit the validator’s manifest (download an updated copy here
),
and specify the number of replicas:
apiVersion: v1
kind: Service
...
---
apiVersion: apps/v1
kind: Deployment
...
spec:
replicas: 3
...
This tells Kubernetes that for our service we want to always have three instances running. Requests sent to the service’s listen port
(the nodePort
set to 30001) will be load balanced in a round-robin manner across instances. Moreover, when updates are deployed
Kubernetes will do this in a rolling manner, incrementally updating instances while ensuring that the prescribed number of replicas is
always available. Given that Kubernetes handles load balancing itself there is no need for any such configuration on your reverse proxy.
Completing the change is done by applying the manifest change:
kubectl apply -f myvalidator.yml
In case we plan to use the validator’s web user interface we need an extra level of configuration to ensure session affinity when selecting validator instance pods. Given that pods are internally managed by Kubernetes, this is not handled through a separate reverse proxy. In contrast, this is achieved by means of an Ingress, the mechanism foreseen by Kubernetes to manage inbound traffic external to the cluster.
The k3s distribution we installed comes by default with the Traefik Ingress Controller enabled as the implementation used to realise ingress rules, listening on the node’s port 80. An alternate configuration, or altogether different ingress controller, could be defined, however this default setup allows to proceed without additional steps.
To ensure load balancing and session affinity we need to define the ingress rule for our validator and adapt our service definition
(download the updated manifest file here
)
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: myvalidator-ingress
spec:
rules:
- http:
paths:
- path: /shacl
pathType: Prefix
backend:
service:
name: myvalidator-service
port:
number: 8080
---
apiVersion: v1
kind: Service
metadata:
name: myvalidator-service
annotations:
traefik.ingress.kubernetes.io/affinity: "true"
spec:
type: ClusterIP
selector:
app: myvalidator
ports:
- protocol: TCP
port: 8080
targetPort: 8080
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: myvalidator-deployment
labels:
app: myvalidator
spec:
replicas: 3
selector:
matchLabels:
app: myvalidator
template:
metadata:
labels:
app: myvalidator
spec:
containers:
- name: myvalidator
image: myorg/myvalidator:latest
imagePullPolicy: Always
ports:
- containerPort: 8080
readinessProbe:
httpGet:
path: /shacl/api/healthcheck
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
The manifest includes a new definition for myvalidator-ingress
specifying that any requests received by the ingress controller
whose path starts with /shacl
should be directed to service myvalidator-service
listening internally on port 8080. Apart from
this, the myvalidator-service
service definition is also adapted with two changes:
The service’s type is set to
ClusterIP
. This means that the service is no longer exposed on a host port, given that requests are now received by the ingress controller.The service is annotated with
traefik.ingress.kubernetes.io/affinity
set totrue
to ensure session affinity is maintained.
To complete the updates apply the manifest file changes:
kubectl apply -f myvalidator.yml
With our updated configuration, the service is now available at http://HOST/shacl/order/upload and, thanks to our ingress configuration, is load balanced across its pod replicas.
Step 4: Configure your reverse proxy
A reverse proxy acts as the entry point to your validator and is needed in all but the simplest setups. Use of a reverse proxy addresses the following points:
It maps publicly exposed paths to your internal configuration.
It rewrites internal hosts, ports and path mappings from responses to what should be externally exposed.
It enables load balancing across multiple validator instances.
It ensures session affinity needed by the validator’s web user interface.
A reverse proxy is not always required however. If you only use your validator internally the services that consume it will likely be able to access it directly (i.e. there is no need for mapping or rewriting). In addition, if you only run a single validator instance there is no need for load balancing and session affinity. Moreover, in case you have opted for a Kubernetes setup such aspects are already handled by your ingress controller.
In short, you can skip defining a reverse proxy for your validator if both these points apply:
You only access the validator internally.
You either run a single validator instance, or use multiple instances via Kubernetes.
In order to provide concrete steps this guide assumes the popular nginx server as the reverse proxy in use. The discussed principles and configurations are nonetheless simple and can be applied to any server implementation.
Proxy setup when running via JAR file or Docker
Note
Why use a reverse proxy in this case? Your reverse proxy is used to publicly expose your validator and load balance its instances.
When exposing your validator externally you need to map its location to a public path. In doing so you also need to ensure that any full internal addresses returned by the validator are adapted to their public counterparts. To do this include the following location block in your configuration:
location /shacl/ {
proxy_pass http://HOST:PORT/shacl/;
sub_filter "http://HOST:PORT/shacl/soap/order/validation" "https://www.myorg.org/shacl/soap/order/validation";
sub_filter_once off;
sub_filter_types *;
}
This configuration matches requests for paths starting with /shacl/
that are passed onto the service listening on the internal HOST
and PORT
. The sub_filter
definition is defined to cover the service’s SOAP API to rewrite the endpoint URL reported in its WSDL to its
public counterpart.
An alternative approach to using sub_filter
directives to correctly publish your validator’s the SOAP endpoint(s) is to configure
in the validator itself the base SOAP endpoint publishing URL. This is the full URL up to, but without including, the domain part
of the URL (“order” in our example). To do this you will need to set a system property or environment variable named validator.baseSoapEndpointUrl
with the URL as its value. For example, if we are running via JAR file we
would do this as follows:
java -Dvalidator.resourceRoot=/opt/validator/resources \
-Dlogging.file.path=/opt/validator/logs \
-Dvalidator.tmpFolder=/opt/validator/tmp \
-Dvalidator.baseSoapEndpointUrl=https://www.myorg.org/shacl/soap/
-jar /opt/validator/validator.jar
Having used this system property (or environment variable) the proxy setup is largely simplified by avoiding rewrites:
location /shacl/ {
proxy_pass http://HOST:PORT/shacl/;
}
In case we are running multiple validator instances we additionally need to enable load balancing and, if the validator’s user interface is enabled, session affinity. This is achieved with the following configuration:
http {
...
upstream backend {
ip_hash; # Only needed if session affinity is necessary
server HOST1:PORT1;
server HOST2:PORT2;
server HOST3:PORT3;
}
server {
...
location /shacl/ {
proxy_pass http://backend/shacl/;
sub_filter "http://backend/shacl/soap/order/validation" "https://www.myorg.org/shacl/soap/order/validation";
sub_filter_once off;
sub_filter_types *;
}
}
}
This configuration uses the backend
placeholder that is defined as a set of upstream servers and is replaced accordingly when requests
are treated. Note that adding ip_hash
ensures session affinity by selecting the upstream server based on the client’s IP address. If
access to the web interface is not needed this can be omitted resulting in a default round-robin selection.
Proxy setup when running via Kubernetes
Note
Why use a reverse proxy in this case? Your reverse proxy is used to publicly expose your validator.
As discussed when scaling your validator via Kubernetes, load balancing and session affinity (if needed) are handled at the level of Kubernetes via an ingress definition. This means that the only reason to define an external reverse proxy on top of this would be to manage the validator’s public exposure.
When exposing your validator externally you need to map its location to a public path. In doing so you also need to ensure that any full internal addresses returned by the validator are adapted to their public counterparts. To do this include the following location block in your configuration:
location /shacl/ {
proxy_pass http://HOST:PORT/shacl/;
sub_filter "http://HOST:PORT/shacl/soap/order/validation" "https://www.myorg.org/shacl/soap/order/validation";
sub_filter_once off;
sub_filter_types *;
}
This configuration matches requests for paths starting with /shacl/
that are passed onto the service listening on the internal HOST
and PORT
. The PORT
in particular, is either the service’s defined nodePort
if running without an ingress, or, if an ingress is
defined, the port on which the ingress controller is listening (port 80 by default).
The proxy’s sub_filter
definition is defined to cover the service’s SOAP API to rewrite the endpoint URL reported in
its WSDL to its public counterpart. An alternative approach to using sub_filter
directives to correctly publish your
validator’s the SOAP endpoint(s) is to configure in the validator itself the base SOAP endpoint publishing URL. This is
the full URL up to, but without including, the domain part of the URL (“order” in our example). To do this you will need
to set an environment variable named validator.baseSoapEndpointUrl
with the URL as its value:
...
spec:
containers:
- name: myvalidator
...
env:
- name: "validator.baseSoapEndpointUrl"
value: "https://www.myorg.org/shacl/soap/"
...
Having set this environment variable the proxy setup is largely simplified by avoiding rewrites:
location /shacl/ {
proxy_pass http://HOST:PORT/shacl/;
}
Step 5: Operate your validator
The current section lists commands you may find useful when managing your validator instance. These depend on the installation approach you followed.
Operations when running via JAR file
The following table summarises common actions for a validator running as a JAR file, via a systemd service:
Action |
Command |
---|---|
View validator logs |
The log file is available under the validator’s configured |
Stop validator |
For all service instances issue |
Start validator |
For all service instances issue |
To update a validator follow these steps:
Stop each of its service instances.
Download and extract the latest JAR file (if you want to update the validator’s software). Place the JAR file in each validator instance’s folder.
Replace the resources within each validator instance’s folder with their latest version (if you want to update the validator’s artefacts).
Start the validator’s services.
Operations when running via Docker
The following table summarises common actions for a validator running as a Docker container:
Action |
Command |
---|---|
View validator logs |
Issue |
Stop validator |
For each container issue |
Start validator |
For each container issue |
To update a validator follow these steps:
Pull the latest image via
docker pull IMAGE
.Stop and remove each container via
docker stop CONTAINER_NAME && docker rm CONTAINER_NAME
.Rerun each container by issuing its
docker run
command.
Operations when running via Kubernetes
The following table summarises common actions for a validator running via Kubernetes:
Action |
Command |
---|---|
View validator logs |
Issue |
Stop validator |
Edit the service’s manifest setting the |
Start validator |
Edit the service’s manifest setting the |
To update a validator issue kubectl rollout restart deployments/myvalidator-deployment
. Given that the deployment’s manifest defines
that each container’s image is pulled before it is started, this results in a rolling update of all pods based on the latest image (if one is available).
Summary
Congratulations! You have now set up a validator ready for production use. In doing so you selected the installation approach that suits your needs (JAR, Docker or Kubernetes) and installed its supporting tooling. You also considered what is needed to scale your validator for increased throughput and availability and manage its access via your reverse proxy.
See also
This guide focused purely on the installation of a validator for use in production. For details on how such a validator can be created check the respective validator guide:
The XML validation guide, for XML validators using XML Schema and Schematron.
The RDF validation guide, for RDF validators using SHACL shapes.
The JSON validation guide, for JSON validators using JSON Schema.
The CSV validation guide, for CSV validators using Table Schema.
In case your validator is a fully custom implementation, check out the GITB validation services documentation for further details.
If you plan to make a machine-to-machine integration with your validator you can consider using the GITB validation service API, a SOAP-based API shared by all validators. Towards this you can consider:
The specific inputs expected per validator type (see for XML, RDF, JSON and CSV).
The validator SOAP integration tutorial with step-by-steps instructions on how to build a Java client.
References
This section contains additional references linked to this guide.
Default paths per validator type
In the course of this guide several references are made in configurations and examples to paths that are exposed by the validator.
The example considered by this guide is an RDF validator with a default application context of /shacl
.
The following table lists the default contexts and paths per validator type.
Validator type |
Default context path |
SOAP API endpoint |
SOAP WSDL path |
Web UI |
REST API |
---|---|---|---|---|---|
RDF |
|
|
|
|
|
JSON |
|
|
|
|
|
CSV |
|
|
|
|
|
XML |
|
|
|
|
|
Note
Adapting the context path: The context path used by a validator can be set using property server.servlet.context-path
set as
a system property or environment variable.