Data Gateway Integration

Introduction

The system offers two main APIs for integration into existing environments. The webVis-API allows the application developer to visualize different resources based on simple URI (can be URN or URL), and the Data Gateway API provides pull-access to the data based on a single URL per URI. The system automatically translates the URI to URL based on local configuration settings.

Resource Specification

A URI can either be a URL or a URN. URNs are mapped to URLs by the system. For example:

The URL must be valid location with a known schema (e.g. HTTPS). The application also must provide authentication tokens if necessary. To support this, the system can transparently pass single-sign-on tokens (e.g. Cookies) along.

The URN can be an installation specific name which is automatically mapped to a configured location while processed. Exact mappings can be configured within the system via a mapping table.

Configuring a URN mapping rule instead of directly using URLs allows for decoupling of client applications interacting with the system and backend data gateway locations. This allows future relocation or replacement of backends without needing to modify client applications.

Data Gateway Interface

The Data Gateway Interface is a simple but powerful abstraction to access data and builds on best practice webserver techniques. It maps the webVis URI to a single URL which can be configured to specific needs. The resource URL can be any valid URL container. For example:

The interface tries to achieve the following goals:

  • Minimal interface to pull data from any data backend

  • Based on standard HTTP techniques and best practices

  • Standard Apache server should be able to provide sufficient functionality

  • Provide single interface to all persistent and dynamic resources

  • New backends can be added without changes to instant3Dhub components

Resource Concepts

Several key concepts are supported by the infrastructure and define how cached resources are handled by the system.

A resource can be:

  • static, i.e. it never changes. Static resources are never checked for newer versions.

  • dynamic, i.e. it can change over time. How often changes are expected to occur can be specified to control update checks.

  • public, i.e. personalized authorization checks are skipped when providing clients with caches. This can only be set explicitly, as resources are protected by default.

  • protected, i.e. personalized authorization must be done against the source when proving clients with caches. This is the default behavior and must be explicitly overridden.

Each of these properties can either be defined per URN, or set in the HTTP response headers of the source, as described in the next few sections.

The system accesses resources in two separate ways:

  • Retrieving the resource itself to generate a cached representation.

  • Retrieving properties about the resource, such as whether a certain user has the appropriate viewing rights.

It is important to make this distinction, as the resource will not need to be retrieved very often, while property requests can happen very frequently.

Accessing Resources & Authorization

Resources are accessed via HTTP GET and authorized via HTTP HEAD requests. The system assumes that a HEAD request is faster than a full GET request, as HEAD requests are made more frequently, and are also used to determine whether a cached representation is outdated, or whether a user is authorized to view a resource.

  • GET: Requests a representation of the specified resource (header + body)

  • HEAD: Expected to contain the same information as GET, only without a body

For both HEAD and GET requests, Single-Sign-On tokens in the form of cookies or other HTTP headers are supported. Both request types use the token of the user initiating the request.

Response Codes

Standard HTTP response codes are supported. The most essential are the following:

  • 200: OK. This results in a cache being generated, or a cached representation being delivered to the user.

  • 202: Data not yet available. No cache is generated for this response. Future requests will try again.

  • 30X: Redirect. Redirects are only followed for GET requests, never HEAD requests. A redirect for a HEAD request results in unauthorized behavior for users.

  • 404: Not Found. No cache can be generated for this response, and it is communicated to the user. However, future requests will still try again.

  • 401: Unauthorized. No cache can be generated (GET request), or user does not have authorization (HEAD request). This is communicated to the user.

  • 403: Forbidden. Same behavior as 401.

For more details see the technical API.

Response Headers

In addition to the response code, response headers can be used to give additional hints about the data itself. The following headers can be used:

  • ETag: Stored on initial GET requests and compared on every HEAD request. If the value does not change, the cached resource is considered up-to-date. The infrastructure will not try to update the cached resource.

  • Content-Type: Used on initial GET request to signal the data format to the system. This can be used to override a format configured for a URN or to prevent instant3Dhub from trying to guess a type based on a file extension. Known types can be found on https://www.threedy.io/scalability/anydata under Service Negotiation Key.

  • Content-Disposition: Used as a fallback to the Content-Type on GET requests. Defines a filename for the data, of which the file extension is used by the system as a hint for selecting an appropriate loader.

For more details see the technical API.