Apache TinkerPop™

TinkerPop-Enabled Providers


Apache TinkerPop grows when 3rd party data systems and query languages utilize it. While Apache's distribution of TinkerPop does provide production ready implementations and tools, it is ultimately the larger ecosystem of providers that ensure TinkerPop's widespread adoption. There are two types of providers. The first are those that develop a (graph) database, (graph) processor, or (graph) analytics tool and want to offer their users TinkerPop-specific graph computing features. The other type of provider are language designers that have a (graph) query language they wish to have execute on the Gremlin traversal machine and thus, against any TinkerPop-enabled graph system.

Data System Providers

When a data system is TinkerPop-enabled, its users are able to model their domain as a graph and analyze that graph using the Gremlin graph traversal language. Furthermore, all TinkerPop-enabled graph systems integrate with one another allowing providers to easily expand their system's offerings as well as allowing users to choose appropriate graph technology for their application. Sometimes an application is best served by an in-memory, transactional graph database. Sometimes a multi-machine distributed graph database will do the job. Or perhaps the application requires both a distributed graph database for real-time queries and, in parallel, a Big(Graph)Data processor for batch analytics. Whatever the application's requirements, there exists a TinkerPop-enabled graph system out there to meet its needs.

At an abstract level, every data system is ultimately a "graph system" because every data system supports the representation of "things" (vertices) and their respective relationships to one another (edges). For instance, a relational database may have a people-table, where the rows denote person vertices and the columns denote properties of those individuals (e.g. their name and age). Moreover, that relational database may also have a knows-table, where two columns reference primary keys in the people-table and the remaining columns denote properties of those relationships (e.g. creation timestamp, strength of friendship, etc.). TinkerPop enables any data system to expose its implicit graph structure within the lexicon of vertices and edges. From there, what differentiates each TinkerPop-enabled system are the various time/space-tradeoffs that were made for representing their "graph" in-memory, on-disk, and across a multi-machine compute cluster. TinkerPop users have the luxury of choosing a graph system based on those design choices that are important for their application. Moreover, they only need to concern themselves with learning the Gremlin traversal language as every TinkerPop-enabled graph system supports Gremlin.

It is (relatively) easy for a data system to become TinkerPop-enabled. There are two interface packages that need to be implemented with one of them being optional. Once implemented the Gremlin traversal machine can talk to the provider's system and thus, so can any of their users' Gremlin traversals as well as any existing tools/technologies that have been designed to interact with a TinkerPop-enabled system.

  1. The Graph (required): These foundational interfaces define the semantics of the operations on a graph, vertex, edge, and property. Once implemented, the provider's data system can immediately be queried using Gremlin OLTP. However, providers may want to leverage a collection of provider-specific compiler optimizations called traversal strategies which can leverage their system's unique features (e.g. global indices, vertex-centric indices, sort orders, sequential scanners, push-down predicates, etc.).

  2. The GraphComputer (optional): All OLAP-based graph processors must implement the primary graph interfaces mentioned above as well as a set of parallel-processing message-passing interfaces. However, it is possible for a data system to only implement the primary graph interfaces and still provide OLAP support to their users by integrating their system with an existing GraphComputer implementation such as, for example, SparkGraphComputer (provided in Apache's distribution of TinkerPop).

Finally, there are other tools and technologies that the provider can leverage from TinkerPop such as Gremlin Server, Gremlin Console, and the like. The purpose of Apache TinkerPop is to make it easy for providers to add graph functionality to their system and/or to build a graph system from scratch and immediately have a query language, server infrastructure, metrics/reporting integration, cluster-based analytics and more.

Information on how to build implementations of the various interfaces that TinkerPop supports can be found in the Provider Documentation.

TinkerPop-Enabled Graph Systems


Apache TinkerPop is always looking to point users to graph systems that are TinkerPop-enabled. Please read Apache TinkerPop's provider listing policy for adding new projects to the listing below. The listing is intended to help users identify TinkerPop-enabled graph systems and does not constitute an endorsement by Apache TinkerPop nor the Apache Software Foundation.

Amazon Neptune is a fast, reliable, fully-managed graph database service that makes it easy to build and run applications that work with highly connected datasets.
Azure Cosmos DB™ is Microsoft's globally distributed, multi-model database service for mission-critical applications. Azure Cosmos DB provides turn-key global distribution, elastic scaling of throughput and storage worldwide, five well-defined consistency levels, and guaranteed high availability, all backed by industry-leading SLAs. It is multi-model with support for the Gremlin graph traversal language.

DataStax Enterprise Graph™, part of DataStax Enterprise's multi-model platform, is a real-time graph database built for cloud applications that need to manage complex and highly connected data. Built on the foundation of Apache Cassandra and Apache TinkerPop, DataStax Enterprise Graph delivers continuous uptime along with predictable performance and scale, while remaining operationally simple to manage.
HugeGraph is an Apache2 licensed high-speed, distributed and scalable OLTP and OLAP graph database, fully optimized to store hundreds of billions vertices/edges and analyze complex relationships between high-connected data. It is modeled as property graph and compatible with Apache TinkerPop and Gremlin. Due to high efficiency, availability and scalability, HugeGraph attracts a large amount of users and has been widely used in social network analysis, fraud detection and knowledge graph.

GRAKN.AI™ is a distributed knowledge graph that brings knowledge ontologies and transactional data together to enable intelligent querying of data. Querying is performed through the language: Graql, a declarative, knowledge-oriented graph query language for retrieving explicitly stored and implicitly derived information, as well as to perform graph analytics and automated reasoning.
IBM® Compose for JanusGraph provides a fully-managed, highly-available, and production-ready JanusGraph on AWS, GCP or IBM Cloud. Deployed in minutes, every JanusGraph deployment on Compose is built with highly available storage and graph engines. The JanusGraph Storage engine is a cluster of the Scylla database. As usage increases or application requirements change, users can vertically or horizontally scale the JanusGraph Engine and Storage to increase throughput or storage.

JanusGraph® is an Apache2 licensed scalable, distributed graph database optimized for storing and querying graphs containing hundreds of billions of vertices and edges distributed across a multi-machine cluster. JanusGraph is a transactional database that can support thousands of concurrent users executing complex Gremlin traversals in real time. JanusGraph also provides an in-memory, compression-based OLAP processor as well as integrates with Apache TinkerPop's Spark OLAP processors.
KeyLines™ is an Apache TinkerPop and Gremlin compatible JavaScript SDK for quickly and easily building powerful, custom and scalable graph visualization applications. The KeyLines SDK offers a rich library of functionality to help you visualize and explore the data in your graph database, including graph layouts, social network analysis measures, filtering, temporal graph visualization and geospatial graph analysis. It allows the visualization of complex graph data at scale.

Linkurious™ is a browser-based graph visualization software to search, explore and visualize connected data. It is compatible with Apache TinkerPop and thus, any TinkerPop-enabled graph system. Linkurious provides enterprise-ready security (authentication, access rights, audit) and flexibility (API, linkurious.js JS graph visualization library) to help software architects successfully deploy graph capabilities within their organizations.
Neo4j™ is the most widely used open source, transactional graph database with a large active user and customer community. Because of its scalability and ease of use, Neo4j is applied in a wide variety of use cases from fraud detection, access control to recommendation and investigative journalism. Along with the openCypher graph query language, Neo4j also supports Apache TinkerPop and currently serves as its OLTP reference implementation.

OrientDB™ is an open source distributed graph database with native support for Apache TinkerPop and the Gremlin graph traversal language. OrientDB handles relationships by using persistent pointers, rather than expensive join runtime operations. This guarantees a fast, constant O(1) time for traversing, no matter the database size. Furthermore, OrientDB is not only a graph database, but a multi-model database able to manage documents, keys/values, objects, full-text and spatial data.
Stardog™ is a graph database optimized for enterprise data unification. It supports both semantic graphs, via RDF, SPARQL, and OWL, as well as property graphs via Apache TinkerPop and Gremlin--it's the only graph database that supports both models over the same database, simultaneously. Stardog also supports hybrid data unification architectures, seamlessly blending data warehouse, system of record, and virtual query strategies. Stardog is suited for enterprise data silo challenges.

Tom Sawyer Perspectives™ is advanced graphics-based software for building enterprise-class data relationship visualization and analysis applications. It is a complete Software Development Kit (SDK) with a graphics-based design and preview environment. Tom Sawyer Perspectives combines visualization, layout, and analysis technology with an elegant platform architecture. Tom Sawyer Perspectives enables interaction with graph database systems via Apache TinkerPop.
 

Query Language Providers


With the growth of NoSQL, for which graph databases are a subclass, many new database query languages have been developed. SQL has always been the industry standard, but now there exists others such as CQL, Datalog, and XQuery. Even in the graph database space there is SPARQL, Cypher, GraphQL, and of course, Gremlin. Much like the Java virtual machine is a host to multiple programming languages including Java, Scala, Groovy, Clojure, JavaScript, etc., the Gremlin traversal machine is a host to multiple query languages. The Gremlin traversal machine's instruction set ensures Turing Completeness and as such, any query language can compile to execute on the Gremlin traversal machine. There are three types of language designers. Below, each type will demonstrate the "same traversal" expressed in different languages. Ultimately, they all compile to the Gremlin traversal below which computes the average rating for the projects created by Gremlin's friends.

g.V().has("person","name","gremlin").
  out("knows").out("created").
  hasLabel("project").
  values("stars").mean()


Distinct query language: Query languages such as SQL and SPARQL are significantly different from Gremlin in that they require a special purpose compiler in order to generate a traversal. For this reason, SQL and SPARQL Gremlin compilers currently exist. Note that, within reason, Gremlin compilers do not need to concern themselves with an optimal compilation (only a semantically correct compilation) as the Gremlin traversal machine will leverage traversal strategies for both compile-time and runtime optimizations. Moreover, the language designer can rest assured that queries in their language will be able to evaluate as either an OLTP or OLAP query. The example on the right is a SPARQL query that determines the average rating for Gremlin's friends' projects. It is because of the Gremlin traversal machine, that that SPARQL query can immediately execute over Spark (for instance).
SELECT AVG(?stars) WHERE {
  ?a v:label person .
  ?a v:name "gremlin" .
  ?a e:knows ?b .
  ?b e:created ?c .
  ?c v:label "project" .
  ?c v:stars ?stars
}


Gremlin language variants: There are various Gremlin language variants such as Gremlin-Python. These languages generate Gremlin bytecode utilizing the same naming/style-conventions as Gremlin-Java. However, where appropriate, they can deviate from convention in order to take advantage of the unique expressive qualities of the host language. For instance, Gremlin-Python supports index slices, attribute access, etc.
g.V().has('person','name','gremlin').
  out('knows').out('created').
  hasLabel('project').stars.mean()


Domain specific languages: A users's application logic typically represents its domain in terms of real-world objects, not "vertices" and "edges." While most application developers will become comfortable writing their traversals in the vertex/edge-lexicon of Gremlin, it may be desirable to create a domain specific language for, perhaps, business users. To do so is simple. The language designer extends Traversal and interacts with an underlying GraphTraversal exposing "high-level" steps that may ultimately be composed of a complex sequence of "low-level" vertex/edge-steps. A major boon is that the DSL designer does not have to worry about compiler optimization as the "low-level" GraphTraversal representation will ultimately be subjected to traversal strategies prior to OLTP or OLAP evaluation. The example on the right is a hypothetical SocialDSL that allows users to query their graph from the perspective of people, projects, etc. and is thus bound to that graph's particular social data schema.


SELECT(Average.of(Stars)).
FROM(Projects)
WHERE(Created.by(Friends.of("gremlin")))

TinkerPop-Enabled Query Languages


Apache TinkerPop is always looking to point users to TinkerPop-enabled graph query languages. Please read Apache TinkerPop's provider listing policy for adding new projects to the listing below. The listing is intended to help users identify TinkerPop-enabled compilers and languages and does not constitute an endorsement by Apache TinkerPop nor the Apache Software Foundation.

Gremlin-Groovy represents Gremlin inside the Groovy language and can be leveraged by any JVM-based project either through gmaven or its JSR-223 ScriptEngine implementation. It also serves as the Gremlin Console language.
Gremlin-Java represents Gremlin inside the Java language. Gremlin-Java is considered the canonical, reference implementation of Gremlin and is the primary compiler for all lambda-free bytecode due to its speed relative to other script-based, JVM variants.

Gremlin-Python represents Gremlin inside the Python language and can be used by any Python virtual machine such as CPython. Gremlin-Python traversals translate to Gremlin bytecode for RemoteConnection execution (e.g. Gremlin Server).
Gremlin.Net represents Gremlin inside the C# language and can be used by any .NET-based project. Gremlin.Net traversals translate to Gremlin bytecode for RemoteConnection execution (e.g. Gremlin Server).

Gremlin-Scala is a Gremlin language variant that uses standard Scala functions, provides a convenient DSL to create vertices and edges, ensures type safe traversals, and incurrs minimal runtime overhead by only allocating instances if absolutely necessary.
Ogre is a Gremlin language variant for Clojure. It provides an API that enhances the expressivity of Gremlin within Clojure, it doesn't introduce any significant amount of performance overhead, and it can work with any TinkerPop-enabled graph database or analytic system.

SPARQL-Gremlin is a compiler used to transform SPARQL queries into Gremlin bytecode. It is based on the Apache Jena SPARQL processor ARQ, which provides access to a syntax tree of a SPARQL query.
SQL-Gremlin compiles ANSI SQL to Gremlin bytecode and is useful for connecting JDBC reporting/business intelligence tools to any TinkerPop-enabled graph system.


Related Resources