What's wrong with SPARQL?
Thank you for posting this and opening this discussion. I'm in favor of a unified standard that combines the most useful/desired features from each language.
Having stumbled onto this page (indeed, site) via a link from the openCypher Google group, I was immediately curious about when it was posted (helps put the doc into context), and had to infer the answer from the age of the comments here.
Could you please add a date (including year) to the "manifesto" and other documents you've posted here? Thanks!
How about creating a standard where binary graph operators are included, and roll up and drill down operations are supported?
I'm in favor of creating a standardized common language for querying Graph Databases like SQL for querying RDBMS.
This is really a huge advantage to be able to reuse the same logic, technology, and skills on various vendors solutions. What made SQL so popular and widespread, was its ability to run on all the RDBMS that are SQL ANSI compliant. Nowadays, Graph Databases, and more generally, all new IT solutions related to Big Data technologies suffer from a lack of standards and reusability, which makes it very difficult to :
- decide which technology is suitable for one's needs
- determine if it is sustainable or not, and if it will die in the next 2 years
- find skilled people on the chosen technology
- promote carreer growth and the ability to evolve on different IT platforms with a minimum of reusability of one's skills
On the other side, we all know that may lead to choose one platform / vendor among others are the vendor specific features that make one platform / technology more accomplished than its competitors one. However, I don't believe that standardization is a brake for this. Again, the SQL world is full of successful examples. Most popular RDBMS like ORACLE, TERADATA, SQL Server, DB2 / UDB, etc. are all SQL ANSI compliant and provide vendor specific extensions and features that enable rapid adoption of the platform and benefits from vendor specific Advanced features (e.g. : the power of MPP on TERADATA databases). By the way, it often happens that vendor specific features inspire the emergence of new extensions of the standards.
I am a heavy user of both Cypher and graphQL. Every time I am writing cypher, I just wish i can write graphQL syntax and unite those two worlds.
I like the philosophy of the Cypher too. I think, isn't to early to create any GQL standard? But there are more similarities known in the IT history: e.g. HTML and CSS, SQL and Hibernate and JPA query language, modules in Java.
On the other hand: May be different languages goes very different ways - May be it would be to late create some standards. - May be different languages will show some interesting concepts to us.
So questions for specialist who use graphs every day.
An interesting competitor in this space is Dgraph's GraphQL+-, which extends GraphQL to make it suitable for actually working with graphs. I am not part of their team, but I would love to see you coordinate with Dgraph on a shared language.
Thanks for calling out this issue to the public.
Having been in the graph space many years, I have some first hand experience that want to share.
In my opinion, pattern match is only a subset of graph query workload. A standard query language should be more expressive and addressing more query workload. Otherwise, it's complementary to relational model, which is severely limited its applicable cases.
Thanks for the continuing comments. Here are a few points in response, starting with your latest one, Sam.
“The “three languages” are all declarative query languages, with well-bounded “give me/do what I want” statements using MATCH clauses (or MERGE or DELETE in Cypher, which has update semantics). The sets of records that are created by MATCH drive subsequent clauses. This means that there is no explicit flow control, and there are query units of work with general characteristics that can be optimized by an implementation. Gremlin has a traversal API which proceeds by “steps” which are akin to functions with additional flow control features: it is a largely imperative language. The Cypher for Gremlin project shows how you can layer declarative Cypher over Gremlin. I view GQL and Gremlin as friendly alternatives, because they are very different styles of language. It’s worth noting that the number one user-requested feature for CosmosDB Graph API is Cypher support, if you check Azure feedback. It’s just plain easier to program many standard queries in a language that operates at the SQL level. This topic might make a good additional page on the site. The Neueda Labs team who develop Cypher for Gremlin might consider writing more extensively on this topic.
The GQL idea is based on the remarkable similarity of the three languages, at the heart of which is ASCII Art path pattern matching and patterns for e.g. MERGE or CREATE. There is no way a fusion of the three languages would lose this powerful, popular characteristic, in my view. It was adopted by GraphFrames in Spark, and is the basis of the MATCH clause for proposed read-only extensions to SQL. But even smarter ASCII Art, like in PGQL — yes!
Why not just extend SQL? The first point is: graph databases like Amazon Neptune or CosmosDB or Neo4j don’t have and don’t need a relationship to SQL. Graph data processing can be a “thing in itself”.
Having been involved for the past year in SQL property graph extensions work in ISO or its U.S. national equivalent, it’s pretty clear that SQL will quite naturally incorporate pattern matching as a way of producing a table, and that with some effort you can map tables and views into a graph object in the catalog.
The jump to a conposable query language modelled around graphs and their elements like paths, edges and nodes would require quite deep changes to SQL’s type system and would produce a complex dual data model and dual concepts of tabular and graph sub-query chains. That feels very forced, taking SQL from being a very complex language built around a core relational model to becoming a super-complex language built around two different core models. Which means that very few vendors would implement “all of SQL containing all of GQL”.
Complexity and backward compatibility with SQL would also play out in slowing down GQL for graph data users. Roughly speaking, I reckon we can get to full GQL as an official standard, with good will and collaboration, at the same time as, or very soon after, limited read-only property graph extensions in SQL:2020.
Which relates to the fact that SQL has so many vendor variants. Is it reasonable to call SQL a standard? If we create a specific, native graph query language then we increase the chances of a true standard, perhaps broken into profiles (like read-only, or read-create for immutable data, or full CRUD for OLTP, to avoid putting a vast implementation burden on projects and products that use GQL).
I wouldn’t go too far in criticising SQL, mind you. We in the property graph data world are not finding it easy to turn three dialects into one standard written language (to use an analogy). Hence the GQL Manifesto.
SQL has a core going back to SQL:99, perhaps, which is the basis for huge advantages in terms of common skills, and considerable portability (and I know, I’ve been there on how hard it is to switch databases!). What we don’t need is the equivalent of SQL vs QUEL,
Alastair, I read your Open letter to database industry a few days ago. Any reason Gremlin was not listed as one of the main graph database query languages in that open letter? It seems to be quite popular among major cloud vendors. I see that you already have a comparison on Cypher, PGQL and G-Core, it would be very helpful to see Gremlin added to that comparison. Thanks.
we all one, I love graph, we should be united under the same language
we are all one. I love graph, we should be united under the same language.
I was just thinking that the many variants of SQL might be a good example of why it's good to keep the standard to a minimum. I think it's more important to get the basic abstractions right, than the exact syntax.
Also, I think query expressions are here to stay (think LINQ, Spark SQL, Flink Table API), which also emphasizes the need for strong abstractions. And also, that the read part is more important to standardize than the write part.
Good effort! Hope it works out!
A standard GQL is good, but it should be made very easy, no one should hijack the process, using the cypher style of making the syntax look like you are actually drawing a directed graph will be nice. Thanks
Coming from a SQL background Cypher felt a natural transition. I've heard SQL described as the narrow waste. NOSQL always has been Not Only Relational rather than Not Only SQL but it's taken the community a long time to recognise the value of a well thought out abstraction language.
The question is whether a graph query language should be a thing in its own right or an extension to the SQL standard.
Thanks for all the comments (and all the votes).
I just wanted to point out that the Cypher community has been discussing and working on designs for composable graph queries since the first openCypher Implementers Meeting in February 2017 (see http://www.opencypher.org/event/2017/02/08/event-ocim1/), where there were discussions on improvements including "Named graphs, multiple graphs and views; Improved schema and constraints; Stronger typing; Conjunctive regular path queries (CRPQs)". All the slides of the talks are at that link.
Two Cypher language designers, Stefan Plantikow and Tobias Lindaaker, co-authored the G-CORE paper as part of the LDBC co-operative effort on graph query languages, which codifies this and other important advances. .
https://github.com/opencypher/cypher-for-apache-spark is an OSS project that implements composable graph queries and multiple graphs, using provisional new Cypher syntax, and this feature is going to be discussed again at next week's Copenhagen face-to-face openCypher meeting (open to all): http://www.opencypher.org/event/2018/05/22/ocim4/.
It's this kind of convergence of proposed and evolving features across G-CORE, Cypher and PGQL that led to this proposal.
I like Ascii art pattern matching and many ideas of Cypher. But queries are not composable as in SQL. So Cypher is not the last word. I never read about G-CORE before, but after reading this text, I found https://arxiv.org/pdf/1712.01550 and it seems quite interesting. I voted with yes, because it's alway good if vendors develop standards. One the other hand this not a vote for a Graph language monoculture.
Cypher should be the standard but having a common Graph Query Language would be very beneficial. I hope Oracle and other SQL big companies don't try to impose they position trying to patch and maintain their products. They have their time and now its time to explore another horizons.
As said by Christian DiMare, calling SQL a standard language is a good way to try to forget all SQL variants.
Actually, I like Cypher, a lot more than other graph db languages.
I hope that a GQL will not replicate the mistakes done with PL/SQL, giving us a simple and practical language that made us able to process billions of nodes in a very, very fast way, providing all elements to do it.
GQL may offer a standardized way to express widespread if not necessarily universal concepts, a single way to expressively and concisely perform well-defined operations, and common definitions for graph traversing operators. It is up to those working on that language to avoid the pitfalls that have plagued ISO SQL.
If the language designers produce a true advance in graph languages, the marketplace will support it. If not, it will likely fail. Stand or fall on its own, the effort to make this language work strikes me as well worth the cost.
In general, having a common Graph Query Language would be very beneficial.
However, I do see a challenge to include the more flexible modeling paradigms databases like ArangoDB or OrientDB provide, e.g. very deeply nested JSON Documents, direct 'Links' or ID references between Documents/Vertices without using an explicit Edge etc.
SQL itself is solid, though a lot of dialects and DB specific extensions are sprouting up, reacting to the newly added capabilities like JSON query & modification.
Hence having a common core query language with well defined extension points/modules to allow emerging capabilities to be added, e.g. timeseries/temporal data handling seems to me to be better than having one 'catch-all' standard.
I love Cypher. Cypher should be the standard but making a big exercise to provide not just a language to query the graph. For me, it is more powerful a language that besides query the graph information, give me powerful elements to process the graphs with high performance. I have to clarify, I am not asking for a kind PL/SQL, no. We should not make the same mistake again. I am asking for elements in the language that makes the most commons and hard tasks in data processing in a very good performance. For example, I would like to process millions of nodes very fast, but without to have to make a Ph.D. in performance, because of the database does not provide the elements to do it.
As much as a standard query language sounds practical, it's severely limiting. Trying to call SQL a standard language is disingenuous: there are many variants of SQL with different nuances, and calling for a GQL standard is limiting to new technologies that MAY do things better because they do things differently. It's too early in graph-prioritized storage to say what GQL should be like; however, I like the paradigms in Cypher. Further, while this "standard" MAY be solid for current graph technologies, a rise in hyper-graph-prioritized storage solutions might invalidate any strides toward standardization.