Beyond SQL: 8 new languages for data querying

SQL has dominated data querying for decades. Newer query languages offer more elegance, simplicity, and flexibility for modern use cases.

mainframe servers in the cloud

For the last three decades, databases and Structured Query Language (SQL) were almost synonymous. Anyone who wanted to retrieve information from a database had to learn SQL. Anyone who wanted to care for a database or take a job as a database administrator needed to master its nuances.

The language itself is a throwback, an opportunity to think and code as the mainframe users did. While the rest of the world embraced lowercase letters, SQL users continued to type words like SELECT or WHERE. Even today, few seem to care that some Tik Tok users mock them or ask why they are shouting. If ALL CAPS was good enough for punchcard deck jockeys wearing ties and short sleeve shirts, well, it's still good enough for today’s remote worker wearing teddy bear pajamas.

But SQL's hold on data retrieval is slipping. New databases are emerging, and some speak entirely new languages. It’s not that SQL is becoming less popular. If anything, more SQL is being written than ever. It’s just that the world of data storage is exploding even faster, and some of that growth is spurring an urge to experiment and branch out.

This article introduces eight newer approaches to retrieving data. In some cases, the innovations are merely cosmetic. Developers have updated SQL's syntax to make it a bit neater and easier to read, so it's less jarring to shift gears between writing code for the browser and retrieving data. The creators of these tools emphasize that the underlying structure is essentially the same as SQL. It will still be easy to learn. Don’t worry.

Other tools invite us to think in a completely different way. Databases that store their bits in graphs or in a time series offer new paradigms for how programmers specify what they want to find.

Not all of these options will be better than SQL for what you need to do. Not all of them will capture the possibilities you are seeking. But all offer a chance to think differently about that sea of bytes on some server, just waiting for you to find a way to spell out what you need.

GraphQL

The name of GraphQL is a bit confusing because it’s not really a language designed to exploit all of the possibilities from graph databases. It’s more like an elegant shorthand for querying data that’s stored in nested format similar to JSON. The query is just a quick description of what the results should look like. The back end looks at this list of fields, which can come with restrictions on the values, and tries to find the results that match. Where SQL specifies how the database should complete a request, GraphQL users simply provide a list of fields. Some call it “query by example.”

The language is a natural match for some JSON databases, but GraphQL is also growing more popular for searching relational databases with a tabular schema. Smart back ends can translate the nested requests into a pattern of JOINs that fits the schema.

The original language began as an internal project at Facebook, but after it was released as an independent open source project, others started developing GraphQL back ends. There are now versions written in all of the major languages and many modern experimental languages, too.

PRQL

If you naturally think of software as a pipeline or an assembly language, then you might like PRQL, which stands for Pipelined Relational Query Language (pronounced "Prequel"). Queries in this language are structured as a chain of small commands. Taken together, the commands produce a result with just the data you want.

Similar to many modern programming languages, the mental model of a query takes a functional approach. Simple features like variables reduce repetition and simplify the flow. The results from one line are fed into the next line in a long chain. If you want to remove one step, you can often just comment out that line and the rest of the pipeline will still work.

PRQL's code is written in Rust as a transpiler for converting PRQL into SQL. The basic structure is meant to be extensible, so you can add more abstractions to suit your use case. This ease of experimentation ensures that the language will rapidly evolve.

WebAssembly

Many developers think of WebAssembly (abbreviated Wasm) as a tool for creating fast applications that run in web browsers. When Redpanda began building a data streaming tool to supersede Kafka, they wanted to add a mechanism for not only delivering the data but occasionally transforming it along the way. WebAssembly was their choice.

Redpanda acts as a ledger by creating a data stream that’s immutable and ordered. Events are appended and programmers can tap into the stream at any point in the past. Most will start with newly created events, but some may start in the past to create historical aggregations.

WebAssembly, of course, is much more capable and low level than even the stored procedures that are part of some databases. Not all developers want to write bit-banging, byte-level code. But the option opens up the data stream to elaborate transformations that go well beyond what is possible with SQL.

GQL

Graph Query Language, or GQL, is a proposed standard that merges similar declarative languages like Cypher, PGQL, and GSQL. Developers create queries by specifying a particular model for a set of nodes, and then the database is responsible for finding matches. GQL works with more complex property graphs that allow pairs of nodes to share several different connections.

The standard is under active development. Currently, the best implementations are research tools that aren’t intended for long-term deployment.

Gremlin

One of the original languages for searching a graph, Gremlin asks for a set of steps for searching through the connections between nodes. Some call it a “path-based” or “graph traversal” language. Each query is built on steps, and each step can involve either mapping the current node, filtering a list, or somehow tabulating the result.

The language is often just a starting point. Some, for instance, are expanding Gremlin by embedding a Python interpreter inside it so that queries can include Python code. Others are embedding Gremlin inside a standard programming language like Java so that programmers can tap the power of Gremlin from inside that language.

Gremlin was first built for Apache’s TinkerPop project and it’s been adopted by major transactional graph databases like Amazon's Neptune and graph processing frameworks using Apache Spark or Hadoop.

N1QL

Over the years, Couchbase has searched for the best way to query general documents. In the beginning, the query was written as a JavaScript function that was handed to the database for execution. It was a nice, general solution that sometimes took forever to generate a result, but it required programmers to think a bit differently.

N1QL (pronounced “nickel”) is designed to make it easier for SQL natives to work with the JSON objects that might be stored in Couchbase. A basic query has several sections specified by the keywords SELECT, FROM, and WHERE, just like SQL. The details of specifying the path into the data structure from which the data will come is adjusted and adapted to the nested world of JSON objects.

To encourage experimentation, N1QL offers a querying workbench with a visual interface for testing and refining queries. Couchbase also offers a generic full-text search option that runs independently for queries seeking text words instead of structured data.

Malloy

The problem with SQL, according to the creators of Malloy, lies in the syntactic details. Expressing even the simplest query takes time because the language is verbose and full of hidden performance traps. So, they created a modern programming language with natural defaults and simpler syntax that can be compiled to SQL, so no one needs to retrofit a stock database.

The result is a syntax that resembles a more powerful GraphQL. A query is more like a model or vision for the result, including any restrictions, matches, or defaults. Malloy handles some optimization in the background. Smarter JOINs, for instance, can be generated automatically to avoid chasm and fan performance traps. Subqueries can be aggregated to save time. Indices are also added as needed. As a result, writing queries feels more like writing modern code, with punctuation serving to keep the structure succinct.

Malloy's open source core is built out in TypeScript for including in Node.js code. A VS Code plugin simplifies development.

Basis

Most query languages are tied directly to a particular database. Basis is building more of a pipeline that can draw from a variety of sources before filtering them with a mixture of SQL and Python. At the end of the pipeline are exporters that deliver the data to a variety of standard options, which range from running code to AI algorithms to charts and dashboards.

Developers are already building pipelines like this in their own code, and many projects depend on similar structures. Basis offers a prebuilt option that can be customized in more elaborate ways. Inputs range from standard database queries to API taps to custom Python code. The transformers aren’t limited to SQL's basic WHERE clauses because you can write Python code that does more than filter as the data flows down the pipeline.

Basis is just one example of newer data pipeline tools that are opening up the querying process to draw from more than one source, filter with more than one language, and deliver the data in more than one form. It’s a vision of being able to take data from almost any source and deliver it to almost any consumer.

Copyright © 2022 IDG Communications, Inc.