Why do we need an API in the first place?
The most obvious reason is that BigQuery can't provide a sub-second query response time, meaning that an application that talks directly to BigQuery will have a suboptimal user experience. Also, BigQuery bills you by the amount of transferred data, so if you have a popular app, you might suddenly know about that from a billing alert.
Also, direct interaction with BigQuery means that you'll need to write SQL queries. There's nothing wrong with SQL; it's a great domain-specific language, but having SQL queries all over your codebase smells like a leaky abstraction — your application layers will know about column names and data types in your database.
So, what are we going to do? In this tutorial, we'll use Cube.js.
Cube.js is an open-source analytical API platform, and it allows you to create an API over any database, BigQuery included.
Cube.js provides an abstraction called a "semantic layer," or a "data schema," which encapsulates database-specific things, generates SQL queries for you, and lets you use high-level, domain-specific identifiers to work with data.
Also, Cube.js has a built-in caching layer that provides predictable, low-latency response query times. It means that an API built with Cube.js is a perfect middleware between your database and your analytical app.
Let's try it in action.
npx cubejs-cli create bigquery-public-datasets -d bigquery
Now you have your new Cube.js project in the
bigquery-public-datasets folder containing a few files. Let's navigate to this folder.
The second step is to add BigQuery and Google Cloud credentials to the
.env file. Cube.js will pick up its configuration options from this file. Make sure your
.env file looks like this:
# Cube.js environment variables: https://cube.dev/docs/reference/environment-variables CUBEJS_DB_TYPE=bigquery CUBEJS_DB_BQ_PROJECT_ID=your-project-id CUBEJS_DB_BQ_KEY_FILE=./your-key-file-name.json CUBEJS_DEV_MODE=true CUBEJS_API_SECRET=SECRET
Here's what all these options mean and how to fill them:
CUBEJS_DB_TYPEsays we'll be connecting to BigQuery.
CUBEJS_DB_BQ_PROJECT_IDshould be set to the identifier of your project in Google Cloud. First, go to the web console of Google Cloud. Create an account if you don't have one. Then go to the new project creation page and create one. Your project identifier is just below the name text field:
CUBEJS_DB_BQ_KEY_FILEshould be set to the key file name for your Google Cloud user that will connect to BigQuery. It's better to use a service account, a special kind of Google Cloud account with restricted access. Go to the service account creation page and create one. On the second step, you'll be asked to specify the roles for this service account. The only roles needed for read-only access to public datasets are
BigQuery Data Viewerand
BigQuery Job User. After the user is created, you need to add a new authentication key — use the
...button on the right to manage the keys for this account and add a new one of JSON type. The key file will be automatically downloaded to your machine. Please put it in the
bigquery-public-datasetsfolder and update your
.envfile with its name.
The third step is to start Cube.js. Run in your console:
npm run dev
And that's it! Here's what you should see:
Great, the API is up and running. Let's describe our data! 🦠