Skip to main content

Posts

Showing posts from August, 2021

Using NPM Library in Google BigQuery UDF

  Javascript UDF’s are cool and using with NPM library is a whole new world to explore! Background One of the main reason to build ETL pipeline was to do data transformation on data before loading into the data warehouse. The only reason we were doing that because data warehouses were not capable to handle these data transformations due to several reasons such as performance and flexibility. In the era of modern data warehouses like Google BigQuery or SnowFlake , things have changed. These data warehouses can process terabyte and petabyte data within seconds and minutes. Considering this much improvement, now performing data transformation within a data warehouse make more sense. Hence to create common transformation logic via UDF (user-defined functions). In this blog, we will see how can we utilize the power of javascript UDF and NPM library to generate data in BigQuery. What is UDF? From Google Cloud Documentations: A user-defined function (UDF) lets you create a function by using

Serverless API Data Ingestion in Google BigQuery: Part 1 (Introduction)

  Ingesting API Data in Google BigQuery the Serverless way! API To Google BigQuery In the era of cloud computing, Serverless has become a buzzword that we keep hearing about, And eventually, we get convinced that serverless is the way to go for companies of all sizes because of various advantages. The basic advantages of the Serverless approach are : No Server Management Scalability Pay as you go In this article, we will also explore how we can use the Serverless approach to build our data Ingestion pipeline in Google Cloud. Serverless Offerings In GCP GCP offers plenty of Serverless Services in various areas such as mentioned below. Computing : Cloud Run, Cloud Function, App Engine Data warehouse : Google BigQuery Object Storage: Google Cloud Storage Workflow management : Cloud Workflow Scheduler: Cloud Scheduler Technically, the combination of the above tools is enough to build API data ingestion in GCP. We can build Two patterns to ingest API data in Google BigQuery Business Requir

Serverless API Data Ingestion in Google BigQuery: Part 2(Show Me The Code!)

Ingesting API Data in Google BigQuery the Serverless way! Introduction A few days ago I wrote 1st part of this blog. Where I talked about the concept that how can we design serverless data ingestion from API data either as streaming or batch pipeline. In this blog, we will see the code and configuration for the entire pipeline. Let's get started. Design In the previous blog, we discuss pattern 1 where we are ingesting data from API and Inserting it into BigQuery In real-time. We will use this pattern for showing code and configuration. Cloud Workflow Yaml Configuration Cloud workflow is a serverless offering from GCP that allows us to design workflow that can execute our multistep pipeline as we design. If some steps get failed we can stop or take the necessary steps to alert system operators to know about failure. In our case, we are initializing some variables that we will use later in configurations such as project, datasetId, tableId, region, and SQL query. As per our design,