Suraj Mishra

Posts

Serverless API Data Ingestion in Google BigQuery: Part 1 (Introduction)

Ingesting API Data in Google BigQuery the Serverless way! API To Google BigQuery In the era of cloud computing, Serverless has become a buzzword that we keep hearing about, And eventually, we get convinced that serverless is the way to go for companies of all sizes because of various advantages. The basic advantages of the Serverless approach are : No Server Management Scalability Pay as you go In this article, we will also explore how we can use the Serverless approach to build our data Ingestion pipeline in Google Cloud. Serverless Offerings In GCP GCP offers plenty of Serverless Services in various areas such as mentioned below. Computing : Cloud Run, Cloud Function, App Engine Data warehouse : Google BigQuery Object Storage: Google Cloud Storage Workflow management : Cloud Workflow Scheduler: Cloud Scheduler Technically, the combination of the above tools is enough to build API data ingestion in GCP. We can build Two patterns to ingest API data in Google BigQu...

Serverless API Data Ingestion in Google BigQuery: Part 2(Show Me The Code!)

Ingesting API Data in Google BigQuery the Serverless way! Introduction A few days ago I wrote 1st part of this blog. Where I talked about the concept that how can we design serverless data ingestion from API data either as streaming or batch pipeline. In this blog, we will see the code and configuration for the entire pipeline. Let's get started. Design In the previous blog, we discuss pattern 1 where we are ingesting data from API and Inserting it into BigQuery In real-time. We will use this pattern for showing code and configuration. Cloud Workflow Yaml Configuration Cloud workflow is a serverless offering from GCP that allows us to design workflow that can execute our multistep pipeline as we design. If some steps get failed we can stop or take the necessary steps to alert system operators to know about failure. In our case, we are initializing some variables that we will use later in configurations such as project, datasetId, tableId, region, and SQL query. As per our design, ...

Beginners Guide to Machine Learning on GCP

This is the title of the webpage! This blog covers basic knowledge needed to get started ML journey on GCP. It provides foundational knowledge which will help readers to gain some level of confidence understanding ML ecosystem on GCP from where they can master each component. Introduction to Machine Learning Machine Learning is a way to use some set of algorithms to derive predictive analytics from data . It is different than Business Intelligence and Data Analytics in a sense that In BI and Data analytics Businesses make decision based on historical data, but In case of Machine Learning , Businesses predict the future based on the historical data. Example, It’s a difference between what happened to the business vs what will happen to the business .Its like making BI much smarter and scalable so that it can predict future rather than just showing the state of the business. ML is based on Standard algorithms which are used to create ...

Cloud Computing Weekly Digest #1

This is the title of the webpage! Cloud Computing Weekly Newsletter is a weekly digest of all the major development and updates that happened in major cloud providers like GCP, AWS, Azure. Photo by Alex Machado on Unsplash

What is advertised.listeners in Kafka?

This is the title of the webpage! Hi guys, Today we gonna talk about Kafka Broker Properties. More Specifically, advertised.listeners property. If you have seen the server.properties file in Kafka there are two properties with listener settings. #listeners=PLAINTEXT://:9092 #advertised.listeners=PLAINTEXT://your.host.name:9092 why the hell we need two listeners for our broker? usually, Kafka brokers talk to each other and register themselves in zookeeper using listeners property. So for all internal cluster communication happens over what you set in listeners property. But if you have a complex network, for example, consider if your cluster is on the cloud which has an internal network and also external IP on which rest of the work can connect to your cluster, in that case, you have to set advertised.listeners property with {EXTERNAL_IP}://{EXTERNAL_PORT}. For Example, If Internal IP is 10.168.4.9 and port is 9092 and External IP is...