Kinesis: First of all, we will start by knowing what’s streaming data.
What’s streaming data?
This is information that’s generated constantly from more data sources, & these information sources can transfer the information records at the same time & in a little size.
Examples of streaming data:
- Geospatial data
When you’re using uber, & your device is linked to the internet. Uber app is continuously saying where the Uber driver is, where you’re, & it’s interrogating map to offer you an excellent possible route to the destination. This is the best case of streaming data.
- Game data
When the client is playing angry bird and the app generates streaming data back to a central server. The streaming data can be “what a user is doing”, “what’s the score”.
- Buying from online stores
Individuals stuff at amazon.com & create streaming data. That streaming data is a product, transactions, etc.
- IoT Sensor-Data
This senses the all-around world checking the temperature.
- Stock prices
This is also a case of streaming data.
- Social network data
This good example of streaming data. When you go on Facebook, update status, & post on a friend’s wall. All the data will be streamed.
This is a platform at AWS which sends one’s streaming data. This makes it simple to evaluate load streaming information & offers the capability for one to make a custom app depending on one’s business requirement.
Kinesis Core Services
- Kinesis Analytics
- Kinesis Firehose
- Kinesis Streams
This is comprised of shards.
Shards offer five transactions in every second for reads to a maximum total data-read rate of about 2MB in every second & about 1,000 records in every second for writes-up to a total data write-rate of about 1MB in every second.
The information capacity of one’s stream is a number function of shards that you specify for a data stream. The total capacity of the Kinesis stream is the sum of capacities of every shard.
Kinesis Stream Architecture
In case we’ve got EC2, Laptops, mobile phones, IOT that produces data. They’re understood as producers because they make the data. The information is moved to AWS Kinesis streams & kept in a shard. By default, the information is kept in shards for about 24 hours. One can increase the time to be seven days of retention. When the information is kept in shards, then you’ve EC2 instances that are understood as consumers. They get the information from shards & turn it into useful information. When a consumer does its calculation, then important useful information is moved to each of AWS services, which includes Redshift, S3, DynamoDB, EMR.
This is a service that’s used in providing streaming information to destinations like Amazon S3, Amazon Elasticsearch, and Amazon Redshift.
When with Kinesis Firehouse, one doesn’t have to control the resources.
Kinesis Firehose Architecture
In case we have the EC2, Laptop, mobile phones, IOT that are making the data. They’re also referred to as producers. They help in sending information to Kinesis Firehose. Kinesis-Firehose doesn’t have to manage resources like shards, you don’t have to bother about streams, you don’t have to bother about manual editing shards keeping up with data, etc. It is fully automated. You don’t have to bother even about consumers. Information is analyzed using the Lambda function. When the information is analyzed, it’s sent straight over the S3. Analytics of information is optional. Another important thing about Kinesis Firehouse is there’s no auto retention window, though the Kinesis stream contains an auto retention window that has a default time of 24 hours & is extended to seven days. Kinesis Firehose doesn’t work similarly. It basically either sends or analyzes the information over other locations or directly to S3.
The other different location can either be Redshift. Here, you have to first write to S3 & then copy to Redshift.
When the Elastic search cluster is the location, then the information is straight sent to the Elastic search cluster.
This is a Kinesis service that streaming data is processed & analyzed utilizing standard SQL.
Kinesis Analytics Architecture
We’ve got a Kinesis stream & a kinesis firehose. Kinesis Analytics enables one to operate SQL Queries of data that exist within kinesis firehose. One can utilize the SQL-Queries to keep the information in S3, Elasticsearch cluster, or Redshift. Essentially, information is analyzed in the kinesis utilizing SQL query language.
Differences in between Kinesis Firehose and Kinesis Streams
- Kinesis stream is managed manually while Kinesis Firehose is completely auto-managed.
- Kinesis stream transfers data to numerous services while Kinesis Firehose transfers data just to Redshift or S3.
- Kinesis stream is made up of an auto retention window that has a default time of 24 hours & can be prolonged to 7-days while Kinesis Firehose doesn’t have an auto retention window.
- Kinesis streams transfer the information to consumers for processing and analyzing while kinesis firehose doesn’t need to worry about customers as kinesis firehose analyzes data with the help of a lambda function.