Introduction of Cassandra Training:
Apache Cassandra Training is a free and open source Database, and it belongs to category of NoSQL database like MongoDB Training, Amazon DynamoDB. The difference between Apache Cassandra Training and MongoDB is based on JSON document and then Apache Cassandra is based on key value pairs. Apache Cassandra also provides a high availability with no single point of failure. It has a master less replication meaning there is no master-slave replication with Apache Cassandra. So any number of nodes together can form a master less replication with Apache Cassandra. It is initially developed and it is a child of BigTable and dynamoDB , So the combination or the features or the good features of both BigTable and dynamoDB are adopted and implementation with Apache Cassandra Training. Cassandra was started in 2008 at Facebook and became an Apache Project in 2014.
Prerequisites of Cassandra Training:
- Cassandra supports Tables, columns, and simple SQL statements, apache kafaka.
- You must have knowledge on MongoDB, oracle.
- Java should be known for Cassandra Training
cassandra training course content:
Module 1-Advantages ,Usage of Cassandra
- Brief Introduction about cassandra
- Advantages &Usage of Cassandra
Module 2-CAP Theorem & No SQL DataBase
- Why No SQL DataBase
- Replication in the RDBMS
- Key Challenges with the RDBMS
- No SQL(Not only SQL)
- No SQL Category
- Advantage , Limitation
- The Key Characteristics of No SQL Data Base
- CAP_ Theorem
Module 3-Cassandra fundamentals, Data model, Installation & setup
- What is an Cassandra?
- Key deployment_ concept
- What is an column oriented database
- Data Model column
- What is column family?
- Installation part
Module 4-Steps in the Configuration
- Overview on Configuration
- Expiring column
Module 5-Summarization, node tool commands, cluster, Indexes, Cassandra and Mapreduce, Installing Ops-center
- Various Difference between Relational modeling and Cassandra modeling
- Steps in the Cassandra modeling
- Time series modeling in an Cassandra
- Data modeling in the Cassandra
- Column family versus Super column family
- Counter column family
- Concepts of Partitioners
- Partitioners strategies
- Gossip protocols
Module 6- The-Multi Cluster setup
- Node_ settings
- Set-up of Multinode cluster
- Row cache & Key cache
- System keyspace
- Overview on Commands
Module 7-Thrift/AVRO/JSON/Hector Client
- Hector_client, and
- How to write an JAVA code
- Hector tag
Module 8-Datastax installation part, the Secondary index
- Node tool commands overview
- Management of an Cassandra
- Cassandra and map reduce
- Installation of Datastax
Module 9-Cassandra API & Summarization and Thrift
- Basic concepts of API
- Internals of an connection pool
- Client connectivity to the cassandra
- Hector client key_features
- Key concepts of Hector client
Difference between RDBMS and NoSQL:
- The SQL Database is used in legacy applications like Relational table storage, enterprise applications like ERP systems, Data mart of web and Mobile.
- NoSQL though we have overlap of mobile and web, enterprise applications, NoSQL has focused use of applications including Gaming, social media, blogging, IOT , Key value store, Document Database or the column families.
- In the subsequent slides we would highlight of difference between the Column family systems and the relational stable storage.
- We explaining difference between a RDMS with NoSQL in the Tabular fashion, the NoSQL its non-relational database and SQL it is a relational database and NoSQL is doesn’t support as it properties in a transaction. The SQL Database supports acid properties of a transaction.
- The schema with a No-SQL is very flexible, if you can change it at any point. In the SQL the schemas strictly not flexible and consistency.
- The consistency of the NoSQL database highly varies solution and Apache Cassandra Training can tune the consistency level.
- The subsequent slides of this presentation and then in SQL the strong consistency is supported. The cap theorem that is the consistency availability and partition tolerance theorem, not all the database would have all the three property
- If you take the SQL it supports only consistency and availability, but in case if you take know a SQL it supports availability and partition tolerance. It does not supports very well, so emphasize it doesn’t supports consistency very well and it doesn’t support consistency at all.
Overview of Cassandra Training:
It have multiple nodes connected together to form a cluster and each node within the cluster would have connectivity between other nodes. The data would replicate across other nodes, so you can configure of how your data needs to be replicated and on how many nodes you want your data to be replicated. This concept is also called as data sharing or scaling. Where you can define of how your data needs to be split and store across the nodes in the cluster.
- The Characteristics of these nodes are independent to one another, even if one node goes down the other node will take up the right request on the read request from the Clint.
- Any node can accept the read and write request a respective of the master or slave, because Apache Cassandra Training is a master less application.
- A data replication and Cassandra nodes will be replicas for piece of data, so each piece of data would be replicated across the nodes.
- If each time when a data is searched out of the node, the Apache Cassandra Training would check whether the data is up-to-date it’s not it performs the read repair, if the data is out of the date
- All this based on a specific protocol for Apache Cassandra Training called the gossip protocol it validates and verifies whether that data is up-to-date and if it is not performs a reader pack.
The moving on these are the components that together build the Architecture of Apache Cassandra Training.
Node – It is a place where a data, data is stored it could be a single physical server or a single physical server can have multiple nodes configures on a VM.
Data Center – Data Center is a collection of related nodes, so within each data center you can have multiple nodes and then you could have multiple data centers. So all nodes across all the data centers together will form a cluster. So it is cluster the component that kind of contains one or more data centers together and each data center will have multiple nodes
Commit Log – Commit log is a crash recovery mechanism and Cassandra, so each time when a data is written or when a crude operation is targeted on Apache Cassandra database that information would be written on the Commits log.
Mem-Table – A mem-table is a memory which as the data structure of all the data. After commit log, the data will be written to the memory-table and the subsequent read request would be served from this mint table. Some times for a single-column family, there will be multiple mem-tables, which you can define the key space and the column family, which we can see subsequence slide.
SS Table – SS Table stands for shorter string table; it has the disk file to which the data is flushed into, so once do a write operation of the data. the data would be written in commit log and then the data would be updated within the memory table.
Features of Cassandra Training:
- It is very highly scalable and it can scale horizontally to a very great extent, it means we want to increase the size of your cluster.
- It is does not provide decision directly however it can be used to make decisions
- Big data does not include unstructured data only, it also includes structured data that extends and complements unstructured data
- Big data is not a substitute for structured data, since most of the information on the internet is available to anyone.
- com has a Cassandra cluster of 75,000 nodes, storing over 100 petabytes of data. Cassandra is a key-value NoSQL data store.
- Fault- tolerant considers a four node Cassandra cluster, in a cluster each node stores copies of same data based on replication factor.
- If the cluster node 4 is having a problem in this case three or three other nodes that can be serviced for our request from the application
- In traditional database if you want to perform any maintenance activity then you need to bring the database down.
- In Cassandra Training the nodes are set up the form of cluster, you can perform maintenance operations’ at node level without impacting the application.
Big Data use Cases – Apache Cassandra Online Training:
Big Data use cases are as following in the retail sector,
- Big Data is used extensively for affinity detection and performing market analytics.
- Credit card companies can detect fraudulent purchases quickly, so they can alert customers while.
- So they can other customers while giving loans, banks, examine private and public data of a customer to minimize risk.
- In medical diagnostics doctors can diagnose a patients illness based on symptoms instead on intuition.
- Digital marketers need to process huge customer data to find effective marketing channels. Insurance companies use big data to minimize insurance risk.
- The individuals driving data can be captured automatically and sent to the company to calculate premium for risky drivers.
- Manufacturing units and Oil Rigs have sensors that generate gigabits of data every day.
- Advertisers use target audience, and Terabytes and petabytes of data are analyzed in the field of genetics to design new models.
- Power grids analyze large hunts of historical and weather forecasting data to forecast power consumption.
Concept of Cassandra Training Cluster:
A Cluster in Cassandra Training is basically a collection of Nodes, a Nodes is actually an independent storage area connected in the form of a cluster. The cluster every node is connected to each other. The request can come from any client can be a Java application a Cassandra swift client or a a Cassandra CQL SH client or a web service or any application. It can send a request to a Cassandra cluster, there is node concept in Cassandra. So any node which is free to service any request will take up the new request. The node which takes the request becomes the coordinator node, a coordinator node will find out the node can service is this request.
Limitations of Cassandra Training:
- Cassandra Training is not a general-purpose database due some limitations is there, it doesn’t provide aggregation of data with group by some min or max like relational databases and Any aggregations has to be pre computed and stored .There are no joins of tables, so data has to be re-normalized before getting stored in Cassandra Training. It doesn’t support additional search clauses or conditions only keys or indexes can be used for a search.There is no sorting provided on non key fields.