Cassandra Architecture:
Holds legacy from Big Table(by google) for wide row/memtable Model and Dynamo from(Amazon) for Architecture
Cassandra has a peer-to-per architecture(i.e. No master-slave replication as in MongoDB), henceno single point of failure where in each instance of Cassandra is said to be node .
The data resides in node(as partition) contained in rack(a logical set of nodes) which resides in Datacenter(logical set of Racks).
Default DataCenetr: DC1 Default Rack: RC1
Casssandra Cluster defines as the full set of nodes which map to single complete token ring, in which each node is primarily responsible for the replicas(Distrbution of data across cluster) that resides in the particular section/segment of token ring.
Each Node stores data as the atomic unit data called Partition, which in turn can be considered as the anlogous to row containg the columns of data
Request Coordination: Each read/write request to the cluster is determined by a particular nodes called coordinator (acts as the proxy b/w client application and nodes(replica) that owns data writes in cassandra)which is defied by the client library, which determies the location in the cluster to where to copy/replica the data(based on the RF(Replication factor) and RS(Replication Strategy). Read/Write operation is confirmed to client through Coordinator as per the CL(Consistency level)
Any node could be the coordinator and any node can talk to any node and by default, the pattern followed is:
Every write on node gets timestamped.
Consistency Level (CL)→ Defines the number of creteria to hold good to signal the read/write as succefull and is configurable with following available levels:
Any
One (default), check closest replica node to coordinator
All
Quorum, defines the majority nodes i.e. (replica nodes)/2 +1;
Local_one
Local_quorum
each_quorum
CL is set for each request
CL varies for write and read, In case of read its the most recent/current data merged by coordinator
Tunable Consistency:
Type of consistency:
a). Immediate Consistency:
It returns most recent data , ALL gurantees recent data as is simple merged and compread in coordinator but has high latency, as it needs to compare and merge data at coordinator. if(nodes_written+nodes_read)> RF(Replication factor)
Consistency levels can be customized as :
High RC (Read consistency)--->
High WC(Write onsistency) --> ALL
Balanced C-->WC and RC both as Quorum
Clockwise syncronization across nodes is must as each to column includes timestamp and most recent data is read back in ine response to client
b).Eventual Consistency: may return stale data and ONE has high chances for stale data and low latency a gets data from node nearest to the coordinator.
ANY lowest while ALL → Highest Consistency
RF (Replication Factor)→ How many read/write to be done(number of replica)
Replication: storing copies of data on mutiple nodes.
It is defined at time defines keyspace using replica placement starategy total number of replicas across cluster= replication factor
All replicas equivalent i.e copy of each row
Replication factor < number of nodes else writes not allowed
RS (Replication Strategy)
Simple Strategy(For single Data Center setup), with RF >=3
places first replica on node determined by partioner Additional replicas placed clockwise
Network Topology(For multiple datacenter setup, prefers the replica to be on nodes falling inside different racks and different DataCenter),so to avoid n/w failure, power and is unique factor for each DC
Partioning:
It s a process wherein a hashing function is implement on the primary key for a record to be inserted to generate the token hence partioning key(Token calculated to determine the partition in the token
ring), in other words partioning key is the token (128 bit integer value)calculated by partioner( a system on each Cassandra system) and used by coordinator to figure out the node containg the token as primary range in the token ring and places the first replica on that node.
In Cassandra, the total amount of data managed by the cluster is represented as a ring. The ring is divided into ranges equal to the number of nodes, with each node being responsible for one or more ranges of the data.
Partioners are of 3 types:
Murmur3(Default)
RandomPartioner:
token assigns equal portion of data to each node
read/write are also evenly distributed and load balancing is simple as each part of hash range receives equal number of rows on average
ByteOrderered Partioner:
Consistent Hashing:
Each Node is the highest token for the particular segment of the token ring and is responsible for the partition with partitionkey falling in the preceding range and is called Primary Range and is first replica and the further replicas positioning is determied by the RF and RS.
Before a node canjoin the ring, it must be assigned a token. The token value determines the node's position in the ring and its range of data. Column family data is partitioned across the nodes based on the row key.
To determine the node where the first replica of a row will live, the ring is walked clockwise until it locates the node with a token value greater than that of the row key. Each node is responsible for the region of the ring between itself (inclusive) and its predecessor (exclusive).
With the nodes sorted in token order, the last node is considered the predecessor of the first node; hence the ringrepresentation.
Node with the lowest token also accepts row keys less than the lowest token and more than the highest token.
Nodes Communications: Based on the Gossip (each node gossips its token which were generated at start of cluster)and Snitch System, including the provision for the Dynamic snitch for nodes performing poorly).where,Snitch defined in .yaml defines how the nodes are gruped together within overall network tolpology.
Maps IPs to racks and datacenters
Gossip protocol is used to make nodes feasible to communicte with each other and hence to get the latest state of node.
As a practice, good to have same seed node in all nodes in a data
Type of Snitch:
SimpleSnitch = single datacenter cluster RackInferringSnitch = defined by octet on node IP address
PropertyFileSnitch = deterines the location of nodes by rack and D EC2 snitch
Dynamic Snitch= default and monitors the read latency and routes requests away frompoor-performing snitcch
Each node joining the cluster determines the cluster topology as:
Communicates to the Seed nodes using Gossip protocol(version controlled runs every sec to exchange the state information)
All is done through configuration in casssandra.yaml:
Prerequisites:
Seed node: IP address for the node that the ne node will contact first after joining the cluster
Schema of cassandra is KEYSPACE
Overhead for data persisted to disk:
1.Column Overhead , 15 byte per column
2.Row overhead, 23 byte data
3.Primary key index of row keys
4.replication overhead
Avoid using supercolumns as has performance issues associated with rather use composite columns Cassandra was designed to avoid using Load Balancer, as high level clients as pycassa implements Load balancing directly
2 types of column family: static column family
Dynamic column family: for custm datatypes
Type of columns in columnFamily are:
Standard, one primary key
Composite, for managing wide rows
Expiring, gets deleted during compaction has optional expiration date called TTL(time to live)
Counter, as to store the number of times a page is viewed
Super, not preferd against composite column since it reads entire super columns and sun columns hence performance issue
Repair Options:
Read repair , read time and is done automatically
Nodetool repair: reapir full nodes at downtime
During read operations, full data is requested from one replica node and digest query is sent to all other nodes.wherein digest query returns the hash to return the current datastage and after merge at the coordinator the repplicas are updted automatically.
Nodetool repair used to repair / make all data on node consistent with most current replicas on cluster,
(To avoid Schema Disagreement) and is stored by corodiator in system.hints table if the target ode: is down
fails to acknowledge and is configurable in cassandra.yaml
….................. READ and WRITE operations to come
No comments:
Post a Comment