Research 3 distributions that utilize the big data file systems approaches, and summarize the characteristics and provided funct

Question

3 years ago

7

Research 3 distributions that utilize the big data file systems approaches, and summarize the characteristics and provided funct

ionality. Research 3 distributions that utilize other NoSQL or NoSQL approaches, and summarize the characteristics and provided functionality. Compare and contrast how these technologies differ and the perceived benefits of each. Provide examples as necessary.

Computers and Technology

1 answer:

OlgaM077 [116] · Answer 1 · 2022-06-19T01:48:47+03:00

Answer:

Explanation:

1: The three most popular data systems that make use of Big Data file systems approach are:

The HDFS (Hadoop Distributed File System), Apache Spark, and Quantcast File System(QFS).

HDFS is the most popular among these and it makes use of the MapReduce algorithm to perform the data management tasks. It can highly tolerate faults and can run on low-cost hardware. It was written in Java and it is an open-source software.

Apache Spark makes use of Resilient Distributed Data (RDD) protocol. It is much faster and lighter than the HDFS and it can be programmed using a variety of languages such as Java, Scala, Python, etc. Its main advantage over HDFS is that it is highly scalable.

While QFS was developed as an alternative to the HDFS and it is also highly fault-tolerant and with space efficient. It makes use of the Reed-Solomon Error Correction technique to perform the task of data management.

2: The NewSQL databases were developed as a solution to the scalability challenges of the monolithic SQL databases. They were designed to allow multiple nodes in the context of an SQL database without affecting the replication architecture. It worked really well during the starting years of the cloud technology. Some of the databases that make use of New SQL technology are Vitess, Citus, etc.

Vitess was developed as an automatic sharding solution to the MySQL. Every MySQL instance acts as a shard of the overall database and each of these instances uses standard MySQL master-slave replication to ensure higher availability.

While, Citus is a PostgreSQL equivalent of the Vitess. It ensures transparent sharding due to which it accounts for horizontal write scalability to PostgreSQL deployments.

NoSQL database technology was developed to provide a mechanism for the storage and retrieval of data that is modeled in a way other than the tabular relations used in the traditional databases (RDBMS). The most popular database that makes use of the NoSQL technology is MongoDB. It functions as a cross-platform document-oriented database. It is known for its ability to provide high availability of replica sets. A replica set is nothing but a bundle of two or more copies of the data