Big Data: Big Opportunities and Big Challenges

Aman Pasi
3 min readSep 17, 2020

--

In this 21st century, the magnitude of data generated and shared by businesses, public administrations numerous industrial and not-to-profit sectors, and scientific research, has increased immeasurably (here by immeasurably I mean very very very large data).
These data include textual content , to multimedia content (e.g. videos, images) on a multiplicity of platforms .

By a report, every day the world produces around 2.5 quintillion bytes of data (i.e. 1 exabyte equals 1 quintillion bytes or 1 exabyte equals 1 billion gigabytes), with 90% of these data generated in the world being unstructured.

“By 2020, over 40+ Zettabytes (or 40+ trillion gigabytes) of data have been generated, imitated, and consumed.”

If we are creating, this amount of DATA, why not delete it?
Of course, we can’t because some how its very important to us(individual as well as for companies).

If we can’t delete these data, it will keep accumulating and at some point of time it will become a problem in terms of storage and searching. This problem is what we call BIG DATA.

Big data can be described in terms of data management challenges that — due to increasing volume, velocity and variety of data — cannot be solved with traditional databases.
- AWS

There are different definition in the market about BIG DATA, but the most commonly agreed are the 3Vs (Volume, Velocity and Variety) of Big Data.
* Volume: Ranges from terabytes to petabytes of data.
* Variety: Includes data from a wide range of sources and formats (e.g. web logs, social media interactions, ecommerce and online transactions, financial transactions, etc).
* Velocity: Increasingly, businesses have stringent requirements from the time data is generated, to the time actionable insights are delivered to the users. Therefore, data needs to be collected, stored, processed, and analyzed within relatively short windows — ranging from daily to real-time.

Google

Google now processes over 40,000 search queries every second on average, which translates to over 3.5 billion searches per day and 1.2 trillion searches per year worldwide. It will be a great problem if the velocity of data is slow for even a mili-second, it will have a huge bad effect on overall search.
Google currently processes over 20 petabytes of data per day through an average of 100,000 MapReduce jobs spread across its massive computing clusters.

It turns out Google uses a distributed file system spread over many machines. It offers huge storage (hundreds of terabytes) over thousands of machines and thousands of disks. The advantage of this type of system is redundancy and low cost.

Facebook

Arguably the world’s most popular social media network with more than two billion monthly active users worldwide, Facebook stores enormous amounts of user data, making it a massive data wonderland.
Every 60 seconds, 136,000 photos are uploaded, 510,000 comments are posted, and 293,000 status updates are posted. That is a LOT of data.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

Aman Pasi
Aman Pasi

No responses yet

Write a response