We all know there is a tremendous focus on Big Data today, but exactly is Big Data anyway?

What is Big Data Anyway?

Many people make the mistake of thinking Big Data is only about advanced analytics, often of unstructured data. From my experience, this is a very limited definition, and the scope of Big Data will intrude on virtually any advanced application fitting a high growth need. It’s a fact that databases only get larger with time, which will put more and more applications into the Big Data requirement over time.

Here is a trimmed down version of what Wikipedia defines as Big Data:

In information technology, big data is a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools. The challenges include capture, curation [the preservation and maintenance of digital assets], storage, search, sharing, analysis, and visualization. The trend to larger data sets is due to the additional information derivable from analysis of a single large set of related data, as compared to separate smaller sets with the same total amount of data, allowing correlations to be found to „spot business trends, determine quality of research, prevent diseases, link legal citations, combat crime, and determine real-time roadway traffic conditions.“…

Big data is difficult to work with using relational databases and desktop statistics and visualization packages, requiring instead „massively parallel software running on tens, hundreds, or even thousands of servers“. What is considered „big data“ varies depending on the capabilities of the organization managing the set. „For some organizations, facing hundreds of gigabytes of data for the first time may trigger a need to reconsider data management options. For others, it may take tens or hundreds of terabytes before data size becomes a significant consideration.“


As you can see, the focus of the definition is on analytics and trends, but from experience I believe Big Data concepts are much broader, and can be extended to many different database scenarios, including OLTP (online transaction processing), traditional data warehouse applications, and NoSQL engines.

Here is a more practical definition of Big Data:

A monolithic database meets the criteria for Big Data is when you have a scalability and performance problem with your database.


Unsere Redaktion empfiehlt:

Relevante Beiträge

Meinungen zu diesem Beitrag

- Gib Deinen Standort ein -
- or -