Taming Big Data with Apache Spark and Python PDF
What is Big Data and why is it called "new oil"
In the near future, big data will become the main tool for decision-making - from networked businesses to entire states and international organizations.
The world leaders in collecting and analyzing big data are the United States and China. Thus, in the United States, even under Barack Obama, the government launched six federal programs for the development of big data totaling $ 200 million. Large corporations are considered the main consumers of Big Data, but their data collection activities are limited in some states, for example, in California.
China has over 200 laws and regulations regarding the protection of personal information. Since 2019, all popular smartphone apps have started to be checked and blocked if they collect user data contrary to the laws. As a result, the government collects data through local services, and many of them are inaccessible from the outside.
Since 2018, the GDPR - General Data Protection Regulation - has been in effect in the European Union. It governs everything related to the collection, storage and use of data from online users. When the law went into effect a year ago, it was considered the world's toughest system for protecting people's privacy on the Internet. Read more in the Taming Big Data with Apache Spark and Python book by Frank Kane.
Read more in the article "Digital Wars: How Artificial Intelligence and Big Data Rule the World."
The big data market is just emerging. For example, mobile operators share information about potential borrowers with banks.
Big Data in business
Big data is good for business in three main ways:
- Launching products and services that will most accurately "shoot" the needs of the target audience;
- Analyzing customer experience in relation to a product or service in order to improve it;
- Customer acquisition and retention with analytics.
- Big Data helps MasterCard prevent more than $3 billion in fraudulent transactions in customer accounts per year. They allow advertisers to better allocate budgets and place ads that target a wide variety of consumers.
Big companies like Netflix, Procter & Gamble or Coca-Cola use big data to predict consumer demand. 70% of decisions in business and government are made on the basis of geodata. Read more in the article on how a business makes profit from Big Data.
Challenges and perspectives of Big Data
- Big data is heterogeneous and therefore difficult to process for statistical inference. The more parameters are required for forecasting, the more errors accumulate during the analysis;
- Working with large amounts of data online requires enormous computing power. Such resources are very expensive and are currently only available to large corporations;
- Storing and processing Big Data is associated with increased vulnerability to cyber attacks and all kinds of leaks. A prime example is the Facebook profile scandals;
- The collection of big data is often associated with a privacy issue: not everyone wants their every action to be monitored and passed on to third parties. Characters from the podcast "What's Changed" explain why privacy is no longer on the web, and the tech giants know everything about us;
- Big data is used for their own purposes not only by corporations, but also by politicians: for example, to influence elections.
Pros and cons:
- Big data is helping solve global problems - for example, fighting a pandemic, finding cures for cancer and preventing an environmental crisis;
- Big Data is a good tool for creating smart cities and solving transport problems;
- Big data helps save money even at the state level: for example, in Germany, they returned about € 15 billion to the budget, having discovered that some of the citizens receive unemployment benefits for no reason. They were calculated using transactions.