What is data science and how is it improving the online advertising industry? We sat down with our senior data scientist, Larkin Liu, to try and make sense of the math all over our whiteboards.
Q: What is data science?
Data science is the application of statistical, mathematical, and machine learning methods to optimize and improve the business objectives set forth by an organization. We apply the most effective mathematical and statistical principles to analyze and identify problems and possible solutions, and then develop an end-to-end pipeline to put our theories into production.
The big name that comes to mind that has become synonymous with data science is “machine learning”. Machine learning is definitely a key part of it, but the idea that machine learning and data science are equivalent is a common misconception.”
Often times data scientists will apply tools from applied statistics and mathematics, in addition to tools found in machine learning to analyze data, find areas of improvement, and design procedures that will optimize business objectives.
“Machine learning is definitely a key part [of data science], but the idea that machine learning and data science are equivalent is a common misconception.”
In relation to ad tech, the applications of data science are endless. For example, by leveraging large scale data analysis we can identify which ad placements are more appealing to which user segments. This allows us to serve ads that will result in higher engagement. Higher engagement rates, of course, result in a more efficient ad spend.
“In relation to ad tech, the applications of data science are endless.”
The bonus step is that all of these insights are detected and actions executed in real time because it goes beyond business intelligence, it starts with extracting insight and ends with execution to deliver immediate impact.
Q: How can data science improve the advertising technology landscape?
1. Bid Price Optimization: This project involves finding the optimal bid price depending on the user and the network being bidding on. The main benefit of this is a more effectively spent advertising budget.
2. Fraud Detection: In an effort combat fraud, especially click fraud caused by malicious bots, we examine the statistical properties pertaining to the click behaviour of regular users vs. the click properties associated with that of malicious bots and find algorithms to combat the ever evolving strategies of bots.
3. User Segmentation: Customers are grouped into different interest segments via their browsing history, but we also apply third party data to first find relevant clusters (or look-alike groups) where each customer share the same properties. Identifying customers with the same properties allows advertisers to target similar audiences and prioritize high value audience members.
Q: What are the biggest challenges data scientists face?
1. Big Data: The first major challenge is big data. When the size of the data is in the petabyte range (1 petabyte = 1 million gigabytes), even performing simple tasks such as aggregation and sorting becomes a big challenge. New technologies have allowed us to wrangle large data sets more efficiently, but the data scientist still needs to ensure that such operations are kept efficient.
“When the size of the data is in the petabyte range (1 petabyte = 1 million gigabytes), even performing simple tasks such as aggregation and sorting becomes a big challenge.”
2. From Design to Production: Like all projects in engineering, delivering an algorithm involves moving a design from prototype to production. It is an arduous process moving from research into feasability and effectiveness to backtesting historical data to deploying live A/B tests.
The major challenges are deploying the AB test on a large scale because implementing an algorithm in real time involves fundamentally changing the code-base of our platform. A strong collaboration between data science and software engineering is a necessity.
3. Learning as you go: When it comes to data science in practice there is no definitive textbook or manual. The data scientist has many tools at his or her disposal and a variety of ways to tackle a problem. The data scientist needs to dive deep into understanding the problem from multiple perspectives, first from the business perspective, second from a mathematical perspective, and sometimes from an entirely new perspective.
Then the data scientist needs to determine which approach, from a quantitative standpoint, is most suited to meet those business objectives. This combined with the constantly changing real world requirements make practical data science a very challenging task.
Q: What excites you most about the future of data science?
Traditional statistical methods are highly theoretical in nature. But as computers become more powerful, and data becomes more readily available, there has been a paradigm shift from theoretical verification to continuous verification. Modern data scientists put less emphasis on an academic and theoretical interpretation of the data and prefer to do more of a continuous A/B testing approach, more commonly known as data mining.