I read an article today on the topic of Big Data. In the article, the author claims that the term Big Data is the most hyped technology ever, even compared to such things as cloud computing and Y2K. I thought this was a bold claim and one that is testable. Using Google Trends, I looked at the popularity of three IT terms to understand the relative hype of each (as measured by number of searches on the topic):Â Web 2.0, cloud computing and big data. The chart from Google Trends appears below.
We can learn a couple of things from this graph. First, the interest in Big Data continues to grow since its first measurable growth appeared in early 2011. Still, the number of searches for the respective terms clearly shows that Web 2.0 and cloud computing received more searches than Big Data. While we don’t know if interest in Big Data will continue to grow, Google Trends, in fact, predicts very a very slow growth rate for Big Data through the end of 2015.
Second, the growth rates of Web 2.0 and cloud computing are faster compared to the growth rate of Big Data, showing that public interest grew more quickly for those terms than for Big Data. Interest in Web 2.0 reached its maximum in a little over 2 years since its initial ascent. Interest in cloud computing reached its peak in about 3.5 years. Interest in Big Data has been growing steadily for over 3.7 years.
One thing of interest. For these three technology terms, the growth of the two latter technology terms started at the peak of the previous term. As one technology becomes commonplace, another takes its place.
So, is Big Data the most hyped technology ever? No.
Data Have Meaning
We live in a Big Data world in which everything is quantified. While the emphasis of Big Data has been focused on distinguishing the three characteristics of data (the infamous three Vs), we need to be cognizant of the fact that data have meaning. That is, the numbers in your data represent something of interest, an outcome that is important to your business. The meaning of those numbers is about the veracity of your data.
[ DATA SCIENCE Q&A]
Q:How do you handle missing data? What imputation techniques do you recommend?
A: * If data missing at random: deletion has no bias effect, but decreases the power of the analysis by decreasing the effective sample size
* Recommended: Knn imputation, Gaussian mixture imputation
Winter is coming, warm your Analytics Club
Yes and yes! As we are heading into winter what better way but to talk about our increasing dependence on data analytics to help with our decision making. Data and analytics driven decision making is rapidly sneaking its way into our core corporate DNA and we are not churning practice ground to test those models fast enough. Such snugly looking models have hidden nails which could induce unchartered pain if go unchecked. This is the right time to start thinking about putting Analytics Club[Data Analytics CoE] in your work place to help Lab out the best practices and provide test environment for those models.
[ DATA SCIENCE Q&A]
Q:Give examples of data that does not have a Gaussian distribution, nor log-normal?
A: * Allocation of wealth among individuals
* Values of oil reserves among oil fields (many small ones, a small number of large ones)
Realizing that you can only improve what you measure is a good way to think about KPIs. Often companies want to improve different aspects of their business all at once, but canât put a finger on what will measure their progress towards overarching company goals. Does it come down to comparing the growth of last year to this year? Or, is it just about the cost of acquiring new customers?
If youâre nervously wondering now, âwait, what is my cost per deal?â, donât sweat it. Another growing pain of deciding on KPIs is discovering that there is a lot of missing information.
Defining Your KPIs
Choosing the right KPI is crucial to make effective, data-driven decisions. If you choose the right KPI, it will help to concentrate the efforts of employees towards a meaningful goal, however, choose incorrectly and you could waste significant resources chasing after vanity metrics.
In order to rally the efforts of your team and achieve your long-term objectives, you have to measure the right things. For example, if the goal is to increase revenue at a SaaS company by 25% over the next two quarters, you couldnât determine success by focusing on the number of likes your Facebook page got. Instead, we could ask questions like: Are we protecting our ARR by retaining our existing customers? Do we want to look at the outreach efforts of our sales development representatives, and whether that results in increased demos and signups? Should we look at the impact of increased training for the sales team on closed deals?
Similarly, if we wanted to evaluate the effectiveness of various marketing channels, we need to determine more than an end goal of increasing sales or brand awareness. Instead, weâll need a more precise definition of success. This might include ad impressions, click through rates, conversion numbers, new email list subscribers, page visits, bounce rates, and much more.
Looking at all these factors will allow us to determine which channels are driving the most traffic and revenue. If we dig a bit deeper, there will be even more insights to discover. In addition to discovering which channels produce traffic most likely to translate into a conversion, we can also learn if other factors such as timing make a difference to reach our target audience.
Of course, every industry and business are different. To establish meaningful KPIs, youâll need to determine what most clearly correlates with your companyâs goals. Here are a few examples:
Finance â Working capital, Operating cash flow, Return on equity, Quick ratio, Debt to equity ratio, Inventory turnover, Accounts receivable turnover, Gross profit margin
Marketing âCustomer acquisition cost, Conversion rate of a particular channel, Percentage of leads generated from a particular channel, Customer Churn, Dormant customers, Average spend per customer
Retail â Gross margin (as a percentage of selling price), Inventory turnover, Sell-through percentage, Average sales per transaction, Percentage of total stock not displayed
If your business is committed to data-driven decision making, establishing the right KPIs is crucial. Although the process of building a performance-driven culture is iterative, clearly defining the desired end result will go a long way towards help you establish effective KPIs that will help focus the efforts of your team towards that goal, whether itâs to move product off shelves faster, create better patient outcomes, or increase your revenue per customer.
The good news is that in the business intelligence world, measuring performance can be especially precise, quick and easy. Yet, the first hurdle every data analyst faces is the initial struggle to choose and agree on company KPIs & KPI tracking. If you are about to embark on a BI project, hereâs a useful guide on how to decide what it is that you want to measure:
Step 1: Isolate Pain Points, Identify Core Business Goals
A lot of companies start by trying to quantify their current performance. But again, as a data analyst, the beauty of your job and the power of business intelligence is that you can drill into an endless amount of very detailed metrics. From clicks, site traffic, and conversion rates, to service call satisfaction and renewals, the list goes on. So ask yourself: What makes the company better at what they do?
You can approach this question by focusing on stage growth, where a startup would focus most on metrics that validate business models, whereas an enterprise company would focus on metrics like customer lifetime value analysis. Or, you can examine this question by industry: a services company (consultancies) would focus more on quality of services rendered, whereas a company that develops products would focus on product usage.
Ready to dive in? Start by going from top-down through each department to elicit requirements and isolate the pain points and health factors for every department. Here are some examples of KPI metrics you may want to look at:
Once you choose a few important KPIs, then try to break it down even further. Remember, while thereâs no magic number, less is almost always more when it comes to KPIs. Thatâs because if you track too many KPIs, as a data analyst you may start to lose your audience and the focus of the common business user. Choosing the top 7-10 KPIs is a great number to aim for and you can do that by breaking down your core business goals into a much more specific metric.
Remember, the point of a KPI is to gain focus and align goals for measurable improvement. Spend more time choosing the KPIs than simply throwing too many into the mix, which will just push the question of focus further down the road (and require more work!).
Step 3: Carefully Assess Your Data
After you have your main 7-10 elements â you can start digging into the data and start some data modeling. A good question to ask at this point is: How does the business currently make decisions? Counterintuitively, in order to answer that question you may want to look at where the company is currently not making its decisions based on data, or not collecting the right data.
This is where you get to flex your muscles as a âdata heroâ or a good analyst! Take every KPI and present it as a business question. Then break the business questions into facts, dimensions, filters, and order (example).
Not every business questions contain all of these elements â but there will always be a fact because you have to measure something. Youâll need to answer the following before moving on:
What are the data sources
Predict the complexity of your data model
Tools to prepare, manage and analyze data (BI solution)
Do this by breaking each KPI into its data components, asking questions like: what do I need to count, what do I need to aggregate, which filters need to apply? For each of these questions, you have to know which the data sources are being used and where the tables coming from.
Consider that data will often come from multiple, disparate data sources. For example, for information on a marketing or sales pipeline, youâll probably need Google Analytics/Adwords data combined your CRM data. As a data analyst, itâs important to recognize that the most powerful KPIs often comes from a combination of multiple data sources. Make sure you are using the right tools, such as a BI tool that has built-in data connectors, to prepare and join data accurately easily.
Step 4: Represent KPIs in an Accurate and Effective Fashion
Congrats! Youâve connected your KPI data to your business. Now youâll need to find a way to represent the metrics in the most effective way. Check out some of these different BI dashboard examples for some inspiration.
One tip to keep mind is that the goal of your dashboard is to put everyone on the same page. Still, users will each have their own questions and areas where they want to explore, which is why building an interactive, highly visual BI dashboards are important. Your BI solution should offer interactive dashboards that allow its users to perform basic analytical tasks, such as filtering the views, drilling down, and examining underlying data â all with little training.
See an example:
As a data analyst you should always look for what other insights you can achieve with the data that the business never thought of asking. People are often entrenched in their own processes and as an analyst, you offer an âoutsiderâs perspectiveâ of sorts, since you only see the data, while others are clouded by their day-to-day business tasks. Donât be afraid to ask the hard questions. Start with the most basic and youâll be surprised how big companies donât know the answersâand youâll be a data hero just for asking.
According to CB Insights, 100 of the most promising private startups focused on Artificial Intelligence raised $11.7 billion in equity funding in 367 deals during 2017. Several of those companies focus on deep learning technologies, including the most well-funded, ByteDance, which accounts for over a fourth of 2017âs private startup funding with 3.1 billion dollars raised.
In the first half of last year alone, corporate venture capitalists contributed nearly 2 billion dollarsof disclosed equity funding in 88 deals to AI startups, which surpassed the total financing for AI startups for all of 2016. The single largest corporate venture capitalist deal in the early part of 2017 was the $600 million Series D funding awarded to NIO, an organization based in China that specializes in autonomous vehicles (among other types of craft), which relies on aspects of deep learning.
According to Forrester, venture capital funding activity in computer vision increased at a CAGR of 137% from 2015 to 2017. Most aspects of advanced pattern recognition, including speech, image, facial recognition and others, hinge on deep learning. A Forbes post noted, âGoogle, Baidu, Microsoft, Facebook, Salesforce, Amazon, and all other major players are talking about â and investing heavily in – what has become known as âdeep learningâ. Indeed, both Microsoft and Google have created specific entities to fund companies specializing in AI.
According to Razorthink CEO Gary Oliver, these developments are indicative of a larger trend in which, âIf you look at where the investments are going from the venture community, if you look at some of the recent reports that have come out, the vast majority are focused on companies that are doing deep learning.â
Deep learning is directly responsible for many of the valuable insights organizations can access via AI, since it can rapidly parse through data at scale to discern patterns that are otherwise too difficult to see or take too long to notice. In particular, deep learning actuates the unsupervised prowess of machine learning by detecting data-driven correlations to business objectives for variables on which it wasnât specifically trained. âThatâs whatâs kind of remarkable about deep learning,â maintained Tom Wilde, CEO of indico, which recently announced $4 million in new equity seed funding. âThatâs why when we see it in action weâre always like whoa, thatâs pretty cool that the math can decipher that.â Deep learningâs capacity for unsupervised learning makes is extremely suitable for analyzing semi-structured and unstructured data. Moreover, when itâs leveraged on the enormous datasets required for speech, image, or even video analysis, it provides these benefits at scale at speeds equal to modern business timeframes.
Although this unsupervised aspect of deep learning is one of its more renowned, itâs important to realize that deep learning is actually an advanced form of classic machine learning. As such, it was spawned from the latter despite the fact that its learning capabilities vastly exceed those of traditional machine learning. Nonetheless, there are still enterprise tasks which are suitable for traditional machine learning, and others which require deep learning. âPeople are aware now that thereâs a difference between machine learning and deep learning, and theyâre excited about the use cases deep learning can help,â Razorthink VP of Marketing Barbara Reichert posited. âWe understand the value of hybrid models and how to apply both deep learning and machine learning so you get the right model for whatever problem youâre trying to solve.â
Whereas deep learning is ideal for analyzing big data sets with vast amounts of variables, classic machine learning persists in simpler tasks. A good example of this fact is its utility in data management staples such as data discovery, in which it can determine relationships between data and use cases. âOnce the data is sent through those [machine learning algorithms] the relationships are predicted,â commented Io-Tahoe Chief Technology Officer Rohit Mahajan. âThatâs where we have to fine-tune a patented data set that will actually predict the right relationships with the right confidence.â
An examination of the spending on AI companies and their technologies certainly illustrates a prioritization of deep learningâs worth to contemporary organizations. It directly impacts some of the more sophisticated elements of AI including robotics, computer vision, and user interfaces based on natural language and speech. However, it also provides unequivocally tangible business value in its analysis of unstructured data, sizable data sets, and the conflation of the two. Additionally, by applying these assets of deep learning to common data modeling needs, it can automate and accelerate certain facets of data science that had previously proved perplexing to organizations.
âApplications in the AI space are making it such that you donât need to be a data science expert,â Wilde said. âItâs helpful if you kind of understand it at a high level, and thatâs actually improved a lot. But today, you donât need to be a data scientist to use these technologies.â
Fix the Culture, spread awareness to get awareness
Adoption of analytics tools and capabilities has not yet caught up to industry standards. Talent has always been the bottleneck towards achieving the comparative enterprise adoption. One of the primal reason is lack of understanding and knowledge within the stakeholders. To facilitate wider adoption, data analytics leaders, users, and community members needs to step up to create awareness within the organization. An aware organization goes a long way in helping get quick buy-ins and better funding which ultimately leads to faster adoption. So be the voice that you want to hear from leadership.
[ DATA SCIENCE Q&A]
Q:Which kernels do you know? How to choose a kernel?
A: * Gaussian kernel
* Linear kernel
* Polynomial kernel
* Laplace kernel
* Esoteric kernels: string kernels, chi-square kernels
* If number of features is large (relative to number of observations): SVM with linear kernel ; e.g. text classification with lots of words, small training example
* If number of features is small, number of observations is intermediate: Gaussian kernel
* If number of features is small, number of observations is small: linear kernel
Predictive analyticsÂ works by learning the patterns that exist in your historical data, then using those patterns to predict future outcomes. For example, if you need to predict if a customer will pay late, youâll feed data samples from customers who paid on time and data from those who have paid late into your predictive analytics algorithm.
The process of feeding in historical data for different outcomes and enabling the algorithm to learn how to predict is called the training process. Once your algorithm determines a pattern, you pass on information about a new customer and it will make a prediction.Â But the first step is deciding what predictive questions you want to answer.
How do you know which predictive questions to ask?
When determining a predictive question, the rule of thumb is to base it on what you want to do with the answer.Following that logic, if we want to predict the number of late payments in a certain time frameâinstead of if a particular person will pay late (as in the above example)âour predictive question should be: âHow many customers will make late payments next month?â
Letâs look at a slightly more complex example. If weâre forecasting volume for a call center, our predictive question might be: âHow many calls will I get tomorrow?â That is a forecasting/regression questionÂ (like the one in the example above). However, we could also ask a binary question such as: âWill I get more than 200 calls tomorrow?â That is a classification questionÂ because the answer will either be yes or no.
The predictive question you should ask will depend on what you are going to do with the information. If you have the staff to handle 200 calls, then you will likely want to know if youâll get 200 calls or not (so youâd ask the classification question). But if your goal is to identify how many calls you are going to get tomorrow so that you can staff accordingly, you would ask the forecasting question.
Letâs apply this rule to a different industry. If youâre in sales and your monthly goal is 250 sales referrals, you would ask a classification question such as: âWill I get 250 referrals or more next month?â But if you simply want to know your expected referral volume, without taking into consideration any monthly goals, then youâd ask the forecasting/regression question: âHow many sales referrals will I get in the next month?â
Over time, youâll be able to run multiple algorithms to pick the one that works best with your data, or even use an ensemble of algorithms. Youâll also want to regularly retrain your learning model to keep up with fluctuations in your data based on based on the time of year, what activities your business has underway, and other factors. Set a timelineâmaybe once a month or once a quarterâto regularly retrain your predictive analytics learning module to update the information.
Analytics Strategy that is Startup Compliant
With right tools, capturing data is easy but not being able to handle data could lead to chaos. One of the most reliable startup strategy for adopting data analytics is TUM or The Ultimate Metric. This is the metric that matters the most to your startup. Some advantages of TUM: It answers the most important business question, it cleans up your goals, it inspires innovation and helps you understand the entire quantified business.
[ DATA SCIENCE Q&A]
Q:What does NLP stand for?
A: * Interaction with human (natural) and computers languages
* Involves natural language understanding
– Machine translation
– Question answering: whats the capital of Canada?
– Sentiment analysis: extract subjective information from a set of documents, identify trends or public opinions in the social media
In this podcast Dickson Tang shared his perspective on building a future and open mindset organization by working on it’s 3 Is: Individual, Infrastructure and Ideas. He shared his perspective on various organization types and individuals who could benefit from this 3iFramework, elaborated in details in his book: “Leadership for future of work: ways to build career edge over robots with human creativity book”. This podcast is great for anyone seeking to learn about ways to be open, innovative and change agent within an organization.
Leadership for future of work: 9 ways to build career edge over robots with human creativity by Dickson TangÂ amzn.to/2McxeIS
Dickson’s Recommended Read:
The Creative Economy: How People Make Money From Ideas by John HowkinsÂ amzn.to/2MdLotA
Dickson Tang is the author of Leadership for future of work: ways to build career edge over robots with human creativity book. He helps senior leaders (CEO, MD and HR) build creative and effective teams in preparation for the future / robot economy. Dickson is a leadership ideas expert, focusing on how leadership will evolve in the future of work. 15+ years of experience in management, business consulting, marketing, organizational strategies and training & development. Corporate experience with several leading companies such as KPMG Advisory, Gartner and Netscape Inc.
Dicksonâs expertise on leadership, creativity and future of work have earned him invitations and opportunities to work with leaders and professionals from various organizations such as Cartier, CITIC Telecom, DHL, Exterran, Hypertherm, JVC Kenwood, Mannheim Business School, Montblanc and others.
AboutÂ #Podcast: #JobsOfFutureÂ is created to spark the conversation around the future of work, worker and workplace. This podcast invite movers and shakers in the industry who are shaping or helping us understand the transformation in work.
If you or any you know wants to join in,
Register your interest @Â analyticsweek.com/