Sep 19, 19: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

[  COVER OF THE WEEK ]

image
Insights  Source

[ AnalyticsWeek BYTES]

>> Future of Public Sector and Jobs in #BigData World #FutureOfData #Podcast by v1shal

>> Black Friday is Becoming Irrelevant, How Retailers Should Survive by v1shal

>> Pascal Marmier (@pmarmier) @SwissRe discuss running data driven innovation catalyst by v1shal

Wanna write? Click Here

[ FEATURED COURSE]

Statistical Thinking and Data Analysis

image

This course is an introduction to statistical data analysis. Topics are chosen from applied probability, sampling, estimation, hypothesis testing, linear regression, analysis of variance, categorical data analysis, and n… more

[ FEATURED READ]

Rise of the Robots: Technology and the Threat of a Jobless Future

image

What are the jobs of the future? How many will there be? And who will have them? As technology continues to accelerate and machines begin taking care of themselves, fewer people will be necessary. Artificial intelligence… more

[ TIPS & TRICKS OF THE WEEK]

Analytics Strategy that is Startup Compliant
With right tools, capturing data is easy but not being able to handle data could lead to chaos. One of the most reliable startup strategy for adopting data analytics is TUM or The Ultimate Metric. This is the metric that matters the most to your startup. Some advantages of TUM: It answers the most important business question, it cleans up your goals, it inspires innovation and helps you understand the entire quantified business.

[ DATA SCIENCE Q&A]

Q:Provide examples of machine-to-machine communications?
A: Telemedicine
– Heart patients wear specialized monitor which gather information regarding heart state
– The collected data is sent to an electronic implanted device which sends back electric shocks to the patient for correcting incorrect rhythms

Product restocking
– Vending machines are capable of messaging the distributor whenever an item is running out of stock

Source

[ VIDEO OF THE WEEK]

@AnalyticsWeek Keynote: The CMO isn't satisfied: Judah Phillips

 @AnalyticsWeek Keynote: The CMO isn’t satisfied: Judah Phillips

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

The goal is to turn data into information, and information into insight. – Carly Fiorina

[ PODCAST OF THE WEEK]

Solving #FutureOfOrgs with #Detonate mindset (by @steven_goldbach & @geofftuff) #FutureOfData #Podcast

 Solving #FutureOfOrgs with #Detonate mindset (by @steven_goldbach & @geofftuff) #FutureOfData #Podcast

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

More than 5 billion people are calling, texting, tweeting and browsing on mobile phones worldwide.

Sourced from: Analytics.CLUB #WEB Newsletter

Mastering Deep Learning with Self-Service Data Science for Business Users

The deployment of deep learning is frequently accompanied by a singular paradox which has traditionally proved difficult to redress. Its evolving algorithms are intelligent enough to solve business problems, but utilizing those algorithms is based on data science particularities business users don’t necessarily understand.

The paucity of data scientists exacerbates this situation, which traditionally results in one of two outcomes. Either deep learning is limited in the amount of use cases for which it’s deployed throughout the enterprise, or the quality of its effectiveness is compromised. Both of these situations fail to actualize the full potential of deep learning or data science.

According to Mitesh Shah, MapR Senior Technologist, Industry Solutions: “The promise of AI is about injecting intelligence into operations so you are actively making customer engagement more intelligent.” Doing so productively implicitly necessitates business user involvement with these technologies.

In response to this realization, a number of different solutions have arisen to provision self-service data science so laymen business users understand how to create deep learning models, monitor and adjust them accordingly, and even explain their results while solving some of their more intractable domain problems.

Most convincingly, there are a plethora of use cases in which deep learning facilitates these boons for “folks who are not data scientists by education or training, but work with data throughout their day and want to extract more value from data,” noted indico CEO Tom Wilde.

Labeled Training Data
The training data required for building deep learning’s predictive models pose two major difficulties for data science. They require labeled output data and massive data quantities to suitably train models for useful levels of accuracy. Typically, the first of these issues was addressed when “the data scientists would say to the subject matter experts or the business line, give us example data labeled in a way you hope the outcome will be predicted,” Wilde maintained. “And the SME [would] say I don’t know what you mean; what are you even asking for?” Labeled output data is necessary for models to use as targets or goals for their predictions. Today, self-service platforms for AI make this data science requisite easy by enabling users to leverage intuitive means of labeling training data for this very purpose. With simple browser-based interfaces “you can use something you’re familiar with, like Microsoft Word or Google Docs,” Wilde said. “The training example pops up in your screen, you underline a few sentences, and you click on a tag that represents the classification you’re trying to do with that clause.”

For instance, when ensuring contracts are compliant with the General Data Protection Regulation, users can highlight clauses for personally identifiable data with examples that both adhere to, and fail to adhere to, this regulation. “You do about a few dozen of each of those, and once you’ve done it you’ve built your model,” Wilde mentioned. The efficiency of this process is indicative of the effect of directly involving business users with AI. According to Shah, such involvement makes “production more efficient to reduce costs. This requires not only AI but the surrounding data logistics and availability to enable this…in a time-frame that enables the business impact.”

Feature Engineering and Transfer Learning
In the foregoing GDPR example, users labeled output training data to build what Wilde referred to as a “customized model” for their particular use case. They are only able to do so this quickly, however, by leveraging a general model and the power of transfer learning to focus the former’s relevant attributes for the business user’s task—which ultimately affects the model’s feature detection and accuracy. As previously indicated, a common data science problem for advanced machine learning is the inordinate amounts of training data required. Wilde commented that a large part of this data is required for “featurization: that’s generally why with deep learning you need so much training data, because until you get to this critical mass of featurization, it doesn’t perform very robustly.” However, users can build accurate custom models with only negligible amounts of training data because of transfer learning. Certain solutions facilitate this process with “a massive generalized model with half a billion labeled records in it, which in turn created hundreds and hundreds of millions of features and vectors that basically creates a vectorization of language,” Wilde remarked. Even better, such generalized models are constructed “across hundreds of domains, hundreds of verticals, and hundreds of use cases” Wilde said, which is why they are readily applicable to the custom models of self-service business needs via transfer learning. This approach allows the business to quickly implement process automation for use cases with unstructured data such as reviewing contracts, dealing with customer support tickets, or evaluating resumes.

Explainability
Another common data science issue circumscribing deep learning deployments is the notion of explainability, which can even hinder the aforementioned process automation use cases. As Shah observed, “AI automates tasks that normally require human intelligence, but does not remove the need for humans entirely. Business users in particular are still an integral part of the AI revolution.” This statement applies to explainability in particular, since it’s critical for people to understand and explain the results of deep learning models in order to gauge their effectiveness. The concept of explainability alludes to the fact that most machine learning models simply generate a numerical output—usually a score—indicative of how likely specific input data will achieve the model’s desired output. With deep learning models in particular, those scores can be confounding because deep learning often does its own feature detection. Thus, it’s exacting for users to understand how models create their particular scores for specific data.

Self-service AI options, however, address this dilemma in two ways. Firstly, they incorporate interactive dashboards so users can monitor the performance of their models with numerical data. Additionally, by clicking on various metrics reflected on the dashboard “it opens up the examples used to make that prediction,” Wilde explained. “So, you actually can track back and see what precisely was used as the training data for that particular prediction. So now you’ve opened up the black box and get to see what’s inside the black box [and] what it’s relying on to make your prediction, not just the number.”

Business Accessible Data Science
Explainability, feature engineering, transfer learning, and labeled output data are crucial data science prerequisites for deploying deep learning. The fact that there are contemporary options for business users to facilitate all of these intricacies suggests how essential the acceptance, and possibly even mastery, of this technology is for the enterprise today. It’s no longer sufficient for a few scarce data scientists to leverage deep learning; its greater virtue is in its democratization for all users, both technical and business ones. This trend is reinforced by training designed to educate users—business and otherwise—about fundamental aspects of analytics. “The MapR Academy on-demand Essentials category offers use case-driven, short, non-lab courses that provide technical topic introductions as well as business context,” Shah added. “These courses are intended to provide insight for a wide variety of learners, and to function as stepping off points to further reading and exploration.”

Ideally, options for self-service data science targeting business users could actually bridge the divide between the technically proficient and those who are less so. “There are two types of people in the market right now,” Wilde said. “You have one persona that is very familiar with AI, deep learning and machine learning, and has a very technical understanding of how do we attack this problem. But then there’s another set of folks for whom their first thought is not how does AI work; their first thought is I have a business problem, how can I solve it?”

Increasingly, the answers to those inquires will involve self-service data science.

Source by jelaniharper

Sep 12, 19: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

[  COVER OF THE WEEK ]

image
Productivity  Source

[ AnalyticsWeek BYTES]

>> Validating a Lostness Measure by analyticsweek

>> 6 Factors to Consider Before Building a Predictive Model for Life Insurance by analyticsweek

>> 7 Deadly Sins of Total Customer Experience  by v1shal

Wanna write? Click Here

[ FEATURED COURSE]

Intro to Machine Learning

image

Machine Learning is a first-class ticket to the most exciting careers in data analysis today. As data sources proliferate along with the computing power to process them, going straight to the data is one of the most stra… more

[ FEATURED READ]

Data Science from Scratch: First Principles with Python

image

Data science libraries, frameworks, modules, and toolkits are great for doing data science, but they’re also a good way to dive into the discipline without actually understanding data science. In this book, you’ll learn … more

[ TIPS & TRICKS OF THE WEEK]

Fix the Culture, spread awareness to get awareness
Adoption of analytics tools and capabilities has not yet caught up to industry standards. Talent has always been the bottleneck towards achieving the comparative enterprise adoption. One of the primal reason is lack of understanding and knowledge within the stakeholders. To facilitate wider adoption, data analytics leaders, users, and community members needs to step up to create awareness within the organization. An aware organization goes a long way in helping get quick buy-ins and better funding which ultimately leads to faster adoption. So be the voice that you want to hear from leadership.

[ DATA SCIENCE Q&A]

Q:Provide examples of machine-to-machine communications?
A: Telemedicine
– Heart patients wear specialized monitor which gather information regarding heart state
– The collected data is sent to an electronic implanted device which sends back electric shocks to the patient for correcting incorrect rhythms

Product restocking
– Vending machines are capable of messaging the distributor whenever an item is running out of stock

Source

[ VIDEO OF THE WEEK]

@Schmarzo @DellEMC on Ingredients of healthy #DataScience practice #FutureOfData #Podcast

 @Schmarzo @DellEMC on Ingredients of healthy #DataScience practice #FutureOfData #Podcast

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

It’s easy to lie with statistics. It’s hard to tell the truth without statistics. – Andrejs Dunkels

[ PODCAST OF THE WEEK]

Understanding Data Analytics in Information Security with @JayJarome, @BitSight

 Understanding Data Analytics in Information Security with @JayJarome, @BitSight

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

According to estimates, the volume of business data worldwide, across all companies, doubles every 1.2 years.

Sourced from: Analytics.CLUB #WEB Newsletter

How big data is driving smarter cyber security tools

cyber-security3

As big data changes and develops, it’s being used to create better, smarter cyber security tools. There is real value to using the big data approach to cyber security – especially when it can be used to identify dangerous malware and more persistent threats to the IT security of big companies that handle a lot of data. The number of data breaches in the news seems to grow all the time, and big data may play a big role in preventing much of that.

Data Storage

One of the ways in which big data can help with cyber security is through the storage of data. Because so much data is collected and stored easily, analytic techniques can be used to find and destroy malware. Smaller segments of data can be analyzed, of course, and were analyzed before big data got started in the cyber security area, but the more data that can be looked at all together, the easier it is to ensure that appropriate steps are taken to neutralize any threats. More data gets screened, and it gets analyzed faster, making big data a surprisingly good choice in the cyber security arena.

Malware Behaviors

In the past, malware was usually identified with signatures. Now that big data is involved, that’s not realistic. The signature identification concept isn’t realistic on a larger scale, so new ways of handling cyber security were needed as soon as big data appeared on the scene. Instead of signature, big data looks at behaviors. How malware or any other type of virus behaves is a very important consideration, and something to focus on when it comes to what can be done to ensure that data is safe.

When something is flagged as having a unique or different behavior, it’s possible to isolate the data that has that with it, so it can be determined if the data is safe. Piggybacking malware onto programs and data that are seemingly innocuous is common, because it lets people pass things through before the problem is realized. When behavior is properly tracked, though, the level at which these viruses are allowed to get through is greatly reduced. There are no guarantees, because malware is always changing and new ones are being developed, but the protection offered by big data is large and significant.

Computing Power

The computer power offered by big data is possibly the most significant reason it is so valuable when it comes to detecting and stopping malware. Fast, powerful computers can process data and information so much faster than slower ones that are not able to harness a high level of power. Because of that, there exists the opportunity for more sophisticated techniques for detecting malware when big data is used. The models that can be built for the identification of malware are significant, and big data is the place to build them.

With the power available, it is becoming easier than ever before to find problems before they get started, so malware can be stopped before it advances through a computer system or set of data. That protects the information contained there, and also the system itself from attack and infection. Those who produce malware continually try to change the game so they won’t be detected, but as computer power advances the chances of malware avoiding detection continue to shrink.

To read the original article on IT Learning Center, click here.

Source: How big data is driving smarter cyber security tools

Sep 05, 19: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

[  COVER OF THE WEEK ]

image
Fake data  Source

[ AnalyticsWeek BYTES]

>> Big Data Insights in Healthcare, Part I. Great Ideas Transcend Time by froliol

>> Top 10 ways in which Google Analytics help online businesses by thomassujain

>> Office Depot Stitches Together the Customer Journey Across Multiple Touchpoints by analyticsweek

Wanna write? Click Here

[ FEATURED COURSE]

Tackle Real Data Challenges

image

Learn scalable data management, evaluate big data technologies, and design effective visualizations…. more

[ FEATURED READ]

The Future of the Professions: How Technology Will Transform the Work of Human Experts

image

This book predicts the decline of today’s professions and describes the people and systems that will replace them. In an Internet society, according to Richard Susskind and Daniel Susskind, we will neither need nor want … more

[ TIPS & TRICKS OF THE WEEK]

Data Analytics Success Starts with Empowerment
Being Data Driven is not as much of a tech challenge as it is an adoption challenge. Adoption has it’s root in cultural DNA of any organization. Great data driven organizations rungs the data driven culture into the corporate DNA. A culture of connection, interactions, sharing and collaboration is what it takes to be data driven. Its about being empowered more than its about being educated.

[ DATA SCIENCE Q&A]

Q:Explain what a local optimum is and why it is important in a specific context,
such as K-means clustering. What are specific ways of determining if you have a local optimum problem? What can be done to avoid local optima?

A: * A solution that is optimal in within a neighboring set of candidate solutions
* In contrast with global optimum: the optimal solution among all others

* K-means clustering context:
It’s proven that the objective cost function will always decrease until a local optimum is reached.
Results will depend on the initial random cluster assignment

* Determining if you have a local optimum problem:
Tendency of premature convergence
Different initialization induces different optima

* Avoid local optima in a K-means context: repeat K-means and take the solution that has the lowest cost

Source

[ VIDEO OF THE WEEK]

Understanding How Fitness Tracker Works via @SciThinkers #STEM #STEAM

 Understanding How Fitness Tracker Works via @SciThinkers #STEM #STEAM

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

Numbers have an important story to tell. They rely on you to give them a voice. – Stephen Few

[ PODCAST OF THE WEEK]

@JustinBorgman on Running a data science startup, one decision at a time #Futureofdata #Podcast

 @JustinBorgman on Running a data science startup, one decision at a time #Futureofdata #Podcast

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

Market research firm IDC has released a new forecast that shows the big data market is expected to grow from $3.2 billion in 2010 to $16.9 billion in 2015.

Sourced from: Analytics.CLUB #WEB Newsletter

20 Best Practices for Customer Feedback Programs: Strategy and Governance

Below is the next installment of the 20 Best Practices for Customer Feedback Programs. Today’s post covers best practices in Strategy and Governance.

Strategy/Governance Best Practices

Strategy
Strategy reflects the overarching, long-term plan of a company that is designed to help the company attain a specific goal. For customer-centric companies, the strategy is directed at improving the customer experience.

A successful customer feedback program is dependent on the support of top management. Any company initiative (including a customer-centric initiative) without the full support of senior executives will likely fail.

The company culture is directly impacted by senior executives. Because loyalty leaders understand that the formal company strategy and accompanying mission statement set the general culture of the company, they embed the importance of the customer into their mission statements. These customer-centric mission statements instill a set of company values and implicit performance standards about addressing customers’ needs. The customer-centric standards shared among the employees act as guidelines with respect to the behaviors that are expected of the employees.

Governance

Figure 2. Customer Feedback Program Governance Components
Figure 2. Customer Feedback Program Governance Components

While strategy is necessary to build a customer-centric culture, companies need to create formal policy around the customer feedback program that supports the strategy. The governance surrounding the customer feedback program helps foster and maintain a customer-centric culture by operationalizing the strategy (See Figure 2).

Three important areas of governance are:

  1. Guidelines and Rules. These guidelines and rules reflect the set of processes, customs and policies affecting the way the program is directed, administered or controlled. These policies formalize processes around the customer feedback program and need to be directed at all company’s constituents, including board members, senior executives, middle managers, and front-line employees. In a customer-centric company, the work-related behaviors of each of the constituencies are aimed at satisfying customers’ needs. As such, customer-centric metrics are used to set and monitor company goals, manage employee behavior and incentivize employees.
  2. Roles and Responsibilities. Need to define and clearly communicate roles/responsibilities across diverse constituency (e.g., board, executives, managers, individual contributor). The definition of the roles and responsibilities need to include how data are used and by whom. Specifically, program guidelines include the way the feedback data from the program are used in different business decision-making processes (resource allocation, employee incentive compensations, account management), each requiring specific employee groups to have access to different types of analytic reports of the customer feedback data.
  3. Change Request. Need to define how changes to the customer feedback program will occur.

The quality of the policies around the use of the customer feedback data will have an impact on the success of the program. Vague policies regarding how the customer feedback program is executed, including analytical methods and goals, dissemination of results, and data usage of the customer feedback data, will ultimately lead to less than optimal effectiveness of the program.

Corporate strategy and governance of the customer feedback program are exhibited in a variety ways by loyalty leaders, from resource allocation in supporting customer initiatives to the using public forums to communicate the company’s vision and mission to its constituents. Executive support and use of customer feedback data as well as company-wide communication of the customer feedback program goals and results helps embed the customer-centric culture into the company milieu. Loyalty leading companies’ use of customer feedback in setting strategic goals helps keep the company customer-focused from the top. Additionally, their use of customer feedback in executive dashboards and for executive compensation ensures the executive team’s decisions will be guided by customer-centric issues. A list of best practices in Strategy and Governance is located in Table 2.

Table 2. Best Practices in Strategy/Governance
Best Practices The specifics…
1. Incorporate a customer-focus in the vision/mission statement Support the company mission by presenting customer-related information (e.g., customer satisfaction/loyalty goals) in the employee handbook. Use customer feedback metrics to set and monitor company goals.
2. Identify an executive as the champion of the customer feedback program A senior level executive “owns” the customer feedback program and reports customer feedback results at executive meetings. Senior executives evangelize the customer feedback program in their communication with employees and customers. Senior executives receive training on the customer feedback program.
3. Incorporate customer feedback as part of the decision-making process Include customer metrics in company’s balanced scorecard along with other, traditional scorecard metrics. This practice will ensure executives and employees understand the importance of these metrics and are aware of current levels of customer satisfaction/loyalty. Present customer feedback results in company meetings and official documents.
4. Use customer feedback metrics in incentive compensation for executives and front-line employees Use key performance indicators and customer loyalty metrics to measure progress and set performance goals. Ensure these measures can be impacted by employee behavior. Where possible, use objective business metrics that are linked to customer satisfaction as key performance indicators on which to build employee incentive programs (see Applied Research).
5. Build accountability for customer satisfaction/ loyalty goals into the company Incorporate customer feedback metrics into key performance measures for all employees. Include customer-centric goals in the company’s performance management system/processes. Employees set customer satisfaction goals as part of their performance objectives.
Copyright © 2011 Business Over Broadway

Take the Customer Feedback Programs Best Practices Survey

You can take the best practices survey to receive free feedback on your company’s customer feedback program. This self-assessment survey assesses the extent to which your company adopts best practices throughout their program. Go here to take the free survey: http://businessoverbroadway.com/resources/self-assessment-survey.

 

Source: 20 Best Practices for Customer Feedback Programs: Strategy and Governance

5 Questions to Ask When Building a Cloud Data Lake Strategy

In my last blog post, I shared some thoughts on the common pitfalls when building a data lake. As the movement to the cloud gets more and more common, I’d like to further discuss some of the best practices when building a cloud data lake strategy. When going beyond the scope of integration tools or platforms for your cloud data lake, here are 5 questions to ask, that can be used as a checklist:

1. Does your Cloud Data Lake strategy include a Cloud Data Warehouse?

As many differences as there are between the two, people often times compare the two types of technology approaches. Data warehouses being the centralization of structured data, and Data Lakes often times being the holy grail of all types of data. (You can read more about the two approaches here.)

Not to confuse the two, as these technology approaches should actually be brought together. You will need a data lake to accommodate all types of data that your business deal with today, make it structured, semi-structured or unstructured, on-premise or in the cloud, or those newer types of data such as IoT data. The data lake often time has a landing zone and staging zone for raw data – data at this stage are not yet consumable, but you may want to keep them for future discovery or data science projects. On the other hand, a cloud data warehouse will be in the picture after data is cleansed, mapped and transformed, so that it is more consumable for business analysts to access and make the use of data for reporting or other analytical use. Data at this stage is often time highly processed to adjust to the data warehouse.

If your approach currently only works with a cloud data warehouse, then often time you are losing raw and some formats of data already, it is not so helpful for any prescriptive or advanced analytics projects, or machine learning and AI initiatives as some meanings within the data is already lost. Vice versa, if you don’t have a data warehouse alongside with your data lake strategy, you will end up with a data swamp where all data is kept with no structure, and not consumable by analysts.

From the integration perspective, make sure your integration tool work with both data lake and data warehouse technologies, which will lead us to the next question. 

2. Does your integration tool have ETL & ELT?

As much as you may know about ETL in your current on-premises data warehouse, moving it to the cloud is a different story, not to mention in a cloud data lake context. Where and how data is processed really depends on what you need for your business.

Similar to what we described in the first question, sometimes you need to keep more of the raw nature of the data, and other times you need more processing. This would require your integration tool to cope with both ETL and ELT capabilities, where the data transformation can be handled either before the data is loaded to your final target, e.g. a cloud data warehouse, or after data is landed there. ELT is more often leveraged when the speed of data ingestion is key to your project, or when you want to keep more intel about your data. Typically, cloud data lakes have a raw data store, then a refined (or transformed) data store. Data scientists, for example, prefer to access the raw data, whereas business users would like the normalized data for business intelligence.

Another use of ELT refers to the massive parallel processing capabilities coming with big data technologies such as Spark and Flink. If your use case requires such a strong processing power, then ELT is a better choice where the processing has more scalability.

3. Can your cloud data lake handle both simple ETL tasks and complex big data ones?

This may look like an obvious question but when you ask about this question, put yourself in the users’ shoes and really think through if your choice of tool can meet both requirements.

Not all of your data lake usage will be complex ones that require advanced processing and transformation, many of them can be simple activities such as ingesting new data into the data lake. Often times, the tasks go beyond the data engineering or IT team as well. So ideally the tool of your choice should be able to handle simple tasks fast and easy, but also can scale to the complexity to meet the requirements of advanced use cases.  Building a data lake strategy that can cope with both can help you make your data lake more consumable and practical for various types of users for different purposes.

4. How about batch and streaming needs?

You may think your current architecture and technology stack is good enough, and your business is not really in the Netflix business where streaming is a necessity. Get it? Well think again.

Streaming data has become a part of our everyday lives whether you realize it or not. The “Me” culture has put everything at the moment of now. If your business is on social media, you are in streaming. If IoT and sensor is the next growth market for your business, you are in streaming. If you have a website for customer interaction, you are in streaming. In IDC’s 2018 Data Integration and Integrity End User Survey, 93% of the respondents indicate the plan to use streaming technology by 2020. Real-time and streaming analytics have become a must for modern businesses today to create that competitive edge. So, this naturally raises the questions: can your data lake handle both your batch and streaming needs? Do you have the technology and people to work with streaming, which is fundamentally different from typical batch needs?

Streaming data is particularly challenging to handle because it is continuously generated by an array of sources and devices as well as being delivered in a wide variety of formats.

One prime example of just how complicated streaming data can be comes from the Internet of Things (IoT). With IoT devices, the data is always on; there is no start and no stop, it just keeps flowing. A typical batch processing approach doesn’t work with IoT data because of the continuous stream and the variety of data types it encompasses.

So make sure your data lake strategy and data integration layer can be agile enough to work with both use cases.

You can find more tips on streaming data, here.

5. Can your data lake strategy help cultivate a collaborative culture?

Last but not least, collaboration.

It may take one person to implement the technology, but it will take a whole village to implement it successfully. The only way to make sure your data lake is a success is to have people use it, improving the workflow one way or another.

In a smaller scope, the workflow in your data lake should be able to be reused and leveraged among data engineers. Less recreation will be needed, and operationalization can be much faster. In a bigger scope, the data lake approach can help improve the collaboration between IT and business teams. For example, your business teams are the experts of their data and they know the meaning and the context of data better than anyone else. Data quality can be much improved if the business team can work on the data for business rule transformations, while IT still governs that activity. Defining such a line with governance in place is a delicate work and no easy task. But you may think through your data lake approach, whether it’s governed but open at the same time to encourage not only final consume /usage of the data, but the improvement of data quality in the process, and be recycled to be available to a broader organization.

To summarize, there we go the 5 questions I would recommend asking when thinking about building a cloud data lake strategy. By no means are these the only questions you should think, but hopefully it initiates some thinking outside of your typical technical checklist. 

The post 5 Questions to Ask When Building a Cloud Data Lake Strategy appeared first on Talend Real-Time Open Source Data Integration Software.

Source: 5 Questions to Ask When Building a Cloud Data Lake Strategy by analyticsweekpick

Aug 29, 19: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

[  COVER OF THE WEEK ]

image
Data interpretation  Source

[ LOCAL EVENTS & SESSIONS]

More WEB events? Click Here

[ AnalyticsWeek BYTES]

>> Landscape of Big Data by v1shal

>> The Future of Big Data? Three Use Cases of Prescriptive Analytics by analyticsweekpick

>> Data Monetization Workshop 2018: Key Themes & Takeaways by analyticsweek

Wanna write? Click Here

[ FEATURED COURSE]

Artificial Intelligence

image

This course includes interactive demonstrations which are intended to stimulate interest and to help students gain intuition about how artificial intelligence methods work under a variety of circumstances…. more

[ FEATURED READ]

Data Science for Business: What You Need to Know about Data Mining and Data-Analytic Thinking

image

Written by renowned data science experts Foster Provost and Tom Fawcett, Data Science for Business introduces the fundamental principles of data science, and walks you through the “data-analytic thinking” necessary for e… more

[ TIPS & TRICKS OF THE WEEK]

Finding a success in your data science ? Find a mentor
Yes, most of us dont feel a need but most of us really could use one. As most of data science professionals work in their own isolations, getting an unbiased perspective is not easy. Many times, it is also not easy to understand how the data science progression is going to be. Getting a network of mentors address these issues easily, it gives data professionals an outside perspective and unbiased ally. It’s extremely important for successful data science professionals to build a mentor network and use it through their success.

[ DATA SCIENCE Q&A]

Q:What is cross-validation? How to do it right?
A: It’s a model validation technique for assessing how the results of a statistical analysis will generalize to an independent data set. Mainly used in settings where the goal is prediction and one wants to estimate how accurately a model will perform in practice. The goal of cross-validation is to define a data set to test the model in the training phase (i.e. validation data set) in order to limit problems like overfitting, and get an insight on how the model will generalize to an independent data set.

Examples: leave-one-out cross validation, K-fold cross validation

How to do it right?

the training and validation data sets have to be drawn from the same population
predicting stock prices: trained for a certain 5-year period, it’s unrealistic to treat the subsequent 5-year a draw from the same population
common mistake: for instance the step of choosing the kernel parameters of a SVM should be cross-validated as well
Bias-variance trade-off for k-fold cross validation:

Leave-one-out cross-validation: gives approximately unbiased estimates of the test error since each training set contains almost the entire data set (n?1n?1 observations).

But: we average the outputs of n fitted models, each of which is trained on an almost identical set of observations hence the outputs are highly correlated. Since the variance of a mean of quantities increases when correlation of these quantities increase, the test error estimate from a LOOCV has higher variance than the one obtained with k-fold cross validation

Typically, we choose k=5 or k=10, as these values have been shown empirically to yield test error estimates that suffer neither from excessively high bias nor high variance.
Source

[ VIDEO OF THE WEEK]

@BrianHaugli @The_Hanover ?on Building a #Leadership #Security #Mindset #FutureOfData #Podcast

 @BrianHaugli @The_Hanover ?on Building a #Leadership #Security #Mindset #FutureOfData #Podcast

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

Data matures like wine, applications like fish. – James Governor

[ PODCAST OF THE WEEK]

#GlobalBusiness at the speed of The #BigAnalytics

 #GlobalBusiness at the speed of The #BigAnalytics

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

140,000 to 190,000. Too few people with deep analytical skills to fill the demand of Big Data jobs in the U.S. by 2018.

Sourced from: Analytics.CLUB #WEB Newsletter

How to pick the right sample for your analysis

Unless we are lucky enough to have access to an entire population and the capacity to analyse all of that data, we have to make do with samples from our population to make statistical inferences. Choosing a sample that is a good representation of your population is the heart of a quality analysis, as all of the fancy statistical tricks in the world can’t make accurate conclusions from bad or biased data. In this blog post, I will discuss some key concepts in selecting a representative sample.

First things first: What is your population of interest?

Obviously it is a bit tricky to get a representative sample if you’re not sure what population you’re trying to represent, so the first step is to carefully consider this question. Imagine a situation where your company wanted you to assess the mean daily number of page views their website receives. Well, what mean daily page views do they want to know about?

Let’s consider some complications to that question. The first is seasonality. Your website might receive more hits at certain times of year. Do we want to include or exclude these periods from our population? Another consideration is the demographic profile of the people visiting your website. Are we interested in all visitors? Or do we want visitors of a certain sex, age group or region? A final consideration is whether there has been some sort of change in condition that may have increased the visitors to the site. For example, was there an advertising campaign launched recently? Has the website added additional languages which mean a broader audience can access it? Does the website sell a product that now ships to additional places?

Let’s imagine our website is a retail platform that sells children’s toys. We see some seasonal spikes in page views every year at Easter, Christmas and two major sale periods every year (Black Friday and post-Christmas). No major advertising campaigns are planned outside these seasonal periods, nor any changes planned to the site. Our company want to know what the “typical” number of mean daily page views is outside these seasonal periods. They don’t care about individual demographic groups of visitors, they just want the visitors as a whole. Therefore, we need to find a sample that reflects this.

Choosing a representative sample

Sample size

Sample size is a key element to representative sampling as it increases your chances of gaining sufficient information about the population, rather than having your statistics influenced by anomalous observations. For example, imagine if by chance we sampled a much higher than average value. Let’s see how this influences a sample of 30 page views compared to a sample of 10. We’ll generate samples in R from a Poisson distribution with a mean of 220 views per day, and add an outlier of 260 views per day to each (by the way, we use a Poisson distribution as it is the most appropriate distribution to model count data like this):


set.seed(567)

# Sample of 30 (29 from the Poisson distribution and an outlier of 260)
sample1 <- c(rpois(29, lambda = 220), 260)

# Sample of 10 (9 from the Poisson distribution and an outlier of 260)
sample2 <- c(rpois(9, lambda = 220), 260)

Compared to the population mean, the mean of the sample of 30 is 221.5 whereas the mean of the sample of 10 is 224.4. As you can see, the smaller sample is far more influenced by the extreme value than the larger one.

A sufficient sample size depends on a lot of things. For example, if the event we were trying to describe was rare (e.g., 1 event per 100 days), a sample of 30 would likely be too small to assess its mean occurrence. When conducting hypothesis testing, the correct sample size is generally calculated using power calculations, something I won’t get into here as it can get veeeeery complicated.

An additional consideration is that overestimating the required sample size can also be undesirable as there may be time, monetary or even ethical reasons to limit the number of observations collected. For example, the company that asked us to assess page views would likely be unhappy if we spent 100 days collecting information on mean daily page views when the same question could be reliably answered from 30 days of data collection.

Representativeness

However, a sufficient sample size won’t be enough if the data are not representative. Representativeness means that the data are sampled from all observations in the population and excludes anything that is outside the population. Representativeness is violated when the sample is biased to a subset of the population or when the sample includes observations from outside the population of interest.

Let’s simulate the number of page views our website received per day in 2014. As you can see in the R code below, I’ve included increased page views for our peak periods of Easter, Black Friday/Christmas, and the post-Christmas sales.


# Simulate some data for 2014, with mean page views of 220 per day.
days <- seq(as.Date("2014/1/1"), as.Date("2014/12/31"), "days")
page.views <- rpois(365, lambda = 220)
views = "2014/01/01" & views$days <= "2014/01/10"] = "2014/04/01" & views$days <= "2014/04/21"] = "2014/11/28" & views$days <= "2014/12/31"] <- rpois(34, lambda = 500)

ggplot2_chunk-1

If you look at the graph above, you can see that for most of the year the page views sit fairly consistently around a mean of 220 (as shown in the dotted black line). If we sampled any time outside of the peak periods, we would be pretty safe in assuming we have a representative sample. However, what if we sampled between March 15 and April 14? We would catch some of the Easter peak period in our sample and our sample would no longer represent typical (non-peak) daily page views – instead, we would overestimate our page views by including observations from the peak period population in our sample.

A final thing to consider: the method of measurement

While not part of representative sampling per se, an extremely important and related concept is how the thing you are measuring relates to your concept of interest. Why does our company want to know how many page views we get? Do they specifically want to know how many visitors they receive a day in order to plan things like server demand? Or do they want to extrapolate from number of visitors to comment on something like the popularity of the page? It is important to consider whether the measurement you take is a good reflection of what you are interested in before you make inferences based on your data. This falls under the branch of statistics known as validity, which is again beyond the scope of this post but an extremely interesting topic.

The take away message

I hope this has been a helpful introduction to picking a good sample, and a reminder that even when you have really big data, you can’t escape basic considerations such as what your population is, and whether the variables you have can really answer your question!

Original post here. The full code used to create the figures in this post is located in this gist on my Github page.

Source: How to pick the right sample for your analysis