Development of the Customer Sentiment Index: Reliability, Validity and Usefulness

This is Part 3 of a series on the Development of the Customer Sentiment Index (see introduction, Part 1 and Part 2). The CSI measures the degree to which customers hold positive/negative attitude about your company/brand. The CSI is based on a single survey question that asks customers to use the best word to describe the company/brand. This post explores what the CSI is measuring and will begin to identify how the CSI can be used to gain insight to improve the customer experience.

In my previous posts about the Customer Sentiment Index (see Part 1 and Part 2), I created five sentiment lexicons; I found that there was generally moderate to high agreement among sentiment lexicons. While each lexicon generates different sentiment values for specific words, each tells you pretty much the same thing. That is, even though there is generally high agreement across different sentiment lexicons, there are mean differences in sentiment values across different lexicons. The bottom line is that sentiment extraction from words results in systematic and reliable differences of words along a sentiment continuum.

In this post, I will try to understand and clarify what a single word can tell you about the attitude of your customers, essentially telling us what the Customer Sentiment Index (CSI) is measuring. The study samples (i.e., B2B and B2C) allow me to take a look at the usefulness of the metric in a business setting. Toward this end, I will use the CSI in two studies (same samples used in prior post) to examine the correlation of the CSI with other commonly used customer metrics, including likelihood to recommend (e.g., NPS), overall satisfaction and CX ratings of important customer touch points (e.g., product quality, customer service). Identifying what is and is not correlated with the CSI will help to clarify its meaning.

Samples and Measures

In the first study, as part of an annual customer survey, a B2B technology company collected customer feedback in their annual customer survey. The customer survey included questions that required customers to provide ratings (0 – low to 10 – high) on measures of customer loyalty (e.g., overall satisfaction, likelihood to recommend, likelihood to buy different products, likelihood to renew) and satisfaction with the customer experience (e.g., product quality, sales process, ease of doing business, technical support).

Additionally, the survey included the question to calculate the CSI: “Using one word, please describe COMPANY’S products/services.” Five CSI scores were calculated using the different sentiment lexicons developed earlier. From 1619 completed surveys, 894 customers provided an answer for the question. Many respondents used multiple words or the company’s name as their response, reducing the number of useful responses to be 689. Of these respondents, a total of 251 usable unique words were used by respondents.

In the second study, as part of a customer relationship survey, I solicited responses from customers of wireless service providers (B2C sample). The sample was obtained using Mechanical Turk by recruiting English-speaking participants to complete a short customer survey about their experience with their wireless service provider. The survey included the aforementioned questions measuring customer loyalty and satisfaction with CX touch points.

The following question was used to generate the CSI: “What one word best describes COMPANY? Please answer this question using one word.” Five CSI scores were calculated using the different sentiment lexicons developed earlier. From 469 completed surveys, 429 customers provided an answer for the question, Many respondents used multiple words or the company’s name as their response, reducing the number of useful responses to be 319. Of these respondents, a total of 85 usable unique words were used by respondents.

Results for B2B Sample

Table 1. Descriptive Statistics of and Correlations among Study Variables in a B2B Sample
Table 1. Descriptive Statistics of and Correlations among Study Variables in a B2B Sample

The descriptive statistics for and correlations among the study variables in first sample are located in Table 1. Looking at the top part of Table 1, the five different CSI scores (each based on a different sentiment lexicon) are correlated strongly with each other (average r = .74), the correlations ranging from a low of r = .60 (Expert compared to IMDB) and r = .62 (Expert compared to Goodreads) to a high of r = .93 (IMDB compared to Goodreads) and r = .83 (OpenTable compared to Amazon/TripAdvisor). A factor analysis of the five CSI scores shows a clear one-factor solution (one eigenvalue over 1.0), with the first factor accounting for almost 82% of the common variance. These pattern of results show that each of the CSI scores measure the same underlying construct.

There were statistically significant differences across each of the five CSI score means. All paired t-tests for each of the 10 paired combinations of the CSI scores were statistically significant (p < .01). The absolute sentiment values you get for the CSI depend on the sentiment lexicon you use. CSI scores are the highest when the Amazon/TripAdvisor lexicon is used and the lowest when the Amazon/Goodreads lexicon is used.

Looking at the bottom part of Table 1, we see the descriptive statistics for the other study variables (loyalty and CX measures) and their correlations with each of the CSI scores. CSI scores were positively related to each of the customer loyalty measures. Customers who have higher CSI scores also report higher levels of customer loyalty compared to customers with lower CSI scores. Additionally, CSI scores were more highly correlated (average r = .37) with advocacy loyalty (i.e., overall satisfaction, recommend, buy same) than with purchasing (i.e., buy additional, expand use) and retention (i.e., renew) loyalty (average r = .18).

CSI scores were positively related to each of the CX measures. Customers who have higher CSI scores also report higher levels of satisfaction with different CX touch points compared to customers with lower CSI scores. Additionally, CSI scores were more highly correlated (average r = .41) with ease of doing business, product quality and communication from the company than with the other CX touch points (average r = .34).

Of the five different CSI scores, CSI-Expert scores showed higher correlations (average r = .42) with each customer loyalty and CX measure compared to the other CSI scores. CSI-Goodreads scores (average r = .26) and CSI-IMDB scores (average r = .27) had the lowest correlations with those same loyalty and CX measures.

Results for B2C Sample

Table 2.
Table 2. Descriptive Statistics of and Correlations among Study Variables in a B2C Sample.

The descriptive statistics for and correlations among the study variables in second sample are located in Table 2. The results for the B2C sample (here) are similar to the results of the B2B sample (prior). Looking at the top part of Table 2, the five different CSI scores (each based on a different sentiment lexicon) are correlated strongly with each other (average r = .73), the correlations ranging from a low of r = .60 (Expert compared to IMDB) and r = .62 (Amazon/Tripadvisor compared to Goodreads) to a high of r = .92 (IMDB compared to Goodreads) and r = .82 (OpenTable compared to Amazon/TripAdvisor). A factor analysis of the five CSI scores shows a clear one-factor solution (one eigenvalue over 1.0), with the first factor accounting for almost 70% of the common variance. These pattern of results show that each of the CSI scores measure the same underlying construct.

There were statistically significant mean differences across each of the five CSI scores. Paired t-tests for each of the 10 paired combinations of the CSI scores were statistically significant (p < .01). Similar to the B2B sample results, CSI scores are the highest when the Amazon/TripAdvisor lexicon is used and the lowest when the Goodreads lexicon is used.

Table 2 (bottom part) contains the descriptive statistics for the other study variables (loyalty and CX measures) and their correlations with each of the CSI scores. Results show that CSI scores were positively related to each of the customer loyalty measures. Customers who have higher CSI scores also report higher levels of customer loyalty compared to customers with lower CSI scores. CSI scores were more highly correlated (average r = .46) with advocacy loyalty (i.e., overall satisfaction, recommend, buy same) than with purchasing (i.e., buy additional, buy more expensive) and retention (i.e., switch – reverse coded for average) loyalty (average r = .27).

CSI scores were positively related to each of the CX measures. Customers who have higher CSI scores also report higher levels of satisfaction with different CX touch points compared to customers with lower CSI scores. Additionally, there is not a lot of variability in correlations across the different CX touch points within CSI scores. However, there are big differences across CSI scores; CSI-Expert scores showed higher correlations (average r = .49) with each customer loyalty and CX measure compared to the other CSI scores. CSI-Goodreads scores (average r = .31) and CSI-IMDB scores (average r = .27) had the lowest correlations with those same loyalty and CX measures.

Settling on a Final Sentiment Lexicon for the CSI

Figure 1.  Scatterplot of Words' Sentiment Values based on Different Sentiment Lexicons
Figure 1. Scatterplot of Words’ Sentiment Values based on Different Sentiment Lexicons

There was generally high agreement between the Expert-derived and the OpenTable sentiment lexicons in scaling words along the sentiment continuum. Using these two lexicons as a starting point for use in calculating the CSI makes sense as each provides roughly the same information about words. However, the sentiment values of a few words were wildly different (i.e., were on opposite sides of the sentiment continuum). For example (see Figure 1, upper plot), the experts classified the word, “Expensive,” as holding a negative sentiment (CSI-E = 2.6) while the empirically-derived sentiment approach based on the OpenTable words classified the same word as holding a positive sentiment (CSI-OT = 6.48). There were a handful of words, like Finicky, Light, Pricey, Costly and Flakey, that showed a similar pattern of results in that the experts thought the words were negative but OpenTable assigned the words as positive. When comparing these words using the Amazon/Tripadvisor and OpenTable sentiment lexicon (Figure 1, bottom plot, red points), we see that these words are generally positive in both sentiment lexicons.

Because of the agreement of other lexicons for handful of words, I created a single sentiment lexicon from the OpenTable and Expert lexicon by allowing the OpenTable sentiment value to be the default sentiment value for the word. For each word that is not in the OpenTable lexicon, the Expert sentiment value was used to assign a sentiment value to each word. This combined sentiment lexicon was used to recalculate sentiment for the respondents in both studies (CSI-BOB).

As I develop the sentiment lexicon(s) for the CSI, I will incorporate survey ratings provided by customers as part of the sentiment lexicon. In fact, of the 19 customers who used the word “Expensive” in the B2B sample, their overall satisfaction rating with the company was 6.58 (positive), similar to the OpenTable value and quite different from the Expert value.

Using CSI to Identify At-Risk Customers

Figure 1. Relationship between the CSI-BOB and Likelihood to Recommend
Figure 2. Relationship between the CSI-BOB and Likelihood to Recommend for both (B2B and B2C) study samples.

Customer surveys can be used to identify at-risk customers. “At-risk” is defined as those customers who will exhibit disloyal behaviors (e.g., switch) or those who will not exhibit loyalty behaviors (e.g., not recommend). Using the CSI to identify at-risk customers, we need to identify a useful cut-off point along the CSI continuum. This cutoff point defines the line below which customers are likely to be disloyal and above which customers are likely to be loyal.

To illustrate the relationship between customer sentiment and customer loyalty, I examined the relationship between CSI-BOB and the likelihood to recommend question (see Figure 2). The recommend question was grouped into segments made popular by the Net Promoter Score: Detractors, Passives and Promoters.

The relationship between the CSI and recommendation intentions was more linear for the B2B sample (top plot) compared to the B2C sample (bottom plot). For the B2B sample, the CSI cutoff point at which most respondents become Detractors is around 3.0. For the B2C sample, the comparable cutoff point for the CSI is around 6.0.

Summary

This series of articles covered the development of the Customer Sentiment Index (CSI), a methodology to capture customers’ sentiment using a single word.

The CSI was positively correlated with all customer loyalty and CX metrics. Customers with higher customer sentiment reported higher levels of customer loyalty and were more satisfied with their customer experience. The CSI was more closely associated with advocacy loyalty (e.g., overall satisfaction, likelihood to recommend) than with other types of loyalty and CX metrics, suggesting that the CSI measures customers’ general attitude toward the company.

As part of a customer experience management programs, businesses need to identify at-risk customers so they can address their concerns immediately. I showed that the CSI can be used to reliably identify at-risk customers (e.g., not likely to recommend). Setting the right cutoff point that optimizes the identification of loyalty and disloyal customers varied across the B2B and B2C samples. Future research needs to confirm or disprove this finding.

Asking customers to describe your company using one word appears to hold some value in helping businesses understand and manage the customer relationship. The Customer Sentiment Index (CSI) provides reliable, valid information about customers’ sentiment. The results showed that the CSI measures customers’ general attitude about your company/brand. The CSI is predictive of important organizational metrics like recommendation, purchasing and switching intentions.

I will continue exploring different uses of the CSI to show how companies can use this metric in their customer experience management programs.

Originally Posted at: Development of the Customer Sentiment Index: Reliability, Validity and Usefulness by bobehayes

2016 Trends in Big Data: Insights and Action Turn Big Data Small

Big data’s salience throughout the contemporary data sphere is all but solidified. Gartner indicates its technologies are embedded within numerous facets of data management, from conventional analytics to sophisticated data science issues.

Consequently, expectations for big data will shift this year. It is no longer sufficient to justify big data deployments by emphasizing the amount and sundry of types of data these technologies ingest, but rather the specific business value they create by offering targeted applications and use cases providing, ideally, quantifiable results.

The shift in big data expectations, then, will go from big to small. That transformation in the perception and deployments of big data will be spearheaded by numerous aspects of data management, from the evolving roles of Chief Data Officers to developments in the Internet of Things. Still, the most notable trends impacting big data will inevitably pertain to the different aspects of:

• Ubiquitous Machine Learning: Machine learning will prove one of the most valuable technologies for reducing time to insight and action for big data. Its propensity for generating future algorithms based on the demonstrated use and practicality of current ones can improve analytics and the value it yields. It can also expedite numerous preparation processes related to data integration, cleansing, transformation and others, while smoothing data governance implementation.
• Cloud-Based IT Outsourcing: The cloud benefits of scale, cost, and storage will alter big data initiatives by transforming IT departments. The new paradigm for this organizational function will involve a hybridized architecture in which all but the most vital and longstanding systems are outsourced to complement existing infrastructure.
• Data Science for Hire: Whereas some of the more complicated aspects of data science (tailoring solutions to specific business processes) will remain tenuous, numerous aspects of this discipline have become automated and accelerated. The emergence of a market for algorithms, Machine Learning-as-a-Service, and self-service data discovery and management tools will spur this trend.

From Machine Learning to Artificial Intelligence
The correlation between these three trends is probably typified by the increasing prevalence of machine learning, which is an integral part of many of the analytics functions that IT departments are outsourcing and aspects of data science that have become automated. The expectations for machine learning will truly blossom this year, with Gartner offering numerous predictions for the end of the decade in which elements of artificial intelligence are normative parts of daily business activities. The projected expansion of the IoT and the automated activity required of the predictive analytics required for its continued growth will increase the reliance on machine learning, while its applications in various data preparation and governance tools are equally as vital.

Nonetheless, the chief way in which machine learning will help to shift the focus of big data from sprawling to narrow relates to the fact that it either eschews or hastens human involvement in all of the aforementioned processes, and in many others as well. Forrester predicted that: “Machine learning will replace manual data wrangling and data governance dirty work…The freeing up of time will accelerate the execution of data and analytics strategies, allowing organizations to get to the good stuff, taking actions and driving better business outcomes based on the data.” Machine learning will enable organizations to spend less time managing their data and more time creating action from the insights they provide.

Accelerating data management processes also enables users to spend more time understanding their data. John Rueter, Vice President of Marketing at Cambridge Semantics, denoted the importance of establishing the context and meaning of data. “Everyone is in such a race to collect as much data as they can and store it so they can get to it when they want to, when oftentimes they really aren’t thinking ahead of time about what they want to do with it, and how it is going to be used. The fact of the matter is what’s the point of collecting all this data if you don’t understand it?”

Cloud-Based IT
The trend of outsourcing IT to the cloud is evinced in a number of different ways, from a distributed model of data management to one in which IT resources are more frequently accessed through the cloud. The variation of basic data management services that the enterprise is able to outsource via the cloud (including analytics, integration, computations, CRM, etc.) are revamping typical architectural concerns, which are increasingly involving the cloud. These facts are substantiated by IDC’s predictions that, “By 2018, at least 50 % of IT spending will be cloud based. By 2018, 65 % of all enterprise IT assets will be housed offsite and 33% of IT staff will be employed by third-party, managed service providers.”

The impact of this trend goes beyond merely extending the cloud’s benefits of decreased infrastructure, lower costs, and greater agility. It means that a number of pivotal facets of data management will require less daily manipulating on the part of the enterprise, and that end users can implements the results of those data driven processes quicker and for more specific use cases. Additionally, this trend heralds a fragmentation of the CDO role. The inherent decentralization involved in outsourcing IT functions through the cloud will be reflected in an evolution of this position. The foregoing Forrester post notes that “We will likely see fewer CDOs going forward but more chief analytics officers, or chief data scientists. The role will evolve, not disappear.”

Self-Service Data Science
Data science is another realm in which the other two 2016 trends in big data coalesce. The predominance of machine learning helps to improve the analytical insight gleaned from data science, just as a number of key attributes of this discipline are being outsourced and accessed through the cloud. Those include numerous facets of the analytics process including data discovery, source aggregation, multiple types of analytics and, in some instances, even analysis of the results themselves. As Forrester indicated, “Data science and real-time analytics will collapse the insights time-to-market. The trending of data science and real-time data capture and analytics will continue to close the gaps between data, insight and action. In 2016, Forrester predicts: “A third of firms will pursue data science through outsourcing and technology. Firms will turn to insights services, algorithms markets, and self-service advanced analytics tools, and cognitive computing capabilities, to help fill data science gaps.”

Self-service data science options for analytics encompass myriad forms, from providers that provision graph analytics, Machine Learning-as-a-Service, and various forms of cognitive computing. The burgeoning algorithms market is a vital aspect of this automation of data science, and enables companies to leverage previously existent algorithms with their own data. Some algorithms are stratified according to use cases for data according to business unit or vertical industry. Similarly, Machine Learning-as-a-Service options provide excellent starting points for organizations to simply add their data and reap predictive analytics capabilities.

Targeting Use Cases to Shrink Big Data
The principal point of commonality between all of these trends is the furthering of the self-service movement and the ability it gives end users to hone in on the uses of data, as opposed to merely focusing on the data itself and its management. The ramifications are that organizations and individual users will be able to tailor and target their big data deployments for individualized use cases, creating more value at the departmental and intradepartmental levels…and for the enterprise as a whole. The facilitation of small applications and uses of big data will justify this technology’s dominance of the data landscape.

Originally Posted at: 2016 Trends in Big Data: Insights and Action Turn Big Data Small by jelaniharper

Jan 25, 18: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Data security  Source

[ NEWS BYTES]

>>
 Mueller Investigation: Did Trump, Kushner and RNC Help Russia Use Big Data to Target US Voters? – Newsweek Under  Big Data

>>
 Personal loans, Beneficial loans, no credit check – Blue Ribbon News Under  Risk Analytics

>>
 Resilinc Awarded Patent for supply Chain Risk Analytics and … – Supply Chain Management Review Under  Risk Analytics

More NEWS ? Click Here

[ FEATURED COURSE]

Pattern Discovery in Data Mining

image

Learn the general concepts of data mining along with basic methodologies and applications. Then dive into one subfield in data mining: pattern discovery. Learn in-depth concepts, methods, and applications of pattern disc… more

[ FEATURED READ]

Superintelligence: Paths, Dangers, Strategies

image

The human brain has some capabilities that the brains of other animals lack. It is to these distinctive capabilities that our species owes its dominant position. Other animals have stronger muscles or sharper claws, but … more

[ TIPS & TRICKS OF THE WEEK]

Fix the Culture, spread awareness to get awareness
Adoption of analytics tools and capabilities has not yet caught up to industry standards. Talent has always been the bottleneck towards achieving the comparative enterprise adoption. One of the primal reason is lack of understanding and knowledge within the stakeholders. To facilitate wider adoption, data analytics leaders, users, and community members needs to step up to create awareness within the organization. An aware organization goes a long way in helping get quick buy-ins and better funding which ultimately leads to faster adoption. So be the voice that you want to hear from leadership.

[ DATA SCIENCE Q&A]

Q:Given two fair dices, what is the probability of getting scores that sum to 4? to 8?
A: * Total: 36 combinations
* Of these, 3 involve a score of 4: (1,3), (3,1), (2,2)
* So: 3/36=1/12
* Considering a score of 8: (2,6), (3,5), (4,4), (6,2), (5,3)
* So: 5/36

Source

[ VIDEO OF THE WEEK]

@AnalyticsWeek Panel Discussion: Finance and Insurance Analytics

 @AnalyticsWeek Panel Discussion: Finance and Insurance Analytics

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

Everybody gets so much information all day long that they lose their common sense. – Gertrude Stein

[ PODCAST OF THE WEEK]

#FutureOfData Podcast: Peter Morgan, CEO, Deep Learning Partnership

 #FutureOfData Podcast: Peter Morgan, CEO, Deep Learning Partnership

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

Every second we create new data. For example, we perform 40,000 search queries every second (on Google alone), which makes it 3.5 searches per day and 1.2 trillion searches per year.In Aug 2015, over 1 billion people used Facebook FB +0.54% in a single day.

Sourced from: Analytics.CLUB #WEB Newsletter

What “Gangnam Style” could teach about branding: 5 Lessons

What "Gangnam Style" could teach about branding: 5 LessonsNot sure, if anyone who exists online is not aware of “Gangnam Style” video that was posted a couple of months back. Sure, as a treat, I have embedded the video after this read. The video grew viral and had everyone talking. I should confess, I myself had watched the video for about a dozen times. I am still not done embracing the brilliance of it. I tried digging deeper into the video and found that there are some good lessons on product branding that could be utilized by all organizations.

5 Lessons:

Don’t be afraid to take risk: How many times do you see a guy dancing in a classy attire, in a cheesy way, not many. It is not too difficult to see the video as different and edgy. Consider every minute 48 hours worth of video is uploaded. It is unexpected and different that stands out. Keeping things routine and traditional might not cut it, so its important to be different and take risk. Different and unexpected brands tends to be retained more. Remember the suite of “OldSpice” ads.

Keep it simple: Another observation that stuck out was simplicity of the video. It was not made more complicated than required to understand the concept. It is important not to bombard too much details and complication to take the fun away from the central message. Therefore, product branding that stayed simple makes it faster to understand and easier to retain.

Introduce the elements of participation: One beautiful thing worth appreciating about this video and many other viral videos is the ability that these videos provide for others to follow, iterate and participate. This not only lets others chime in and start participating in your work, but it gives you great kickbacks as well. As with Gangnam style and other viral videos, several clones emerged that further added to the virality of this video. So, its always a genius thought to provide an excuse in the system for other to participate.

Explain the execution well: Another thing worth noting from this video is the execution. Gangnam style video is primarily localized around a dance move that is easier to copy, and the video demonstrated a couple of examples to see the move in action. This is another way to minimize confusion. Doing this not only explain the execution well, but helps others imagine the prospects. Its effect could be seen on its clone videos where people have copied moves and backgrounds and put their own imagination to it. Keeping the message same, but outcome different. This helps in strengthening the brand as few routine used cases are told upfront.

Maintain the focus: This is another point that stood out for me in the video. Throughout the run of the video, the focus was intact. Cheesy dance with a classy dress and good background music to back it up. It communicated the message clearly and loudly. After the run of the video, if you think about what has stayed with you, If one or two things jump out, branding is clearer and more focussed. If too many things start cluttering the mind, it was not done properly. So, it is important to maintain the focus of the brand to one or two key areas only. This results in stronger positioning and clarity for the brand.

Now the fun part:
Video:

As extra treat, for something more around branding, watch a TED talk by Morgan Spurlock

Originally Posted at: What “Gangnam Style” could teach about branding: 5 Lessons

The 37 best tools for data visualization

Creating charts and info graphics can be time-consuming. But these tools make it easier.

It’s often said that data is the new world currency, and the web is the exchange bureau through which it’s traded. As consumers, we’re positively swimming in data; it’s everywhere from labels on food packaging design to World Health Organisation reports. As a result, for the designer it’s becoming increasingly difficult to present data in a way that stands out from the mass of competing data streams.

One of the best ways to get your message across is to use a visualization to quickly draw attention to the key messages, and by presenting data visually it’s also possible to uncover surprising patterns and observations that wouldn’t be apparent from looking at stats alone.

EVENT PROMOTION

Not a web designer or developer? You may prefer free tools for creating infographics.

As author, data journalist and information designer David McCandless said in his TED talk: “By visualizing information, we turn it into a landscape that you can explore with your eyes, a sort of information map. And when you’re lost in information, an information map is kind of useful.”

There are many different ways of telling a story, but everything starts with an idea. So to help you get started we’ve rounded up some of the most awesome data visualization tools available on the web.

01. Dygraphs

Help visitors explore dense data sets with JavaScript library Dygraphs

Dygraphs is a fast, flexible open source JavaScript charting library that allows users to explore and interpret dense data sets. It’s highly customizable, works in all major browsers, and you can even pinch to zoom on mobile and tablet devices.

02. ZingChart

ZingChart lets you create HTML5 Canvas charts and more

ZingChart is a JavaScript charting library and feature-rich API set that lets you build interactive Flash or HTML5 charts. It offer over 100 chart types to fit your data.

03. InstantAtlas

InstantAtlas enables you to create highly engaging visualisations around map data

If you’re looking for a data viz tool with mapping, InstantAtlas is worth checking out. This tool enables you to create highly-interactive dynamic and profile reports that combine statistics and map data to create engaging data visualizations.

04. Timeline

 Timeline
Timeline creates beautiful interactive visualizations

Timeline is a fantastic widget which renders a beautiful interactive timeline that responds to the user’s mouse, making it easy to create advanced timelines that convey a lot of information in a compressed space.

Each element can be clicked to reveal more in-depth information, making this a great way to give a big-picture view while still providing full detail.

05. Exhibit

 Exhibit
Exhibit makes data visualization a doddle

Developed by MIT, and fully open-source, Exhibit makes it easy to create interactive maps, and other data-based visualizations that are orientated towards teaching or static/historical based data sets, such as flags pinned to countries, or birth-places of famous people.

06. Modest Maps

 Modest Maps
Integrate and develop interactive maps within your site with this cool tool

Modest Maps is a lightweight, simple mapping tool for web designers that makes it easy to integrate and develop interactive maps within your site, using them as a data visualization tool.

The API is easy to get to grips with, and offers a useful number of hooks for adding your own interaction code, making it a good choice for designers looking to fully customise their user’s experience to match their website or web app. The basic library can also be extended with additional plugins, adding to its core functionality and offering some very useful data integration options.

07. Leaflet

 Leaflet
Use OpenStreetMap data and integrate data visualisation in an HTML5/CSS3 wrapper

Another mapping tool, Leaflet makes it easy to use OpenStreetMap data and integrate fully interactive data visualisation in an HTML5/CSS3 wrapper.

The core library itself is very small, but there are a wide range of plugins available that extend the functionality with specialist functionality such as animated markers, masks and heatmaps. Perfect for any project where you need to show data overlaid on a geographical projection (including unusual projections!).

08. WolframAlpha

 Wolfram Alpha
Wolfram Alpha is excellent at creating charts

Billed as a “computational knowledge engine”, the Google rival WolframAlpha is really good at intelligently displaying charts in response to data queries without the need for any configuration. If you’re using publically available data, this offers a simple widget builder to make it really simple to get visualizations on your site.

09. Visual.ly

 Visual.ly
Visual.ly makes data visualization as simple as it can be

Visual.ly is a combined gallery and infographic generation tool. It offers a simple toolset for building stunning data representations, as well as a platform to share your creations. This goes beyond pure data visualisation, but if you want to create something that stands on its own, it’s a fantastic resource and an info-junkie’s dream come true!

10. Visualize Free

 Visualize Free
Make visualizations for free!

Visualize Free is a hosted tool that allows you to use publicly available datasets, or upload your own, and build interactive visualizations to illustrate the data. The visualizations go well beyond simple charts, and the service is completely free plus while development work requires Flash, output can be done through HTML5.

11. Better World Flux

 Better World Flux
Making the ugly beautiful – that’s Better World Flux

Orientated towards making positive change to the world, Better World Flux has some lovely visualizations of some pretty depressing data. It would be very useful, for example, if you were writing an article about world poverty, child undernourishment or access to clean water. This tool doesn’t allow you to upload your own data, but does offer a rich interactive output.

12. FusionCharts

FusionCharts Suite XT
A comprehensive JavaScript/HTML5 charting solution for your data visualization needs

FusionCharts Suite XT brings you 90+ charts and gauges, 965 data-driven maps, and ready-made business dashboards and demos. FusionCharts comes with extensive JavaScript API that makes it easy to integrate it with any AJAX application or JavaScript framework. These charts, maps and dashboards are highly interactive, customizable and work across all devices and platforms. They also have a comparison of the top JavaScript charting libraries which is worth checking out.

13. jqPlot

 jQPlot
jqPlot is a nice solution for line and point charts

Another jQuery plugin, jqPlot is a nice solution for line and point charts. It comes with a few nice additional features such as the ability to generate trend lines automatically, and interactive points that can be adjusted by the website visitor, updating the dataset accordingly.

14. Dipity

 Dipity
Dipity has free and premium versions to suit your needs

Dipity allows you to create rich interactive timelines and embed them on your website. It offers a free version and a premium product, with the usual restrictions and limitations present. The timelines it outputs are beautiful and fully customisable, and are very easy to embed directly into your page.

15. Many Eyes

 Many Eyes
Many Eyes was developed by IBM

Developed by IBM, Many Eyes allows you to quickly build visualizations from publically available or uploaded data sets, and features a wide range of analysis types including the ability to scan text for keyword density and saturation. This is another great example of a big company supporting research and sharing the results openly.

16. D3.js

 D3.js
You can render some amazing diagrams with D3

D3.js is a JavaScript library that uses HTML, SVG, and CSS to render some amazing diagrams and charts from a variety of data sources. This library, more than most, is capable of some seriously advanced visualizations with complex data sets. It’s open source, and uses web standards so is very accessible. It also includes some fantastic user interaction support.

17. JavaScript InfoVis Toolkit

 JavaScript InfoVis Toolkit
JavaScript InfoVis Toolkit includes a handy modular structure

A fantastic library written by Nicolas Belmonte, the JavaScript InfoVis Toolkit includes a modular structure, allowing you to only force visitors to download what’s absolutely necessary to display your chosen data visualizations. This library has a number of unique styles and swish animation effects, and is free to use (although donations are encouraged).

18. jpGraph

 jpGraph
jpGraph is a PHP-based data visualization tool

If you need to generate charts and graphs server-side, jpGraph offers a PHP-based solution with a wide range of chart types. It’s free for non-commercial use, and features extensive documentation. By rendering on the server, this is guaranteed to provide a consistent visual output, albeit at the expense of interactivity and accessibility.

19. Highcharts

 Highcharts
Highcharts has a huge range of options available

Highcharts is a JavaScript charting library with a huge range of chart options available. The output is rendered using SVG in modern browsers and VML in Internet Explorer. The charts are beautifully animated into view automatically, and the framework also supports live data streams. It’s free to download and use non-commercially (and licensable for commercial use). You can also play with the extensive demos using JSFiddle.

20. Google Charts

 Google Charts
Google Charts has an excellent selection of tools available

The seminal charting solution for much of the web, Google Charts is highly flexible and has an excellent set of developer tools behind it. It’s an especially useful tool for specialist visualizations such as geocharts and gauges, and it also includes built-in animation and user interaction controls.

21. Excel

 Excel
It isn’t graphically flexible, but Excel is a good way to explore data: for example, by creating ‘heat maps’ like this one

You can actually do some pretty complex things with Excel, from ‘heat maps’ of cells to scatter plots. As an entry-level tool, it can be a good way of quickly exploring data, or creating visualizations for internal use, but the limited default set of colours, lines and styles make it difficult to create graphics that would be usable in a professional publication or website. Nevertheless, as a means of rapidly communicating ideas, Excel should be part of your toolbox.

Excel comes as part of the commercial Microsoft Office suite, so if you don’t have access to it, Google’s spreadsheets – part ofGoogle Docs and Google Drive – can do many of the same things. Google ‘eats its own dog food’, so the spreadsheet can generate the same charts as the Google Chart API. This will get your familiar with what is possible before stepping off and using the API directly for your own projects.

22. CSV/JSON

CSV (Comma-Separated Values) and JSON (JavaScript Object Notation) aren’t actual visualization tools, but they are common formats for data. You’ll need to understand their structures and how to get data in or out of them.

23. Crossfilter

 Crossfilter
Crossfilter in action: by restricting the input range on any one chart, data is affected everywhere. This is a great tool for dashboards or other interactive tools with large volumes of data behind them

As we build more complex tools to enable clients to wade through their data, we are starting to create graphs and charts that double as interactive GUI widgets. JavaScript library Crossfilter can be both of these. It displays data, but at the same time, you can restrict the range of that data and see other linked charts react.

24. Tangle

 Tangle
Tangle creates complex interactive graphics. Pulling on any one of the knobs affects data throughout all of the linked charts. This creates a real-time feedback loop, enabling you to understand complex equations in a more intuitive way

The line between content and control blurs even further with Tangle. When you are trying to describe a complex interaction or equation, letting the reader tweak the input values and see the outcome for themselves provides both a sense of control and a powerful way to explore data. JavaScript library Tangle is a set of tools to do just this.

Dragging on variables enables you to increase or decrease their values and see an accompanying chart update automatically. The results are only just short of magical.

25. Polymaps

 Polymaps
Aimed more at specialist data visualisers, the Polymaps library creates image and vector-tiled maps using SVG

Polymaps is a mapping library that is aimed squarely at a data visualization audience. Offering a unique approach to styling the the maps it creates, analagous to CSS selectors, it’s a great resource to know about.

26. OpenLayers

 OpenLayers
It isn’t easy to master, but OpenLayers is arguably the most complete, robust mapping solution discussed here

OpenLayers is probably the most robust of these mapping libraries. The documentation isn’t great and the learning curve is steep, but for certain tasks nothing else can compete. When you need a very specific tool no other library provides, OpenLayers is always there.

27. Kartograph

 Kartograph
Kartograph’s projections breathe new life into our standard slippy maps

Kartograph’s tag line is ‘rethink mapping’ and that is exactly what its developers are doing. We’re all used to the Mercator projection, but Kartograph brings far more choices to the table. If you aren’t working with worldwide data, and can place your map in a defined box, Kartograph has the options you need to stand out from the crowd.

28. CartoDB

 CartoDB
CartoDB provides an unparalleled way to combine maps and tabular data to create visualisations

CartoDB is a must-know site. The ease with which you can combine tabular data with maps is second to none. For example, you can feed in a CSV file of address strings and it will convert them to latitudes and longitudes and plot them on a map, but there are many other users. It’s free for up to five tables; after that, there are monthly pricing plans.

29. Processing

 Processing
Processing provides a cross-platform environment for creating images, animations, and interactions

Processing has become the poster child for interactive visualizations. It enables you to write much simpler code which is in turn compiled into Java.

There is also a Processing.js project to make it easier for websites to use Processing without Java applets, plus a port to Objective-C so you can use it on iOS. It is a desktop application, but can be run on all platforms, and given that it is now several years old, there are plenty of examples and code from the community.

30. NodeBox

 NodeBox
NodeBox is a quick, easy way for Python-savvy developers to create 2D visualisations

NodeBox is an OS X application for creating 2D graphics and visualizations. You need to know and understand Python code, but beyond that it’s a quick and easy way to tweak variables and see results instantly. It’s similar to Processing, but without all the interactivity.

31. R

 R
A powerful free software environment for statistical computing and graphics, R is the most complex of the tools listed here

How many other pieces of software have an entire search enginededicated to them? A statistical package used to parse large data sets, R is a very complex tool, and one that takes a while to understand, but has a strong community and package library, with more and more being produced.

The learning curve is one of the steepest of any of these tools listed here, but you must be comfortable using it if you want to get to this level.

32. Weka

 Weka
A collection of machine-learning algorithms for data-mining tasks, Weka is a powerful way to explore data

When you get deeper into being a data scientist, you will need to expand your capabilities from just creating visualizations to data mining. Weka is a good tool for classifying and clustering data based on various attributes – both powerful ways to explore data – but it also has the ability to generate simple plots.

33. Gephi

 Gelphi
Gephi in action. Coloured regions represent clusters of data that the system is guessing are similar

When people talk about relatedness, social graphs and co-relations, they are really talking about how two nodes are related to one another relative to the other nodes in a network. The nodes in question could be people in a company, words in a document or passes in a football game, but the maths is the same.

Gephi, a graph-based visualiser and data explorer, can not only crunch large data sets and produce beautiful visualizations, but also allows you to clean and sort the data. It’s a very niche use case and a complex piece of software, but it puts you ahead of anyone else in the field who doesn’t know about this gem.

34. iCharts

 iCharts
iCharts can have interactive elements, and you can pull in data from Google Docs

The iCharts service provides a hosted solution for creating and presenting compelling charts for inclusion on your website. There are many different chart types available, and each is fully customisable to suit the subject matter and colour scheme of your site.

Charts can have interactive elements, and can pull data from Google Docs, Excel spreadsheets and other sources. The free account lets you create basic charts, while you can pay to upgrade for additional features and branding-free options.

35. Flot

 Flot
Create animated visualisations with this jQuery plugin

Flot is a specialised plotting library for jQuery, but it has many handy features and crucially works across all common browsers including Internet Explorer 6. Data can be animated and, because it’s a jQuery plugin, you can fully control all the aspects of animation, presentation and user interaction. This does mean that you need to be familiar with (and comfortable with) jQuery, but if that’s the case, this makes a great option for including interactive charts on your website.

36. Raphaël

 Raphael
This handy JavaScript library offers a range of data visualisation options

This handy JavaScript library offers a wide range of data visualization options which are rendered using SVG. This makes for a flexible approach that can easily be integrated within your own web site/app code, and is limited only by your own imagination.

That said, it’s a bit more hands-on than some of the other tools featured here (a victim of being so flexible), so unless you’re a hardcore coder, you might want to check out some of the more point-and-click orientated options first!

37. jQuery Visualize

 JQuery Visualise
jQuery Visualize Plugin is an open source charting plugin

Written by the team behind jQuery’s ThemeRoller and jQuery UI websites, jQuery Visualize Plugin is an open source charting plugin for jQuery that uses HTML Canvas to draw a number of different chart types. One of the key features of this plugin is its focus on achieving ARIA support, making it friendly to screen-readers. It’s free to download from this page on GitHub.

Further reading

  • A great Tumblr blog for visualization examples and inspiration:vizualize.tumblr.com
  • Nicholas Felton’s annual reports are now infamous, but he also has a Tumblr blog of great things he finds.
  • From the guy who helped bring Processing into the world:benfry.com/writing
  • Stamen Design is always creating interesting projects:stamen.com
  • Eyeo Festival brings some of the greatest minds in data visualization together in one place, and you can watch the videos online.

Brian Suda is a master informatician and author of Designing with Data, a practical guide to data visualisation.

Originally posted via “The 37 best tools for data visualization”

 

Originally Posted at: The 37 best tools for data visualization

Jan 18, 18: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Data security  Source

[ AnalyticsWeek BYTES]

>> Fortune 100 CEOs And Their Path To Success by v1shal

>> Using Analytics to build A Big Data Workforce by v1shal

>> #FutureOfData Podcast: Peter Morgan, CEO, Deep Learning Partnership – Playcast – Data Analytics Leadership Playbook Podcast by v1shal

Wanna write? Click Here

[ NEWS BYTES]

>>
 LRH recognized for improved performance – Dickinson County News Under  Health Analytics

>>
 STORM OVER STATISTICS: Manufacturing retained positive growth — Kale, Statistician General of the Federation – Vanguard Under  Statistics

>>
 Rival IQ Provides Social Media Analytics at No Cost to HubSpot Customers with New Integration Partnership – MarTech Series Under  Social Analytics

More NEWS ? Click Here

[ FEATURED COURSE]

Probability & Statistics

image

This course introduces students to the basic concepts and logic of statistical reasoning and gives the students introductory-level practical ability to choose, generate, and properly interpret appropriate descriptive and… more

[ FEATURED READ]

Thinking, Fast and Slow

image

Drawing on decades of research in psychology that resulted in a Nobel Prize in Economic Sciences, Daniel Kahneman takes readers on an exploration of what influences thought example by example, sometimes with unlikely wor… more

[ TIPS & TRICKS OF THE WEEK]

Grow at the speed of collaboration
A research by Cornerstone On Demand pointed out the need for better collaboration within workforce, and data analytics domain is no different. A rapidly changing and growing industry like data analytics is very difficult to catchup by isolated workforce. A good collaborative work-environment facilitate better flow of ideas, improved team dynamics, rapid learning, and increasing ability to cut through the noise. So, embrace collaborative team dynamics.

[ DATA SCIENCE Q&A]

Q:What is the Central Limit Theorem? Explain it. Why is it important?
A: The CLT states that the arithmetic mean of a sufficiently large number of iterates of independent random variables will be approximately normally distributed regardless of the underlying distribution. i.e: the sampling distribution of the sample mean is normally distributed.
– Used in hypothesis testing
– Used for confidence intervals
– Random variables must be iid: independent and identically distributed
– Finite variance

Source

[ VIDEO OF THE WEEK]

Data-As-A-Service (#DAAS) to enable compliance reporting

 Data-As-A-Service (#DAAS) to enable compliance reporting

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

For every two degrees the temperature goes up, check-ins at ice cream shops go up by 2%. – Andrew Hogue, Foursquare

[ PODCAST OF THE WEEK]

Andrea Gallego(@risenthink) / @BCG on Managing Analytics Practice #FutureOfData #Podcast

 Andrea Gallego(@risenthink) / @BCG on Managing Analytics Practice #FutureOfData #Podcast

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

100 terabytes of data uploaded daily to Facebook.

Sourced from: Analytics.CLUB #WEB Newsletter

The Definitive Guide to Do Data Science for Good

You are a fully-equipped (or aspiring) data scientist and want to use your precious skills for solving problems that really itch the world? Welcome to the club. The good news is that there are many ways for data scientists to do good. However, the path is not always beaten and you might need to show some initiative.  This article will give you some insight on how you can get involved, either through group meetings and events, as a volunteer or in paid positions.

data science for good

Source: flickr

Getting started — online data science competitions

A good place to start (without even having to leave your couch!) are online data science competitions. These competitions allow you to sharpen your skills and to get familiar with different problem types before you get actually involved.
The home of data science competitions certainly is Kaggle. Watch out for competitions that tackle social problems. Examples are the diabetic retinopathy detection competition or theAfrica soil property prediction challenge.
DrivenData is a rather new competition platform that focuses solely on social challenges. This makes it a perfect place to test your skills while doing good.
Occasionally, you will find other data science for good competitions. The IBM Big Data for Social Good Challenge was one of them (but beware, you are not free in the choice of tools here).

Another great way to get started is to replicate one of the projects in our #openimpact shortlist(magic ball icon = predictive analytics inside!)

Group meetings and events

A good opportunity to mingle with like-minded folks in person is attending (or starting) ameetup. The following table lists data science meetups around the world with a focus on social good:

Name Creation year Members Past events Location
Data for Good – Data Scientists & Devs doing GOOD 2012 661 13 Toronto
DataKind NYC 2012 2041 22 New York
DataKind UK 2013 1288 9 London
Data for Good – Calgary 2013 356 14 Calgary
Data for Good Montreal РData Scientists & Devs doing GOOD 2013 140 1 Montr̩al
DataKind Dublin 2013 483 15 Dublin
Brussels Data Science Meetup 2014 1277 35 Brussels
DataKind DC 2014 610 5 Washington
DataKind SG 2014 713 9 Singapore
DataKind Bangalore 2014 502 7 Bangalore
Data for Good 2014 576 2 Paris

Source: Own compilation. Numbers are retrieved dynamically from meetup.com.

You should also keep your eyes and ears open for dedicated hackathons. An example from the past is the Thorn hackathon in San Francisco. Or the Bayes Impact hackathon which happens annually (also in San Francisco).

Volunteering

DataKind is a true pioneer in the field and does a phenomenal job of getting volunteers excited about harnessing the power of data science in the service of humanity. If you live close to one of the DataKind Chapters, you can attend their meetups and further engage in the following ways:

  1. Attend a DataDive:
    DataDives are weekend-long, marathon-style events where dozens of volunteers rally together to help 3-4 social change organizations do initial data analysis, exploration, and prototyping. These events are free for organizations, open to volunteers of all skill levels and take place around the world.
  1. Be among the ones selected into a DataCorps:
    DataCorps is DataKind’s signature program that brings together teams of pro bono data scientists with social change organizations on long-term projects that use data science to transform their work and their sector. These projects last between one to six months and are structured so that volunteers can work in their spare time.

DataKind also hosts a neat “Data Do-Gooding Calendar”.

Do you live in Brazil? Then you might want to check out Data4Good. This initiative works on creating a network of volunteers, produces content to educate around the usage of data for social good (mostly infographics) and provides consulting services for social organizations (more about Data4Good in this blog post).

What if you are not so much into meetups, or if you are living on a remote farm and all you have is a cat, an internet connection and “The Elements of Statistical Learning”?

Well, one thing you can do is look for job descriptions for skilled data volunteers on LinkedIn. However, at the time of writing I got 0 results for “volunteer data scientist” and 1 result for “volunteer data analyst”. However, if “volunteer data entry” is what you are looking for, then there is plenty to do.

If LinkedIn doesn’t get you hooked up with an exciting problem, you should check out theDigital Humanitarian Network. They leverage digital networks for humanitarian response to crises or disasters. It took me a bit to understand their “activation facilitation process”, but it’s a great idea (this diagram helps). You can volunteer through their member organizations who provide data science and coding tasks of different complexity (check out this diagram to see the members’ services).

Some people are even thinking about virtual marketplaces that match up non-profits or local governments with volunteer data scientists. In the same vein, we are currently thinking how we can match up parties on datalook.io. On the one hand non-profit organizations or government agencies who see a project on DataLook and think that it can be replicated to solve their own problem, but don’t have the necessary skills in-house. And on the other hand local or remote data scientists who would be interested in helping to realize the project. If you think this is a great idea or want to discuss this with us, please get in touch.

You see that there are quite a few opportunities for volunteering in the field. But what if you need some dough to pay the bills?

Paid jobs (temporary / part-time)

Such positions are usually organized as fellowships. The most prolific fellowship in the field is probably the Data Science for Social Good Fellowship at the University of Chicago. It was started in 2013 and is run as a 3-month summer program where fellows working in small teams partner with non-profits and local government authorities to tackle socially relevant problems using data science. The fellowship is sponsored by the Eric and Wendy Schmidt Foundation. [$11-16k, 12 weeks]

The program has a smaller sibling in Atlanta: Data Science for Social Good Atlanta. The summer internship program was launched in 2014. Students in the program work as paid interns on projects coming from the City of Atlanta and local non-profits. [$8k, 10 weeks]

If you are a college student in the New York City area, you are then eligible for the Microsoft Research Data Science Summer School. In the past, students taking part in the summer school have worked on NYC related challenges. [$5k, 8 weeks]

Code for America fellows are usually web/app developers, but a few of the fellows are data scientists working on problems in different U.S. cities. [$50k, 11 months]

All these fellowships are run by organizations that partner with non-profits and the government. There are also non-profits that offer their own fellowship. An example is the Thorn Innovation Lab where data scientists help fight child sexual exploitation. [$100k, 1 year]

Apart from fellowships you might become what I call a “data angel”, a full-time data scientist working at a company that partners with a non-profit. You help the non-profit for a limited time while receiving your salary from your company. Some companies that offer such Corporate Social Responsibility programs are Pivotal, Teradata, Cloudera, Palantir, and Informatica.

If your company wants to establish such a CSR program in Germany, get in touch with us.

Paid jobs (permanent / full-time)

DataKind announced in 2014 it would create a full-time, in-house Data Science Team for Good in New York City. Their first data scientist was hired in early 2015 (see here) and you should check out DataKind’s careers page for upcoming positions. Sometimes, “Data for Good” job openings in general are also tweeted via @DataKind.

Bayes Impact is a Y Combinator backed non-profit in San Francisco. They launched in 2014 and their approach is to take on a few large projects at a time rather than spreading their resources across many smaller projects. Their vision is to build operational data science solutions for large-scale problems that affect millions of people. Project partners are large NGOs and the federal government. Bayes Impact is always looking for big-hearted data scientists, data engineers and software engineers. You can apply here.

As non-profits are beginning to understand that data science can help them achieve their goals, a few of them have already created full-time positions for data scientists. Examples areChange.org, Big Mountain Data, and Crisis Text Line.

The government sector too begins to slowly open positions for data scientists. On the local level, the team of the Mayor’s Office of Data Analytics in New York City has achieved some impressive impact with their projects. On the federal level, the White House recently appointed Data Science veteran DJ Patil as U.S. Chief Data Scientist.

You might also want to look for jobs in for-profit companies whose mission is to use cutting-edge data science to solve pressing societal problems. An example is Enlitic in San Francisco who want to revolutionize diagnostic healthcare with deep learning. Or Edgeflip in Chicago who want to enable non-profits and issue-based groups to better reach their online communities using data science. You should also have a look at consultancies like Real Impact Analytics(Brussels), SocialCops (New Delhi) or Civis Analytics (Chicago) that have interesting social good projects in their portfolios. But these are just examples and there are many more out there.

And then there is of course the vast field of academia and science with opportunities to apply your data science skills to the greater good. Fields that produce enormous amounts of data like astronomy have a huge demand for data scientists. Check out an article by Jake Vanderplas for elaborate thoughts about data science in academia.

Off the beaten path

From my German perspective it seems like the vast majority of occasions to apply your skills for social good are in the U.S. I became interested in the field in 2013 and I didn’t find an organization in my city that allowed me to use my skills for social good. Instead of giving up I tried to convince a federal authority to use predictive analytics for prioritizing food security inspections. That didn’t work and then I founded DataLook. Through DataLook, I’m now in touch with a lot of people in Germany and abroad who share my interests.  It’s a long way and we are still looking for non-profits and government agencies as project partners to realize projects (get in touch!). However, I hope that this article helps some of you get connected with existing initiatives – or to start your own and leave the beaten path in order to do what you want to do: use data science to tackle real problems.

To read the original article on DataLook, click here.

AnalyticsWeek note: Here is just one of our meetups.

Originally Posted at: The Definitive Guide to Do Data Science for Good

How CFOs Can Harness Analytics

Businesses are collecting more data than ever from their own operations, supply chains, production processes, employees, and customer interactions.

But information alone isn’t knowledge.

Data is nothing more than virtual garbage unless it can be translated into actionable insight and effective business outcomes.

CFOs and The Information Age

CFOs are the financial spokesperson of their organization, historically responsible for financial planning, record-keeping, and reporting. They’re role is to balance risk through budget management, cost benefit analysis, forecasting, and funding knowledge.

As purveyors of finance, these basic duties haven’t changed much.

But as data volume and variety increase, CFOs and financial professionals must leverage these new information streams to identify trends and incorporate analytics into the decision making process. When it comes to seeking out strategic advice, CEOs turn to CFOs 72 percent of the time.

To help shape the strategic direction of the company, CFOs not only have to interpret vast troves of data, but they have to do it faster and more accurately than the competition.

Business Intelligence for CFOs

A Video Guide to Business Intelligence

Companies that adopt a data-driven culture are the most successful. Seventy-six percent of executives from top-performing companies cite data collection as very important or essential, and data-driven companies are three times more likely to rate themselves as substantially ahead of their peers in financial performance.

CFOs have always used valuable information to identify growth opportunities either from existing or new customers, products, and markets. This task has become more complex however due to the amount of sales, operations, employee, website, and customer data that is now being generated across multiple channels.

To bring these disparate silos together and eliminate late nights hunched over reports (that are often out of date by the time they’re analyzed), CFOs need business intelligence and analytics tools. In fact, Gartner research revealed 78 percent of CFOs consider BI and analytics as a top technology initiative for the finance department — beating out even financial management applications.

Though many organizations consider data a priority, they’re still struggling to make progress with BI and analytics. In order to avoid this common problem, let’s examine how business intelligence software helps CFOs harness data and analytics to increase customer engagement and grow profits.

Self-service Insight

Capturing data is important, but using information to propel the business forward is more important. BI software alleviates the headache of collecting and consolidating profit and loss numbers from every department or business unit for comparison and analysis. By integrating various ERP, CRM, marketing, HR, and back-end data sources into a BI platform, all company data is compiled into one central location. This means no more tracking company performance from spreadsheets or waiting on IT to run complex reports.

BI software puts data at your fingertips through the use of dashboards, which present easy to analyze views of selected metrics. Dashboards help improve the BI user experience and make business intelligence approachable. They simplify complex data sets, reveal patterns, and provide you with a way to monitor business performance at a glance, which enables fact-based decision-making.

Below is a live visualization of a mock CFO dashboard created with Tableau. This particular dashboard is used to view your business composition and monitor changes as they occur. Currently, the segment composition is calculated by net sales. By using the “Select Measure” filter in the top left, you can switch your measure to gross profit, operating income, or net profit. You can also use the rest of the filter panel or drill down by clicking anywhere in the visualization.

 

By aggregating important data, dashboards and self-service reporting tools help you identify hidden trends or missed opportunities. For example, rather than just a report that shows profits increased, BI lets you pinpoint why profits increased. To gain context, you can “drill down” to see exactly what is causing the spike in revenue. With a real-time financial view across the company, you can discover which efforts are suffering and which are exceeding, then allocate resources accordingly.

Financial Visualizations

End-to-end business intelligence is critical. CFOs are already trained to discern patterns and implications, so visualization tools that save time and simplify processes are important. BI dashboards allow CFOs to build interactive reports that allow them to get fast answers to business questions such as “What’s driving sales growth?” or “Where are we spending resources?”

Business Intelligence and Analytics Software In Action

Data visualizations turn your financial key performance indicators (KPIs) into clear graphics, which help you track performance and assess risk. Visualization also helps quickly compare different pieces of data with auto-generated charts. These typically include: tables, pie charts, bar graphs, heat maps, scatter plots, and gauges. Our minds often respond better to pictures than to rows of numbers: business managers who use visual data discovery tools are 28 percent more likely to find timely information than peers who only use managed reporting and dashboards.

Advanced Analytics

Descriptive analytics is when you use historical data to pinpoint the reason behind a success or failure. Business intelligence software now goes beyond this to help you shift from a historical perspective to a forward-looking perspective.

ayata-infographic-2012-10-15-720x600

Predictive analytics uses technologies such as forecasting, data mining, and simulations to tell you what is likely to happen. For example, predictive analytics can identify sales fluctuations and product popularity, which can be used to forecast inventory needs at a particular retail store. It can also identify purchase behavior and demographic information about your most valuable clients, which can be used to determine how much money should be invested to gain business from them or similar prospects. Advanced prescriptive analytics goes ever further by recommending the best course of action based on the knowledge you currently have.

_____

Business analytics software uses the data you already have to open up new ways of looking at business and operational information. CFOs can use this technology to scale performance, assess future risk, and hone their financial metrics.

Don’t let your data become virtual garbage.

The right business intelligence tools can help make data your company’s most valuable asset.

To read the original article on Technology Advice, click here.

Originally Posted at: How CFOs Can Harness Analytics by analyticsweekpick

Apache Spark for Big Analytics

Apache Spark is hot.   Spark, a top-level Apache project, is an open source distributed computing framework for advanced analytics in Hadoop.  Originally developed as a research project at UC Berkeley’s AMPLab, the project achieved incubator status in Apache in June 2013 and top-level status in February 2014.

Spark seeks to address the critical challenges for advanced analytics in Hadoop.  First, Spark is designed to support in-memory processing, so developers can write iterative algorithms without writing out a result set after each pass through the data.  This enables true high performance advanced analytics; for techniques like logistic regression, project sponsors report runtimes in Spark 100X faster than what they are able to achieve with MapReduce.

Second, Spark offers an integrated framework for advanced analytics, including a machine learning library (MLLib); a graph engine (GraphX); a streaming analytics engine (Spark Streaming) and fast interactive query tool (Shark).   (Update:  Databricks recently announced Alpha availability of Spark SQL).   This eliminates the need to support multiple point solutions, such as Giraph, and GraphLab for graph engines; Storm and S4 for streaming; or Hive and Impala for interactive queries.  A single platform simplifies integration, and ensures that users can produce consistent results across different types of analysis.

spark-project-header1-cropped

At Spark’s core is an abstraction layer called Resilient Distributed Datasets, or RDDs.  RDDs are read-only partitioned collections of records created through deterministic operations on stable data or other RDDs.  RDDs include information about data lineage together with instructions for data transformation and (optional) instructions for persistence.  They are designed to be fault tolerant, so that if an operation fails it can be reconstructed.

For data sources, Spark works with any file stored in HDFS, or any other storage system supported by Hadoop (including local file systems, Amazon S3, Hypertable and HBase).  Hadoop supports text files, SequenceFiles and any other Hadoop InputFormat.

Spark’s machine learning library, MLLib, is rapidly growing.   In the latest release it includes linear support vector machines and logistic regression for binary classification; linear regression; k-means clustering; and alternating least squares for collaborative filtering.  Linear regression, logistic regression and support vector machines are all based on a gradient descent optimization algorithm, with options for L1 and L2 regularization.  MLLib is part of a larger machine learning project (MLBase), which includes an API for feature extraction and an optimizer (currently in development with planned release in 2014).

GraphX, Spark’s graph engine, combines the advantages of data-parallel and graph-parallel systems by efficiently expressing graph computation within the Spark framework.  It enables users to interactively load, transform, and compute on massive graphs.  Project sponsors report performance comparable to Apache Giraph, but in a fault tolerant environment that is readily integrated with other advanced analytics.

Spark Streaming offers an additional abstraction called discretized streams, or DStreams.  DStreams are a continuous sequence of RDDs representing a stream of data; they are created from live incoming data or generated by transforming other DStreams.  Spark receives data, divides it into batches, then replicates the batches for fault tolerance and persists them in memory where they are available for mathematical operations.

Currently, Spark supports programming interfaces for Scala, Java and Python.  For R users, the team at Berkeley’s AMPLab released a developer preview of SparkR in January.

There is an active and growing developer community for Spark; 83 developers contributed to Release 0.9.  In the past six months, developers contributed more commits to Spark than to all of the other Apache analytics projects combined.   In 2013, the Spark project published seven double-dot releases, including Spark 0.8.1 published on December 19; this release included YARN 2.2 support, high availability mode for cluster management, performance optimizations and improvements to the machine learning library and Python interface.  The Spark team released 0.9.0 in February, 2014, and 0.9.1, a maintenance release, in April, 2014.  Release 0.9 includes Scala 2.10 support, a configuration library, improvements to Spark Streaming, the Alpha release for GraphX, enhancements to MLLib and many other enhancements).

In a nod to Spark’s rapid progress, Cloudera announced immediate support for Spark in February.   MapR recently announced that it will distribute the complete Spark stack, including Shark (Cloudera does not distribute Shark).  Hortonworks also recently announced plans to distribute Spark for machine learning, though it plans to stick with Storm for streaming analytics and Giraph for graph engines.  Databricks offers a certification program for Spark; participants currently include Adatao, Alpine Data Labs, ClearStory and Tresata.)

In December, the first Spark Summit attracted more than 450 participants from more than 180 companies.  Presentations covered a range of applications such as neuroscience, audience expansion, real-time network optimization and real-time data center management, together with a range of technical topics.  The 2014 Spark Summit will be held in San Francisco this June 30-July 2.

In recognition of Spark’s rapid development, on February 27 Apache announced that Spark is a top-level project.  Developers expect to continue adding machine learning features and to simplify implementation.  Together with an R interface and commercial support, we can expect continued interest and application for Spark.   Enhancements are coming rapidly — expect more announcements before the Spark Summit.

Source by thomaswdinsmore