Biased Machine-Learning Models (Part 2): Impact on the Hiring Process

This a follow-up to our previous blog on Biased Machine-Learning Models, in which we explored how machine-learning algorithms can be susceptible to bias (and what you can do to avoid that bias). Now, we’ll examine the impact biased predictive models can have on specific processes such as hiring.

>> Related: Predictive Analytics 101 <<

Just recently, Amazon had to scrap a predictive recruiting tool because it was biased against women. How does something like this happen? Because algorithms learn rules by looking at past data—and if the historical data is biased, the model is going to be biased as well. An even larger problem is that a machine-learning model will continue to automate the process of being biased.

When Does Bias Become an Issue?

A company itself may not be biased in the hiring process, but the current constituents on the team will dictate how the algorithm scores applicants. The algorithm is adding bias to its score because of the imbalance that already exists in the input data, and this creates a problem for filtering out candidates in the future.

Let’s look at hiring scenarios where bias might become an issue. Say your sales team is comprised of mostly 25-year-old white males. The algorithm will then interpret this as the ideal profile for a salesperson (age 25, white, male). If a female or someone older than 25 applies, the algorithm will not give them a good score. Similarly, say your accounting team is comprised of mostly 35-year-old women. Any males or younger women who apply will also score low.

Biased Candidate Model

Outside of the hiring process, the same logic can be applied to retail establishments renting out houses or even financial institutions approving loans. There may not be bias in the manual business workflow, but the historical data can create a bias in the automated predictions. For example, if an applicant lives in a neighborhood with a high concentration of young, educated Asians, the algorithm may penalize anyone who does not fit this demographic.

What Can We Do About It?

There are a variety of ways to deal with biased machine-learning models. First, look for and acknowledge any biased data. Next, add sampling techniques like Under, Over, Smote or Rose sampling methods. You can also add class weights to solve such problems, especially to increase diversity. Or, to keep it simple, simply remove age, gender, and race as inputs to the model.

Best Candidate Model

To learn more techniques for handling biased data, see our previous blog on biased machine-learning models.

See how Logi can help with your predictive analytics needs. Sign up for a free demo of Logi Predict today >

Source

Lavastorm Democratizing Big Data Analytics in Face of Skills Shortage

Democratizing Big Data refers to the growing movement of making products and services more accessible to other staffers, such as business analysts, along the lines of “self-service business intelligence” (BI).

In this case, the democratized solution is “the all-in-one Lavastorm Analytics Engineplatform,” the Boston company said in an announcement today announcing product improvements. It “provides an easy-to-use, drag-and-drop data preparation environment to provide business analysts a self-serve predictive analytics solution that gives them more power and a step-by-step validation for their visualization tools.”

It addresses one of the main challenges to successful Big Data deployments, as listed in study after study: lack of specialized talent.

“Business analysts typically encounter a host of core problems when trying to utilize predictive analytics,” Lavastorm said. “They lack the necessary skills and training of data scientists to work in complex programming environments like R. Additionally, many existing BI tools are not tailored to enable self-service data assembly for business analysts to marry rich data sets with their essential business knowledge.”

XXX
[Click on image for larger view.]The Lavastorm Analytics Engine (source: Lavastorm Analytics)

That affirmation has been confirmed many times. For example, a recent report by Capgemini Consulting, “Cracking the Data Conundrum: How Successful Companies Make Big Data Operational,” says that lack of Big Data and analytics skills was reported by 25 percent of respondents as a key challenge to successful deployments. “The Big Data talent gap is something that organizations are increasingly coming face-to-face with,” Capgemini said.

Other studies indicate they haven’t been doing such a good job facing the issue, as the self-service BI promises remain unfulfilled.

Enterprises are trying many different approaches to solving the problem. Capgemini noted that some companies are investing more in training, while others try more unconventional techniques, such as partnering with other companies in employee exchange programs that share more skilled workers or teaming up with or outright acquiring startup Big Data companies to bring skills in-house.

Others, such as Altiscale Inc., offer Hadoop-as-a-Service solutions, or, like BlueData, provide self-service, on-premises private clouds with simplified analysis tools.

Lavastorm, meanwhile, uses the strategy of making the solutions simpler and easier to use. “Demand for advanced analytic capabilities from companies across the globe is growing exponentially, but data scientists or those with specialized backgrounds around predictive analytics are in short supply,” said CEO Drew Rockwell. “Business analysts have a wealth of valuable data and valuable business knowledge, and with the Lavastorm Analytics Engine, are perfectly positioned to move beyond their current expertise in descriptive analytics to focus on the future, predicting what will happen, helping their companies compete and win on analytics.”

The Lavastorm Analytics Engine comes in individual desktop editions or in server editions for use in larger workgroups or enterprise-wide.

New predictive analytics features added to the product as listed today by Lavastorm include:

  • Linear Regression: Calculate a line of best fit to estimate the values of a variable of interest.
  • Logistic Regression: Calculate probabilities of binary outcomes.
  • K-Means Clustering: Form a user-specified number of clusters out of data sets based on user-defined criteria.
  • Hierarchical Clustering: Form a user-specified number of clusters out of data sets by using an iterative process of cluster merging.
  • Decision Tree: Predict outcomes by identifying patterns from an existing data set.

These and other new features are available today, Lavastorm said, with more analytical component enhancements to the library on tap.

The company said its approach to democratizing predictive analytics gives business analysts drag-and-drop capabilities specifically designed to help them master predictive analytics.

“The addition of this capability within the Lavastorm Analytics Engine’s visual, data flow-driven approach enables a fundamentally new method for authoring advanced analyses by providing a single shared canvas upon which users with complementary skill sets can collaborate to rapidly produce robust, trusted analytical applications,” the company said.

About the Author- David Ramel is an editor and writer for 1105 Media.

Originally posted via “Lavastorm Democratizing Big Data Analytics in Face of Skills Shortage”

Originally Posted at: Lavastorm Democratizing Big Data Analytics in Face of Skills Shortage by analyticsweekpick

Jul 25, 19: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

[  COVER OF THE WEEK ]

image
Ethics  Source

[ LOCAL EVENTS & SESSIONS]

More WEB events? Click Here

[ AnalyticsWeek BYTES]

>> Data Storytelling: What’s Easy and What’s Hard by analyticsweek

>> How Big Data Is Transforming The Fight Against Cancer by analyticsweekpick

>> Oct 25, 18: #AnalyticsClub #Newsletter (Events, Tips, News & more..) by admin

Wanna write? Click Here

[ FEATURED COURSE]

R Basics – R Programming Language Introduction

image

Learn the essentials of R Programming – R Beginner Level!… more

[ FEATURED READ]

Big Data: A Revolution That Will Transform How We Live, Work, and Think

image

“Illuminating and very timely . . . a fascinating — and sometimes alarming — survey of big data’s growing effect on just about everything: business, government, science and medicine, privacy, and even on the way we think… more

[ TIPS & TRICKS OF THE WEEK]

Grow at the speed of collaboration
A research by Cornerstone On Demand pointed out the need for better collaboration within workforce, and data analytics domain is no different. A rapidly changing and growing industry like data analytics is very difficult to catchup by isolated workforce. A good collaborative work-environment facilitate better flow of ideas, improved team dynamics, rapid learning, and increasing ability to cut through the noise. So, embrace collaborative team dynamics.

[ DATA SCIENCE Q&A]

Q:What is principal component analysis? Explain the sort of problems you would use PCA for. Also explain its limitations as a method?

A: Statistical method that uses an orthogonal transformation to convert a set of observations of correlated variables into a set of values of linearly uncorrelated variables called principal components.

Reduce the data from n to k dimensions: find the k vectors onto which to project the data so as to minimize the projection error.
Algorithm:
1) Preprocessing (standardization): PCA is sensitive to the relative scaling of the original variable
2) Compute covariance matrix ?
3) Compute eigenvectors of ?
4) Choose kk principal components so as to retain xx% of the variance (typically x=99)

Applications:
1) Compression
– Reduce disk/memory needed to store data
– Speed up learning algorithm. Warning: mapping should be defined only on training set and then applied to test set

2. Visualization: 2 or 3 principal components, so as to summarize data

Limitations:
– PCA is not scale invariant
– The directions with largest variance are assumed to be of most interest
– Only considers orthogonal transformations (rotations) of the original variables
– PCA is only based on the mean vector and covariance matrix. Some distributions (multivariate normal) are characterized by this but some are not
– If the variables are correlated, PCA can achieve dimension reduction. If not, PCA just orders them according to their variances

Source

[ VIDEO OF THE WEEK]

#BigData @AnalyticsWeek #FutureOfData #Podcast with Juan Gorricho, @disney

 #BigData @AnalyticsWeek #FutureOfData #Podcast with Juan Gorricho, @disney

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

War is 90% information. – Napoleon Bonaparte

[ PODCAST OF THE WEEK]

@AnalyticsWeek #FutureOfData with Robin Thottungal(@rathottungal), Chief Data Scientist at @EPA

 @AnalyticsWeek #FutureOfData with Robin Thottungal(@rathottungal), Chief Data Scientist at @EPA

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

Brands and organizations on Facebook receive 34,722 Likes every minute of the day.

Sourced from: Analytics.CLUB #WEB Newsletter

The New Analytics Professional: Landing A Job In The Big Data Era

Along with the usual pomp and celebration of college commencements and high school graduation ceremonies we’re seeing now, the end of the school year also brings the usual brooding and questions about careers and next steps. Analytics is no exception, and with the big data surge continuing to fuel lots of analytics jobs and sub-specialties, the career questions keep coming. So here are a few answers on what it means to be an “analytics professional” today, whether you’re just entering the workforce, you’re already mid-career and looking to make a transition, or you need to hire people with this background.

The first thing to realize is that analytics is a broad term, and there are a lot of names and titles that have been used over the years that fall under the rubric of what “analytics professionals” do: The list includes “statistician,” “predictive modeler,” “analyst,” “data miner” and — most recently — “data scientist.” The term “data scientist” is probably the one with the most currency – and hype – surrounding it for today’s graduates and upwardly mobile analytics professionals. There’s even a backlash against over-use of the term by those who slap it loosely on resumes to boost salaries and perhaps exaggerate skills.

stairs_jobs

Labeling the Data Scientist

In reality, if you study what successful “data scientists” actually do and the skills they require to do it, it’s not much different from what other successful analytics professionals do and require. It is all about exploring data to uncover valuable insights often using very sophisticated techniques. Much like success in different sports depends on a lot of the same fundamental athletic abilities, so too does success with analytics depend on fundamental analytic skills. Great analytics professionals exist under many titles, but all share some core skills and traits.
The primary distinction I have seen in practice is that data scientists are more likely to come from a computer science background, to use Hadoop, and to code in languages like Python and R. Traditional analytics professionals, on the other hand, are more likely to come from a statistics, math or operations research background, are likely to work in relational or analytics server environments, and to code in SAS and SQL.

Regardless of the labels or tools of choice, however, success depends on much more than specific technical abilities or focus areas, and that’s why I prefer the term “data artist” to get at the intangibles like good judgment and boundless curiosity around data. I wrote an article on the data artist for the International Institute for Analytics (IIA). I also collaborated jointly with the IIA and Greta Roberts from Talent Analytics to survey a wide number of analytics professionals. One of our chief goals in that 2013 quantitative study was to find out whether analytics professionals have a unique, measurable mind-set and raw talent profile.

A Jack-of-All Trades

Our survey results showed that these professionals indeed have a clear, measurable raw talent fingerprint that is dominated by curiosity and creativity; these two ranked very high among 11 characteristics we measured. They are the qualities we should prioritize alongside the technical bona fides when looking to fill jobs with analytics professionals. These qualities also happen to transcend boundaries between traditional and newer definitions of what makes an analytics professional.

This is particularly true as we see more and more enterprise analytics solutions getting built from customized mixtures of multiple systems, analytic techniques, programming languages and data types. All analytics professionals need to be creative, curious and adaptable in this complex environment that lets data move to the right analytic engines, and brings the right analytic engines to where the data may already reside.
Given that the typical “data scientist” has some experience with Hadoop and unstructured data, we tend to ascribe the creativity and curiosity characteristics automatically (You need to be creative and curious to play in a sandbox of unstructured data, after all). But that’s an oversimplification, and our Talent Analytics/International Institute of Analytics survey shows that the artistry and creative mindset we need to see in our analytics professionals is an asset regardless of what tools and technologies they’ll be working with and regardless of what title they have on their business card. This is especially true when using the complex, hybrid “all-of-the-above” solutions that we’re seeing more of today and which Gartner IT -0.48% calls the Logical Data Warehouse.

Keep all this in mind as you move forward. The barriers between the worlds of old and new; open source and proprietary; structured and unstructured are breaking down. Top quality analytics is all about being creative and flexible with the connections between all these worlds and making everything work seamlessly. Regardless of where you are in that ecosystem or what kind of “analytics professional” you may be or may want to hire, you need to prioritize creativity, curiosity and flexibility – the “artistry” – of the job.

To read the original article on Forbes, click here.

Source by analyticsweekpick

Validating a Lostness Measure

No one likes getting lost. In real life or digitally.

One can get lost searching for a product to purchase, finding medical information, or clicking through a mobile app to post a social media status.

Each link, button, and menu leads to decisions. And each decision can result in a mistake, leading to wasted time, frustration, and often the inability to accomplish tasks.

But how do you measure when someone is lost? Is this captured already by standard usability metrics or is a more specialized metric needed? It helps to first think about how we measure usability.

Measuring Usability

We recommend a number of usability measures to assess the user experience, both objectively and subjectively (which come from the ISO 9241 standard of usability). Task completion, task time, and number of errors are the most common types of objective task-based measures. Errors take time to operationalize (“What is an error?”), while task completion and time can often be collected automatically (for example, in our MUIQ platform).

Perceived ease and confidence are two common task-based subjective measures—simply asking participants how easy or difficult or how confident they are they completed the task. Both tend to correlate (r ~ .5) with objective task measures [pdf]. But do any of these objective or subjective measures capture what it means to be lost?

What Does It Mean to Be Lost?

How do you know whether someone is lost? In real life you could simply ask them. But maybe people don’t want to admit they’re lost (you know, like us guys). Is there an objective way to determine lostness?

In the 1980s, as “hypertext” systems were being developed, a new dimension was added to information regarding behavior. Designers wanted to know whether people were getting lost when clicking all those links. Earlier, Elm and Woods (1985) argued that being lost was more than a feeling (no Boston pun intended); it was a degradation of performance that could be objectively measured. Inspired by this idea, in 1996 Patricia Smith sought to objectively define lostness and described a way to objectively measure when people were lost in hypertext. But not much has been done with it since (at least that we could find).

Smith’s work has received a bit of a resurgence after Tomer Sharon cited it in Validating Product Ideas and was consequently mentioned in online articles.

While there have been other methods for quantitatively assessing navigation, in this article we’ll take a closer look at how Smith quantified lostness and how the measure was validated.

A Lostness Measure

Smith proposed a few formulas to objectively assess lostness. The measure is essentially a function of what the user does (how many screens visited) relative to the most efficient path a user could take through a system. It requires first finding the minimum number of screens or steps it takes to accomplish a task—a happy path—and then comparing that to how many total screens and unique screens a user actually visits. She settled on the following formula using these three inputs to account for two dimensions of lostness:

N=Unique Pages Visited

S=Total Pages Visited

R=Minimum Number of Pages Required to Complete Task

The lostness measure ranges from 0 (absence of lostness) to 1 (being completely lost). Formulas can be confusing and they sometimes obscure what’s being represented, so I’ve attempted to visualize this metric and show how it’s derived with the Pythagorean theorem in Figure 1 below. 

Figure 1: Visualization of the lostness measure. The orange lines with the “C” is an example of how a score from one participant can be converted into lostness using the Pythagorean theorem.

Smith then looked to validate the lostness measure using data from a previous study using 20 students (16 and 17 year olds) from the UK. Participants were asked to look for information on a university department hypertext system. Measures collected included the total number of nodes (pages), deviations, and unique pages accessed.

After reviewing videos of the users across tasks, she found that her lostness measure did correspond to lost behavior. She identified the threshold of lostness scores above .5 as being lost, while scores below .4 as not lost, and the scores between .4 and .5 as indeterminate.

The measures were also used in another study that used eight participants with more nodes as reported in Smith. In the study by Cardle (a 1994 dissertation), similar findings of Lostness and Efficiency were found. But of the eight users, one had a score above .5 (indicating lost) when he was not really lost but exploring—suggesting a possible confound with the measure.

Replication Study

Given the small amount of data used to validate the lostness measure (and the dearth of information since), we conducted a new study to collect more data, to confirm thresholds of lostness, and see how this measure correlates with other widely used usability measures.

Between September and December 2018 we reviewed 73 videos of users attempting to complete 8 tasks from three studies. The studies included consumer banking websites, a mobile app for making purchases, and a findability task on the US Bank website that asked participants to find the name of the VP and General Counsel (an expected difficult task). Each task had a clear correct solution and “exploring” behavior wasn’t expected, thus minimizing possible confounds with natural browsing behavior that may look like lostness (e.g., looking at many pages repeatedly).

Sample sizes ranged from 5 to 16 for each task experience. We selected tasks that we hoped would provide a good range of lostness. For each task, we identified the minimum number of screens needed to complete each task for the lostness measure (R), and reviewed each video to count the total number of screens (S) and number of unique screens (N). We then computed the lostness score for each task experience. Post-task ease was collected using the SEQ and task time and completion rates were collected in the MUIQ platform.

Study Results

Across the 73 task experiences we had a good range of lostness, from a low of 0 (perfect navigation) to a high of .86 (very lost) and a mean lostness score of .34. We then aggregated the individual experiences by task.

Table 1 shows the lostness score, post-task ease, task time, and completion rate aggregated across the tasks, with lostness scores ranging from .16 to .72 (higher lostness scores mean more lostness).

Task Lostness Ease Time Completion % Lost
1 0.16 6.44 196 100% 6%
2 0.26 6.94 30 94% 19%
4 0.33 6.00 272 40% 40%
3 0.34 6.19 83 100% 44%
6 0.37 4.60 255 80% 60%
7 0.51 4.40 193 100% 40%
8 0.66 2.20 339 60% 100%
5 0.72 2.40 384 60% 100%

Table 1: Lostness, ease (7-point SEQ scale), time (in seconds), completion rates, and % lost (> .5) for the eight tasks. Tasks sorted by lostness score, from least lost to most lost.

Using the Smith “lost” threshold of .5, we computed a binary metric of lost/not lost for each video and computed the average percent lost per task (far right column in Table 1).

Tasks 8 and 5 have both the highest lostness scores and percent being lost. All participants had lostness scores above .5 and were considered “lost.” In contrast, only 6% and 19% of participants were “lost” on tasks 1 and 2.

You can see a pattern between lostness and the ease, time, and completion rates in Table 1. As users get more lost (lostness goes up), the perception of ease goes down, time goes up. The correlations between lostness and these task-level measures are shown in Table 2 at both the task level and individual level.

Metric Task Level
r
Individual Level
r
Ease -0.95* -0.52*
Comp -0.46 -0.17
Time 0.72* 0.51*

Table 2: Correlations between lostness and ease, completion rates, and time at the task level (n=8) and individual level (n = 73). * indicates statistically significant at the p < .05 level

As expected, correlations are higher at the task level as the individual variability is smoothed out through the aggregation, which helps reveal patterns. The correlation between ease and lostness is very high (r = -.95) at the task level and to a lesser extent at the individual level r = -.52. Interestingly, despite differing tasks, the correlation between lostness and task time is also high and significant at r= .72 and r = .51 at the task and individual levels.

The correlation with completion rate, while in the expected direction, is more modest and not statistically significant (see the “Comp” row in Table 2). This is likely a consequence of both the coarseness of this metric (binary) and a restriction in range with most tasks in our dataset having high completion rates.

The strong relation between perceived ease and lostness can be seen in the scatter plot in Figure 2, with users’ perception of the task ease accounting for a substantial ~90% of the variance in lostness. At least with our dataset, it appears that average lostness is well accounted for by ease. That is, participants generally rate high lostness tasks as difficult.

Figure 2: Relationship between lostness and ease (r = -.95) for the 8 tasks; p < .01. Dots represent the 8 tasks.

 

Ease N % Lost Mean Lostness
1 6 1.00 0.73
2 4 1.00 0.72
3 4 1.00 0.62
4 3 0.00 0.17
5 3 0.33 0.26
6 13 0.31 0.35
7 40 0.23 0.24

Table 3: Percent of participants “lost” and mean lostness score for each point on the Single Ease Question (SEQ).

Further examining the relationship between perceived ease and lostness, Table 3 shows the average percent of participants that were marked as lost (scores above .5) and the mean lostness score for each point on the Single Ease Question (SEQ) scale. More than half the task experiences were rated 7 (the easiest task score), which corresponds to low lostness scores (below .4). SEQ scores at 4 and below all have high lostness scores (above .6), providing an additional point of concurrent validity for the lostness measure. Table 4 further shows an interesting relationship. The threshold when lostness scores go from not lost to lost happens around the historical SEQ average score of 5.5, again suggesting that below average ease is associated with lostness. It also reinforces the idea that the SEQ (a subjective score) is a good concurrent indicator of behavior (objective data).

Lostness N Mean SEQ Score
0 28 6.6
0.3 12 6.3
0.4 4 6.5
0.5 6 5.7
0.6 5 4.4
0.7 8 3.5
0.8 10 4.1

Table 4: Lostness scores aggregated into deciles with corresponding mean SEQ scores at each decile.

Validating the Lostness Thresholds

To see how well the thresholds identified by Smith predicted actual lostness, we reviewed the videos again and made a judgment as to whether the user was struggling or giving any indication of lostness (toggling back and forth, searching, revisiting pages). Of the 73 videos, two analysts independently reviewed 55 (75%) of the videos and made a binary decision whether the participant was lost or not lost (similar to the characterization described by Smith).

Lost Example: For example, one participant, when looking for the US Bank General Counsel, kept going back to the home page, scrolling to the bottom of the page multiple times, and using the search bar multiple times. This participant’s lostness score was .64 and was marked as “lost” by the evaluator.

Not Lost Example: In contrast, another participant, when looking for checking account fees, clicked a checking account tab, inputted their zip code, found the fees, and stopped the task. This participant’s lostness score was 0 (perfect) and was marked as “not lost” by the evaluator.

Table 5 shows the number of participants identified as lost by the evaluators corresponding to their lostness score grouped into deciles.

Lostness Score N # Lost % Lost
0 28 1 4%
0.3 6 2 33%
0.4 1 1 100%
0.5 3 1 33%
0.6 4 4 100%
0.7 5 5 100%
0.8 8 6 75%

Table 5: Percent of participants characterized as lost or not lost from evaluators watching the videos.

For example, of the 28 participant videos with a lostness score of 0, only 1 (4%) was considered lost. In contrast, 6 out of the 8 (75%) participants with lostness scores between .8 and .9 were considered lost. We do see good corroboration with the Smith thresholds. Only 9% (3 of 34) of participants with scores below .4 were considered lost. Similarly, 89% (16 of 18) participants were considered lost who had scores above .5.

Another way to look at the data, participants who were lost had a lostness score that was more than 5 times as high as those who weren’t lost (.61 vs .11; p <.01).

 

Summary and Takeaways

An examination of a method for measuring lostness revealed:

Lostness as path taken relative to the happy path. An objective lostness measure was proposed over 20 years ago that uses the sum of two ratios: the number of unique pages relative to the minimum number of pages, and the total number of pages relative to the unique number of pages. Computing this lostness measure requires identifying the minimum number of pages or steps needed to complete a task (the happy path) as well as counting all screens and the number of unique screens (a time-consuming process). A score of 0 represents perfectly efficient navigation (not lost) while a score of 1 indicates being very lost.

Thresholds are supported but not meant for task failure. Data from the original validation study had suggested lostness values below .4 indicated that participants weren’t lost and values above .5 as participants being lost. Our data corroborated these thresholds as 91% of participants with scores below .4 were not considered lost and 89% of participants with scores above .5 were lost. The thresholds and score, however, become less meaningful when a user fails or abandons a task and visits only a subset of the essential screens, which decreases their lostness score. This suggests lostness may be best as a secondary measure to other usability metrics, notably task completion.

Perceived ease explains lostness. In our data, we found that average task-ease scores (participant ratings on the 7-point SEQ) explained 95% of the variance in lostness scores. At least with our data, in general, when participants were lost, they knew it and rated the task harder (at least when aggregated across tasks). While subjective measures aren’t a substitute for objective measures, they do correlate, and post-task ease is quick to ask and analyze. Lower SEQ scores already indicate a need to look further for the problems and this data suggests participants getting lost may be a culprit for some tasks.

Time-consuming process is ideally automated. To collect the data for this validation study we had to review participant videos several times to compute the lostness score (counting screens and unique screens). It may not be worth the effort to review videos just to identify a lostness score (especially if you’re able to more quickly identify the problems users are having with a different measure). However, a lostness score can be computed using software (something we are including in our MUIQ platform). Researchers will still need to input the minimal number of steps (i.e., the happy path) per task but this measure, like other measures such as clicking non-clickable elements, may help quickly diagnose problem spots.

There’s a distinction between browsing and being lost. The tasks used in our replication study all had specific answers (e.g. finding a checking account’s fees). These are not the sort of tasks participants likely want to spend any more time (or steps) on than they need to. For these “productivity” tasks where users know exactly what they need to do or find, lostness may be a good measure (especially if it’s automatically collected). However, for more exploratory tasks where only a category is defined and not a specific item, like browsing for clothing, electronics, or the next book to purchase, the natural back-and-forth of browsing behavior may quantitatively look like lostness. A future study can examine how well lostness holds up under these more exploratory tasks.

(function() {
if (!window.mc4wp) {
window.mc4wp = {
listeners: [],
forms : {
on: function (event, callback) {
window.mc4wp.listeners.push({
event : event,
callback: callback
});
}
}
}
}
})();

Sign-up to receive weekly updates.

Source: Validating a Lostness Measure

Jul 18, 19: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

[  COVER OF THE WEEK ]

image
Accuracy check  Source

[ LOCAL EVENTS & SESSIONS]

More WEB events? Click Here

[ FEATURED COURSE]

CS109 Data Science

image

Learning from data in order to gain useful predictions and insights. This course introduces methods for five key facets of an investigation: data wrangling, cleaning, and sampling to get a suitable data set; data managem… more

[ FEATURED READ]

Storytelling with Data: A Data Visualization Guide for Business Professionals

image

Storytelling with Data teaches you the fundamentals of data visualization and how to communicate effectively with data. You’ll discover the power of storytelling and the way to make data a pivotal point in your story. Th… more

[ TIPS & TRICKS OF THE WEEK]

Data Have Meaning
We live in a Big Data world in which everything is quantified. While the emphasis of Big Data has been focused on distinguishing the three characteristics of data (the infamous three Vs), we need to be cognizant of the fact that data have meaning. That is, the numbers in your data represent something of interest, an outcome that is important to your business. The meaning of those numbers is about the veracity of your data.

[ DATA SCIENCE Q&A]

Q:What is POC (proof of concept)?
A: * A realization of a certain method to demonstrate its feasibility
* In engineering: a rough prototype of a new idea is often constructed as a proof of concept

Source

[ VIDEO OF THE WEEK]

Ashok Srivastava(@aerotrekker @intuit) on Winning the Art of #DataScience #FutureOfData #Podcast

 Ashok Srivastava(@aerotrekker @intuit) on Winning the Art of #DataScience #FutureOfData #Podcast

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

Data beats emotions. – Sean Rad, founder of Ad.ly

[ PODCAST OF THE WEEK]

#BigData @AnalyticsWeek #FutureOfData with Jon Gibs(@jonathangibs) @L2_Digital

 #BigData @AnalyticsWeek #FutureOfData with Jon Gibs(@jonathangibs) @L2_Digital

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

Data production will be 44 times greater in 2020 than it was in 2009.

Sourced from: Analytics.CLUB #WEB Newsletter

HP boosts Vertica big data capabilities with streaming analytics

HP chases big data strategy with Vertica additions and startup accelerator

HP has revealed an updated version of its Vertica big data analytics platform in a bid to fulfil a data-oriented strategy that benefits businesses and non-data scientists.

HP Vertica will gain data streaming capabilities and advanced log file text searching to enable high-speed analytics on big data collected from sources such as the Internet of Things (IoT).

The new version of Vertica, codenamed Excavator, will offer support for Apache Kafka, an open source distributed messaging system, to allow organisations to harvest and analyse streaming data in near real time.

HP claimed that this new capability allows Excavator to be used in a wide range of monitoring and process control deployments in sectors such as manufacturing, healthcare and finance.

The addition of advanced machine log text search in Excavator will allow companies to collect and organise large log file datasets generated by systems and applications and provide more scope in predicting and identifying application failures and cyber attacks, along with the ability to see authorised and unauthorised access to apps.

HP showed its commitment to big data-driven businesses by announcing the Haven Startup Accelerator, a programme designed to expand HP’s ecosystem of developers by offering free access to community versions of Vertica, and affordable access to the firm’s big data software and services.

Embracing open source
HP has added native integration of Vertica with Apache Spark in a move to embrace the scalability of open source software and platforms for big data analytics. The firm has also enabled Vertica to support SQL queries made on native files and popular formats found in Hadoop data and deployments.

HP will integrate Vertica with Apache Spark to allow data to be transferred between the database platform and the cluster computing framework, giving developers the option to build data models in Spark and run them through Vertica’s analytics capabilities.

Furthermore, the company is making its Flex Zone available as open source software, which allows companies to analyse semi-structured data without needing to carry out intensive coding to prepare a system for the data ahead of analysis.

HP appears to be bolstering its portfolio of enterprise-grade products in preparation for its split into two separate companies, Hewlett Packard Enterprise and HP Inc.

Note: This article originally appeared in V3. Click for link here.

Source

The Importance of Workforce Analytics

Although organizations must make decisions based on a variety of factors and perspectives, few are as important as human resources when it comes to taking action. A company’s workforce is vitally important but concurrently one of the more complex sides of operating a business. Instead of clean, hard data, employees can present a variety of qualitative factors that are hard to put into numbers that work for analytics.

Even so, an organization’s human capital is perhaps its most important asset. Building an in-depth understanding of your staff can, therefore, deliver better answers and give you a competitive edge. More than acting as a way of punishing employees, however, workforce analytics—sometimes called people analytics—can empower your team by providing better insights as to what works and doesn’t. Furthermore, it can help uncover all the tools employees need to succeed. Let’s begin by breaking down the meaning of workforce analytics.

What are Workforce Analytics?

Workforce analytics, which is a part of HR analytics, are used to track and measure employee-related data and optimize organizations’ human resource management and decision-making. The field focuses on much more than hiring and firing by also concentrating on the return on value for every hire. Moreover, it highlights more specific data that assists with identifying workplace trends such as potential risk factors, satisfaction with decisions, and more.

Additionally, workforce analytics can evaluate more than just existing staff by also analyzing the trends that surround employment. For instance, companies can see which periods of the year have a higher number of applicants and adjust their recruitment efforts, or measure diversity efforts as well as employee engagement without having to resort to more invasive or subjective methods that may provide false positives.

What Are some Key Benefits of Workforce Analytics?

More so than tracking the number of employees and what they’re making, workforce analytics provides a comprehensive view of your organization’s workers designed to interpret historic trends and create predictive models that lead to insights and better decisions in the future. Some of the key benefits of workforce analytics include:

  • Find areas where efficiency can be improved with automation – While workers are an asset to a company, sometimes the tasks they do can reduce their productivity or provide minimal returns. Workforce analytics can discover areas where tasks can be relegated to machines via automation, allowing workers to instead dedicate their efforts to more important and valuable activities.
  • Improve workers’ engagement by understanding their needs and satisfaction – More than simply looking for firing and hiring information, workforce and people analytics can help a company understand why their employees are not performing their best, and the factors that are impacting productivity. This is more to maintain the current workforce instead of replacing it. The goal is to uncover those factors affecting performance and engagement and to overcome them by fostering better conditions.
  • Create better criteria for hiring new staff and provide a better hiring process – Finding new talent is always complex regardless of a company’s size or scope. Workforce analytics can shed light exactly on what is needed from a new hire by a department based on previous applicants, their success, and the company’s needs. More importantly, they can understand new candidates based on this historical data to determine whether they would be a good fit or not. For instance, a company seeking to hire a new developer may think twice about hiring a server-side programmer after several previous hires with similar experience didn’t work out.

What Key Metrics Should I Track for Workforce Analytics?

  • Employee productivity – We still talk about the 9 to 5 work day, but the current reality for many employees dictates that work hours tend to be more flexible and variable. As such, measuring productivity by the number of hours worked is no longer fully accurate. Instead, creating a productivity index which includes a few different data points will give a much better idea of how employees are performing.
  • Early turnover – Another important area that is often neglected when measuring satisfaction is how quickly employees are leaving on their own. A high early turnover rate is an indicator that things are not working both in terms of meeting expectations and employee satisfaction.
  • Engagement – This may seem superfluous, but employees who are engaged with their work are more likely to be productive. Measuring engagement includes tracking employee satisfaction, stress levels, and employees’ belief in the company’s ideals. High engagement is a great sign that HR is doing its job.

Conclusion

Focusing your data gathering internally can help you improve your company’s productivity. By honing in on your human resources and finding ways to empower your team, people analytics can boost your company’s efficiency, leading to happier and more productive colleagues.

Source

Jul 11, 19: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

[  COVER OF THE WEEK ]

image
Insights  Source

[ LOCAL EVENTS & SESSIONS]

More WEB events? Click Here

[ AnalyticsWeek BYTES]

>> The Challenges Canadian Companies Face When Implementing Big Data by analyticsweekpick

>> Voices in AI – Episode 89: A Conversation with Doug Lenat by analyticsweekpick

>> BARC Survey Shows New Benefits from Embedded Analytics by analyticsweek

Wanna write? Click Here

[ FEATURED COURSE]

Statistical Thinking and Data Analysis

image

This course is an introduction to statistical data analysis. Topics are chosen from applied probability, sampling, estimation, hypothesis testing, linear regression, analysis of variance, categorical data analysis, and n… more

[ FEATURED READ]

The Signal and the Noise: Why So Many Predictions Fail–but Some Don’t

image

People love statistics. Statistics, however, do not always love them back. The Signal and the Noise, Nate Silver’s brilliant and elegant tour of the modern science-slash-art of forecasting, shows what happens when Big Da… more

[ TIPS & TRICKS OF THE WEEK]

Keeping Biases Checked during the last mile of decision making
Today a data driven leader, a data scientist or a data driven expert is always put to test by helping his team solve a problem using his skills and expertise. Believe it or not but a part of that decision tree is derived from the intuition that adds a bias in our judgement that makes the suggestions tainted. Most skilled professionals do understand and handle the biases well, but in few cases, we give into tiny traps and could find ourselves trapped in those biases which impairs the judgement. So, it is important that we keep the intuition bias in check when working on a data problem.

[ DATA SCIENCE Q&A]

Q:Is it better to spend 5 days developing a 90% accurate solution, or 10 days for 100% accuracy? Depends on the context?
A: * “premature optimization is the root of all evils”
* At the beginning: quick-and-dirty model is better
* Optimization later
Other answer:
– Depends on the context
– Is error acceptable? Fraud detection, quality assurance

Source

[ VIDEO OF THE WEEK]

#BigData @AnalyticsWeek #FutureOfData #Podcast with @MPFlowersNYC, @enigma_data

 #BigData @AnalyticsWeek #FutureOfData #Podcast with @MPFlowersNYC, @enigma_data

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

The data fabric is the next middleware. – Todd Papaioannou

[ PODCAST OF THE WEEK]

Understanding #FutureOfData in #Health & #Medicine - @thedataguru / @InovaHealth #FutureOfData #Podcast

 Understanding #FutureOfData in #Health & #Medicine – @thedataguru / @InovaHealth #FutureOfData #Podcast

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

A quarter of decision-makers surveyed predict that data volumes in their companies will rise by more than 60 per cent by the end of 2014, with the average of all respondents anticipating a growth of no less than 42 per cent.

Sourced from: Analytics.CLUB #WEB Newsletter

Webinar: Improving the Customer Experience Using Big Data, Customer-Centric Measurement and Analytics

pivotal-logo-taglineI recently gave a talk on how to improve the customer experience using Big Data, customer-centric measurement and analytics. My talk was hosted by the good people at Pivotal (recently Cetas).

You can view the webinar by registering here or you can view the slides below. In this webinar, Improving the Customer Experience Using Big Data, Customer-Centric Measurement and Analytics, I include content from my new book “TCE – Total Customer Experience: Building Business Through Customer-Centric Measurement and Analytics.” I discuss three areas: measuring the right customer metrics, integrating disparate data silos and using Big Data to answer strategic business questions. Using the right customer metrics in conjunction with other business data, businesses will be able to extract meaningful results that help executives make the right decisions to move their company forward.

In the book, I present best practices in measurement and analytics for customer experience management (CEM) programs.  Drawing on decades of research and practice, I illustrate analytical best practices in the field of customer experience management that will help you increase the value of all your business data to help improve the customer experience and increase customer loyalty.

 

Originally Posted at: Webinar: Improving the Customer Experience Using Big Data, Customer-Centric Measurement and Analytics by bobehayes