Data Science Programming: Python vs R

“Data Scientist – The Sexiest Job of 21st Century.”- Harvard Business Review

If you are already into a big data related career then you must already be familiar with the set of big data skillsthat you need to master to grab the sexiest job of 21st century. With every industry generating massive amounts of data – the need to crunch data requires more powerful and sophisticated programming tools like Python and R language. Python and R are among the popular programming languages that a data scientist must know to pursue a lucrative career in data science.

Python is popular as a general purpose web programming language whereas R is popular for its great features for data visualization as it was particularly developed for statistical computing. At DeZyre, our career counsellors often get questions from prospective students as to what should they learn first Python programming or R programming. If you are unsure on which programming language to learn first then you are on the right page.

Python and R language top the list of basic tools for statistical computing among the set of data scientist skills.Data scientists often debate on the fact that which one is more valuable R programming or Python programming, however both the programming languages have their specialized key features complementing each other.

Data Science with Python Language

Data science consists of several interrelated but different activities such as computing statistics, building predictive models, accessing and manipulating data, building explanatory models, data visualizations, integrating models into production systems and much more on data. Python programming provides data scientists with a set of libraries that helps them perform all these operations on data.

Python is a general purpose multi-paradigm programming language for data science that has gained wide popularity-because of its syntax simplicity and operability on different eco-systems. Python programming can help programmers play with data by allowing them to do anything they need with data – data munging, data wrangling, website scraping, web application building, data engineering and more. Python language makes it easy for programmers to write maintainable, large scale robust code.

“Python programming has been an important part of Google since the beginning, and remains so as the system grows and evolves. Today dozens of Google engineers use Python language, and we’re looking for more people with skills in this language.” – said Peter Norvig, Director at Google.

Unlike R language, Python language does not have in-built packages but it has support for libraries like Scikit, Numpy, Pandas, Scipy and Seaborn that data scientists can use to perform useful statistical and machine learningtasks. Python programming is similar to pseudo code and makes sense immediately just like English language. The expressions and characters used in the code can be mathematical, however, the logic can be easily adhered from the code.

What makes Python language the King of Data Science Programming Languages?

“In Python programming, everything is an object. It’s possible to write applications in Python language using several programming paradigms, but it does make for writing very clear and understandable object-oriented code.”- said Brian Curtin, member of Python Software Foundation

1) Broadness

The public package index for Python language popularly known as PyPi has approximately 40K add-ons available listed under 300 different categories. So, if a developer or a data scientist has to do something with Python language then there is high probability that someone already has it and they need not begin from the scratch. Python programming is used extensively for various tasks ranging from CGI and web development, system testing and automation, and ETL to gaming.

2) Efficient

Developers these days spend lot of time in defining and processing big data. With the increasing amount of data that needs to be processed, it becomes extremely important for programmers to efficiently manage the in-memory usage. Python language has generators both from functions and also as expressions which helps in iterative processing i.e. one item at a time. When there are large number of processes to be applied to a set of data in that case generators in Python language prove to be great advantage as they grab the source data ,one item at a time and then pass through the entire processing chain.

The generator based migration tool collective.transmogrifier helps make complex and interdependent updates to the data as it is being processed from the old site and then allows the programmers to create and store objects in constant memory at the new site.The transmogrifier plays vital role in Python programming when dealing with larger data sets.

3) Can be Easily Mastered Under Expert Guidance-Read It, Use it with Ease

Python language has gained wide popularity as the syntax is clear and readable making it easy to learn under expert guidance. Data scientists can gain expertise knowledge and master programming with Python in scientific computing by taking industry expert oriented Python programming courses. The readability of the syntax makes it easier for other peer programmers update already written Python programs at a faster pace and also helps write new programs quickly.

Applications of Python language-

  • Python programming is used by Mozilla for exploring their broad code base. Mozilla releases several open source packages built using Python.
  • Dropbox, a popular file hosting service founded by Drew Houston as he kept forgetting his USB. The project was started to fulfill his personal needs but it turned out to be so good that even others started using it.Dropbox is completely written in Python language which now has close to 150 million registered users.
  • Walt Disney uses Python language to enhance the supremacy of their creative processes.
  • Some other exceptional products written in Python language are –

i. Cocos2d-A popular open source 2D gaming framework

ii.Mercurial- A popular cross-platform, distributed code revision control tool used by developers.

iii.Bit Torrent- File sharing software

iv.Reddit- Entertainment and Social News website.

Limitations of Python Programming-

  • Python is an interpreted language and thus is many a times slower than the compiled languages.
  • “A possible disadvantage of Python is its slow speed of execution. But many Python packages have been optimized over the years and execute at C speed.”- said Pierre Carbonnelle, a Python programmer who runs the PyPL language index.
  • Python language being a dynamically typed language poses certain design restrictions. It requires rigorous testing because errors show up only during runtime.
  • Python programming has gained popularity on desktop and server platforms but is still weak on mobile computing platforms as there are very less number of mobile apps that are developed using Python language. Python programming can be rarely found on the client side of web applications.

Click here to know more about our IBM Certified Hadoop Developer course

Data Science with R Language

Millions of data scientists and statisticians use R programming to get away with challenging problems related to statistical computing and quantitative marketing. R language has become an essential tool for finance and business analytics-driven organizations like LinkedIn, Twitter, Bank of America, Facebook and Google.

R is an open source programming language and environment for statistical computing and graphics available on Linux, Windows and Mac. R language has an innovative package system that allows developers to extend the functionality to new heights by providing cross-platform distribution and testing of data and code. With more than 5K publicly released packages available for download, it is just a great programming language for exploratory data analysis language can easily be integrated with other object oriented programming languages like C, C++ and Java. R language has array-oriented syntax making it easier for programmers to translate math to code, in particular for professionals with minimal programming background.

Why use R programming for data science?

1.R language is one of the best tools for data scientists in the world of data visualization. It virtually has everything that a data scientist needs- statistical models, data manipulation and visualization charts.

2.Data scientists can create unique and beautiful data visualizations with R language that go far beyond the out-dated line plots and bar charts. With R programming, data scientists can draw meaningful insights from data in multiple dimensions using 3D surfaces and multi-panel charts. The Economist and The New York Times exploit the custom charting capabilities of R programming to create stunning infographics.

3.One great feature of R programming is its reproducible research-the code and data can be given to an interested third party which can trace it back to reproduce the same results. Thus, data scientists need to write code that will extract the data, analyse it and generate a HTML, PDF or a PPT for reporting. When any other third party is interested, the original author can share the code and data with the third party for reproducing similar results.

4.R language is designed particularly for data analysis with a flexibility to mix and match various statistical and predictive models for best possible outcomes. R programming scripts can further be automated with ease to promote production deployments and reproducible research.

5.R language has rich community of approximately 2 million users and close to 1000’s of developers that draws talents of data scientists spread across the world. The community has packages widespread across actuarial analysis, finance, machine learning, web technologies,pharmaceuticals that can be of great help to predict component failure times, analyse genomic sequences, and optimize portfolios. All these resources created by experts in various domains can be accessed easily for free, online.

Applications of R Language

  • Ford uses open source tools like R programming and Hadoop for data driven decision support and statistical data analysis.
  • The popular insurance giant Lloyd’s uses R language to create motion charts that provide analysis reports to investors.
  • Google uses R programming to analyse the effectiveness of online advertising campaigns, predict economic activities and measure the ROI of advertising campaigns.
  • Facebook uses R language to analyse the status updates and create the social network graph.
  • Zillow makes use of R programming to promote the housing prices.

Limitations of R Language

  • R programming has a steep learning curve for professionals who do not come from a programming background (professionals hailing from a GUI world like that of MicrosoftExcel).
  • Working with R language can at times be slow if the code is written poorly, however, there are solutions to this like FastR package, pqR and Penjin.

Data Science with Python or R Programming- What to learn first?

There are certain strategies that will help professionals decide their call of action on whether to begin learning data science with Python language or with R language –

  • If professionals are aware of the fact on what kind of project they will be working on then they can make a decision on which language to learn first. If the projects requires working with jumbled or scrape data from files, websites or any other sources of data then professionals must first start their learning with Python language. On the other hand, if the project requires working with clean data then professionals must first learn to focus on the data analysis part which requires learning R programming first.
  • It is always better to be on-par with the teams so find out what data science  programming language are they using R or Python. Collaboration and learning becomes much easier if you and your team mates are on the same language paradigm.
  • Trends in increasing data scientist jobs will help make a better decision on which what to learn first R language or Python language.
  • Last but not the least, do consider your personal preferences as to what interests you more and which is easier for you to grasp.

Having understood briefly about Python language and R language, the bottom line here is that it is difficult to choose learning any one language first -Python or R to crack data scientist jobs in top big data companies. Each one has its own advantages and disadvantages based on the different scenarios and tasks to be performed. Thus, the best solution is to make a smart move based on the above listed strategies and decide which language you should learn first that will fetch you a job with big data scientist salary and later add onto your skill set by learning the other language.

To read the original article on DeZyre, click here.

Source: Data Science Programming: Python vs R

@DarrWest / @BrookingsInst on the Future of Work: AI, Robots & Automation #JobsOfFuture

[youtube https://www.youtube.com/watch?v=aEfVIY09p3o]

In this podcast Darrell West (@DarrWest) from @BrookingsInst talks about the future of work, worker and workplace. He sheds light into this research into changing work landscape and importance of policy design to compensate for technology disruption. Darrell shares his thoughts on how business, professionals and governments could come together to minimize the technology driven joblessness impact and help stimulate the economy by placing everyone in futuristic roles.

Darrell’s Recommended Read:
Einstein: His Life and Universe by Walter Isaacson https://amzn.to/2JA4hsK

Podcast Link:
iTunes: http://math.im/itunes
GooglePlay: http://math.im/gplay

Darrell’s BIO:
Darrell West is the Vice President of Governance Studies and Director of the Center for Technology Innovation at the Brookings Institution. Previously, he was the John Hazen White Professor of Political Science and Public Policy and Director of the Taubman Center for Public Policy at Brown University. His current research focuses on technology policy, the Internet, digital media, health care, education, and privacy and security.

The Center that he directs examines a wide range of topics related to technology innovation including technology policy, public sector innovation; legal and Constitutional aspects of technology; digital media and social networking; health information technology; virtual education, and green technology. Its mission is to identify key developments in technology innovation, undertake cutting-edge research, disseminate best practices broadly, inform policymakers at the local, state, and federal levels about actions needed to improve innovation, and enhance the public’s and media’s understanding of the importance of technology innovation.

West is the author of written 23 books and my most recent one published two weeks ago is The Future of Work: Robots, AI, and Automation (Brookings Institution Press) some of the past work includes: How Technology Can Transform Education (Brookings, 2012); The Next Wave: Using Digital Technology to Further Social and Political Innovation (Brookings, 2011), Brain Gain: Rethinking U.S. Immigration Policy (Brookings, 2010), Digital Medicine: Health Care in the Internet Era (Brookings, 2009), Digital Government: Technology and Public Sector Performance, (Princeton University Press, 2005), and Air Wars: Television Advertising in Election Campaigns (Congressional Quarterly Press, 2005), among others. He is the winner of the American Political Science Association’s Don K. Price award for best book on technology (for Digital Government) and the American Political Science Association’s Doris Graber award for best book on political communications (for Cross Talk).

About #Podcast:
#JobsOfFuture podcast is a conversation starter to bring leaders, influencers and lead practitioners to come on show and discuss their journey in creating the data driven future.

Wanna Join?
If you or any you know wants to join in,
Register your interest @ http://play.analyticsweek.com/guest/

Want to sponsor?
Email us @ info@analyticsweek.com

Keywords:
#JobsOfFuture #Leadership #Podcast #Future of #Work #Worker & #Workplace

Originally Posted at: @DarrWest / @BrookingsInst on the Future of Work: AI, Robots & Automation #JobsOfFuture

The Beginner’s Guide to Predictive Workforce Analytics

Greta Roberts, CEO
Talent Analytics, Corp.

Human Resources Feels Pressure to Begin Using Predictive Analytics
Today’s business executives are increasingly applying pressure to their Human Resources departments to “use predictive analytics”.  This pressure isn’t unique to Human Resources as these same business leaders are similarly pressuring Sales, Customer Service, IT, Finance and every other line of business (LOB) leader, to do something predictive or analytical.

Every line of business (LOB) is clear on their focus. They need to uncover predictive analytics projects that somehow affect their bottom line. (Increase sales, increase customer service, decrease mistakes, increase calls per day and the like).

Human Resources Departments have a Different, and Somewhat Unique, Challenge not Faced By Most Other Lines of Business
When Human Resources analysts begin a predictive analytics initiative, what we see mirrors what every other line of business does. Somehow for HR, instead of having a great outcome it can be potentially devastating.

Unless the unique challenge HR faces is understood, it can trip up an HR organization for a long time, cause them to lose analytics project resources and funding, and continue to perplex HR as they have no idea how they missed the goal of the predictive initiative so badly.

Human Resources’ Traditional Approach to Predictive Projects
Talent Analytics’ experience has been that (like all other lines of business) when Human Resources focuses on predictive analytics projects, they look around for interesting HR problems to solve; that is, problems inside of the Human Resources departments. They’d like to know if employee engagement predicts anything, or if they can use predictive work somehow with their diversity challenges, or predict a flight risk score that is tied to how much training or promotions someone has, or see if the kind of onboarding someone has relates to how long they last in a role. Though these projects have tentative ties to other lines of business, these projects are driven from an HR need or curiosity.

HR (and everyone else) Needs to Avoid the “Wikipedia Approach” to Predictive Analytics
Our firm is often asked if we can “explore the data in the HR systems” to see if we can find anything useful. We recommend avoiding this approach as it is exactly the same as beginning to read Wikipedia from the beginning (like a book) hoping to find something useful.

When exploring HR data (or any data) without a question, what you’ll find are factoids that will be “interesting but not actionable”. They will make people say “really, I never knew that”, but nothing will result.  You’ll pay an external consultant a lot of money to do this, or have a precious internal resource do this – only to gain little value without any strategic impact.  Avoid using the Wikipedia Approach – at least at first.  Start with a question to solve.  Don’t start with a dataset.

Human Resources Predictive Project Results are Often Met with Little Enthusiasm
Like all other Lines of Business, HR is excited to show results of their HR focused predictive projects.

The important disconnect. HR shows results that are meaningful to HR only.

Perhaps there is a prediction that ties # of training classes to attrition, or correlates performance review ratings with how long someone would last in their role. This is interesting information to HR but not to the business.

Here’s what’s going on.

Business Outcomes Matter to the Business.  HR Outcomes Don’t.
Human Resources departments can learn from the Marketing Department who came before them on the predictive analytics journey. Today’s Marketing Departments, that are using predictive analytics successfully, are arguably one of the strongest and most strategic departments of the entire company.

Today’s Marketing leaders predict customers who will generate the most revenue (have high customer lifetime value). Marketing Departments did not gain any traction with predictive analytics when they were predicting how many prospects would “click”. They needed to predict how many customers would buy.

Early predictive efforts in the Marketing Department used predictive analytics to predict how many webinars they’ll need to conduct to get 1,000 new prospects in their prospect database.  Or, how much they’d need to spend on marketing campaigns to get prospects to click on a coupon. (Adding new prospect names to a prospect database is a marketing goal not a business goal.  Clicking on a coupon is a marketing goal not a business goal). Or, they could predict that customer engagement would go up if they gave a discount on a Friday (again, this is a marketing goal not a business goal. The business doesn’t care about any of these “middle measures” unless they can be proved and tracked to the end business outcome.

Marketing Cracked the Code
Business wants to reliably predict how many people would buy (not click) using this coupon vs. that one.  When marketing predicted real business outcomes, resources, visibility and funding quickly became available.

When Marketing was able to show a predictive project that could identify what offer to make so that a customer bought and sales went up – business executives took notice. They took such close notice that they highlighted what Marketing was able to do, they gave Marketing more resources and funding and visibility. Important careers were made out of marketing folks who were / are part of strategic predictive analytics projects that delivered real revenue and / or real cost savings to the business’s bottom line.

Marketing stopped being “aligned” with the business, Marketing was the business.

Human Resources needs to do the same thing.

Best Approach for Successful and Noteworthy Predictive Workforce Projects
Many people get tangled up in definitions. Is it people analytics, workforce analytics, talent analytics or something else? It doesn’t matter what you call it – the point is that predictive workforce projects need to address and predict business outcomes not HR outcomes.

Like Marketing learned over time, when Human Resources begins predictive analytics projects, they need to approach the business units they support and ask them what kinds of challenges they are having that might be affected by the workforce.

There are 2 critical categories for strategic predictive workforce projects:

  • Measurably reducing employee turnover / attrition in a certain department or role

  • Measurably increasing specific employee performance (real performance not performance review scores) in one role or department or another (i.e. more sales, less mistakes, higher customer service scores, less accidents).

I say “measurably” because to be credible, the predictive workforce initiative needs to measure and show business results both before and after the predictive model.

For Greatest ROI: Businesses Must Predict Performance or Flight Risk Pre-Hire
Once an employee is hired, the business begins pouring significant cost into the employee typically made up of a) their salary and benefits b) training time while they ramp up to speed and deliver little to no value. Our analytics work measuring true replacement costs show us that even for very entry level roles a conservative replacement estimate for a single employee (Call Center Rep, Bank Teller and the like) will be over $6,000.

A great example, is to consider the credit industry. Imagine them extending credit to someone for a mortgage – and then applying analytics after the mortgage has been extended to predict which mortgage holders are a good credit risk. It’s preposterous.

They only thing the creditor can do after the relationship has begun is to try to coach, train, encourage, change the payment plan and the like. It’s too late after the relationship has begun.

Predicting credit risk (who will pay their bills) – is predicting human behavior.  Predicting who will make their sales quota, who will make happy customers, who will make mistakes, who will drive their truck efficiently – also is predicting human behavior.

HR needs to realize that predicting human behavior is a mature domain with decades of experience and time to hone approaches, algorithms and sensitivity to private data.

What is Human Resources’ Role in Predictive Analytics Projects?
The great news is that typically the Human Resources Department will already be aware of both of these business challenges. They just hadn’t considered that Human Resources could be a part of helping to solve these challenges using predictive analytics.

Many articles discuss how Human Resources needs to be an analytics culture, and that all Human Resources employees need to learn analytics. Though I appreciate the realization that analytics is here to stay, Human Resources of all people should know that there are some people with the natural mindset to “get” and love analytics and there are some that don’t and won’t.

As I speak around the world and talk to folks in HR, I can feel the fear felt today by people in HR who have little interest in this space. My recommendation would be to breathe, take a step back and realize that not everyone needs to know how to perform predictive analytics.  Realize there are many traditional HR functions that need to be accomplished. We recommend a best practice approach of identifying who does have the mindset and interest in the analytics space and let them partner with someone who is a true predictive analyst.

For those who know they are not cut out to be the person doing the predictive analytics there are still many roles where they can be incredibly useful in the predictive process. Perhaps they could identify problem areas that predictive analytics can solve, or perhaps they could be the person doing more of the traditional Human Resources work. I find this “analytics fear” paralyzes and demoralizes employees and people in general.

Loosely Identified, but Important Roles on a Predictive Workforce Analytics Project

  1. Someone to identify high turnover roles in the lines of business, or identify where there are a lot of employees not performing very well in their jobs

  2. A liaison: Someone to introduce the HR predictive analytics team to the lines of business with turnover or business performance challenges

  3. Someone to help find and access the data to support the predictive project

  4. Someone to actually “do” the predictive analytics work (the workforce analyst or data scientist)

  5. Someone who creates a final business report to show the results of the work (both positive and negative)

  6. Someone who presents the final business report

  7. A high level project manager to help keep the project moving along

  8. The business and HR experts that understand how things work and need to be consulted all along the way

These roles can sometimes all be the same person, and sometimes they can be many different people depending on the complexity of the project, the size of the predictive workforce organization, the number of lines of business that are involved in the project and / or the multiple areas where data needs to be accessed.

The important thing to realize is there are several non analytics roles inside of predictive projects. Not every role in a predictive project requires a predictive specialist or even an analytics savvy person.

High Value Predictive Projects Don’t Deliver HR Answers
We recommend, no. At least not to begin with. We started by describing how business leaders are pressuring Human Resources to do predictive analytics projects. There is often little or no guidance given to HR about what predictive projects to do.

Here is my prediction and you can take it to the bank. I’ve seen it happen over and over again.

When HR departments use predictive analytics to solve real, Line of Business challenges that are driven by the workforce, HR becomes an instant hero. These Human Resources Departments are given more resources, their projects are funded, they receive more headcount for their analytics projects – and like Marketing, they will turn into one of the most strategic departments of the entire company.

Feeling Pressure to Get Started with Predictive?
If you’re feeling pressure from your executives to start using predictive analytics strategically and have a high volume role like sales or customer service you’d like to optimize, get in touch.

Want to see more examples of “real” predictive workforce business outcomes? Attend Predictive Analytics World for Workforce in San Francisco, April 3-6, 2016.

Greta Roberts is the CEO & Co-founder of Talent Analytics, Corp. She is the Program Chair of Predictive Analytics World for Workforce and a Faculty member of the International Institute for Analytics. Follow her on twitter @gretaroberts.

Source: The Beginner’s Guide to Predictive Workforce Analytics

Analytic Exploration: Where Data Discovery Meets Self-Service Big Data Analytics

Traditionally, the data discovery process was a critical prerequisite to, yet a distinct aspect of, formal analytics. This fact was particularly true for big data analytics, which involved extremely diverse sets of data types, structures, and sources.

However, a number of crucial developments have recently occurred within the data management landscape that resulted in increasingly blurred lines between the analytics and data discovery processes. The prominence of semantic graph technologies, combined with the burgeoning self-service movement and increased capabilities of visualization and dashboard tools, has resulted in a new conception of analytics in which users can dynamically explore their data while simultaneously gleaning analytic insight.

Such analytic exploration denotes several things: decreased time to insight and action, a democratization of big data and analytics fit for the users who need these technologies most, and an increased reliance on data for the pervasiveness of data-centric culture.

According to Ben Szekely, Vice President of Solutions and Pre-sales at Cambridge Semantics, it also means much more–a new understanding of the potential of analytics, which necessitates that users adopt:

“A willingness to explore their data and be a little bit daring. It is sort of a mind-bending thing to say, ‘let me just follow any relationship through my data as I’m just asking questions and doing analytics’. Most of our users, as they get in to it, they’re expanding their horizons a little bit in terms of realizing what this capability really is in front of them.”

Expanding Data Discovery to Include Analytics
In many ways, the data discovery process was widely viewed as part of the data preparation required to perform analytics. Data discovery was used to discern which data were relevant to a particular query and for solving a specific business problem. Discovery tools provided this information, which was then cleansed, transformed, and loaded into business intelligence or analytics options to deliver insight in a process that was typically facilitated by IT departments and exceedingly time consuming.

However, as the self-service movement has continued to gain credence throughout the data sphere these tools evolved to become more dynamic and celeritous. Today, any number of vendors is servicing tools that regularly publish the results of analytics in interactive dashboards and visualizations. These platforms enable users to manipulate those results, display them in ways that are the most meaningful for their objectives, and actually utilize those results to answer additional questions. As Szekely observed, oftentimes users are simply: “Approaching a web browser asking questions, or even using a BI or analytics tool they’re already familiar with.”

The Impact of Semantic Graphs for Exploration
The true potential for analytic exploration is realized when combining data discovery tools and visualizations with the relationship-based, semantic graph technologies that are highly effective on widespread sets of big data. By placing these data discovery platforms atop stacks predicated on an RDF graph, users are able to initiate analytics with the tools that they previously used to merely refine the results of analytics.

Szekely mentioned that: “It’s the responsibility of the toolset to make that exploration as easy as possible. It will allow them to navigate the ontology without them really knowing they’re using RDF or OWL at all…The system is just presenting it to them in a very natural and intuitive way. That’s the responsibility of the software; it’s not the responsibility of the user to try to come down to the level of RDF or OWL in any way.”

The underlying semantic components of RDF, OWL, and vocabularies and taxonomies that can link disparate sets of big data are able to contextualize that data to give them relevance for specific questions. Additionally, semantic graphs and semantic models are responsible for the upfront data integration that occurs prior to analyzing different data sets, structures and sources. By combining data discovery tools with semantic graph technologies, users are able to achieve a degree of profundity in their analytics that would have previously either taken too long to achieve or not have been possible.

The Nature of Analytic Exploration
On the one hand, that degree of analytic profundity is best described as the ability of the layman business end user to ask much more questions of his or her data in quicker time frames than he or she is used to doing so. On the other hand, the true utility of analytic exploration is realized in the types of questions that user can ask. These questions are frequently ad-hoc, include time-sensitive and real-time data, and are often based on the results of previous questions and conclusions that one can draw from them.

As Szekely previously stated, the sheer freedom and depth of analytic exploration lends itself to so many possibilities on different sorts of data that it may require a period of adjustment to conceptualize and fully exploit. The possibilities enabled by analytic exploration are largely based on the visual nature of semantic graphs, particularly when combined with competitive visualization mechanisms that capitalize on the relationships they illustrate for users. According to Craig Norvell, Franz Vice President of Global Sales and Marketing, such visualizations are an integral “part of the exploration process that facilitates the meaning of the research” for which an end user might be conducting analytics.

Emphasizing the End User
Overall, analytic exploration is reliant upon the relationship-savvy, encompassing nature of semantic technologies. Additionally, it depends upon contemporary visualizations to fuse data discovery and analytics. Its trump card, however, lies in its self-service nature which is tailored for end users to gain more comfort and familiarity with the analytics process. Ultimately, that familiarity can contribute to a significantly expanded usage of analytics, which in turn results in more meaningful data driven processes from which greater amounts of value are derived.

Source

Biased Machine-Learning Models (Part 2): Impact on the Hiring Process

This a follow-up to our previous blog on Biased Machine-Learning Models, in which we explored how machine-learning algorithms can be susceptible to bias (and what you can do to avoid that bias). Now, we’ll examine the impact biased predictive models can have on specific processes such as hiring.

>> Related: Predictive Analytics 101 <<

Just recently, Amazon had to scrap a predictive recruiting tool because it was biased against women. How does something like this happen? Because algorithms learn rules by looking at past data—and if the historical data is biased, the model is going to be biased as well. An even larger problem is that a machine-learning model will continue to automate the process of being biased.

When Does Bias Become an Issue?

A company itself may not be biased in the hiring process, but the current constituents on the team will dictate how the algorithm scores applicants. The algorithm is adding bias to its score because of the imbalance that already exists in the input data, and this creates a problem for filtering out candidates in the future.

Let’s look at hiring scenarios where bias might become an issue. Say your sales team is comprised of mostly 25-year-old white males. The algorithm will then interpret this as the ideal profile for a salesperson (age 25, white, male). If a female or someone older than 25 applies, the algorithm will not give them a good score. Similarly, say your accounting team is comprised of mostly 35-year-old women. Any males or younger women who apply will also score low.

Biased Candidate Model

Outside of the hiring process, the same logic can be applied to retail establishments renting out houses or even financial institutions approving loans. There may not be bias in the manual business workflow, but the historical data can create a bias in the automated predictions. For example, if an applicant lives in a neighborhood with a high concentration of young, educated Asians, the algorithm may penalize anyone who does not fit this demographic.

What Can We Do About It?

There are a variety of ways to deal with biased machine-learning models. First, look for and acknowledge any biased data. Next, add sampling techniques like Under, Over, Smote or Rose sampling methods. You can also add class weights to solve such problems, especially to increase diversity. Or, to keep it simple, simply remove age, gender, and race as inputs to the model.

Best Candidate Model

To learn more techniques for handling biased data, see our previous blog on biased machine-learning models.

See how Logi can help with your predictive analytics needs. Sign up for a free demo of Logi Predict today >

Source

Lavastorm Democratizing Big Data Analytics in Face of Skills Shortage

Democratizing Big Data refers to the growing movement of making products and services more accessible to other staffers, such as business analysts, along the lines of “self-service business intelligence” (BI).

In this case, the democratized solution is “the all-in-one Lavastorm Analytics Engineplatform,” the Boston company said in an announcement today announcing product improvements. It “provides an easy-to-use, drag-and-drop data preparation environment to provide business analysts a self-serve predictive analytics solution that gives them more power and a step-by-step validation for their visualization tools.”

It addresses one of the main challenges to successful Big Data deployments, as listed in study after study: lack of specialized talent.

“Business analysts typically encounter a host of core problems when trying to utilize predictive analytics,” Lavastorm said. “They lack the necessary skills and training of data scientists to work in complex programming environments like R. Additionally, many existing BI tools are not tailored to enable self-service data assembly for business analysts to marry rich data sets with their essential business knowledge.”

XXX
[Click on image for larger view.]The Lavastorm Analytics Engine (source: Lavastorm Analytics)

That affirmation has been confirmed many times. For example, a recent report by Capgemini Consulting, “Cracking the Data Conundrum: How Successful Companies Make Big Data Operational,” says that lack of Big Data and analytics skills was reported by 25 percent of respondents as a key challenge to successful deployments. “The Big Data talent gap is something that organizations are increasingly coming face-to-face with,” Capgemini said.

Other studies indicate they haven’t been doing such a good job facing the issue, as the self-service BI promises remain unfulfilled.

Enterprises are trying many different approaches to solving the problem. Capgemini noted that some companies are investing more in training, while others try more unconventional techniques, such as partnering with other companies in employee exchange programs that share more skilled workers or teaming up with or outright acquiring startup Big Data companies to bring skills in-house.

Others, such as Altiscale Inc., offer Hadoop-as-a-Service solutions, or, like BlueData, provide self-service, on-premises private clouds with simplified analysis tools.

Lavastorm, meanwhile, uses the strategy of making the solutions simpler and easier to use. “Demand for advanced analytic capabilities from companies across the globe is growing exponentially, but data scientists or those with specialized backgrounds around predictive analytics are in short supply,” said CEO Drew Rockwell. “Business analysts have a wealth of valuable data and valuable business knowledge, and with the Lavastorm Analytics Engine, are perfectly positioned to move beyond their current expertise in descriptive analytics to focus on the future, predicting what will happen, helping their companies compete and win on analytics.”

The Lavastorm Analytics Engine comes in individual desktop editions or in server editions for use in larger workgroups or enterprise-wide.

New predictive analytics features added to the product as listed today by Lavastorm include:

  • Linear Regression: Calculate a line of best fit to estimate the values of a variable of interest.
  • Logistic Regression: Calculate probabilities of binary outcomes.
  • K-Means Clustering: Form a user-specified number of clusters out of data sets based on user-defined criteria.
  • Hierarchical Clustering: Form a user-specified number of clusters out of data sets by using an iterative process of cluster merging.
  • Decision Tree: Predict outcomes by identifying patterns from an existing data set.

These and other new features are available today, Lavastorm said, with more analytical component enhancements to the library on tap.

The company said its approach to democratizing predictive analytics gives business analysts drag-and-drop capabilities specifically designed to help them master predictive analytics.

“The addition of this capability within the Lavastorm Analytics Engine’s visual, data flow-driven approach enables a fundamentally new method for authoring advanced analyses by providing a single shared canvas upon which users with complementary skill sets can collaborate to rapidly produce robust, trusted analytical applications,” the company said.

About the Author- David Ramel is an editor and writer for 1105 Media.

Originally posted via “Lavastorm Democratizing Big Data Analytics in Face of Skills Shortage”

Originally Posted at: Lavastorm Democratizing Big Data Analytics in Face of Skills Shortage by analyticsweekpick

The New Analytics Professional: Landing A Job In The Big Data Era

Along with the usual pomp and celebration of college commencements and high school graduation ceremonies we’re seeing now, the end of the school year also brings the usual brooding and questions about careers and next steps. Analytics is no exception, and with the big data surge continuing to fuel lots of analytics jobs and sub-specialties, the career questions keep coming. So here are a few answers on what it means to be an “analytics professional” today, whether you’re just entering the workforce, you’re already mid-career and looking to make a transition, or you need to hire people with this background.

The first thing to realize is that analytics is a broad term, and there are a lot of names and titles that have been used over the years that fall under the rubric of what “analytics professionals” do: The list includes “statistician,” “predictive modeler,” “analyst,” “data miner” and — most recently — “data scientist.” The term “data scientist” is probably the one with the most currency – and hype – surrounding it for today’s graduates and upwardly mobile analytics professionals. There’s even a backlash against over-use of the term by those who slap it loosely on resumes to boost salaries and perhaps exaggerate skills.

stairs_jobs

Labeling the Data Scientist

In reality, if you study what successful “data scientists” actually do and the skills they require to do it, it’s not much different from what other successful analytics professionals do and require. It is all about exploring data to uncover valuable insights often using very sophisticated techniques. Much like success in different sports depends on a lot of the same fundamental athletic abilities, so too does success with analytics depend on fundamental analytic skills. Great analytics professionals exist under many titles, but all share some core skills and traits.
The primary distinction I have seen in practice is that data scientists are more likely to come from a computer science background, to use Hadoop, and to code in languages like Python and R. Traditional analytics professionals, on the other hand, are more likely to come from a statistics, math or operations research background, are likely to work in relational or analytics server environments, and to code in SAS and SQL.

Regardless of the labels or tools of choice, however, success depends on much more than specific technical abilities or focus areas, and that’s why I prefer the term “data artist” to get at the intangibles like good judgment and boundless curiosity around data. I wrote an article on the data artist for the International Institute for Analytics (IIA). I also collaborated jointly with the IIA and Greta Roberts from Talent Analytics to survey a wide number of analytics professionals. One of our chief goals in that 2013 quantitative study was to find out whether analytics professionals have a unique, measurable mind-set and raw talent profile.

A Jack-of-All Trades

Our survey results showed that these professionals indeed have a clear, measurable raw talent fingerprint that is dominated by curiosity and creativity; these two ranked very high among 11 characteristics we measured. They are the qualities we should prioritize alongside the technical bona fides when looking to fill jobs with analytics professionals. These qualities also happen to transcend boundaries between traditional and newer definitions of what makes an analytics professional.

This is particularly true as we see more and more enterprise analytics solutions getting built from customized mixtures of multiple systems, analytic techniques, programming languages and data types. All analytics professionals need to be creative, curious and adaptable in this complex environment that lets data move to the right analytic engines, and brings the right analytic engines to where the data may already reside.
Given that the typical “data scientist” has some experience with Hadoop and unstructured data, we tend to ascribe the creativity and curiosity characteristics automatically (You need to be creative and curious to play in a sandbox of unstructured data, after all). But that’s an oversimplification, and our Talent Analytics/International Institute of Analytics survey shows that the artistry and creative mindset we need to see in our analytics professionals is an asset regardless of what tools and technologies they’ll be working with and regardless of what title they have on their business card. This is especially true when using the complex, hybrid “all-of-the-above” solutions that we’re seeing more of today and which Gartner IT -0.48% calls the Logical Data Warehouse.

Keep all this in mind as you move forward. The barriers between the worlds of old and new; open source and proprietary; structured and unstructured are breaking down. Top quality analytics is all about being creative and flexible with the connections between all these worlds and making everything work seamlessly. Regardless of where you are in that ecosystem or what kind of “analytics professional” you may be or may want to hire, you need to prioritize creativity, curiosity and flexibility – the “artistry” – of the job.

To read the original article on Forbes, click here.

Source by analyticsweekpick

Validating a Lostness Measure

No one likes getting lost. In real life or digitally.

One can get lost searching for a product to purchase, finding medical information, or clicking through a mobile app to post a social media status.

Each link, button, and menu leads to decisions. And each decision can result in a mistake, leading to wasted time, frustration, and often the inability to accomplish tasks.

But how do you measure when someone is lost? Is this captured already by standard usability metrics or is a more specialized metric needed? It helps to first think about how we measure usability.

Measuring Usability

We recommend a number of usability measures to assess the user experience, both objectively and subjectively (which come from the ISO 9241 standard of usability). Task completion, task time, and number of errors are the most common types of objective task-based measures. Errors take time to operationalize (“What is an error?”), while task completion and time can often be collected automatically (for example, in our MUIQ platform).

Perceived ease and confidence are two common task-based subjective measures—simply asking participants how easy or difficult or how confident they are they completed the task. Both tend to correlate (r ~ .5) with objective task measures [pdf]. But do any of these objective or subjective measures capture what it means to be lost?

What Does It Mean to Be Lost?

How do you know whether someone is lost? In real life you could simply ask them. But maybe people don’t want to admit they’re lost (you know, like us guys). Is there an objective way to determine lostness?

In the 1980s, as “hypertext” systems were being developed, a new dimension was added to information regarding behavior. Designers wanted to know whether people were getting lost when clicking all those links. Earlier, Elm and Woods (1985) argued that being lost was more than a feeling (no Boston pun intended); it was a degradation of performance that could be objectively measured. Inspired by this idea, in 1996 Patricia Smith sought to objectively define lostness and described a way to objectively measure when people were lost in hypertext. But not much has been done with it since (at least that we could find).

Smith’s work has received a bit of a resurgence after Tomer Sharon cited it in Validating Product Ideas and was consequently mentioned in online articles.

While there have been other methods for quantitatively assessing navigation, in this article we’ll take a closer look at how Smith quantified lostness and how the measure was validated.

A Lostness Measure

Smith proposed a few formulas to objectively assess lostness. The measure is essentially a function of what the user does (how many screens visited) relative to the most efficient path a user could take through a system. It requires first finding the minimum number of screens or steps it takes to accomplish a task—a happy path—and then comparing that to how many total screens and unique screens a user actually visits. She settled on the following formula using these three inputs to account for two dimensions of lostness:

N=Unique Pages Visited

S=Total Pages Visited

R=Minimum Number of Pages Required to Complete Task

The lostness measure ranges from 0 (absence of lostness) to 1 (being completely lost). Formulas can be confusing and they sometimes obscure what’s being represented, so I’ve attempted to visualize this metric and show how it’s derived with the Pythagorean theorem in Figure 1 below. 

Figure 1: Visualization of the lostness measure. The orange lines with the “C” is an example of how a score from one participant can be converted into lostness using the Pythagorean theorem.

Smith then looked to validate the lostness measure using data from a previous study using 20 students (16 and 17 year olds) from the UK. Participants were asked to look for information on a university department hypertext system. Measures collected included the total number of nodes (pages), deviations, and unique pages accessed.

After reviewing videos of the users across tasks, she found that her lostness measure did correspond to lost behavior. She identified the threshold of lostness scores above .5 as being lost, while scores below .4 as not lost, and the scores between .4 and .5 as indeterminate.

The measures were also used in another study that used eight participants with more nodes as reported in Smith. In the study by Cardle (a 1994 dissertation), similar findings of Lostness and Efficiency were found. But of the eight users, one had a score above .5 (indicating lost) when he was not really lost but exploring—suggesting a possible confound with the measure.

Replication Study

Given the small amount of data used to validate the lostness measure (and the dearth of information since), we conducted a new study to collect more data, to confirm thresholds of lostness, and see how this measure correlates with other widely used usability measures.

Between September and December 2018 we reviewed 73 videos of users attempting to complete 8 tasks from three studies. The studies included consumer banking websites, a mobile app for making purchases, and a findability task on the US Bank website that asked participants to find the name of the VP and General Counsel (an expected difficult task). Each task had a clear correct solution and “exploring” behavior wasn’t expected, thus minimizing possible confounds with natural browsing behavior that may look like lostness (e.g., looking at many pages repeatedly).

Sample sizes ranged from 5 to 16 for each task experience. We selected tasks that we hoped would provide a good range of lostness. For each task, we identified the minimum number of screens needed to complete each task for the lostness measure (R), and reviewed each video to count the total number of screens (S) and number of unique screens (N). We then computed the lostness score for each task experience. Post-task ease was collected using the SEQ and task time and completion rates were collected in the MUIQ platform.

Study Results

Across the 73 task experiences we had a good range of lostness, from a low of 0 (perfect navigation) to a high of .86 (very lost) and a mean lostness score of .34. We then aggregated the individual experiences by task.

Table 1 shows the lostness score, post-task ease, task time, and completion rate aggregated across the tasks, with lostness scores ranging from .16 to .72 (higher lostness scores mean more lostness).

Task Lostness Ease Time Completion % Lost
1 0.16 6.44 196 100% 6%
2 0.26 6.94 30 94% 19%
4 0.33 6.00 272 40% 40%
3 0.34 6.19 83 100% 44%
6 0.37 4.60 255 80% 60%
7 0.51 4.40 193 100% 40%
8 0.66 2.20 339 60% 100%
5 0.72 2.40 384 60% 100%

Table 1: Lostness, ease (7-point SEQ scale), time (in seconds), completion rates, and % lost (> .5) for the eight tasks. Tasks sorted by lostness score, from least lost to most lost.

Using the Smith “lost” threshold of .5, we computed a binary metric of lost/not lost for each video and computed the average percent lost per task (far right column in Table 1).

Tasks 8 and 5 have both the highest lostness scores and percent being lost. All participants had lostness scores above .5 and were considered “lost.” In contrast, only 6% and 19% of participants were “lost” on tasks 1 and 2.

You can see a pattern between lostness and the ease, time, and completion rates in Table 1. As users get more lost (lostness goes up), the perception of ease goes down, time goes up. The correlations between lostness and these task-level measures are shown in Table 2 at both the task level and individual level.

Metric Task Level
r
Individual Level
r
Ease -0.95* -0.52*
Comp -0.46 -0.17
Time 0.72* 0.51*

Table 2: Correlations between lostness and ease, completion rates, and time at the task level (n=8) and individual level (n = 73). * indicates statistically significant at the p < .05 level

As expected, correlations are higher at the task level as the individual variability is smoothed out through the aggregation, which helps reveal patterns. The correlation between ease and lostness is very high (r = -.95) at the task level and to a lesser extent at the individual level r = -.52. Interestingly, despite differing tasks, the correlation between lostness and task time is also high and significant at r= .72 and r = .51 at the task and individual levels.

The correlation with completion rate, while in the expected direction, is more modest and not statistically significant (see the “Comp” row in Table 2). This is likely a consequence of both the coarseness of this metric (binary) and a restriction in range with most tasks in our dataset having high completion rates.

The strong relation between perceived ease and lostness can be seen in the scatter plot in Figure 2, with users’ perception of the task ease accounting for a substantial ~90% of the variance in lostness. At least with our dataset, it appears that average lostness is well accounted for by ease. That is, participants generally rate high lostness tasks as difficult.

Figure 2: Relationship between lostness and ease (r = -.95) for the 8 tasks; p < .01. Dots represent the 8 tasks.

 

Ease N % Lost Mean Lostness
1 6 1.00 0.73
2 4 1.00 0.72
3 4 1.00 0.62
4 3 0.00 0.17
5 3 0.33 0.26
6 13 0.31 0.35
7 40 0.23 0.24

Table 3: Percent of participants “lost” and mean lostness score for each point on the Single Ease Question (SEQ).

Further examining the relationship between perceived ease and lostness, Table 3 shows the average percent of participants that were marked as lost (scores above .5) and the mean lostness score for each point on the Single Ease Question (SEQ) scale. More than half the task experiences were rated 7 (the easiest task score), which corresponds to low lostness scores (below .4). SEQ scores at 4 and below all have high lostness scores (above .6), providing an additional point of concurrent validity for the lostness measure. Table 4 further shows an interesting relationship. The threshold when lostness scores go from not lost to lost happens around the historical SEQ average score of 5.5, again suggesting that below average ease is associated with lostness. It also reinforces the idea that the SEQ (a subjective score) is a good concurrent indicator of behavior (objective data).

Lostness N Mean SEQ Score
0 28 6.6
0.3 12 6.3
0.4 4 6.5
0.5 6 5.7
0.6 5 4.4
0.7 8 3.5
0.8 10 4.1

Table 4: Lostness scores aggregated into deciles with corresponding mean SEQ scores at each decile.

Validating the Lostness Thresholds

To see how well the thresholds identified by Smith predicted actual lostness, we reviewed the videos again and made a judgment as to whether the user was struggling or giving any indication of lostness (toggling back and forth, searching, revisiting pages). Of the 73 videos, two analysts independently reviewed 55 (75%) of the videos and made a binary decision whether the participant was lost or not lost (similar to the characterization described by Smith).

Lost Example: For example, one participant, when looking for the US Bank General Counsel, kept going back to the home page, scrolling to the bottom of the page multiple times, and using the search bar multiple times. This participant’s lostness score was .64 and was marked as “lost” by the evaluator.

Not Lost Example: In contrast, another participant, when looking for checking account fees, clicked a checking account tab, inputted their zip code, found the fees, and stopped the task. This participant’s lostness score was 0 (perfect) and was marked as “not lost” by the evaluator.

Table 5 shows the number of participants identified as lost by the evaluators corresponding to their lostness score grouped into deciles.

Lostness Score N # Lost % Lost
0 28 1 4%
0.3 6 2 33%
0.4 1 1 100%
0.5 3 1 33%
0.6 4 4 100%
0.7 5 5 100%
0.8 8 6 75%

Table 5: Percent of participants characterized as lost or not lost from evaluators watching the videos.

For example, of the 28 participant videos with a lostness score of 0, only 1 (4%) was considered lost. In contrast, 6 out of the 8 (75%) participants with lostness scores between .8 and .9 were considered lost. We do see good corroboration with the Smith thresholds. Only 9% (3 of 34) of participants with scores below .4 were considered lost. Similarly, 89% (16 of 18) participants were considered lost who had scores above .5.

Another way to look at the data, participants who were lost had a lostness score that was more than 5 times as high as those who weren’t lost (.61 vs .11; p <.01).

 

Summary and Takeaways

An examination of a method for measuring lostness revealed:

Lostness as path taken relative to the happy path. An objective lostness measure was proposed over 20 years ago that uses the sum of two ratios: the number of unique pages relative to the minimum number of pages, and the total number of pages relative to the unique number of pages. Computing this lostness measure requires identifying the minimum number of pages or steps needed to complete a task (the happy path) as well as counting all screens and the number of unique screens (a time-consuming process). A score of 0 represents perfectly efficient navigation (not lost) while a score of 1 indicates being very lost.

Thresholds are supported but not meant for task failure. Data from the original validation study had suggested lostness values below .4 indicated that participants weren’t lost and values above .5 as participants being lost. Our data corroborated these thresholds as 91% of participants with scores below .4 were not considered lost and 89% of participants with scores above .5 were lost. The thresholds and score, however, become less meaningful when a user fails or abandons a task and visits only a subset of the essential screens, which decreases their lostness score. This suggests lostness may be best as a secondary measure to other usability metrics, notably task completion.

Perceived ease explains lostness. In our data, we found that average task-ease scores (participant ratings on the 7-point SEQ) explained 95% of the variance in lostness scores. At least with our data, in general, when participants were lost, they knew it and rated the task harder (at least when aggregated across tasks). While subjective measures aren’t a substitute for objective measures, they do correlate, and post-task ease is quick to ask and analyze. Lower SEQ scores already indicate a need to look further for the problems and this data suggests participants getting lost may be a culprit for some tasks.

Time-consuming process is ideally automated. To collect the data for this validation study we had to review participant videos several times to compute the lostness score (counting screens and unique screens). It may not be worth the effort to review videos just to identify a lostness score (especially if you’re able to more quickly identify the problems users are having with a different measure). However, a lostness score can be computed using software (something we are including in our MUIQ platform). Researchers will still need to input the minimal number of steps (i.e., the happy path) per task but this measure, like other measures such as clicking non-clickable elements, may help quickly diagnose problem spots.

There’s a distinction between browsing and being lost. The tasks used in our replication study all had specific answers (e.g. finding a checking account’s fees). These are not the sort of tasks participants likely want to spend any more time (or steps) on than they need to. For these “productivity” tasks where users know exactly what they need to do or find, lostness may be a good measure (especially if it’s automatically collected). However, for more exploratory tasks where only a category is defined and not a specific item, like browsing for clothing, electronics, or the next book to purchase, the natural back-and-forth of browsing behavior may quantitatively look like lostness. A future study can examine how well lostness holds up under these more exploratory tasks.

(function() {
if (!window.mc4wp) {
window.mc4wp = {
listeners: [],
forms : {
on: function (event, callback) {
window.mc4wp.listeners.push({
event : event,
callback: callback
});
}
}
}
}
})();

Sign-up to receive weekly updates.

Source: Validating a Lostness Measure

HP boosts Vertica big data capabilities with streaming analytics

HP chases big data strategy with Vertica additions and startup accelerator

HP has revealed an updated version of its Vertica big data analytics platform in a bid to fulfil a data-oriented strategy that benefits businesses and non-data scientists.

HP Vertica will gain data streaming capabilities and advanced log file text searching to enable high-speed analytics on big data collected from sources such as the Internet of Things (IoT).

The new version of Vertica, codenamed Excavator, will offer support for Apache Kafka, an open source distributed messaging system, to allow organisations to harvest and analyse streaming data in near real time.

HP claimed that this new capability allows Excavator to be used in a wide range of monitoring and process control deployments in sectors such as manufacturing, healthcare and finance.

The addition of advanced machine log text search in Excavator will allow companies to collect and organise large log file datasets generated by systems and applications and provide more scope in predicting and identifying application failures and cyber attacks, along with the ability to see authorised and unauthorised access to apps.

HP showed its commitment to big data-driven businesses by announcing the Haven Startup Accelerator, a programme designed to expand HP’s ecosystem of developers by offering free access to community versions of Vertica, and affordable access to the firm’s big data software and services.

Embracing open source
HP has added native integration of Vertica with Apache Spark in a move to embrace the scalability of open source software and platforms for big data analytics. The firm has also enabled Vertica to support SQL queries made on native files and popular formats found in Hadoop data and deployments.

HP will integrate Vertica with Apache Spark to allow data to be transferred between the database platform and the cluster computing framework, giving developers the option to build data models in Spark and run them through Vertica’s analytics capabilities.

Furthermore, the company is making its Flex Zone available as open source software, which allows companies to analyse semi-structured data without needing to carry out intensive coding to prepare a system for the data ahead of analysis.

HP appears to be bolstering its portfolio of enterprise-grade products in preparation for its split into two separate companies, Hewlett Packard Enterprise and HP Inc.

Note: This article originally appeared in V3. Click for link here.

Source

The Importance of Workforce Analytics

Although organizations must make decisions based on a variety of factors and perspectives, few are as important as human resources when it comes to taking action. A company’s workforce is vitally important but concurrently one of the more complex sides of operating a business. Instead of clean, hard data, employees can present a variety of qualitative factors that are hard to put into numbers that work for analytics.

Even so, an organization’s human capital is perhaps its most important asset. Building an in-depth understanding of your staff can, therefore, deliver better answers and give you a competitive edge. More than acting as a way of punishing employees, however, workforce analytics—sometimes called people analytics—can empower your team by providing better insights as to what works and doesn’t. Furthermore, it can help uncover all the tools employees need to succeed. Let’s begin by breaking down the meaning of workforce analytics.

What are Workforce Analytics?

Workforce analytics, which is a part of HR analytics, are used to track and measure employee-related data and optimize organizations’ human resource management and decision-making. The field focuses on much more than hiring and firing by also concentrating on the return on value for every hire. Moreover, it highlights more specific data that assists with identifying workplace trends such as potential risk factors, satisfaction with decisions, and more.

Additionally, workforce analytics can evaluate more than just existing staff by also analyzing the trends that surround employment. For instance, companies can see which periods of the year have a higher number of applicants and adjust their recruitment efforts, or measure diversity efforts as well as employee engagement without having to resort to more invasive or subjective methods that may provide false positives.

What Are some Key Benefits of Workforce Analytics?

More so than tracking the number of employees and what they’re making, workforce analytics provides a comprehensive view of your organization’s workers designed to interpret historic trends and create predictive models that lead to insights and better decisions in the future. Some of the key benefits of workforce analytics include:

  • Find areas where efficiency can be improved with automation – While workers are an asset to a company, sometimes the tasks they do can reduce their productivity or provide minimal returns. Workforce analytics can discover areas where tasks can be relegated to machines via automation, allowing workers to instead dedicate their efforts to more important and valuable activities.
  • Improve workers’ engagement by understanding their needs and satisfaction – More than simply looking for firing and hiring information, workforce and people analytics can help a company understand why their employees are not performing their best, and the factors that are impacting productivity. This is more to maintain the current workforce instead of replacing it. The goal is to uncover those factors affecting performance and engagement and to overcome them by fostering better conditions.
  • Create better criteria for hiring new staff and provide a better hiring process – Finding new talent is always complex regardless of a company’s size or scope. Workforce analytics can shed light exactly on what is needed from a new hire by a department based on previous applicants, their success, and the company’s needs. More importantly, they can understand new candidates based on this historical data to determine whether they would be a good fit or not. For instance, a company seeking to hire a new developer may think twice about hiring a server-side programmer after several previous hires with similar experience didn’t work out.

What Key Metrics Should I Track for Workforce Analytics?

  • Employee productivity – We still talk about the 9 to 5 work day, but the current reality for many employees dictates that work hours tend to be more flexible and variable. As such, measuring productivity by the number of hours worked is no longer fully accurate. Instead, creating a productivity index which includes a few different data points will give a much better idea of how employees are performing.
  • Early turnover – Another important area that is often neglected when measuring satisfaction is how quickly employees are leaving on their own. A high early turnover rate is an indicator that things are not working both in terms of meeting expectations and employee satisfaction.
  • Engagement – This may seem superfluous, but employees who are engaged with their work are more likely to be productive. Measuring engagement includes tracking employee satisfaction, stress levels, and employees’ belief in the company’s ideals. High engagement is a great sign that HR is doing its job.

Conclusion

Focusing your data gathering internally can help you improve your company’s productivity. By honing in on your human resources and finding ways to empower your team, people analytics can boost your company’s efficiency, leading to happier and more productive colleagues.

Source