Jan 30, 20: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

[  COVER OF THE WEEK ]

image
Conditional Risk  Source

[ AnalyticsWeek BYTES]

>> AI has bested chess and Go, but it struggles to find a diamond in Minecraft by administrator

>> 6 Mistakes to Avoid When Building and Embedding Operational Reports by analyticsweek

>> Big data: managing the legal and regulatory risks by analyticsweekpick

Wanna write? Click Here

[ FEATURED COURSE]

CS229 – Machine Learning

image

This course provides a broad introduction to machine learning and statistical pattern recognition. … more

[ FEATURED READ]

The Future of the Professions: How Technology Will Transform the Work of Human Experts

image

This book predicts the decline of today’s professions and describes the people and systems that will replace them. In an Internet society, according to Richard Susskind and Daniel Susskind, we will neither need nor want … more

[ TIPS & TRICKS OF THE WEEK]

Winter is coming, warm your Analytics Club
Yes and yes! As we are heading into winter what better way but to talk about our increasing dependence on data analytics to help with our decision making. Data and analytics driven decision making is rapidly sneaking its way into our core corporate DNA and we are not churning practice ground to test those models fast enough. Such snugly looking models have hidden nails which could induce unchartered pain if go unchecked. This is the right time to start thinking about putting Analytics Club[Data Analytics CoE] in your work place to help Lab out the best practices and provide test environment for those models.

[ DATA SCIENCE Q&A]

Q:What is cross-validation? How to do it right?
A: It’s a model validation technique for assessing how the results of a statistical analysis will generalize to an independent data set. Mainly used in settings where the goal is prediction and one wants to estimate how accurately a model will perform in practice. The goal of cross-validation is to define a data set to test the model in the training phase (i.e. validation data set) in order to limit problems like overfitting, and get an insight on how the model will generalize to an independent data set.

Examples: leave-one-out cross validation, K-fold cross validation

How to do it right?

the training and validation data sets have to be drawn from the same population
predicting stock prices: trained for a certain 5-year period, it’s unrealistic to treat the subsequent 5-year a draw from the same population
common mistake: for instance the step of choosing the kernel parameters of a SVM should be cross-validated as well
Bias-variance trade-off for k-fold cross validation:

Leave-one-out cross-validation: gives approximately unbiased estimates of the test error since each training set contains almost the entire data set (n?1n?1 observations).

But: we average the outputs of n fitted models, each of which is trained on an almost identical set of observations hence the outputs are highly correlated. Since the variance of a mean of quantities increases when correlation of these quantities increase, the test error estimate from a LOOCV has higher variance than the one obtained with k-fold cross validation

Typically, we choose k=5 or k=10, as these values have been shown empirically to yield test error estimates that suffer neither from excessively high bias nor high variance.
Source

[ VIDEO OF THE WEEK]

George (@RedPointCTO / @RedPointGlobal) on becoming an unbiased #Technologist in #DataDriven World #FutureOfData #Podcast

 George (@RedPointCTO / @RedPointGlobal) on becoming an unbiased #Technologist in #DataDriven World #FutureOfData #Podcast

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

Data matures like wine, applications like fish. – James Governor

[ PODCAST OF THE WEEK]

#FutureOfData with @CharlieDataMine, @Oracle discussing running analytics in an enterprise

 #FutureOfData with @CharlieDataMine, @Oracle discussing running analytics in an enterprise

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

As recently as 2009 there were only a handful of big data projects and total industry revenues were under $100 million. By the end of 2012 more than 90 percent of the Fortune 500 will likely have at least some big data initiatives under way.

Sourced from: Analytics.CLUB #WEB Newsletter

Machine Learning Tutorial: The Max Entropy Text Classifier

In this tutorial we will discuss about Maximum Entropy text classifier, also known as MaxEnt classifier. The Max Entropy classifier is a discriminative classifier commonly used in Natural Language Processing, Speech and Information Retrieval problems. Implementing Max Entropy in a standard programming language such as JAVA, C++ or PHP is non-trivial primarily due to the […]

Originally Posted at: Machine Learning Tutorial: The Max Entropy Text Classifier by administrator

How Sports Data Analytics Is Upsetting The Game All Over Again

One or two games in MLB is often the difference between advancing to the post-season or staying home, and an entire season can be determined by a couple of good or bad pitches. There is a huge competitive advantage to knowing the opponent’s next step. That’s one reason sport analytics is a booming field. And it explains why data scientists, both fan and professional, are figuring out how to do more accurate modeling than ever before.

One notable example is Ray Hensberger, baseball-loving technologist in the Strategic Innovation Group at Booz Allen Hamilton.

At a workshop during the GigaOm Structure conference, Hensberger shared his next-level data crunching and the academic paper his team prepared for the MIT Sloan Sports Analytics Conference. His team modeled MLB data to show with 74.5% accuracy what a pitcher is going to throw—and when.

Hensberger’s calculations are more accurate than anything else published to date. But as Hensberger knows, getting the numbers right isn’t easy. The problem: How to build machine-learning build models that understand baseball decision-making? And how to make them solid enough to actually work with new data in real-time game situations?

“We started with 900 pitchers,” says Hensberg. “By excluding players having thrown less than 1,000 pitches total over the three seasons considered, we drew an experimental sample of about 400,” he says. “We looked at things like the number of people on base, a right-handed batter versus a left-handed batter.”

They also looked at the current at-bat (pitch type and zone history, ball-strike count); the game situation (inning, number of outs, and number and location of men on base); and pitcher/batter handedness; as well as other features from observations on pitchers that vary across ball games, such as curveball release point, fastball velocity, general pitch selection, and slider movement.

The final result? A set of pitcher-customized models and a report about what those pitchers would throw in a real game situation.

“We took the data, looked at the most common pitches they threw, then built a model that said ‘In this situation, this pitcher will throw this type of pitch—be that a slider, curveball, split-finger. We took the top four top favorite pitches of that pitcher, and we built models for each one of those pitches for each one of those pitchers,” Hensberger said.

They are methods he and his team outline in a book published by his team called The Field Guide To Data Science. “Most of [the data],” he says, “was PITCHf/x data from MLB. There’s a ton of data out there.”

Modern Baseball Analytics.Booz Allen Hamilton

Cross Validation Is Key

“Each pitcher-specific model was trained and tested by five-fold cross-validation testing,” Hensberger says. Cross-validation is an important part of training and testing machine learningmodels. Its purpose, in English: to ensure that the models aren’t biased by the data they’re triangulated by.

“The cross-validation piece, the goal of it, you’re defining a data set you can test the model with,” says Hensberger. “You’ve got to have a way of testing the model out when you’re training it, and to provide insight on how the model will generalize to an unknown data set. In this case, that would be real-time pitches.”

“You don’t want to just base your model on purely 100% on what was done historically. If we just put out this model without doing that cross-validation piece, people would probably say your model is overfit for the data that you have.”

Once the models were solid, Hensberger and his team used a machine-learning strategy known as “one-versus-rest” to run experiments to predict the type of the next pitch for each pitcher. It is based on an algorithm that allowed them to establish an “index of predictability” for a given pitcher. Then they looked at the data in three different ways:

  1. Predictability by pitch count, looking at pitcher predictability: When the batter is ahead (more balls than strikes), when the batter is behind (more strikes than balls), and when the pitch count is even.
  2. Predictability by “platooning” which looks at how well a right-handed batter will fare against a left-handed pitcher, and vice versa.
  3. Out-of-sample test, a test to verify the predictions by running trained models with new data to make sure they work. “We performed out-of-sample predictions by running trained classifier models using previously unseen examples from the 2013 World Series between the Boston Red Sox and the St. Louis Cardinals.”

“Overall our rate was about 74.5% predictability across all pitchers, which actually beats the previous published result at the MIT Sloan Sports Analytics conference. That that was 70%,” says Hensberger. The report published by his team was also able to predict exact pitch type better than before. “The other study only said if a fastball or not a fastball that’s going to come out of a pitcher’s hand,” says Hensberger. “The models we built were for the top four pitches, so [they show] what the actually pitches were going to be.”

Hensberger’s team also made some other interesting discoveries.

“Some pitchers, just given the situation, were more predictable than others,” he says. “There is no correlation between predictability and ERA. With less predictable pitchers, you would expect them to be more effective. But that’s not true. We also found that eight of the 15 most predictable pitchers came from two teams: the Cardinals and the Reds.”

This may be a result of the catchers calling the game, influencing the pitchers and their decisions. But it also may be attributed to pitching coaches telling pitchers what to do in certain situations. “Either way,” Hensberger says, “it’s interesting to consider.”

His findings around platoon advantage are worth thinking about as well. Statistically in baseball, platoon advantage means that the batter usually has the advantage: They have better stats when they face the opposite-handed pitcher.

“What we found [in that situation] is the predictability of pitchers was around 76%. If you look at the disadvantage, the overall predictability was about 73%,” Hensberger says. “So, pitchers are a little more predictable, we found, when the batter’s at the advantage. That could play into why the stats kind of favor them.”

This work was done over the corpus of data, but Hensberger says that you run the models real-time during a game, using the time interval between pitches to compute new stats and make predictions according to the current game situation.

According to Jessica Gelman, cofounder and co-chair of the MIT Sloan Sports Analytics Conference, that type of real-time, granular data crunching is where sports analytics is headed. The field is changing fast. And Gelman proves it. Below, her overview on how dramatically it has evolved from where it was just a couple of years ago.

How Sports Data Science Has Evolved

“If you’ve read Moneyball or watched the movie, at that point in time it was no different than what bankers do in looking for an undervalued asset. Now, finding those undervalued assets is much harder. There’s new stats that are being created all the time. It’s so much more advanced,” Gelman says.

Though it may surprise data geeks, Gelman says that formalized sport analytics still isn’t yet mainstream—not every sport or team uses data. The NHL is still lagging in analytics, with the most notable exception of the Boston Bruins. The NFL is slow to adopt as well, though more teams like the Buffalo Bills are investing in the space.

However, most other leagues are with the program. And that is accelerating. In a big way. In Major League Soccer, formal analytics are now happening. Data analysis is now standard in EnglishPremier League football, augmented by global football by fan sites. And almost every baseball and basketball team has an analytics team.

“Some sports have been quicker to accept it than others,” says Gelman. “But it’s widely accepted at this point in time that there’s significant value to having analytics to support decision making.”

So how are analytics used in sports? Gelman says there’s work happening on both the team side and on the business side.

“On the team side, some leagues do a lot with, for example, managing salaries and using analytics for that. Other leagues use it for evaluating the players on the field and making decisions about who’s going to play or who to trade. Some do both,” says Gelman.

On the business side, data science increasingly influences a number of front office decisions. “That’s ticketing, pricing, and inventory management. It’s also customer marketing, enhancing engagement and loyalty, fandom, and the game-day experience,” Gelman explains. A lot of data science work looks at how people react to what in the stadium and how you keep them coming to back—versus watching at home on TV. “And then,” Gelman says, “the most recent realm of analytics is wearable technology,” which means more data will soon be available to players and coaches.

Hensberger sees this as a good thing. Ultimately, he says, the biggest winners will be the fans.

“Data science is about modeling and predicting. When this gets in the hands of everyone across the leagues, the viewing experience will get better for everybody,” he says. “You want to see competition. You don’t want to see a blowout, you want to see close games. Excitement and heart-pounding experience. That’s what brings us back to the sport.”

Originally posted via “How Sports Data Analytics Is Upsetting The Game All Over Again”

Source: How Sports Data Analytics Is Upsetting The Game All Over Again by analyticsweekpick

Jan 23, 20: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

[  COVER OF THE WEEK ]

image
Trust the data  Source

[ AnalyticsWeek BYTES]

>> Why “Big Data” Is a Big Deal by analyticsweekpick

>> Jun 01, 17: #AnalyticsClub #Newsletter (Events, Tips, News & more..) by admin

>> Cloud Data Warehouse Performance Testing by analyticsweekpick

Wanna write? Click Here

[ FEATURED COURSE]

Hadoop Starter Kit

image

Hadoop learning made easy and fun. Learn HDFS, MapReduce and introduction to Pig and Hive with FREE cluster access…. more

[ FEATURED READ]

The Industries of the Future

image

The New York Times bestseller, from leading innovation expert Alec Ross, a “fascinating vision” (Forbes) of what’s next for the world and how to navigate the changes the future will bring…. more

[ TIPS & TRICKS OF THE WEEK]

Analytics Strategy that is Startup Compliant
With right tools, capturing data is easy but not being able to handle data could lead to chaos. One of the most reliable startup strategy for adopting data analytics is TUM or The Ultimate Metric. This is the metric that matters the most to your startup. Some advantages of TUM: It answers the most important business question, it cleans up your goals, it inspires innovation and helps you understand the entire quantified business.

[ DATA SCIENCE Q&A]

Q:What is root cause analysis? How to identify a cause vs. a correlation? Give examples
A: Root cause analysis:
– Method of problem solving used for identifying the root causes or faults of a problem
– A factor is considered a root cause if removal of it prevents the final undesirable event from recurring

Identify a cause vs. a correlation:
– Correlation: statistical measure that describes the size and direction of a relationship between two or more variables. A correlation between two variables doesn’t imply that the change in one variable is the cause of the change in the values of the other variable
– Causation: indicates that one event is the result of the occurrence of the other event; there is a causal relationship between the two events
– Differences between the two types of relationships are easy to identify, but establishing a cause and effect is difficult

Example: sleeping with one’s shoes on is strongly correlated with waking up with a headache. Correlation-implies-causation fallacy: therefore, sleeping with one’s shoes causes headache.
More plausible explanation: both are caused by a third factor: going to bed drunk.

Identify a cause Vs a correlation: use of a controlled study
– In medical research, one group may receive a placebo (control) while the other receives a treatment If the two groups have noticeably different outcomes, the different experiences may have caused the different outcomes

Source

[ VIDEO OF THE WEEK]

@BrianHaugli @The_Hanover ?on Building a #Leadership #Security #Mindset #FutureOfData #Podcast

 @BrianHaugli @The_Hanover ?on Building a #Leadership #Security #Mindset #FutureOfData #Podcast

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

Data that is loved tends to survive. – Kurt Bollacker, Data Scientist, Freebase/Infochimps

[ PODCAST OF THE WEEK]

Discussing #InfoSec with @travturn, @hrbrmstr(@rapid7) @thebearconomist(@boozallen) @yaxa_io

 Discussing #InfoSec with @travturn, @hrbrmstr(@rapid7) @thebearconomist(@boozallen) @yaxa_io

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

And one of my favourite facts: At the moment less than 0.5% of all data is ever analysed and used, just imagine the potential here.

Sourced from: Analytics.CLUB #WEB Newsletter

October 17, 2016 Health and Biotech analytics news roundup

The latest health and biotech analytics news and opinion:

3M and Verily Life Sciences Collaborate to Develop Population Health Measurement Technology for Healthcare Providers and Payers: The partnership will make population data sets manageable for providers.

Arivale Brings Gene Sequencing Driven Wellness To Los Angeles: The Seattle-based company has expanded into another city. It combines sequencing with other measures to health to recommend lifestyle improvements.

‘Poring over’ DNA—Advancing nanopore sensing towards lower cost and more accurate DNA sequencing: Researchers at Harvard, Columbia, and Genia Technologies have improved the “nanopore-based sequencing-by-synthesis” technique, which sequences DNA by measuring electrical current changes as it passes through a membrane.

When Genetic Autopsies Go Awry: “Molecular autopsies” can help identify the causes of sudden death. However, the technique may spur surviving relatives to take drastic action based on weak evidence.

Can deep learning debug biology? 20n, a synthetic biology company, uses deep learning to identify which genetic changes in cells changes their biochemical output.

Source: October 17, 2016 Health and Biotech analytics news roundup by pstein

Building a Customer First Insurance Experience

Bajaj Allianz Life Insurance Company recently hosted a unique insurance summit focused on putting customer experience first. The event titled  “Future Perfect -Customer First Insurance Industry Summit 2018” was a full day event saw attendance by 21 insurance companies out of the 24 carriers that operate in India. While the focus was life insurance, the summit also saw participation from a few general insurance carriers.

The central theme of the summit was building and optimizing processes that would keep customer experience at the forefront of every customer interaction and transaction. In line with theme, experts from the insurance industry, regulatory bodies and cyber crime divisions of the Police spoke about upcoming trends in the insurance industry, fraud mitigation using advanced analytics, fighting cyber fraud etc…

Dr. Karnik, Chief Data Scientist at Aureus, spoke about how artificial intelligence and machine learning are being used to predict, and prevent fraud in the insurance industry. His presentation was based on his experience designing predictive models for early claims and fraud prevention across some of the largest life insurance carriers.

Dr. Karnik talking about ML in Insurance

 

Dr. Karnik Presenting at Insurance Summit

The complete presentation can be viewed below.

Using Machine Learning to Find a Needle in a Haystack by Dr. Nilesh Karnik

 

Dr. Karnik’s talk was very well received as  it derived from learnings from real life scenarios.  The complete presentation can be downloaded from here.

 

Future Perfect -Customer First Insurance Industry Summit 2018 put into focus the immediate challenges that the insurance industry as a whole is facing. The perspectives from multiple stakeholders – regulators, carriers, technology partners and cyber teams will help shape the next best action to make the end consumer experience epic and safe.

Originally Posted at: Building a Customer First Insurance Experience

Jan 16, 20: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

[  COVER OF THE WEEK ]

image
Insights  Source

[ AnalyticsWeek BYTES]

>> Is It Time to Jump-Start Your Data Offense? by analyticsweek

>> Data Lakes: The Enduring Case for Centralization by jelaniharper

>> Development of the Customer Sentiment Index: Lexical Differences by bobehayes

Wanna write? Click Here

[ FEATURED COURSE]

Pattern Discovery in Data Mining

image

Learn the general concepts of data mining along with basic methodologies and applications. Then dive into one subfield in data mining: pattern discovery. Learn in-depth concepts, methods, and applications of pattern disc… more

[ FEATURED READ]

The Misbehavior of Markets: A Fractal View of Financial Turbulence

image

Mathematical superstar and inventor of fractal geometry, Benoit Mandelbrot, has spent the past forty years studying the underlying mathematics of space and natural patterns. What many of his followers don’t realize is th… more

[ TIPS & TRICKS OF THE WEEK]

Strong business case could save your project
Like anything in corporate culture, the project is oftentimes about the business, not the technology. With data analysis, the same type of thinking goes. It’s not always about the technicality but about the business implications. Data science project success criteria should include project management success criteria as well. This will ensure smooth adoption, easy buy-ins, room for wins and co-operating stakeholders. So, a good data scientist should also possess some qualities of a good project manager.

[ DATA SCIENCE Q&A]

Q:What is your definition of big data?
A: Big data is high volume, high velocity and/or high variety information assets that require new forms of processing
– Volume: big data doesn’t sample, just observes and tracks what happens
– Velocity: big data is often available in real-time
– Variety: big data comes from texts, images, audio, video…

Difference big data/business intelligence:
– Business intelligence uses descriptive statistics with data with high density information to measure things, detect trends etc.
– Big data uses inductive statistics (statistical inference) and concepts from non-linear system identification to infer laws (regression, classification, clustering) from large data sets with low density information to reveal relationships and dependencies or to perform prediction of outcomes or behaviors

Source

[ VIDEO OF THE WEEK]

#FutureOfData with @CharlieDataMine, @Oracle discussing running analytics in an enterprise

 #FutureOfData with @CharlieDataMine, @Oracle discussing running analytics in an enterprise

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

The temptation to form premature theories upon insufficient data is the bane of our profession. – Sherlock Holmes

[ PODCAST OF THE WEEK]

#FutureOfData with @CharlieDataMine, @Oracle discussing running analytics in an enterprise

 #FutureOfData with @CharlieDataMine, @Oracle discussing running analytics in an enterprise

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

571 new websites are created every minute of the day.

Sourced from: Analytics.CLUB #WEB Newsletter

Big Data’s big libertarian lie: Facebook, Google and the Silicon Valley ethical overhaul we need

The tech world talks of liberty and innovation while invading privacy and surveilling us all. It must end now.

Why has Big Data so quickly become a part of industry dogma? Beyond the tremendous amounts of money being thrown at Big Data initiatives, both in research dollars and marketing efforts designed to convince enterprise clients of Big Data’s efficacy, the analytics industry plays into long-held cultural notions about the value of information. Despite Americans’ overall religiosity, our embrace of myth and superstition, our surprisingly enduring movements against evolution, vaccines, and climate change, we are a country infatuated with empiricism. “A widespread revolt against reason is as much a feature of our world as our faith in science and technology,” as Christopher Lasch said. We emphasize facts, raw data, best practices, instruction manuals, exact directions, instant replay, all the thousand types of precise knowledge. Even our love for gossip, secrets, and conspiracy theories can be seen as a desire for more privileged, inside types of information— a truer, more rarified knowledge. And when this knowledge can come to us through a machine—never mind that it’s a computer program designed by very fallible human beings—it can seem like truth of the highest order, computationally exact. Add to that a heavy dollop of consumerism (if it can be turned into a commodity, Americans are interested), and we’re ready to ride the Big Data train.

Information is comforting; merely possessing it grounds us in an otherwise unstable, confusing world. It’s a store to draw on, and we take threats to it seriously. Unlike our European brethren, we evince little tolerance for the peculiarities of genre or the full, fluid spectrum between truth and lies. We regularly kick aside cultural figures (though, rarely, politicians) who we’ve determined have misled us.

Our bromides about information—it’s power, it wants to be free, it’s a tool for liberation—also say something about our enthusiasm for it. The smartphone represents the coalescing of information into a single, personal object. Through the phone’s sheer materiality, it reminds us that data is now encoded into the air around us, ready to be called upon. We live amid an atmosphere of information. It’s numinous, spectral, but malleable. This sense of enchantment explains why every neoliberal dispatch from a remote African village must note the supposedly remarkable presence of cell phones. They too have access to information, that precious resource of postindustrial economies.
All of this is part of what I call the informational appetite. It’s our total faith in raw data, in the ability to extract empirical certainties about life’s greatest mysteries, if only one can deduce the proper connections. When the informational appetite is layered over social media, we get the messianic digital humanitarianism of Mark Zuckerberg. Connectivity becomes a human right; Facebook, we are told, can help stop terrorism and promote peace. (Check out Facebook.com/peace to see this hopeless naïveté in action.) More data disclosure leads to a more authentic self. Computerized personal assistants and ad networks and profiling algorithms are here to learn who we are through information. The pose is infantilizing: we should surrender and give them more personal data so that they can help us. At its furthest reaches, the informational appetite epitomizes the idea that we can know one another and ourselves through data alone. It becomes a metaphysic. Facebook’s Graph Search, arguably the first Big Data–like tool available to a broad, nonexpert audience, shows “how readily what we ‘like’ gets translated into who we are.”

Zuckerberg is only one exponent of what has become a folkway in the age of digital capitalism. Ever connected, perhaps fearing disconnection itself more than the fear of missing out, we live the informational appetite. We have internalized and institutionalized it by hoarding photos we’ll never organize, much less look at again; by tracking ourselves relentlessly; by feeling a peculiar anxiety whenever we find ourselves without a cell phone signal. We’ve learned to deal with information overload by denying its existence or adopting it as a sociocultural value, sprinkled with a bit of the martyrdom of the Protestant work ethic. It’s a badge of honor now to be too busy, always flooded with to-do items. It’s a problem that comes with success, which is why we’re willing to spend so much time online, engaging in, as Ian Bogost called it, hyperemployment.

There’s an inherent dissonance to all this, a dialectic that becomes part of how we enact the informational appetite. We ping-pong between binge-watching television and swearing off new media for rustic retreats. We lament our overflowing in-boxes but strive for “in-box zero”—temporary mastery over tools that usually threaten to overwhelm us. We subscribe to RSS feeds so as to see every single update from our favorite sites—or from the sites we think we need to follow in order to be well-informed members of the digital commentariat—and when Google Reader is axed, we lament its loss as if a great library were burned. We maintain cascades of tabs of must-read articles, while knowing that we’ll never be able to read them all. We face a nagging sense that there’s always something new that should be read instead of what we’re reading now, which makes it all the more important to just get through the thing in front of us. We find a quotable line to share so that we can dismiss the article from view. And when, in a moment of exhaustion, we close all the browser tabs, this gesture feels both like a small defeat and a freeing act. Soon we’re back again, turning to aggregators, mailing lists, Longreads, and the essential recommendations of curators whose brains seem somehow piped into the social-media firehose. Surrounded by an abundance of content but willing to pay for little of it, we invite into our lives unceasing advertisements and like and follow brands so that they may offer us more.

In the informational appetite, we find the corollary of digital detox and its fetishistic response to the overwhelming tide of data and stimulus. Information is power, particularly in this enervated economy, so we take on as much of it as we can handle. We succumb to what Evgeny Morozov calls “the many temptations of information consumerism.” This posture promises that anything is solvable—social and environmental and economic problems, sure, but more urgent, the problem of our (quantified) selves, since the appetite for information is ultimately a self-serving one. How can information make us richer? Smarter? Happier? Safer? How can we get better deals on gadgets and kitchenware? The informational appetite is the hunger for self-help in disguise.

Viral media thrives because it insists on both its newness and relevance—two weaknesses of the informational glutton. A third weakness: because it’s recent and seemingly unfiltered, it must be accurate. Memes have the lifespan and cultural value of fruit flies, but they’re infinite, and they satisfy our obsession with shared reference and cheap parody. We consume them with the uncaring aggression of those who blow up West Virginia Mountains to get at coal underneath. They exhaust themselves quickly, messily, when the glistening viral balloon is deflated by the revelation that the ingredients of the once-tidy story don’t add up. But no matter. There is always another to move onto, as well as someone (Wikipedia, Know Your Meme, Urban Dictionary, et al.) to catalog it.

The informational appetite is the never-ending need for more page views. It’s the irresistible compulsion to pull out your phone in the middle of a conversation to confirm some point of fact, because it’s intolerable not to know right now. It’s the smartphone as a salve for loneliness amid the crowd. It’s the “second screen” habit, in which we watch TV while playing games on our iPhone, tweeting about what we’re seeing, or looking up an actor on IMDB. It’s Google Glass and the whole idea of augmented reality, a second screen over your entire life. It’s the phenomenon of continuous partial attention, our focus split among various inputs because to concentrate on one would reduce our bandwidth, making us less knowledgeable citizens.

The informational appetite, then, is a cultural and metaphysical attitude as much as it is a business and technological ethic. But it also has three principal economic causes: the rapid decrease in the cost of data storage, the rising belief that all data is potentially useful, and the consolidation of a variety of media and communication systems into one global network, the Internet. With the ascension of Silicon Valley moguls to pop culture stardom, their philosophy has become our aspirational ideal—the key to business success, the key to self-improvement, the key to improving government and municipal services (or doing away with them entirely). There is seemingly no problem, we are told, that cannot be solved with more information and no aspect of life that cannot be digitized. As Katherine Losse noted, “To [Zuckerberg] and many of the engineers, it seemed, more data is always good, regardless of how you got it. Social graces—and privacy and psychological well-being, for that matter—are just obstacles in the way of having more information.”

The CIA’s chief technology officer isn’t immune. “We fundamentally try to collect everything and hang on to it forever,” he said at a 2013 conference sponsored by GigaOm, the technology Web site. He too preached the Big Data gospel, telling the crowd: “It is really very nearly within our grasp to be able to compute on all human generated information.” How far gone must you be to see this as beneficial?

Compared to this kind of talk, Google’s totalizing vision—“to organize the world’s information and make it universally accessible and useful”—sounds like a public service, rather than a grandiose, privacy-destroying monopoly. Google’s mission statement, along with its self-inoculating “Don’t Be Evil” slogan, has made it acceptable for other companies to speak of world-straddling ambitions. LinkedIn’s CEO describes his site thusly: “Imagine a platform that can digitally represent every opportunity in the world.” Factual wants to identify every fact in the world. Whereas once we hoped for free municipal Wi-Fi networks, now Facebook and Cisco are providing Wi-Fi in thousands of stores around the United States, a service free so long as you check into Facebook on your smartphone and allow Facebook to know whenever you’re out shopping. “Our vision is that every business in the world who have people coming in and visiting should have Facebook Wi-Fi,” said Erick Tseng, Facebook’s head of mobile products.* Given that Mark Zuckerberg has said that connectivity is a human right, does requiring patrons to log into Facebook to get free Wi-Fi impinge on their rights, or does it merely place Facebook access on the same level of humanitarianism?

All of the world’s information, every opportunity, every fact, every business on earth. Such widely shared self-regard has made it seem embarrassing to claim more modest goals for one’s business. A document sent out to members of Y Combinator, the industry’s most sought-after start-up incubator, instructed would-be founders: “If it doesn’t augment the human condition for a huge number of people in a meaningful way, it’s not worth doing.”

As long as we have the informational appetite, more data will always seem axiomatic—why wouldn’t one collect more, compute more? It’s the same absolutism found in the mantra “information wants to be free.” We won’t consider whether some types of data should be harder to find or whether the creation, preservation, and circulation of some data should be subjected to a moral calculus. Nor will we be able to ignore the data sitting in front of us; it would be irresponsible not to leverage it. If you think, as the CEO of ZestFinance does, that “all data is credit data, we just don’t know how to use it yet,” then why would you not incorporate anything—anything— you can get on consumers into your credit model? As a lender, you’re in the business of risk management, precisely the field which, in David Lyon’s view, is so well attuned to Big Data. The question of collecting and leveraging consumer data, and making decisions based on it, passes from the realm of ethics to a market imperative. It doesn’t matter if you’re forcing total transparency on a loan recipient while keeping your algorithm and data-collection practices totally hidden from view. It’s good for the bottom line. You must protect your company. That’s business.

This is how Big Data becomes a religion, and why, as long as you’re telling yourself and your customers that you’re against “evil,” you can justify building the world’s largest surveillance company. You’re doing it for the customers. The data serves them. It adds relevance to their informational lives, and what could be more important than that?

Ethics, for one thing. A sense of social responsibility. A willingness to accept your own fallibility and that, just as you can create some pretty amazing, world-spanning technological systems, these systems might produce some negative outcomes, too—outcomes that can’t be mitigated simply by collecting more data or refining search algorithms or instituting a few privacy controls.

Matt Waite, the programmer who built the mugshots site for the Tampa Bay Times, only later to stumble upon some complications, introduced some useful provocations for every software engineer. “The life of your data matters,” he wrote. “You have to ask yourself, Is it useful forever? Does it become harmful after a set time?”

Not all data is equal, nor are the architectures we create to contain them. They have built-in capacities—“affordances” is the sociologist’s term d’art—and reflect the biases of their creators. Someone who’s never been arrested, who makes $125,000 a year, and who spends his days insulated in a large, suburban corporate campus would approach programming a mugshot site much differently from someone who grew up in an inner-city ghetto and has friends behind bars. Whatever his background, Waite’s experience left him cognizant of these disparities, and led him to some important lessons. “What I want you to think about, before you write a line of code, is what does it mean to put your data on the Internet?” he said. “What could happen, good and bad? What should you do to be responsible about it?”

The problems around reputation, viral fame, micro-labor, the lapsing of journalism into a page-view- fueled horse race, the intrusiveness of digital advertising and unannounced data collection, the erosion of privacy as a societal value—nearly all would be improved, though not solved, if we found a way to stem our informational appetite, accepting that our hunger for more information has social, economic, and cultural consequences. The solutions to these challenges are familiar but no more easier to implement: regulate data brokers and pass legislation guarding against data-based discrimination; audit Internet giants’ data collecting practices and hand down heavy fines, meaningful ones, for unlawful data collection; give users more information about where their data goes and how it’s used to inform advertising; don’t give municipal tax breaks (Twitter) or special privileges at local airfields (Google) to major corporations that can afford to pay their share of taxes and fees to the cities that have provided the infrastructure that ensures their success. Encrypt everything.

Consumers need to educate themselves about these industries and think about how their data might be used to their disadvantage. But the onus shouldn’t lie there. We should be savvy enough, in this stage of late capitalism, to be skeptical of any corporate power that claims to be our friend or acting in our best interests. At the same time, the rhetoric and utility of today’s personal technology is seductive. One doesn’t want to think that a smartphone is also a surveillance device, monitoring our every movement and communication. Consumers are busy, overwhelmed, lacking the proper education, or simply unable to reckon with the pervasiveness of these systems. When your cell carrier offers you a heavily subsidized, top-of-the-line smartphone—a subsidy that comes from locking you into a two-year contract, throughout which they’ll make a bundle off of your data production—then you take it. It’s hard not to.

“If you try to keep up with this stuff in order to stop people from tracking you, you’d go out of your mind,” Joseph Turow said. “It’s very complicated and most people have a difficult time just getting their lives in order. It’s not an easy thing, and not only that, many people when they get online, they just want to do what they do and get off.”

He’s right, except that, when we now think we’re offline, we’re not. The tracking and data production continues. And so we shouldn’t be too hard on one another, particularly those less steeped in this world, when they suddenly raise fears over privacy or data collection. Saying “What did you expect?” is not the response of a wizened technology user. It’s a cynical displacement of blame.

It’s long past time for Silicon Valley to take an ethical turn. The industry’s increasing reliance on bulk data collection and targeted advertising is at odds with its rhetoric of individual liberty and innovation. Adopting other business models is both an economic and ethical imperative; perpetual surveillance of customers cannot endure, much less continue growing at such a pace.

The tech companies might have to give up something in turn, starting with their obsession with scale. Katherine Losse, the former Facebook employee, again: “The engineering ideology of Facebook itself: Scaling and growth are everything, individuals and their experiences are secondary to what is necessary to maximize the system.” Scale is its own self-sustaining project. It can’t be sated, because it doesn’t need to be. The writer and environmentalist Edward Abbey said, “Growth for the sake of growth is the ideology of the cancer cell.” The informational appetite is cancerous; it devours.

When the CEO of Factual speaks of forming a repository of all of the world’s facts, including all human beings’ “genetic information, what they ate, when and where they exercised,” he may indeed be on the path to changing the world, as he claims to be. But for whose benefit? If the world’s information is measured, collected, computed, and privatized en masse, what is being improved, beyond the bottom lines of a coterie of venture capitalists and tech industry executives? What social, economic, or political problem can be solved by this kind of computation? Will Big Data end a war? Will it convince us to convert our economy to sustainable energy sources, even if it raises gas prices?

Will it stop right-wing politicians from gutting the social safety net? Which data-driven insight will convince voters and legislators to accept gay marriage or to stop imprisoning nonviolent drug offenders? For all their talk of changing the world, technology professionals rarely get into even this basic level of specificity. Perhaps it’s because “changing the world” simply means creating a massive, rich company. These are small dreams. The dreamers “haven’t even reached the level of hypocrisy,” as the avuncular science fiction author Bruce Sterling told the assembled faithful at SXSW Interactive, the industry’s premier festival, in March 2013. “You’re stuck at the level of childish naïveté.”

Adopting a populist stance, some commentators, such as Jaron Lanier, say that to escape the tyranny of a data-driven society, we must expect to be paid for our data. We should put a price on it. Never mind that this is to give into the logic of Big Data. Rather than trying to dismantle or reform the system—one which, as Lanier acknowledges in his book Who Owns the Future?, serves the oligarchic platform owners at the expense of customers—they wish to universalize it. We should all have accounts from which we can buy and sell our data, they say. Or companies will be required to pay us market rates for using our private information. We could get personal data accounts and start entertaining bids. Perhaps we’d join Miinome, “the first member-controlled, portable human genomics marketplace,” where you can sell your genomic information and receive deals from retailers based on that same information. (Again, I think back to HSBC’s “Your DNA will be your data” ad, this time recognizing it not as an attempt at imparting a vaguely inspirational, futuristic message, but as news of a world already here.) That beats working with 23andMe, right? That company already sells your genetic profile to third parties—and that’s just in the course of the (controversial, non-FDA- compliant) testing they provide, for which they also charge you.

Tellingly, a version of this proposal for a data marketplace appears in the World Economic Forum (WEF) paper that announced data as the new oil. Who could be more taken with this idea than the technocrats of Davos? The paper, written in collaboration with Bain & Company, described a future in which “a person’s data would be equivalent to their ‘money.’ It would reside in an account where it would be controlled, managed, exchanged and accounted for just like personal banking services operate today.”

Given that practically everything we do now produces a digital record, this model would make all of human life part of one vast, automated dataveillance system. “Think of personal data as the digital record of ‘everything a person makes and does online and in the world,’ ” the WEF says. The pervasiveness of such a system will only increase with the continued development and adoption of the “Internet of things”—Internet- connected, sensor-rich devices, from clothing to appliances to security cameras to transportation infrastructure. No social or behavioral act would be immune from the long arms of neoliberal capitalism. Because everything would be tracked, everything you do would be part of some economic exchange, benefiting a powerful corporation far more than you. This isn’t emancipation through technology. It’s the subordination of life, culture, and society to the cruel demands of the market and making economic freedom unavailable to average people, because just walking down the street produces data about them.

Signs of this future have emerged. A company called Datacoup has offered people $8 per month to share their personal information, which Datacoup says it strips of identifying details before selling it on to interested companies. Through its ScreenWise program, Google has offered customers the ability to receive $5 Amazon gift cards in exchange for installing what is essentially spy software. The piece of software, a browser extension, was only available for Google’s own Chrome browser, which surely didn’t hurt its market share. Setting its sights on a full range of a family’s data, Google also offered a more expansive version of the program. Working with the market research firm GfK, Google began sending brochures presenting the opportunity to “be part of an exciting and very important new research study.” Under this program, Google would give you a new router and install monitoring software on anything with a Wi-Fi connection, even Blu-Ray players. The more you added, the more money you’d make. But it doesn’t add up to much. One recipient estimated that he could earn $945 from connecting all of his family’s devices for one year. A small sum for surrendering a year of privacy, for accepting a year of total surveillance. It is, however, more than most of us are getting now for much the same treatment.

Making consumers complicit—yet far from partners—in the data trade would only increase these inequities. We would experience data-based surveillance in every aspect of our lives (a dream for intelligence agencies). Such a scheme would also be open to manipulation or to the kind of desperate micro-labor and data accumulation exemplified by Mechanical Turk and other online labor marketplaces. You’d wonder if your friend’s frequent Facebook posts actually represented incidents from his life or whether he was simply trying to be a good, profitable data producer. If we were to lose our jobs, we might not go on welfare, if it’s even still available; we’d accept a dole in the form of more free devices and software from big corporations to harvest our data, or more acutely personal data, at very low rates. A gray market of bots and data-generating services would appear, allowing you to pay for automated data producers or poorly paid, occasionally blocked ones working, like World of Warcraft gold miners, in crowded apartments in some Chinese industrial center.

Those already better off, better educated, or technically adept would have the most time, knowledge, and resources to leverage this system. In another paper on the data trade, the WEF (this time working with the Boston Consulting Group) spoke of the opportunity for services to develop to manage users’ relations with data collectors and brokers—something on the order of financial advisors, real estate agents, and insurers. Risk management and insurance professionals should thrive in such a market, as people begin to take out reputational and data-related insurance; both fields depend as much on the perception of insecurity as the reality. Banks could financialize data and data-insurance policies, creating complex derivatives, securities, and other baroque financial instruments. The poor, and those without the ability to use the tools that’d make them good data producers, would lose out. Income inequality, already driven in part by the collapse of industrial economies and the rise of postindustrial, post-employment information economies, would increase ever more.

This situation won’t be completely remedied by more aggressive regulation, consumer protections, and eliminating tax breaks. Increasing automation, fueled by this boom in data collection and mining, may lead to systemic unemployment of a kind we’ve never seen. Those contingent workers laboring for tech companies through Elance or Mechanical Turk will soon enough be replaced by automated systems. It’s clear that, except for an elite class of managers, engineers, and executives, human labor is seen as a problem that technology can solve. In the meantime, those whose sweat this industry still relies upon find themselves submitting to exploitative conditions, whether as a Foxconn worker in Shenzhen or a Postmates courier in San Francisco. As one Uber driver complained to a reporter: “We have a real person performing a function, not a Google automatic car. We have become the functional end of the app.” It might not be long before he is traded in for a self-driving car. They don’t need breaks, they don’t worry about safety conditions or unions, they don’t complain about wages. Compared to a human being, automatic cars are perfectly efficient.

And who will employ him then? Who will be interested in someone who’s spent a few years bouncing between gray-market transportation facilitation services, distributed labor markets, and other hazy digital makework? He will have no experience, no connections, and little accrued knowledge. He will have lapsed from subsistence farming in the data fields to something worse and more desultory—a superfluous machine.

Automation, then, should ensure that power, data, and money continue to accrue in tandem. Through ubiquitous surveillance and data collection, now stretching from computers to cell phones to thermostats to cars to public spaces, a handful of large companies have successfully socialized our data production on their behalves. We need some redistribution of resources, which ultimately means a redistribution of power and authority. A universal basic income, paid for in part by taxes and fees levied on the companies making fabulous profits out of the quotidian materials of our lives, would help to reintroduce some fairness into our technologized economy. It’s an idea that’s had support from diverse corners—liberals and leftists often cite it as a pragmatic response to widespread inequality, while some conservatives and libertarians see it as an improvement over an imperfect welfare system. As the number of long-term unemployed, contingent, and gig workers increases, a universal basic income would restore some equity to the system. It would also make the supposed freedom of those TaskRabbit jobs actually mean something, for the laborer would know that even if the company cares little for his welfare or ability to make a living, someone else does and is providing the resources to make sure that economic precarity doesn’t turn into something more dire.

These kinds of policies would help us to begin to wean ourselves off of our informational appetite. They would also foment a necessary cultural transformation, one in which our collective attitude toward our technological infrastructure would shift away from blind wonder and toward a learned skepticism. I’m optimistic that we can get there. But to accomplish this, we’ll have to start supporting and promoting the few souls who are already doing something about our corrupted informational economy. Through art, code, activism, or simply a prankish disregard for convention, the rebellion against persistent surveillance, dehumanizing data collection, and a society administered by black-box algorithms has already begun.

Excerpted from “Terms of Service: Social Media and the Price of Constant Connection” by Jacob Silverman. Published by Harper, a division of HarperCollins. Copyright 2015 by Jacob Silverman. Reprinted with permission of the author. All rights reserved.

Originally posted via “Big Data’s big libertarian lie: Facebook, Google and the Silicon Valley ethical overhaul we need”

Source: Big Data’s big libertarian lie: Facebook, Google and the Silicon Valley ethical overhaul we need by analyticsweekpick

Sep 28, 17: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Fake data  Source

[ NEWS BYTES]

>>
 Microsoft Azure to Feature New Big Data Analytics Platform – MeriTalk (blog) Under  Big Data Analytics

>>
 Digital Guardian Declares a New Dawn for Data Loss Prevention – insideBIGDATA Under  Big Data Security

>>
 The DCIM tool and its place in the modern data center – TechTarget Under  Data Center

More NEWS ? Click Here

[ FEATURED COURSE]

Statistical Thinking and Data Analysis

image

This course is an introduction to statistical data analysis. Topics are chosen from applied probability, sampling, estimation, hypothesis testing, linear regression, analysis of variance, categorical data analysis, and n… more

[ FEATURED READ]

How to Create a Mind: The Secret of Human Thought Revealed

image

Ray Kurzweil is arguably today’s most influential—and often controversial—futurist. In How to Create a Mind, Kurzweil presents a provocative exploration of the most important project in human-machine civilization—reverse… more

[ TIPS & TRICKS OF THE WEEK]

Data Analytics Success Starts with Empowerment
Being Data Driven is not as much of a tech challenge as it is an adoption challenge. Adoption has it’s root in cultural DNA of any organization. Great data driven organizations rungs the data driven culture into the corporate DNA. A culture of connection, interactions, sharing and collaboration is what it takes to be data driven. Its about being empowered more than its about being educated.

[ DATA SCIENCE Q&A]

Q:What is: lift, KPI, robustness, model fitting, design of experiments, 80/20 rule?
A: Lift:
It’s measure of performance of a targeting model (or a rule) at predicting or classifying cases as having an enhanced response (with respect to the population as a whole), measured against a random choice targeting model. Lift is simply: target response/average response.

Suppose a population has an average response rate of 5% (mailing for instance). A certain model (or rule) has identified a segment with a response rate of 20%, then lift=20/5=4

Typically, the modeler seeks to divide the population into quantiles, and rank the quantiles by lift. He can then consider each quantile, and by weighing the predicted response rate against the cost, he can decide to market that quantile or not.
“if we use the probability scores on customers, we can get 60% of the total responders we’d get mailing randomly by only mailing the top 30% of the scored customers”.

KPI:
– Key performance indicator
– A type of performance measurement
– Examples: 0 defects, 10/10 customer satisfaction
– Relies upon a good understanding of what is important to the organization

More examples:

Marketing & Sales:
– New customers acquisition
– Customer attrition
– Revenue (turnover) generated by segments of the customer population
– Often done with a data management platform

IT operations:
– Mean time between failure
– Mean time to repair

Robustness:
– Statistics with good performance even if the underlying distribution is not normal
– Statistics that are not affected by outliers
– A learning algorithm that can reduce the chance of fitting noise is called robust
– Median is a robust measure of central tendency, while mean is not
– Median absolute deviation is also more robust than the standard deviation

Model fitting:
– How well a statistical model fits a set of observations
– Examples: AIC, R2, Kolmogorov-Smirnov test, Chi 2, deviance (glm)

Design of experiments:
The design of any task that aims to describe or explain the variation of information under conditions that are hypothesized to reflect the variation.
In its simplest form, an experiment aims at predicting the outcome by changing the preconditions, the predictors.
– Selection of the suitable predictors and outcomes
– Delivery of the experiment under statistically optimal conditions
– Randomization
– Blocking: an experiment may be conducted with the same equipment to avoid any unwanted variations in the input
– Replication: performing the same combination run more than once, in order to get an estimate for the amount of random error that could be part of the process
– Interaction: when an experiment has 3 or more variables, the situation in which the interaction of two variables on a third is not additive

80/20 rule:
– Pareto principle
– 80% of the effects come from 20% of the causes
– 80% of your sales come from 20% of your clients
– 80% of a company complaints come from 20% of its customers

Source

[ VIDEO OF THE WEEK]

Data-As-A-Service (#DAAS) to enable compliance reporting

 Data-As-A-Service (#DAAS) to enable compliance reporting

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

Everybody gets so much information all day long that they lose their common sense. – Gertrude Stein

[ PODCAST OF THE WEEK]

#DataScience Approach to Reducing #Employee #Attrition

 #DataScience Approach to Reducing #Employee #Attrition

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

We are seeing a massive growth in video and photo data, where every minute up to 300 hours of video are uploaded to YouTube alone.

Sourced from: Analytics.CLUB #WEB Newsletter

Source

Jan 09, 20: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

[  COVER OF THE WEEK ]

image
Statistics  Source

[ AnalyticsWeek BYTES]

>> October 30, 2017 Health and Biotech analytics news roundup by pstein

>> 9th edition of Aegis Graham Bell Award nominations open by administrator

>> Logi is Named #1 in Embedded Business Intelligence in The BI Survey 19 by analyticsweek

Wanna write? Click Here

[ FEATURED COURSE]

The Analytics Edge

image

This is an Archived Course
EdX keeps courses open for enrollment after they end to allow learners to explore content and continue learning. All features and materials may not be available, and course content will not be… more

[ FEATURED READ]

On Intelligence

image

Jeff Hawkins, the man who created the PalmPilot, Treo smart phone, and other handheld devices, has reshaped our relationship to computers. Now he stands ready to revolutionize both neuroscience and computing in one strok… more

[ TIPS & TRICKS OF THE WEEK]

Fix the Culture, spread awareness to get awareness
Adoption of analytics tools and capabilities has not yet caught up to industry standards. Talent has always been the bottleneck towards achieving the comparative enterprise adoption. One of the primal reason is lack of understanding and knowledge within the stakeholders. To facilitate wider adoption, data analytics leaders, users, and community members needs to step up to create awareness within the organization. An aware organization goes a long way in helping get quick buy-ins and better funding which ultimately leads to faster adoption. So be the voice that you want to hear from leadership.

[ DATA SCIENCE Q&A]

Q:Examples of NoSQL architecture?
A: * Key-value: in a key-value NoSQL database, all of the data within consists of an indexed key and a value. Cassandra, DynamoDB
* Column-based: designed for storing data tables as sections of columns of data rather than as rows of data. HBase, SAP HANA
* Document Database: map a key to some document that contains structured information. The key is used to retrieve the document. MongoDB, CouchDB
* Graph Database: designed for data whose relations are well-represented as a graph and has elements which are interconnected, with an undetermined number of relations between them. Polyglot Neo4J

Source

[ VIDEO OF THE WEEK]

Using Topological Data Analysis on your BigData

 Using Topological Data Analysis on your BigData

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

I keep saying that the sexy job in the next 10 years will be statisticians. And I’m not kidding. – Hal Varian

[ PODCAST OF THE WEEK]

Ashok Srivastava(@aerotrekker @intuit) on Winning the Art of #DataScience #FutureOfData #Podcast

 Ashok Srivastava(@aerotrekker @intuit) on Winning the Art of #DataScience #FutureOfData #Podcast

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

Every person in the US tweeting three tweets per minute for 26,976 years.

Sourced from: Analytics.CLUB #WEB Newsletter