[ COVER OF THE WEEK ]
Human resource Source
[ AnalyticsWeek BYTES]
>> Riding on Data Driven Journey by d3eksha
>> Leveraging Virtualization to Streamline Data Management by analyticsweekpick
>> Geeks Vs Nerds [Infographics] by v1shal
Wanna write? Click Here
[ NEWS BYTES]
Research Reveals Predictive and Prescriptive Analytics Best Practices – CPAPracticeAdvisor.com Under Analytics
How Companies Get Value from Data Science Production – insideBIGDATA Under Data Science
Connected Enterprise Market Size to Reach $400.87 Billion By 2021 with Accelerite, Cisco Systems, General Electric … – Yahoo Finance Under Streaming Analytics
More NEWS ? Click Here
[ FEATURED COURSE]
Lean Analytics Workshop – Alistair Croll and Ben Yoskovitz
[ FEATURED READ]
Machine Learning With Random Forests And Decision Trees: A Visual Guide For Beginners
[ TIPS & TRICKS OF THE WEEK]
Analytics Strategy that is Startup Compliant
With right tools, capturing data is easy but not being able to handle data could lead to chaos. One of the most reliable startup strategy for adopting data analytics is TUM or The Ultimate Metric. This is the metric that matters the most to your startup. Some advantages of TUM: It answers the most important business question, it cleans up your goals, it inspires innovation and helps you understand the entire quantified business.
[ DATA SCIENCE Q&A]
Q:What is cross-validation? How to do it right?
A: It’s a model validation technique for assessing how the results of a statistical analysis will generalize to an independent data set. Mainly used in settings where the goal is prediction and one wants to estimate how accurately a model will perform in practice. The goal of cross-validation is to define a data set to test the model in the training phase (i.e. validation data set) in order to limit problems like overfitting, and get an insight on how the model will generalize to an independent data set.
Examples: leave-one-out cross validation, K-fold cross validation
How to do it right?
the training and validation data sets have to be drawn from the same population
predicting stock prices: trained for a certain 5-year period, it’s unrealistic to treat the subsequent 5-year a draw from the same population
common mistake: for instance the step of choosing the kernel parameters of a SVM should be cross-validated as well
Bias-variance trade-off for k-fold cross validation:
Leave-one-out cross-validation: gives approximately unbiased estimates of the test error since each training set contains almost the entire data set (n?1n?1 observations).
But: we average the outputs of n fitted models, each of which is trained on an almost identical set of observations hence the outputs are highly correlated. Since the variance of a mean of quantities increases when correlation of these quantities increase, the test error estimate from a LOOCV has higher variance than the one obtained with k-fold cross validation
Typically, we choose k=5 or k=10, as these values have been shown empirically to yield test error estimates that suffer neither from excessively high bias nor high variance.
[ VIDEO OF THE WEEK]
@AnalyticsWeek: Big Data at Work: Paul Sonderegger
Subscribe to Youtube
[ QUOTE OF THE WEEK]
You can have data without information, but you cannot have information without data. – Daniel Keys Moran
[ PODCAST OF THE WEEK]
Discussing #InfoSec with @travturn, @hrbrmstr(@rapid7) @thebearconomist(@boozallen) @yaxa_io
[ FACT OF THE WEEK]
Decoding the human genome originally took 10 years to process; now it can be achieved in one week.