Nov 30, 17: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Trust the data  Source

[ AnalyticsWeek BYTES]

>> Linking Operational and VoC Metrics by bobehayes

>> Emergent Trends in Machine Learning: Business Autonomy by jelaniharper

>> Malaysia opens digital government lab for big data analytics by analyticsweekpick

Wanna write? Click Here

[ NEWS BYTES]

>>
 Bunch wants to be ‘Google Analytics for company culture … – TechCrunch Under  Analytics

>>
 California State University Moves Administrative Systems to Hybrid … – Campus Technology Under  Hybrid Cloud

>>
 How new Google Trends features can make your business smarter – TechRepublic Under  Social Analytics

More NEWS ? Click Here

[ FEATURED COURSE]

Deep Learning Prerequisites: The Numpy Stack in Python

image

The Numpy, Scipy, Pandas, and Matplotlib stack: prep for deep learning, machine learning, and artificial intelligence… more

[ FEATURED READ]

Hypothesis Testing: A Visual Introduction To Statistical Significance

image

Statistical significance is a way of determining if an outcome occurred by random chance, or did something cause that outcome to be different than the expected baseline. Statistical significance calculations find their … more

[ TIPS & TRICKS OF THE WEEK]

Strong business case could save your project
Like anything in corporate culture, the project is oftentimes about the business, not the technology. With data analysis, the same type of thinking goes. It’s not always about the technicality but about the business implications. Data science project success criteria should include project management success criteria as well. This will ensure smooth adoption, easy buy-ins, room for wins and co-operating stakeholders. So, a good data scientist should also possess some qualities of a good project manager.

[ DATA SCIENCE Q&A]

Q:You are compiling a report for user content uploaded every month and notice a spike in uploads in October. In particular, a spike in picture uploads. What might you think is the cause of this, and how would you test it?
A: * Halloween pictures?
* Look at uploads in countries that don’t observe Halloween as a sort of counter-factual analysis
* Compare uploads mean in October and uploads means with September: hypothesis testing

Source

[ VIDEO OF THE WEEK]

Big Data Introduction to D3

 Big Data Introduction to D3

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

The data fabric is the next middleware. – Todd Papaioannou

[ PODCAST OF THE WEEK]

Unconference Panel Discussion: #Workforce #Analytics Leadership Panel

 Unconference Panel Discussion: #Workforce #Analytics Leadership Panel

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

Big data is a top business priority and drives enormous opportunity for business improvement. Wikibon’s own study projects that big data will be a $50 billion business by 2017.

Sourced from: Analytics.CLUB #WEB Newsletter

Nov 23, 17: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
SQL Database  Source

[ LOCAL EVENTS & SESSIONS]

More WEB events? Click Here

[ AnalyticsWeek BYTES]

>> 20 Best Practices for Customer Feedback Programs: Applied Research by bobehayes

>> Two Underutilized Heroes of Data & Innovation: Correlation & Covariance by v1shal

>> 80/20 Rule For Startups by v1shal

Wanna write? Click Here

[ FEATURED COURSE]

Process Mining: Data science in Action

image

Process mining is the missing link between model-based process analysis and data-oriented analysis techniques. Through concrete data sets and easy to use software the course provides data science knowledge that can be ap… more

[ FEATURED READ]

Big Data: A Revolution That Will Transform How We Live, Work, and Think

image

“Illuminating and very timely . . . a fascinating — and sometimes alarming — survey of big data’s growing effect on just about everything: business, government, science and medicine, privacy, and even on the way we think… more

[ TIPS & TRICKS OF THE WEEK]

Data aids, not replace judgement
Data is a tool and means to help build a consensus to facilitate human decision-making but not replace it. Analysis converts data into information, information via context leads to insight. Insights lead to decision making which ultimately leads to outcomes that brings value. So, data is just the start, context and intuition plays a role.

[ DATA SCIENCE Q&A]

Q:Do you know a few “rules of thumb” used in statistical or computer science? Or in business analytics?

A: Pareto rule:
– 80% of the effects come from 20% of the causes
– 80% of the sales come from 20% of the customers

Computer science: “simple and inexpensive beats complicated and expensive” – Rod Elder

Finance, rule of 72:
– Estimate the time needed for a money investment to double
– 100$ at a rate of 9%: 72/9=8 years

Rule of three (Economics):
– There are always three major competitors in a free market within one industry

Source

[ VIDEO OF THE WEEK]

@AnalyticsWeek: Big Data at Work: Paul Sonderegger

 @AnalyticsWeek: Big Data at Work: Paul Sonderegger

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

Data really powers everything that we do. – Jeff Weiner

[ PODCAST OF THE WEEK]

@BrianHaugli @The_Hanover ?on Building a #Leadership #Security #Mindset #FutureOfData #Podcast

 @BrianHaugli @The_Hanover ?on Building a #Leadership #Security #Mindset #FutureOfData #Podcast

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

According to execs, the influx of data is putting a strain on IT infrastructure. 55 percent of respondents reporting a slowdown of IT systems and 47 percent citing data security problems, according to a global survey from Avanade.

Sourced from: Analytics.CLUB #WEB Newsletter

Telangana government to set up India’s first data analytics park

Telangana-Data-Analytics-Park

Minister for IT, Government of Telangana, K.T. Rama Rao has announced a proposal to build a data analytics park in Hyderabad. Speaking at the inauguration of Techwave Consulting’s office there, he also stressed on the importance of SMEs. The government is also planning to set up SME Tower to house SMEs at Kondapur. He believes it would attract investment for them.
Speaking further he added that Government of Telangana has proposed to set a data analytics park in Gachibowli in the city. “This kind of park does not exist in India yet. We have made a proposal to Engineering Staff College of India (ESCI), an autonomous organ of India’s largest body of professional engineers. Its Board of Directors will meet in September and decide on the same,” informed K.T. Rama Rao.

Rama Rao described how, over the last twenty years, Hyderabad has transformed in ‘Cyberabad’ and that now the city is considered one of the best options for low cost, high value human resources. Hyderabad is the second largest IT exporting company and contributes 12% of country’s IT exports.

Addressing the gathering, Dr. BVR Mohan Reddy, Chairman-NASSCOM said the industry continues to grow. It is a $148 billion industry which is growing at the rate of 14% per cent annually. The IT sector employs 3.1 million people in India. 2.5 lakh IT professionals were recruited last year. “We expect similar or better growth this year,” he said.

“While this industry is generating lot of employment, there is a word of caution to the budding professionals. You will not have better prospects, until you have domain knowledge and vertical specialisation. Make yourself distinctly different. Learn continuously. Catch up with ever changing technologies. And have right skills,” BVR Mohan Reddy added.

Read more at: http://www.firstpost.com/business/telangana-government-set-indias-first-data-analytics-park-2376298.html

Source

A Big Data Cheat Sheet: What Executives Want to Know

In April, I was given the opportunity to present An Executive’s Cheat Sheet on Hadoop, the Enterprise Data Warehouse and the Data Lake at the SAS Global Forum Executive Conference in Dallas. During this standing-room only session, I addressed these five questions:

What can Hadoop do that my data warehouse can’t?
We’re not doing “big” data, so why do we need Hadoop?
Is Hadoop enterprise-ready?
Isn’t a data lake just the data warehouse revisited?
What are some of the pros and cons of a data lake?
Following is a recap of my comments, along with a few screenshots. See what you think.

1. WHAT CAN HADOOP DO THAT MY DATA WAREHOUSE CAN’T?

The short answer is: (1) Store any and all kinds of data more cheaply and (2) process all this data more quickly (and cheaply).

The longer answer is: I made reference to my opening “soapbox” statement – “Big data is not new.” They say that 20% of the data we deal with today is structured data (see examples in orange boxes below). I also call this traditional, relational data. The other 80% is semi-structured or unstructured data (examples in blue boxes), and this is what I call “big” data.

Hadoop

Are any of these example blue-box data types new? Of course not. We’ve been collecting, processing, storing, and analyzing all this data for decades. What we haven’t been able to do very well, however, if at all, is mix the orange- and blue-box data together.

So here’s what’s new: We now have the technologies to collect, process, store, and analyze all this data together. In other words, we can now mix-&-match the orange- and blue-box data together – at a fraction of the cost and time of our traditional, relational systems.

2. WE’RE NOT DOING “BIG” DATA, SO WHY DO WE NEED HADOOP?

I proposed six common Hadoop use cases—three of which don’t require “big” data at all to take full advantage of Hadoop. These use cases come from my white paper called The Non-Geek’s Big Data Playbook: Hadoop and the Enterprise Data Warehouse.

Here’s a brief summary of each use case:

Stage structured data. Use Hadoop as a data staging platform for your data warehouse.
Process structured data. Use Hadoop to update data in your data warehouse and/or operational systems.
Archive all data. Use Hadoop to archive all your data on-premises or in the cloud.
Process any data. Use Hadoop to take advantage of non-integrated and unstructured data that’s currently unavailable in your data warehouse.
Access any data (via data warehouse). Use Hadoop to extend your data warehouse and keep it at the center of your organization’s data universe.
Access any data (via Hadoop). Use Hadoop as the landing platform for all data and exploit the strengths of both the data warehouse and Hadoop.
If you’d like to see these use cases further explained and demonstrated with some easy-to-understand visuals, I invite you to download the white paper.

3. IS HADOOP ENTERPRISE-READY?

I have two answers to this question:

For your organization: Maybe.
For all organizations: No.
It all depends on what and how you want to use Hadoop in your organization. If you simply want to use it as an additional (or alternative) storage repository and/or as a short-term data processor, then by all means, Apache Hadoop is ready for you.

However, if you want to go beyond data storage and processing and are looking for some of the same data management and analysis capabilities you currently have with your existing relational systems, you will first need to explore the vast ecosystem of Hadoop-related open source and proprietary projects and products. This will not be a small undertaking.

Because many of these newer Hadoop-related technologies are still maturing—quite rapidly, I might add—that’s why I say Hadoop—as in the Hadoop ecosystem—isn’t 100% ready for the enterprise.

4. ISN’T A DATA LAKE JUST THE DATA WAREHOUSE REVISITED?

Many of us have been learning more about the data lake, especially in the last 6 months. Some suggest that the data lake is just a reincarnation of the data warehouse—in the spirit of “been there, done that.” Others focus on how much better this “shiny, new” data lake is, while others are standing on the shoreline screaming, “Don’t go in! It’s not a lake—it’s a swamp!”

All kidding aside, the commonality I see is that they are both data storage repositories. Beyond that, the table below highlights some key differences. This is, by no means, an exhaustive list, but it does get us past this “been there, done that” mentality. A data lake is not a data warehouse.

data lake

5. WHAT ARE SOME OF THE PROS AND CONS OF A DATA LAKE?

Some of you may be aware of the Data Lake Debate blog series I recently participated in with my colleague, Anne Buff, on SmartData Collective. I took the Pro stance, Anne took the Con stance, and our boss, Jill Dyché, moderated.

It was an intense 8 weeks of discussion—loosely structured like a Lincoln-Douglas debate—and many key points about the data lake were addressed. During my presentation, I summed up these key points using a SWOT diagram:

SWOT

Note: Read original post here.

Originally Posted at: A Big Data Cheat Sheet: What Executives Want to Know

Nov 16, 17: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Human resource  Source

[ AnalyticsWeek BYTES]

>> Automating Data Modeling for the Internet of Things: Accelerating Transformation and Data Preparation by jelaniharper

>> The End of Transformation: Expediting Data Preparation and Analytics with Edge Computing by jelaniharper

>> The Total Customer Experience: How Oracle Builds their Business Around the Customer by bobehayes

Wanna write? Click Here

[ NEWS BYTES]

>>
 Mercy Builds a Healthy Data Framework – CIO Insight Under  Analytics

>>
 One Click Retail Publishes Amazon Holiday Strategy Report for Brand Manufacturers – Broadway World Under  Sales Analytics

>>
 HPE, Hedvig announce hybrid cloud storage partnership – Network World Under  Hybrid Cloud

More NEWS ? Click Here

[ FEATURED COURSE]

Baseball Data Wrangling with Vagrant, R, and Retrosheet

image

Analytics with the Chadwick tools, dplyr, and ggplot…. more

[ FEATURED READ]

Antifragile: Things That Gain from Disorder

image

Antifragile is a standalone book in Nassim Nicholas Taleb’s landmark Incerto series, an investigation of opacity, luck, uncertainty, probability, human error, risk, and decision-making in a world we don’t understand. The… more

[ TIPS & TRICKS OF THE WEEK]

Strong business case could save your project
Like anything in corporate culture, the project is oftentimes about the business, not the technology. With data analysis, the same type of thinking goes. It’s not always about the technicality but about the business implications. Data science project success criteria should include project management success criteria as well. This will ensure smooth adoption, easy buy-ins, room for wins and co-operating stakeholders. So, a good data scientist should also possess some qualities of a good project manager.

[ DATA SCIENCE Q&A]

Q:What is better: good data or good models? And how do you define ‘good”? Is there a universal good model? Are there any models that are definitely not so good?
A: * Good data is definitely more important than good models
* If quality of the data wasn’t of importance, organizations wouldn’t spend so much time cleaning and preprocessing it!
* Even for scientific purpose: good data (reflected by the design of experiments) is very important

How do you define good?
– good data: data relevant regarding the project/task to be handled
– good model: model relevant regarding the project/task
– good model: a model that generalizes on external data sets

Is there a universal good model?
– No, otherwise there wouldn’t be the overfitting problem!
– Algorithm can be universal but not the model
– Model built on a specific data set in a specific organization could be ineffective in other data set of the same organization
– Models have to be updated on a somewhat regular basis

Are there any models that are definitely not so good?
– ‘all models are wrong but some are useful” George E.P. Box
– It depends on what you want: predictive models or explanatory power
– If both are bad: bad model

Source

[ VIDEO OF THE WEEK]

Jeff Palmucci @TripAdvisor discusses managing a #MachineLearning #AI Team

 Jeff Palmucci @TripAdvisor discusses managing a #MachineLearning #AI Team

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

In God we trust. All others must bring data. – W. Edwards Deming

[ PODCAST OF THE WEEK]

#FutureOfData with @CharlieDataMine, @Oracle discussing running analytics in an enterprise

 #FutureOfData with @CharlieDataMine, @Oracle discussing running analytics in an enterprise

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

Three-quarters of decision-makers (76 per cent) surveyed anticipate significant impacts in the domain of storage systems as a result of the “Big Data” phenomenon.

Sourced from: Analytics.CLUB #WEB Newsletter

Surge in real-time big data and IoT analytics is changing corporate thinking

Big data that can be immediately actionable in business decisions is transforming corporate thinking. One expert cautions that a mindset change is needed to get the most from these analytics.

Gartner reported in September 2014 that 73% of respondents in a third quarter 2014 survey had already invested or planned to invest in big data in the next 24 months. This was an increase from 64% in 2013.

The big data surge has fueled the adoption of Hadoop and other big data batch processing engines, but it is also moving beyond batch and into a real-time big data analytics approach.

Organizations want real-time big data and analytics capability because of an emerging need for big data that can be immediately actionable in business decisions. An example is the use of big data in online advertising, which immediately personalizes ads for viewers when they visit websites based on their customer profiles that big data analytics have captured.

“Customers now expect personalization when they visit websites,” said Jeff Kelley, a big data analytics analyst from Wikibon, a big data research and analytics company. “There are also other real-time big data needs in specific industry verticals that want real-time analytics capabilities.”

The financial services industry is a prime example. “Financial institutions want to cut down on fraud, and they also want to provide excellent service to their customers,” said Kelley. “Several years ago, if a customer tried to use his debit card in another country, he was often denied because of fears of fraud in the system processing the transaction. Now these systems better understand each customer’s habits and the places that he is likely to travel to, so they do a better job at preventing fraud, but also at enabling customers to use their debit cards without these cards being locked down for use when they travel abroad.”

Kelly believes that in the longer term this ability to apply real-time analytics to business problems will grow as the Internet of Things (IoT) becomes a bigger factor in daily life.

“The Internet of Things will enable sensor tacking of consumer type products in businesses and homes,” he said. “You will be collect and analyze data from various pieces of equipment and appliances and optimize performance.”

The process of harnessing IoT data is highly complex, and companies like GE are now investigating the possibilities. If this IoT data can be captured in real time and acted upon, preventive maintenance analytics can be developed to preempt performance problems on equipment and appliances, and it might also be possible for companies to deliver more rigorous sets of service level agreements (SLAs) to their customers.

Kelly is excited at the prospects, but he also cautions that companies have to change the way they view themselves and their data to get the most out of IoT advancement.

“There is a fundamental change of mindset,” he explained, “and it will require different ways of approaching application development and how you look at the business. For example, a company might have to redefine itself from thinking that it only makes ‘makes trains,’ to a company that also ‘services trains with data.'”

The service element, warranties, service contracts, how you interact with the customer, and what you learn from these customer interactions that could be forwarded into predictive selling are all areas that companies might need to rethink and realign in their business as more IoT analytics come online. The end result could be a reformation of customer relationship management (CRM) to a strictly customer-centric model that takes into account every aspect of the customer’s “life cycle” with the company — from initial product purchases, to servicing, to end of product life considerations and a new beginning of the sales cycle.

Originally posted via “Surge in real-time big data and IoT analytics is changing corporate thinking”

Source by analyticsweekpick

Nov 09, 17: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Pacman  Source

[ AnalyticsWeek BYTES]

>> July 24, 2017 Health and Biotech analytics news roundup by pstein

>> 8 Ways Big Data and Analytics Will Change Sports by analyticsweekpick

>> What “Gangnam Style” could teach about branding: 5 Lessons by d3eksha

Wanna write? Click Here

[ NEWS BYTES]

>>
 Is your business too complacent about cyber security? – Information Age Under  cyber security

>>
 Trilliant buys payer analytics play – Nashville Post Under  Health Analytics

>>
 Gran Tierra Appoints New Director – Markets Insider Under  Risk Analytics

More NEWS ? Click Here

[ FEATURED COURSE]

A Course in Machine Learning

image

Machine learning is the study of algorithms that learn from data and experience. It is applied in a vast variety of application areas, from medicine to advertising, from military to pedestrian. Any area in which you need… more

[ FEATURED READ]

Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython

image

Python for Data Analysis is concerned with the nuts and bolts of manipulating, processing, cleaning, and crunching data in Python. It is also a practical, modern introduction to scientific computing in Python, tailored f… more

[ TIPS & TRICKS OF THE WEEK]

Fix the Culture, spread awareness to get awareness
Adoption of analytics tools and capabilities has not yet caught up to industry standards. Talent has always been the bottleneck towards achieving the comparative enterprise adoption. One of the primal reason is lack of understanding and knowledge within the stakeholders. To facilitate wider adoption, data analytics leaders, users, and community members needs to step up to create awareness within the organization. An aware organization goes a long way in helping get quick buy-ins and better funding which ultimately leads to faster adoption. So be the voice that you want to hear from leadership.

[ DATA SCIENCE Q&A]

Q:How do you assess the statistical significance of an insight?
A: * is this insight just observed by chance or is it a real insight?
Statistical significance can be accessed using hypothesis testing:
– Stating a null hypothesis which is usually the opposite of what we wish to test (classifiers A and B perform equivalently, Treatment A is equal of treatment B)
– Then, we choose a suitable statistical test and statistics used to reject the null hypothesis
– Also, we choose a critical region for the statistics to lie in that is extreme enough for the null hypothesis to be rejected (p-value)
– We calculate the observed test statistics from the data and check whether it lies in the critical region

Common tests:
– One sample Z test
– Two-sample Z test
– One sample t-test
– paired t-test
– Two sample pooled equal variances t-test
– Two sample unpooled unequal variances t-test and unequal sample sizes (Welch’s t-test)
– Chi-squared test for variances
– Chi-squared test for goodness of fit
– Anova (for instance: are the two regression models equals? F-test)
– Regression F-test (i.e: is at least one of the predictor useful in predicting the response?)

Source

[ VIDEO OF THE WEEK]

Data-As-A-Service (#DAAS) to enable compliance reporting

 Data-As-A-Service (#DAAS) to enable compliance reporting

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

Everybody gets so much information all day long that they lose their common sense. – Gertrude Stein

[ PODCAST OF THE WEEK]

Understanding Data Analytics in Information Security with @JayJarome, @BitSight

 Understanding Data Analytics in Information Security with @JayJarome, @BitSight

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

Every person in the world having more than 215m high-resolution MRI scans a day.

Sourced from: Analytics.CLUB #WEB Newsletter

Big data: managing the legal and regulatory risks

Too many organisations enter into the hype of big data without a comprehensive view of the legal and regulatory minefield they’re about to navigate.

n the rush to embrace the opportunities that big data brings, we must not forget history. One characteristic of the dotcom boom (and bust) was that many new Internet businesses did not address the basics of doing business. For example, many of them found too late to their cost that they had not secured their asset values through appropriate legal protection and intellectual property rights. After all, anyone can set up a website. That meant that funding from private equity and venture capital sources was hard to come by.

When adopting a new and potentially disruptive technology such as Big Data, just as with any new venture, all the risks need to be identified and managed. That includes securing asset values and addressing the other legal and regulatory risks. As the Information Commissioner recently observed, Big Data is ‘not a game played by different rules’ (The Information Commissioner’s Office, Big Data and Data Protection, 2014). Among other things, a failure to address legal and regulatory risk in relation to Big Data could result in a serious regulatory breach, attracting fines, reputational damage and loss of business. In this article we consider how to identify and manage such risks.

Big Data consists of large, complex data sets generated from sensors (for example, via the Internet of Things), Internet transactions, mobile payments, email, click streams and other digital interactions. Small and unconnected pieces of data generated from these sources, when amalgamated and subjected to powerful Big Data analytics, can reveal useful information about the user.

Why use big data?

Big Data analytics is predictive in character, allowing a business to interact with its customers as individuals, on a bespoke basis (reflecting customer preferences) through tailored advice, offers and related products, with the objectives of obtaining a market advantage and engendering customer loyalty. Beyond customer interactions, Big Data is used to make market predictions and will increasingly inform business strategy.

In the technology arena, Big Data will spark economic activity as diverse as joint ventures and collaborations, monetising data sets (by licensing, including by data aggregators), software and app development, supply of hardware for processing capacity, consultancy services (for contextualising data and analytics), sourcing and outsourcing, supply of connectivity (communications and data carriage), and the provision of new infrastructure (such as data storage and management). In the public sector, Big Data will be used to implement public policy by delivering public sector efficiencies.

Controlling use of big data

Data privacy law is one area of law that any business is going to have to take very seriously indeed in relation to the use of Big Data. While these laws vary from country to country, in Europe there are certain commonalities. Big Data typically involves the reuse of data originally collected for another purpose. Among other things, such reuse would need to be ‘not incompatible’ with the original purpose for which the date was collected for reuse to be permissible. The Article 29 Working Party (consisting of the data privacy regulators across the EU) has set out a four stage test to determine when this requirement is met.

The four stage test includes a requirement that safeguards are put in place to ensure fair processing and to prevent undue impact on the relevant individual. This could include ‘functional separation’ (that is, anonymising / pseudonymising or aggregating the results).

Functional separation may be difficult to achieve in relation to Big Data (where the sheer volume of data may make identification possible when large data sets are brought together). On the other hand, reuse is more likely to be compatible with the original purpose if it is impossible to take decisions regarding any particular individual based on the reused data.

In many cases, the only way to overcome data privacy concerns in relation to Big Data will be by way of adequate consent notifications. To obtain effective consent in relation to Big Data analytics is not straightforward.

The possession of large data sets can confer market power and exclude other market entrants. Competition regulators (and competitors) aggrieved by lack of access to such data may attempt to deploy competition law to force such access. Aggregations of data sets by merger and acquisition activity may also attract the attention of competition regulators.

Tax laws may also have an impact on Big Data projects. For example, the OECD’s Centre for Tax Policy and Administration is currently considering a proposal (called Base Erosion and Profit Shifting) to control the way digital businesses structure their profit flows internationally to limit tax exposure.

Likewise, discrimination laws in the UK and across the EU may need to be considered. They may be relevant where, say, the outcome of Big Data analytics is to offer goods and services selectively in a way that is discriminatory.

How do you protect rights in big data?

Across the EU, the intellectual property right that could provide the most protection is the database protection regime. It has limitations, as do copyright and patents in relation to Big Data. The law of confidentiality may provide some protection, depending on the particular information and its source. As the law in this area may provide only limited protection, it may sometimes be necessary to return to the basics: ensure that any disclosure is coupled with adequate contractual confidentiality provisions limiting further use and disclosure.

Conversely it will be essential to check that the compilation of a Big Data data set has not infringed a third party’s intellectual property or contractual rights.

What are the other potential liabilities?

Among the potential liabilities that need to be addressed is the question of data reliability. Data sourced from publicly available sources, from another business, or collated by the business itself, may contain errors. Such errors may be processing errors or may arise at source (for example, from mistakes in field coding and other inputs). These errors may flow through to the outputs of the data analytics processes (such as trend analysis and predictions), upon which a business’s strategic and investment decisions may depend.

Data sets may have their origin in several different sources. So-called ‘open data’ is typically licensed on terms similar to those applicable to open source software. Such terms usually give little or no comfort in relation to the reliability (and non-infringing nature) of the licensed material.

Public providers of such data sets (such as local authorities or central government) are seldom willing to accept liability for losses arising from reliance on the data (particularly when the data are provided free or for a nominal charge).

Businesses who on-supply such data, or who provide services dependent on that data, could potentially face claims in contract, in tort (for example, for negligent misstatement) or for some other form of liability (this could include consumer claims based on statutory rights). They will need to ensure that they circumscribe their own liability on a back-to-back basis with their own supplier where possible, or insure against the risks.

What technical and organisational measures should be considered?

Interception, appropriation and corruption of data remain an issue for businesses possessing Big Data data sets, just as with any other data. The data privacy laws in many countries require that the data controller implements appropriate technical and organisational measures to safeguard the security of personal data. Such laws typically require the data controller to flow down these requirements in contractual relations with their suppliers. These requirements will apply to Big Data data sets held by businesses that contain personal data.

Businesses will also need to take into account the new EU Data Protection Regulation, which will require that technical and organisational measures ought to be provided for by design and default. Purely technical solutions, implemented in the absence of a more comprehensive approach to information governance, may not be adequate.

Businesses whose business models depend on creating and exploiting Big Data will need to develop an approach to information governance that is capable of addressing the risks presented by unstructured Big Data data sets. Compliance with information retention requirements will need to be reconciled with the legal and commercial imperatives regularly to purge unwanted data as part of a business’s risk management strategy.

The need for expertise

A recent survey by Accenture (Big Success with Big Data Survey, April 2014) found that 41% of businesses reported a lack of appropriately skilled resources to implement a Big Data project. Such expertise will need to include a legal and regulatory compliance review. It is simply a case of taking steps to address these issues early on.

Originally posted via “Big data: managing the legal and regulatory risks”

Originally Posted at: Big data: managing the legal and regulatory risks by analyticsweekpick