Data center location – your DATA harbour

data center location choice

When it comes to storing and processing your business’ data using a 3rd party data center and/or hosting providers, there are many factors to consider and not all of them are verifiable by you with complete guarantee of being true, but there is one aspect that you can investigate pretty easily and get a decent idea about. Anyone who either is a home owner or is familiar with the real estate business knows that when purchasing property there is one factor that has a tremendous influence on the price and it is …you probably guessed it by now “location, location, location” that’s what it is all about.

Considering that almost no business can operate today without being dependent on data processing, storage and mobile data access and the information technology infrastructure has become a commodity we tend to employ 3rd party providers to host our data in their facilities. The importance of your data whereabouts has become a vital factor in making a choice of colocation or cloud provider. Have a look at the “Critical decision making points when choosing your IT provider” and let’s focus on the location factor in your decision making process on whom to entrust your data.

Considering the location
There certain key factors to consider when it comes to the location of the provider facilities in order to make the most suitable choice for your business:

  1. Natural disasters: What is the likelihood of environmental calamities like hurricanes, tornadoes, catastrophic hail, major flooding and earthquakes in the area historically and statistically? Natural disaster hazards are a serious threat due to human inability to always forecast them and complete lack of control of these events. Having a disaster recovery or a fail over site in a location that is prone to natural disasters is dangerous and defeats the purpose of this precaution. If your primary data center is located in an accident prone area you should make sure that your disaster recovery and backup sites are outside of high risk zones.
  2. Connectivity and latency: Location of a data center will have a tremendous impact on the selection and the amount of available carriers providing network services. Remote and hard to reach locations will suffer from smaller selection. Data centers in the vicinity of Internet Exchange and Network Access Points enjoy rich selection of carriers and thus often lower latency and higher bandwidth at lower costs. Ideally a multi-tenant data center should be considered a carrier neutral facility, which pretty much means that the company owning the data center is entirely independent of any network provider and thus not in direct competition with them. This usually is a great incentive for carriers to want to offer their services in such facility.
  3. Security: Are the facilities in an area that is easy to secure and designated for business? Considering crime and accidents is this low risk area? Is the facility unmarked and hard to spot during a random check? Data center structures should be hard to detect for passersby and should be in areas that are easy to secure and monitor. Areas with high traffic and crime would increase the risk of your data being vulnerable to theft by physical access.
  4. Political and economic stability: Political and economic stability are critical factors in choosing the location. Countries that show a track record of civil distress and economical struggle can prove to be high risk due to the possibility of facilities seizures due to political reasons or higher risk of bankruptcy. Having a threat of government being overthrown and your data seized or your colocation provider filing for Chapter 13 due to currency devaluation are a huge no-go in any way you look at it. Stability is a key to guarantee your business continuity.
  5. Local laws and culture: Both can have a negative impact on your business from losing ownership of data to not being to operate with the same standards that you might be used to and expect. Make sure that you are not breaking any laws in the country your data resides in, what is allowed in one country could be illegal elsewhere. For example infringement and copyright laws vary greatly between countries and some of your customers’ data could put you in a tight spot. Furthermore make sure that the language and cultural barriers will not turn out to be show stoppers when it comes to your daily operations and in need of troubleshooting.
  6. Future growth: You might think it’s hasty and pointless to look into expansion and growth possibilities that your provider can sustain, but nothing is further from the truth. Finding out that your provider cannot accommodate your growth when you do need to expand can turn into a very pricey endeavor leading to splitting your infrastructure across multiple locations or even forcing you to a provider switch. Always make sure the data center has room to grow and not only space wise, but most importantly power wise. Today’s data centers will sooner cope with the shortage of power supply then with a shortage of space. Find out what is their growth potential in space and power and how soon can they be realized, as you need to know how quickly can they adapt to buffer all their customers growth this being a multi-tenant facility.
  7. Redundancy: The location of the facility will also have an impact on its redundancy. To name a few things of importance in order to run mission critical applications would be continuous access to power from multiple sources in case of outages. Multiple fiber entry into the facility will also ensure increased network redundancy. Redundant cooling and environmental controls are also a must to guarantee operationality. These are the basics redundancy factors and depending on your specific requirements for availability you might need to look much deeper than just your facility infrastructure redundancy. Talk to your provider about this, they will offer you advice.
  8. Accessibility: This is a factor important for multiple reasons, from security concerns to daily operations and disaster situations. Accessibility for specialized maintenance staff and emergency services in case of crisis as well as transportation of equipment and supplies within a reasonable amount of time is of vital importance. Facilities that are outside or immediate reach of such amenities have an increased risk of failure and increased recovery time in case of incidents. There could also be a question of the need of your staff physically accessing the equipment, but with today’s data center providers offering remote hands and usually managed services you can avoid such complications and have provider’s local personnel take care of all physical repairs and trouble shooting.

Consequences
The choice of your provider and its location will have severe consequences for your business if things go wrong and they do go wrong. Data Center Knowledge has made an overview of last year’s top 10 Outages and as you can see some big players in the industry have been brought to their knees.
Such disruptions of service carry tremendous costs for the data centers and those expenses have been increasing yearly. Just to give you an idea what kind of money losses I am talking about here take a look at the findings of Emerson Network Power and Ponemon Institute study.

“The study of U.S.-based data centers quantifies the cost of an unplanned data center outage at slightly more than $7,900 per minute.”
– Emerson Network Power, Ponemon Institute

The companies included in this study listed various reasons for outages from faulty equipment to human error, but 30 percent named weather-related reasons as a root cause.
You can bet on it that these losses need to be compensated somewhere may it be by increase in prices or decrease in staff pay (which usually means hiring less qualified personnel) just to name a few corners that could be cut. While at the same time you might be aiming to accomplish more with steady or perhaps decreasing IT budgets, such actions on provider’s side will prove to be counterproductive when trying to achieve your goals.

These are the average costs of the data center outages for the companies running them and we haven’t even touched the damages to businesses that are suffering from such outages. Your services being unavailable could end up costing you money directly by not meeting your SLA’s or losing customers’ orders to your operation coming to an abrupt stop. Long term impact might be a reputation hit and thus decrease in trust of your businesses abilities. Additionally if you are actively running marketing campaigns and invest in trade shows and other public promotional activities, the event of service outage on your part can reduce their impact on increasing you brand popularity. There is a number of factors that influence just how much downtime actually can cost your business and they are strictly tied to how much your entire operation depends on Information Technology all together and how much of it is being affected by outages, this is however outside of the scope of this article.

Bottom line
Depending on the dynamics of your business the requirements from compliance, law, budgeting and even personal bias of the decision makers will render some of the above mentioned factors of more or less importance, but in the end this decision will have a short and long term impact on your business continuity. So if you are involved in the decision making process my advice is: do your homework, talk to the providers, if possible visit the facilities and take in to account the points above. If you are expecting growth or perhaps want to make a switch from legacy systems (if they are a part of technology supporting your current operations) and you are considering leveraging Infrastructure-as-a-Service models then talk to the providers on your list and see if they offer such services and can accommodate your needs. Following these steps you can decrease and even completely avoid data center location choice negatively impacting your business’ bottom line!

Source

Jul 27, 17: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Data interpretation  Source

[ AnalyticsWeek BYTES]

>> BigData At Work: Used Cases [Infographic] by v1shal

>> The Rise of Automated Analytics by anum

>> Secret Sauce to Sustain Business through Technology Driven Era by d3eksha

Wanna write? Click Here

[ NEWS BYTES]

>>
 HEC plans to promote data science activities – The Nation Under  Data Science

>>
 ‘We’re competing against crap’: The race is on to provide influencer marketing analytics – Digiday Under  Marketing Analytics

>>
 Traffic From Google Home App Considered Direct Traffic in Google Analytics – Search Engine Journal Under  Analytics

More NEWS ? Click Here

[ FEATURED COURSE]

CPSC 540 Machine Learning

image

Machine learning (ML) is one of the fastest growing areas of science. It is largely responsible for the rise of giant data companies such as Google, and it has been central to the development of lucrative products, such … more

[ FEATURED READ]

On Intelligence

image

Jeff Hawkins, the man who created the PalmPilot, Treo smart phone, and other handheld devices, has reshaped our relationship to computers. Now he stands ready to revolutionize both neuroscience and computing in one strok… more

[ TIPS & TRICKS OF THE WEEK]

Strong business case could save your project
Like anything in corporate culture, the project is oftentimes about the business, not the technology. With data analysis, the same type of thinking goes. It’s not always about the technicality but about the business implications. Data science project success criteria should include project management success criteria as well. This will ensure smooth adoption, easy buy-ins, room for wins and co-operating stakeholders. So, a good data scientist should also possess some qualities of a good project manager.

[ DATA SCIENCE Q&A]

Q:What are the drawbacks of linear model? Are you familiar with alternatives (Lasso, ridge regression)?
A: * Assumption of linearity of the errors
* Can’t be used for count outcomes, binary outcomes
* Can’t vary model flexibility: overfitting problems
* Alternatives: see question 4 about regularization

Source

[ VIDEO OF THE WEEK]

Making sense of unstructured data by turning strings into things

 Making sense of unstructured data by turning strings into things

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

Information is the oil of the 21st century, and analytics is the combustion engine. – Peter Sondergaard

[ PODCAST OF THE WEEK]

@AnalyticsWeek #FutureOfData with Robin Thottungal(@rathottungal), Chief Data Scientist at @EPA

 @AnalyticsWeek #FutureOfData with Robin Thottungal(@rathottungal), Chief Data Scientist at @EPA

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

Bad data or poor data quality costs US businesses $600 billion annually.

Sourced from: Analytics.CLUB #WEB Newsletter

Join View from the Top 500 (VFT-500) to Share with and Learn from your CEM Peers

VFT-500Knowledge is power in the world of customer service. How this knowledge is acquired, understood and consumed will have a substantial impact on your business success. To help customer experience professionals succeed, Omega Management Group and its strategic partner, Anthony & Alexander Group, are providing a solution with their View From the Top (VFT-500) Research Panel, an exclusive membership organization for service industry executives devoted to customer experience management (CEM) best practices. You can learn more about the VFT-500 and how to join this exclusive group here.

My Role

Omega has invited me to lead the VFT-500 Advisory Board as their Program Chairperson. I will manage the overall process, ensuring that the Advisory Board stays on course and continues to provide meaningful research data to VFT-500 members. I will be working closely with the VFT-500 Advisory Board to craft the quarterly surveys, identifying general areas of study as well as deep dives on specific topics related to customer experience management.

VFT-500 Program Specifics

The objective of the VFT Research Panel is to provide members with an opportunity to respond to questions and provide their opinions on service business-related topics that reveal valuable information, knowledge, trends and best practices in customer satisfaction and loyalty. Different types of analyses will be applied to these data (e.g., segmentation, correlational) to uncover deep insight. These insights will be summarized in confidential reports that are only available to VFT-500 members.

Research is conducted on key topics that will be of value to your organization. As a VFT-500 member, you’ll be able to let us know what key research topics are a priority to you. Your input will be critical in helping us design the research topic surveys, which will be sent online to all VFT-500 members on a quarterly basis. We will then analyze the survey results and publish an executive summary report of our findings.

VFT-500 members will focus on vital areas of service business operations, such as:

  • Customer Experience Management (CEM) best practices and strategies that drive revenue and profits
  • Service operations, strategies and CRM technologies
  • Metrics and analytics
  • Employee compensation linked to customer satisfaction and loyalty
  • Employee soft skills training in building a customer-centric culture

VFT-500 Benefits

As a VFT-500 member, you will participate in a quarterly online research survey that will result in a comprehensive report containing data from participating members—allowing your organization to measure its performance against other VFT-500 members. These reports will provide you with the key drivers for increasing customer satisfaction and loyalty that you can apply to benchmark your organization’s performance and establish a baseline for continuous improvement. Further, your organization will be licensed to use rankings in marketing materials and marketing campaigns. The research results will contain the aggregate compilations only.

Here’s your opportunity to share your own thoughts on current service business-related topics and also learn from your peers. To learn more about the VFT-500, click here.

 

Originally Posted at: Join View from the Top 500 (VFT-500) to Share with and Learn from your CEM Peers by bobehayes

Machine consciousness: Big data analytics and the Internet of Things

Companies aim to squeeze more efficiency from operations by cloud-connecting everything.

During my visit to General Electric’s Global Research Centers in San Ramon, California, and Niskayuna, New York, last month, I got what amounts to an end-to-end tour of what GE calls the “Industrial Internet.” The phrase refers to the technologies of cloud computing and the “Internet of Things” applied across a broad swath of GE’s businesses in an effort to squeeze better performance and efficiency from the operations of everything from computer-controlled manufacturing equipment to gas turbine engines and power plants. It’s an ambitious effort that GE is hoping to eventually sell to other companies as a cloud service—branded as Predix.

GE is not alone in trying to harness cloud computing and apply it to the rapidly growing universe of networked systems in energy, manufacturing, health care, and aviation. IBM has its own Internet of Things cloud strategy, and other companies—including SAP, Siemens, and startups such as MachineShop—are hoping to tie their business analytic capabilities to the vast volumes of data generated by machines and sensors. That data could fuel what some have called the next industrial revolution: manufacturing that isn’t just automated, but is driven by data in a way that fundamentally changes how factories work.

Eventually, analytical systems could make decisions about logistics, plant configuration, and other operational details with little human intervention other than creativity, intuition, and fine motor skills. And even in industries where there is no production plant, analytics could make people more efficient by getting them where they need to be at the right time with the right tools.

Creating that world requires some demanding management of data and the modeling of systems and processes in the physical world that create that data to give it meaningful shape. In other words, analytics of industrial operations requires both a schema for all the things and the computing power required to be able to both translate and understand real-time data streams while discovering trends in deep lakes of historical data.

Such data comes in many forms—and it can come from many places. In manufacturing and other traditional industrial environments, many systems have already been instrumented for computer control through SCADA (supervisory control and data acquisition) systems for a computer-based HMI (human-machine interface) console. In these cases, it’s relatively straightforward to tap into the telemetry from those systems.

But other systems that haven’t been connected to SCADA in the past can be an important part of analytic data, too. For example, GE’s Connected Experience Lab Technology Lead Arnie Lund demonstrated an analytic system for Ars built for Hydro Quebec. The system pulled in not just information from the power grid, but weather sensor data and even information on historic and projected tree growth in areas around power lines to help predict in advance where there might be outages caused by wind or fallen branches. Similar analytic systems included geospatial data on railroad lines and track surveys, aiming to prioritize track maintenance to prevent derailments and other incidents. In both cases, much of the data was pulled from devices that weren’t networked live, instead these were only occasionally or opportunistically connected to networks.

It’s when data from networked sensors is fused with other sources (like the tree survey) that it becomes valuable. On the lower end of the analytics space, there are tools like Wolfram’s Data Drop, which can take in data from anything that can send it via HTTP and add semantic structure to it for analysis. For larger systems, like GE’s Predix, it all goes into a “data lake”—a giant cloud storage pool of structured and unstructured data that can be programmatically accessed by analytics tools.

But all that data is useless without good analytics, and simply matching raw sensor numbers by timestamps isn’t enough to understand what’s going on historically or in realtime. That’s where data science comes in. “Data science is all about building models on any kind of data that represents physical phenomena,” Christina Brasco, a data scientist at GE Software in San Ramon, told Ars. “We’re building analytic engines that might be working on numbers a mathematical model produced, and not on raw data.”

Brasco is focused on aviation systems right now, specifically the tens of thousands of GE gas turbine engines that the company manages in air fleets around the world. “GE is trying to move toward predictive maintenance, so data science comes in as we try to build predictive models that replace things that are more hands on,” she said. “I’m producing one of many apps that try to do predictive scheduling of maintenance at the fleet level so we never have unscheduled downtime.”

That means creating models based on the thousands of terabytes of maintenance history data and remote diagnostic data recorded from every jet engine in GE’s managed fleet—data periodically dumped into the cloud during between-flight maintenance. Models can then calculate projected wear on turbine blades and other components over time, figuring out when it’s time to pull them. Other models being built by data scientists at GE for other lines of business could also make calculations based on live streams of data to determine whether systems are configured efficiently or are edging toward dangerous conditions.

Horel Kodesh, GE Software’s chief technology officer, told Ars that the goal of Predix is to essentially create “a cloud operating system” for industrial analytic apps based on sensor data. He envisions the system as a “digital Switzerland” where companies can control access to their industrial data while being able to leverage analytic software written by third-party developers.

The advantage of using a cloud platform to deliver the analytics as well as the data, Kodesh explained, was that “doing it in the cloud makes it much faster to get new systems out.” Cloud APIs such as those for Predix and other platforms take the issues of building a data store or provisioning for more computing power out of the picture. “Developers shouldn’t be worried about how to access data,” Kodesh explained. “The idea is that we want developers to be able to build analytics software capable of solving the problems they set out for themselves.”Kodesh is betting that many of GE’s customers will buy into the Predix platform because it will already have models for the equipment they use, and it will offer a platform for customers to build their own models for other systems. Potentially, Kodesh even foresees an “app store” with models from third-party developers and equipment manufacturers.

GE won’t be alone in that game, though. IBM, Amazon, and others’ existing cloud services are likely trying to draw developers of their own for “Internet of things” analytics and other cloud-based processing of industrial data. IBM is also looking to bring its Watson “cognitive cloud” service to help people understand data from IoT devices, according to IBM’s vice president of Watson products and services Alexa Swainson-Barreveld. “We’ve actually been having a fairly in depth conversation about IoT recently,” she told Ars. “It’s a place where we think we can help deal with the massive amounts of content and separate the signal from the noise. We see the industrial data space as a major opportunity area.”

For now, models that feed back into control systems from the cloud to modify their operations aren’t in play. But systems like the company’s wind turbines already use local analytics to change configuration based on sensor data (some turbines change the pitch of their blades when wind gusts are detected to prevent damage to the system, faster than a human could respond to the change). Considering GE’s hopes and the company’s progress so far, “closed loop” analytic systems for more of the industry are likely not that far off.

Sean Gallagher / Sean is Ars Technica’s IT Editor. A former Navy officer, systems administrator, and network systems integrator with 20 years of IT journalism experience, he lives and works in Baltimore, Maryland.

Originally posted via “Machine consciousness: Big data analytics and the Internet of Things”

Source: Machine consciousness: Big data analytics and the Internet of Things by anum

Jul 20, 17: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Accuracy check  Source

[ AnalyticsWeek BYTES]

>> The 5 Types of Healthcare Analytics Solutions by analyticsweekpick

>> Apr 27, 17: #AnalyticsClub #Newsletter (Events, Tips, News & more..) by admin

>> Big data: are we making a big mistake? by anum

Wanna write? Click Here

[ NEWS BYTES]

>>
 Using the Cloud to Protect the Cloud | CSO Online – CSO Online Under  Cloud Security

>>
 How Sauce Labs Analytics Accelerates Software Testing – eWeek Under  Analytics

>>
 The Rise of Network Functions Virtualization – Virtualization Review Under  Virtualization

More NEWS ? Click Here

[ FEATURED COURSE]

Process Mining: Data science in Action

image

Process mining is the missing link between model-based process analysis and data-oriented analysis techniques. Through concrete data sets and easy to use software the course provides data science knowledge that can be ap… more

[ FEATURED READ]

The Future of the Professions: How Technology Will Transform the Work of Human Experts

image

This book predicts the decline of today’s professions and describes the people and systems that will replace them. In an Internet society, according to Richard Susskind and Daniel Susskind, we will neither need nor want … more

[ TIPS & TRICKS OF THE WEEK]

Save yourself from zombie apocalypse from unscalable models
One living and breathing zombie in today’s analytical models is the pulsating absence of error bars. Not every model is scalable or holds ground with increasing data. Error bars that is tagged to almost every models should be duly calibrated. As business models rake in more data the error bars keep it sensible and in check. If error bars are not accounted for, we will make our models susceptible to failure leading us to halloween that we never wants to see.

[ DATA SCIENCE Q&A]

Q:How do you test whether a new credit risk scoring model works?
A: * Test on a holdout set
* Kolmogorov-Smirnov test

Kolmogorov-Smirnov test:
– Non-parametric test
– Compare a sample with a reference probability distribution or compare two samples
– Quantifies a distance between the empirical distribution function of the sample and the cumulative distribution function of the reference distribution
– Or between the empirical distribution functions of two samples
– Null hypothesis (two-samples test): samples are drawn from the same distribution
– Can be modified as a goodness of fit test
– In our case: cumulative percentages of good, cumulative percentages of bad

Source

[ VIDEO OF THE WEEK]

#FutureOfData Podcast: Peter Morgan, CEO, Deep Learning Partnership

 #FutureOfData Podcast: Peter Morgan, CEO, Deep Learning Partnership

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

The temptation to form premature theories upon insufficient data is the bane of our profession. – Sherlock Holmes

[ PODCAST OF THE WEEK]

#BigData @AnalyticsWeek #FutureOfData #Podcast with Juan Gorricho, @disney

 #BigData @AnalyticsWeek #FutureOfData #Podcast with Juan Gorricho, @disney

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

According to estimates, the volume of business data worldwide, across all companies, doubles every 1.2 years.

Sourced from: Analytics.CLUB #WEB Newsletter

BigData At Work: Used Cases [Infographic]

BigData At Work: Used Cases [Infographic]
BigData At Work: Used Cases [Infographic]
BigData is word every company endorse to. They have massive data and yet to fail them in the most optimal way that generate consistent value to them. Big Data holds big insights, so digging in on big-data provides great opportunity. Consider a case where you could predict your churn, customer satisfaction, product demand and business outcome. How you will react to it. What if your models could bring certainty and predictability into the decision-making process?
Participants of Smarter Analytics Leadership Summit were asked some questions around their use of big-data and analytics. Surveyors are finding highly intelligent and profitable answers in clever analytics software and services that can process all the different kinds of data and make it more useful in key business decisions and processes — with impressive results. Following infographics is created explaining some of the most common used cases.

BigData At Work: Used Cases [Infographic]
BigData At Work: Used Cases [Infographic]

Originally Posted at: BigData At Work: Used Cases [Infographic]

Jul 13, 17: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Data security  Source

[ AnalyticsWeek BYTES]

>> Insurance market still slow on adopting ‘big data’ by anum

>> Three final talent tips: how to hire data scientists by analyticsweekpick

>> Word For Social Media Strategy for Brick-Mortar Stores: “Community” by v1shal

Wanna write? Click Here

[ NEWS BYTES]

>>
 Flash and Virtualization Give Customers More Reasons to Appreciate the Cisco/HDS Partnership – Cisco Blogs (blog) Under  Virtualization

>>
 Why IoT really means the “Integration of Things” – ReadWrite Under  IOT

>>
 Innovation, The Cloud And Cisco’s Fight To Maintain Market Leadership – Forbes Under  Cloud

More NEWS ? Click Here

[ FEATURED COURSE]

Learning from data: Machine learning course

image

This is an introductory course in machine learning (ML) that covers the basic theory, algorithms, and applications. ML is a key technology in Big Data, and in many financial, medical, commercial, and scientific applicati… more

[ FEATURED READ]

On Intelligence

image

Jeff Hawkins, the man who created the PalmPilot, Treo smart phone, and other handheld devices, has reshaped our relationship to computers. Now he stands ready to revolutionize both neuroscience and computing in one strok… more

[ TIPS & TRICKS OF THE WEEK]

Keeping Biases Checked during the last mile of decision making
Today a data driven leader, a data scientist or a data driven expert is always put to test by helping his team solve a problem using his skills and expertise. Believe it or not but a part of that decision tree is derived from the intuition that adds a bias in our judgement that makes the suggestions tainted. Most skilled professionals do understand and handle the biases well, but in few cases, we give into tiny traps and could find ourselves trapped in those biases which impairs the judgement. So, it is important that we keep the intuition bias in check when working on a data problem.

[ DATA SCIENCE Q&A]

Q:You have data on the durations of calls to a call center. Generate a plan for how you would code and analyze these data. Explain a plausible scenario for what the distribution of these durations might look like. How could you test, even graphically, whether your expectations are borne out?
A: 1. Exploratory data analysis
* Histogram of durations
* histogram of durations per service type, per day of week, per hours of day (durations can be systematically longer from 10am to 1pm for instance), per employee…
2. Distribution: lognormal?

3. Test graphically with QQ plot: sample quantiles of log(durations)log?(durations) Vs normal quantiles

Source

[ VIDEO OF THE WEEK]

Surviving Internet of Things

 Surviving Internet of Things

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

If you can’t explain it simply, you don’t understand it well enough. – Albert Einstein

[ PODCAST OF THE WEEK]

#BigData @AnalyticsWeek #FutureOfData #Podcast with Juan Gorricho, @disney

 #BigData @AnalyticsWeek #FutureOfData #Podcast with Juan Gorricho, @disney

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

For a typical Fortune 1000 company, just a 10% increase in data accessibility will result in more than $65 million additional net income.

Sourced from: Analytics.CLUB #WEB Newsletter

10 Things Your Customers WISH You Knew About Them [Infographic]

10 Things Your Customers WISH You Knew About Them
10 Things Your Customers WISH You Knew About Them

Understanding your customers is an integral part of any successful business. It is instrumental in building a loyal customer base. Here is an infographic listing 10 research studies that reveal the things your customers WISH you knew.

10 Things Your Customers WISH You Knew About Them

Source: 10 Things Your Customers WISH You Knew About Them [Infographic]

Smart Data Modeling: From Integration to Analytics

There are numerous reasons why smart data modeling, which is predicated on semantic technologies and open standards, is one of the most advantageous means of effecting everything from integration to analytics in data management.

  • Business-Friendly—Smart data models are innately understood by business users. These models describe entities and their relationships to one another in terms that business users are familiar with, which serves to empower this class of users in myriad data-driven applications.
  • Queryable—Semantic data models are able to be queried, which provides a virtually unparalleled means of determining provenance, source integration, and other facets of regulatory compliance.
  • Agile—Ontological models readily evolve to include additional business requirements, data sources, and even other models. Thus, modelers are not responsible for defining all requirements upfront, and can easily modify them at the pace of business demands.

According to Cambridge Semantics Vice President of Financial Services Marty Loughlin, the most frequently used boons of this approach to data modeling is an operational propensity in which, “There are two examples of the power of semantic modeling of data. One is being able to bring the data together to ask questions that you haven’t anticipated. The other is using those models to describe the data in your environment to give you better visibility into things like data provenance.”

Implicit in those advantages is an operational efficacy that pervades most aspects of the data sphere.

Smart Data Modeling
The operational applicability of smart data modeling hinges on its flexibility. Semantic models, also known as ontologies, exist independently of infrastructure, vendor requirements, data structure, or any other characteristic related to IT systems. As such, they can incorporate attributes from all systems or data types in a way that is aligned with business processes or specific use cases. “This is a model that makes sense to a business person,” Loughlin revealed. “It uses terms that they’re familiar with in their daily jobs, and is also how data is represented in the systems.” Even better, semantic models do not necessitate all modeling requirements prior to implementation. “You don’t have to build the final model on day one,” Loughlin mentioned. “You can build a model that’s useful for the application that you’re trying to address, and evolve that model over time.” That evolution can include other facets of conceptual models, industry-specific models (such as FIBO), and aspects of new tools and infrastructure. The combination of smart data modeling’s business-first approach, adaptable nature and relatively rapid implementation speed is greatly contrasted with typically rigid relational approaches.

Smart Data Integration and Governance
Perhaps the most cogent application of smart data modeling is its deployment as a smart layer between any variety of IT systems. By utilizing platforms reliant upon semantic models as a staging layer for existing infrastructure, organizations can simplify data integration while adding value to their existing systems. The key to integration frequently depends on mapping. When mapping from source to target systems, organizations have traditionally relied upon experts from each of those systems to create what Loughlin called “ a source to target document” for transformation, which is given to developers to facilitate ETL. “That process can take many weeks, if not months, to complete,” Loughlin remarked. “The moment you’re done, if you need to make a change to it, it can take several more weeks to cycle through that iteration.”

However, since smart data modeling involves common models for all systems, integration merely includes mapping source and target systems to that common model. “Using common conceptual models to drive existing ETL tools, we can provide high quality, governed data integration,” Loughlin said. The ability of integration platforms based on semantic modeling to automatically generate the code for ETL jobs not only reduces time to action, but also increases data quality while reducing cost. Additional benefits include the relative ease in which systems and infrastructure are added to this process, the tendency for deploying smart models as a catalog for data mart extraction, and the means to avoid vendor lock-in from any particular ETL vendor.

Smart Data Analytics—System of Record
The components of data quality and governance that are facilitated by deploying semantic models as the basis for integration efforts also extend to others that are associated with analytics. Since the underlying smart data models are able to be queried, organizations can readily determine provenance and audit data through all aspects of integration—from source systems to their impact on analytics results. “Because you’ve now modeled your data and captured the mapping in a semantic approach, that model is queryable,” Loughlin said. “We can go in and ask the model where data came from, what it means, and what conservation happened to that data.” Smart data modeling provides a system of record that is superior to many others because of the nature of analytics involved. As Loughlin explained, “You’re bringing the data together from various sources, combining it together in a database using the domain model the way you described your data, and then doing analytics on that combined data set.”

Smart Data Graphs
By leveraging these models on a semantic graph, users are able to reap a host of analytics benefits that they otherwise couldn’t because such graphs are focused on the relationships between nodes. “You can take two entities in your domain and say, ‘find me all the relationships between these two entities’,” Loughlin commented about solutions that leverage smart data modeling in RDF graph environments. Consequently, users are able to determine relationships that they did not know existed. Furthermore, they can ask more questions based on those relationships than they otherwise would be able to ask. The result is richer analytics results based on the overarching context between relationships that is largely attributed to the underlying smart data models. The nature and number of questions asked, as well as the sources incorporated for such queries, is illimitable. “Semantic graph databases, from day one have been concerned with ontologies…descriptions of schema so you can link data together,” explained Franz CEO Jans Aasman. “You have descriptions of the object and also metadata about every property and attribute on the object.”

Modeling Models
When one considers the different facets of modeling that smart data modeling includes—business models, logical models, conceptual models, and many others—it becomes apparent that the true utility in this approach is an intrinsic modeling flexibility upon which other approaches simply can’t improve. “What we’re actually doing is using a model to capture models,” Cambridge Semantics Chief Technology Officer Sean Martin observed. “Anyone who has some form of a model, it’s probably pretty easy for us to capture it and incorporate it into ours.” The standards-based approach of smart data modeling provides the sort of uniform consistency required at an enterprise level, which functions as means to make data integration, data governance, data quality metrics, and analytics inherently smarter.

Source by jelaniharper

Jul 06, 17: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Human resource  Source

[ AnalyticsWeek BYTES]

>> The First and Only – Big Data Search Engine Powered by Apache® Spark™ by analyticsweekpick

>> New Mob4Hire Report “The Impact of Mobile User Experience on Network Operator Customer Loyalty” Ranks Performance Of Global Wireless Industry by bobehayes

>> How big data can improve manufacturing by anum

Wanna write? Click Here

[ FEATURED COURSE]

Pattern Discovery in Data Mining

image

Learn the general concepts of data mining along with basic methodologies and applications. Then dive into one subfield in data mining: pattern discovery. Learn in-depth concepts, methods, and applications of pattern disc… more

[ FEATURED READ]

Superintelligence: Paths, Dangers, Strategies

image

The human brain has some capabilities that the brains of other animals lack. It is to these distinctive capabilities that our species owes its dominant position. Other animals have stronger muscles or sharper claws, but … more

[ TIPS & TRICKS OF THE WEEK]

Save yourself from zombie apocalypse from unscalable models
One living and breathing zombie in today’s analytical models is the pulsating absence of error bars. Not every model is scalable or holds ground with increasing data. Error bars that is tagged to almost every models should be duly calibrated. As business models rake in more data the error bars keep it sensible and in check. If error bars are not accounted for, we will make our models susceptible to failure leading us to halloween that we never wants to see.

[ DATA SCIENCE Q&A]

Q:Do you know / used data reduction techniques other than PCA? What do you think of step-wise regression? What kind of step-wise techniques are you familiar with?
A: data reduction techniques other than PCA?:
Partial least squares: like PCR (principal component regression) but chooses the principal components in a supervised way. Gives higher weights to variables that are most strongly related to the response

step-wise regression?
– the choice of predictive variables are carried out using a systematic procedure
– Usually, it takes the form of a sequence of F-tests, t-tests, adjusted R-squared, AIC, BIC
– at any given step, the model is fit using unconstrained least squares
– can get stuck in local optima
– Better: Lasso

step-wise techniques:
– Forward-selection: begin with no variables, adding them when they improve a chosen model comparison criterion
– Backward-selection: begin with all the variables, removing them when it improves a chosen model comparison criterion

Better than reduced data:
Example 1: If all the components have a high variance: which components to discard with a guarantee that there will be no significant loss of the information?
Example 2 (classification):
– One has 2 classes; the within class variance is very high as compared to between class variance
– PCA might discard the very information that separates the two classes

Better than a sample:
– When number of variables is high relative to the number of observations

Source

[ VIDEO OF THE WEEK]

#BigData @AnalyticsWeek #FutureOfData #Podcast with  John Young, @Epsilonmktg

 #BigData @AnalyticsWeek #FutureOfData #Podcast with John Young, @Epsilonmktg

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

Getting information off the Internet is like taking a drink from a firehose. – Mitchell Kapor

[ PODCAST OF THE WEEK]

#BigData @AnalyticsWeek #FutureOfData #Podcast with  John Young, @Epsilonmktg

 #BigData @AnalyticsWeek #FutureOfData #Podcast with John Young, @Epsilonmktg

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

According to Twitter’s own research in early 2012, it sees roughly 175 million tweets every day, and has more than 465 million accounts.

Sourced from: Analytics.CLUB #WEB Newsletter