5 Surprising Skills Tomorrow’s Chief Analytics Officers Need to Develop

image credit: yuko honda via flickr/cc
image credit: yuko honda via flickr/cc

In 2013 McKinsey predicted a massive talent gap in the analytics and big data space. By 2018, they estimate the shortage will reach some 140,000-190,000 individuals with quantitative skill sets, and 1.5 million analytics-savvy management professionals. It’s apparent that today’s analytics experts are wise to strive for management and leadership positions, because there will probably be no shortage of demand.

Why Analytical Leadership Matters

Is there a need for analytics leadership in the future? Absolutely. While only 4% of organizations are performing sophisticated, predictive analytics, these  companies have 30% higher stock market returns and 2.5 times healthier leadership pipelines. Deloitte reports that some 96% of professionals believe analytics will become increasingly important to their organizations in the months to come.

In this blog, you’ll learn some of the skills needed to take your career from analyst or junior data scientist to Chief Analytics Officer (CAO). The skills mentioned here are curated from existing job postings for C-Level analytics professionals, as well as the professional backgrounds of newly-appointed leadership at Hospital Corporation of America (HCA), the White House, and other highly-visible organizations. This isn’t meant to be a comprehensive roadmap, but rather a first step towards developing an understanding of what it may mean to be a Chief Data Officer in the years to come:

1. Well-Established Thought Leadership

Aspiring analytical leaders cannot afford to limit their achievements to in-house projects. Newly-appointed White House Chief Data Scientist DJ Patil has long been an influential voice in the data science community, and has an incredibly impressive list of influential articles and books under his belt.

Patil’s publications include “Building Data Science Teams” by O’Reilly Media, and the famous Harvard Business Review (HBR) article Data Scientist: the Sexiest Job of the 21st Century. A recent c-level posting via tech organization Rocket Fuel similarly requested  a candidate with “credentials as evidenced through education, publications, presentations, collaborations and service to the research community.”

Individuals aspiring to high-profile leadership roles should start establishing their position as thought leaders as soon as possible through both academic and non-academic publications, conference presentations, and active social media usage. While you may not immediately win an O’Reilly book deal or a coveted spot on the cover of HBR, each contribution to the domain of data science will enhance your personal brand.

2. Tool-Agnostic Data Wrangling

Today’s data scientists can’t afford to take a tools-specific approach to solving big data problems. Many leading data science teams are self-described “tool-agnostic” shops, who may bring together individuals with a broad variety of technical backgrounds. While it’s absolutely critical to have a solid groundwork in Python, Hadoop and related tools; you can’t afford to develop a basic skill set and stop there.

A recent job posting for a Chief Data Scientist at financial giant Capital One requested an individual who can act as a “wrangler.” The job posting specified this probably meant some nine languages and ecosystems, but the point is clear. You can’t afford to be limited by your technical capabilities. Develop the competency to operate seamlessly across many tools and platforms, and you’ll never be limited by what you can’t do.

3. Aggressive, Self-Directed Professional Development

Very few of today’s Chief Data Scientists have graduate coursework in Analytics. In fact, many of the first graduates of Data Science programs are just now entering the workforce. Today’s leaders and highly visible Scientists have backgrounds that may range from Machine Learning to Computer Science, with a few Mathematics, Statistics, and Engineering specialties thrown in. Regardless of how relevant or irrelevant your professional education is, it’s positively critical to have a commitment to self-directed professional development.

A recent posting from Chartbeat specified a candidate with a “strong knowledge of the media space and research topics” and “interest in growing as an engineer.” In fast-changing economic and analytics environments, it’s critical to stay on top of business context, tools, languages, and best practices. Demonstrating a commitment to growing yourself as a professional, regardless of your organization’s support or requirements, will aid you in being appointed to executive leadership roles. Chief Analytics Officers are inherently educators, with a responsibility of informing the enterprise and leading best-of-class data science practices. There’s simply no room for anyone who isn’t continually pursuing personal improvement.

4. Social Selling

The first – and in many cases, even second and third – wave of Chief Analytics Officers may not be welcomed wholeheartedly by every member of the executive leadership teams at their enterprises. While the potential benefit of predictive analytics is clear to most CIOs and CMOs, gaining support and funding for new initiatives from CFOs and others could be a surprising requirement for their role that wasn’t necessarily listed in the job posting. Depending on the sophistication-level of their colleagues and the enterprise, Chief Analytics Officers could find themselves responsible for not only education on the value of analytics, but also internal marketing initiatives and social selling to convince others of the value of the analytics organization.

Edmund Jackson, appointed Chief Data Scientist at HCA in February 2014, has taken a highly visible role in both his organization and the greater Nashville, Tennessee community as one of the first C-Level analytics professionals in his region. For Jackson, this means participating in a great deal of guest speaking opportunities and panels to educate local professionals on the value of data science. While tomorrow’s leaders can hope they won’t have to launch internal campaigns to convince their colleagues of the value of data science, gaining experience with social selling theory and community education can only help your candidacy.

5. Mentorship and Talent Management

While it may be difficult to predict the tools and types of big data problems tomorrow’s analytical leaders could face in their organizations, one thing can safely be predicted. It’s almost certain that a talent gap and shortage of adequately qualified employees will be a theme for many data executives.

Fifth Third Bank’s posting for a Chief Data Officer highlighted many of the soft skills required for successful leadership in a field filled with competition for limited talent. Their ideal candidate will be focused on employee development, recognition and reward, performance feedback, and inspiration. Personnel management and development is likely to be a theme, and today’s sharpest aspiring leaders will gain experience in these arenas, even if they aren’t currently in a management capacity. Mentoring analytical undergraduate students, serving in advisory capacities to career centers, and undergoing voluntary education in human resources topics can only benefit your future.

While these five skills are simply a few of the qualities required to be appointed Chief Analytics of Data Officer at a major organization, they are likely to be major themes as companies increasingly seek high-level analytical leadership. Regardless of where your career path takes you, developing these competencies should have major positive impact on your trajectory.

What are some of the quantitative and interpersonal skills you feel will be critical for aspiring analytical leaders to develop in the years to come?

Further Reading/Recommended Resources:


What’s the True Cost of a Data Breach?

The direct hard costs of a data breach are typically easy to calculate. An organization can assign a value to the human-hours and equipment costs it takes to recover a breached system. Those costs, however, are only a small part of the big picture.

Every organization that has experienced a significant data breach knows this firsthand. Besides direct financial costs, there are actually lost business, third-party liabilities, legal expenses, regulatory fines, and damaged goodwill. The true cost of a data breach encompasses much more than just direct losses.

Forensic Analysis. Hackers have learned to disguise their activity in ways that make it difficult to determine the extent of a breach. An organization will often need forensic specialists to determine how deeply hackers have infiltrated a network. Those specialists charge between $200 and $2,000 per hour.

Customer Notifications. A company that has suffered a data breach has a legal and ethical obligation to send written notices to affected parties. Those notices can cost between $5 and $50 apiece.

Credit Monitoring. Many companies will offer credit monitoring and identity theft protection services to affected customers after a data breach. Those services cost between $10 and $30 per customer.

Legal Defense Costs. Customers will not hesitate to sue a company if they perceive that the company failed to protect their data. Legal costs between $500,000 and $1 million are typical for significant data breaches affecting large companies. Companies often mitigate these high costs with data breach insurance because it covers liability and notification costs, among others.

Regulatory Fines and Legal Judgments. Target paid $18.5 million after a 2013 data breach that exposed the personal information of more than 41 million customers. Advocate Health Care paid a record $5.5 million fine after thieves stole an unsecured hard drive containing patient records. Fines and judgments of this magnitude can be ruinous for a small or medium-sized business.

Reputational Losses. Quantifying the value of lost goodwill and standing within an industry after a data breach is impossible. That lost goodwill can translate into losing more than 20 percent of regular customers, plus revenue depletions exceeding 30 percent. There’s also the cost of missing new business opportunities.

The total losses that a company experiences following a data breach depend on the number of records lost. The average per-record loss in 2017 was $225. Thus, a small or medium-sized business that loses as few as 1,000 customer records can expect to realize a loss of $225,000. This explains why more than 60 percent of SMBs close their doors permanently within six months of experiencing a data breach.

Knowing the risks, companies can focus on devoting their cyber security budget to prevention and response. The first line of defense is technological, including network firewalls and regular employee training. However, hackers can still slip through the cracks, as they’re always devising new strategies for stealing data. A smart backup plan includes a savvy response and insurance to cover the steep costs if a breach occurs. After all, the total costs are far greater than just business interruption and fines; your reputation is at stake, too.

Source: What’s the True Cost of a Data Breach?

How to Implement a Job Metadata Framework using Talend

Today, data integration projects are not just about moving data from point A to point B, there is much more to it. The ever-growing volumes of data, the speed at which the data changes presents a lot of challenges in managing the end-to-end data integration process. In order to address these challenges, it is paramount to track the data-journey from source to target in terms of start and end timestamps, job status, business area, subject area, and the individuals responsible for a specific job. In other words, metadata is becoming a major player in data workflows. In this blog, I want to review how to implement a job metadata framework using Talend. Let’s get started!

Metadata Framework: What You Need to Know

The centralized management and monitoring of this job metadata are crucial to data management teams. An efficient and flexible job metadata framework architecture requires a number of things. Namely, a metadata-driven model and job metadata.

A typical Talend Data Integration job performs the following tasks for extracting the data from source systems and loading them into target systems.

  1. Extracting data from source systems
  2. Transforming the data involves:
    • Cleansing source attributes
    • Applying business rules
    • Data Quality
    • Filtering, Sorting, and Deduplication
    • Data aggregations
  3. Loading the data into a target systems
  4. Monitoring, Logging, and Tracking the ETL process

Figure 1: ETL process

Over the past few years, the job metadata has evolved to become an essential component of any data integration project. What happens when you don’t have job metadata in your data integration jobs? It may lead to incorrect ETL statistics and logging as well as difficult to handle errors occurred during the data integration process. A successful Talend Data Integration project depends on how well the job metadata framework is integrated with the enterprise data management process.

Job Metadata Framework

The job metadata framework is a meta-data driven model that integrates well with Talend product suite. Talend provides a set of components for capturing the statistics and logging information during the flight of the data integration process.

Remember, the primary objective of this blog is to provide an efficient way to manage the ETL operations with a customizable framework. The framework includes the Job management data model and the Talend components that support the framework.

Figure 2: Job metadata model

Primarily, the Job Metadata Framework model includes:

  • Job Master
  • Job Run Details
  • Job Run Log
  • File Tracker
  • Database High Water Mark Tracker for extracting the incremental changes

This framework is designed to allow the production support to monitor the job cycle refresh and look for the issues relating to job failure and any discrepancies while processing the data loads. Let’s go through each of piece of the framework step-by-step.

Talend Jobs

Talend_Jobs is a Job Master Repository table that manages the inventory of all the jobs in the Data Integration domain.




Unique Identifier to identify a specific job


Job Name is the name of the job as per the naming convention (__



Business Unit / Department or Application Area


Job author Information


Additional Information related to the job


The last updated date

Talend Job Run Details

Talend_Job_Run_Details registers every run of a job and its sub jobs with statistics and run details such as job status, start time, end time, and total duration of main job and sub jobs.




Unique Identifier to identify a specific job run


Business Unit / Department or Application Area


Job author Information


Unique Identifier to identify a specific job


Job Name is the name of the job as per the naming convention (__



Unique Identifier to identify a specific sub job


Sub Job Name is the name of the sub job as per the naming convention (__



Main Job Start Timestamp


Main Job End Timestamp


Main Job total job execution duration


Sub Job Start Timestamp


Sub Job End Timestamp


Sub Job total job execution duration


Sub Job Status (Pending / Complete)


Main Job Status (Pending / Complete)


The last updated date

Talend Job Run Log

Talend_Job_Run_Log logs all the errors occurred during particular job execution. Talend_Job_Run_Log extracts the details from the Talend components specially designed for catching logs (tLogCatcher) and statistics (tStatCacher).

Figure 3: Error logging and Statistics

The tLogCatcher component in Talend operates as a log function triggered during the process by one of these components: Java exceptions, tDie or tWarn. In order catch exceptions coming from the job, tCatch function needs to be enabled on all the components.

The tStatCatcher component gathers the job processing metadata at the job level.




Unique Identifier to identify a specific job run


Unique Identifier to identify a specific job


The time when the message is caught


The Process ID of the Job


The Parent process ID


The root process ID


The system process ID


The name of the project


The name of the Job


The ID of the Job file stored in the repository


The version of the current Job


The Name of the current context


The priority sequence


The name of the component if any


Begin or End


The error message generated by the component when an error occurs. This is an After variable. This variable functions only if the Die on error checkbox is cleared.




Time for the execution of a Job or a component with the tStatCaher Statistics check box selected


Record counts


Job references


Log thresholds for managing error handling workflows

Talend High Water Marker Tracker

Talend_HWM_Tracker helps in processing delta and incremental changes of a particular table. The High Water Tracker is helpful when the “Change Data Capture” is not enabled and the changes are extracted based on specific conditions such as “last_updated_date_time” or ‘revision_date_time.” In some cases, the High Water Mark relates to the highest sequence number when the records are processed based on the sequence number.




Unique Identifier to identify a specific source table


Unique Identifier to identify a specific job


The name of the Job


The name of the source table


The source table environment


The source table database type


High Water Field (Datetime)


High Water Field (Number)


High Water SQL Statement

Talend File Tracker

Talend_File_Tracker registers all the transactions related to file processing. The transaction details include source file location, destination location, file name pattern, file name suffix, and the name of the last file processed.




Unique Identifier to identify a specific source file


Unique Identifier to identify a specific job


The name of the Job


The file server environment


The file name pattern


The source file location


The target file location


The file suffix


The name of the last file processed for a specific file


The override flag to re-process the file with the same name


The last updated date


This brings to the end of the implementing Job metadata framework using Talend. The following are key takeaways from this blog:

  1. The need and the importance of Job metadata framework
  2. The data model to support the framework
  3. The customizable data model to support different types of job patterns.

As always – let me know if you have any questions below and happy connecting!

The post How to Implement a Job Metadata Framework using Talend appeared first on Talend Real-Time Open Source Data Integration Software.

Source: How to Implement a Job Metadata Framework using Talend

What Is an ID Graph and How Can It Benefit Cross-Device Tracking?

Modern marketing is a complex multichannel and multi-device world. On average, 40% of adult users use more than one device to connect to the Internet. This offers more paths for customer engagement, more ways of making purchases, and more chances to gather data – a myriad of possibilities for both advertisers and customers. At the same time, it’s a double-edged sword because with these possibilities come challenges. Recognizing users across multiple channels, devices, and touchpoints, then stitching that information into a full dimensional profile is tricky indeed.

Without a well-rounded view of customers, marketing campaigns have to start from ground zero every time a user jumps from one device to another. Understanding customer identity becomes a key point in every marketing strategy. The question is, how can we achieve this goal with fragmented data scattered across multiple devices? The answer is by using an identity graph (ID graph).

If you’re not familiar with this notion yet, no worries – we’re here to shed some light and provide details on how it can benefit your organization. But before that, let’s take a quick look behind the scenes of graph analytics.

Visualize your customer data

Data scientists, analysts, and marketers all seek the best way to visualize and understand data. The graph method seems to be one of them. You can use graphs to filter meaningful signals from customers, users that otherwise would get lost in the noise of irrelevant data streams.

Graph databases together with linked data present today’s reality, showing us how people live and act. You can employ interest graphs, customer graphs, social graphs, product graphs, identity graphs, and many other types of graphs. The power of this technology comes from its capacity to manage links across billions of identities with millisecond latencies.

The core of your graph is a user profile, and the rest is the context you embed it into. You can design one that best fits your organization’s business needs and requirements.

What makes an ID graph?

First of all: what is an ID graph? It’s a tool you use to get a single customer view, not just user likes or interests. It connects all data on individual users across channels and devices. It’s a database. It houses all the identifiers associated with individual visitors. Throughout the whole customer journey you collect various personal identifiers such as:

  • Email or physical address
  • Account usernames
  • Device IDs
  • Phone numbers
  • IP address
  • Cookies
  • … and many more

These identifiers are gathered and then sewn together into a customer profile that reflects the ID graph. The graph also contains behavioral data, such as browsing history, past transactions, etc. The identity graph is fueled by data from different sources, whether it be CRM, marketing and ad platforms, or e-commerce software. In brief, it connects data across the offline and online landscapes.

Representation of an ID graph.
Representation of an ID graph. Source: Signal

With all these data points you can create a single customer view that lets you better understand and target customers. The primary focus here is the individual customer, not a device. This cross-device identity is a powerful tool in your marketing arsenal.

What’s more, with the ID graph you gain the possibility to link online data with offline data then reach your customers at every touchpoint. It helps you improve your personalization strategy and lift customer engagement.

Free Comparison of 4 Enterprise-Ready Customer Data Platforms

Get to know 25 key differences between Tealium, Ensighten, BlueVenn and Piwik PRO to find out which platform fits your business’s needs

Download FREE Comparison

Why control matters

Connecting the dots of customer identifiers is not the only issue. When you decide to handle a customer’s personal data – which is what some of these identifiers are – you are walking through a minefield of legal issues. That’s why your primary focus should be on data ownership and profile authenticity.

You can acquire ID graphs offered by onboarding providers, ad partners, or social networks. However, this solution means working with third-party data, and that comes with certain restrictions.

First, there’s the matter of legal compliance. The identifiers we’re talking about are usually personal data, which you need users’ consent to process. This might be tricky considering the complexity of data flows between different parties.

Second, when using third-party, a data marketer’s strategy is limited to the capabilities and insights offered by data providers. It might come at the cost of the relevance and precision of information. Consequently, marketers can have trouble measuring their impact and building better relationships with customers.

If you’d like to get a better picture of third-party data, we recommend reading:
Why First-Party Data is the Most Valuable to Marketers

However, there is a solution. You can and even should obtain your own ID graph. Having control and ownership over the graph – over data – your marketing team gets a complete picture of the customer journey. Because you can achieve cross-screen identification of your users, which means recognizing their preferences, wishes, desires, and past activities, you can finally build well-rounded user profiles.

Authenticated or not?

Visitor profiles come in two types: authenticated and non-authenticated. The former are more beneficial as they’re persistent, being built on authenticated IDs. They consist of login IDs, email addresses, customer IDs, and the like. Basically, when a user buys something with their credit card or logs on to a site, the identity graph connects the provided first-party data with other pieces of data used to identify this user across devices.

These kinds of profiles grow with every interaction as they accumulate more and more data. The data deepens and enriches such profiles, making them more advantageous for marketing strategies. That’s why the right ID graph must be linked to persistent user profiles.

In contrast, non-authenticated profiles are created from temporary identifiers like cookies, channel- or device-specific IDs that can’t fit into multi-device strategy. Their application is limited to a single campaign or single channel.

When cookies don’t cut it

For years cookies have been successfully used in tracking online desktop users. However, with the proliferation of mobile devices cookies don’t pass the test in this game. The thing is, on smartphones, tablets, or other similar devices, whenever a user closes the browser, their cookies are reset. They simply weren’t designed for a multi-device reality. They can’t be passed from one device to another, nor be shared between apps.

What this means for identification practice is that if someone accesses a website on a desktop and then on their tablet, they will be recognized as two different users. So if a visitor comes to your website first on their desktop, and later on their smartphone, that one person will be recorded as two different visitors.

But there are other issues. For instance, the cookies of someone browsing the Internet in incognito or private mode (desktop/laptop) won’t be saved. And there’s also the trouble with third-party cookies being constantly blocked either by users changing their cookie settings or through privacy features like Intelligent Tracking Prevention. Consequently, cookies are becoming obsolete.

Matchmaking to the rescue

As we’ve already said, when people switch from tablet to laptop and then reach for their smartphones, it means that consumer data is scattered instead of being held on one device. So if cookies can’t do the tracking in the cross-device world, what can you do about it?

You can take advantage of deterministic matching and probabilistic matching. These two methods are applied to match data and help you build cross-device identities of your customers. In this case, there’s no single right or wrong choice. In a nutshell, the former provides a more accurate match, while the latter offers better scalability. Everything comes down to your business goals and the types of data you have access to. You may even find instances where they compliment each other and you can apply both. But before you decide, let’s dive into the details of each.

Deterministic matching

Deterministic matching lets you recognize the same person across multiple devices by matching the same user profiles together. These profiles are built around various bits of user data with every user having their own profile on different devices.

To recognize users across multiple screens, deterministic matching searches through data sets and ties all profiles of a particular user together with a common identifier. The matching is usually achieved through login details.

For instance, if someone regularly types in their username and password into a site using their mobile phone, the brand recognizes this user profile and can link the user with this particular smartphone and also identify that same user if they then login using the same credentials on a different device (for example, their laptop). It means that the data is authenticated, which significantly increases its certainty.

One of the biggest assets of this method is its accuracy (about 80-90%). However, when it comes to scale, it’s not the best option, because not all websites or applications compel users to log in or submit other information.

Probabilistic matching

If you’re looking for better scalability, you can use probabilistic matching. It utilizes distinct data sets (as mentioned earlier) together with algorithms. Thanks to this formula you can identify the same user across the whole multi-screen digital landscape. Probabilistic matching uses data like:

  • IP address
  • Device type
  • Browser type
  • Operating system
  • WiFi network
  • Location

This method creates probable statistical connections between a user and any number of devices they use. Compared to the deterministic method it’s not as accurate, but it applies the same data sets to teach the algorithms to increase accuracy.

One of the key advantages in this approach is scalability, providing organizations with greater possibilities to scale personalization beyond their user base. So you’ll be able to recognize users across multiple screens without having to gather email addresses or other personal data.

However, the drawback of probabilistic matching is poor transparency of matching methods and algorithms.

Another difficulty which appears with both methods are the privacy issues present on the legal landscape. If you want to process personal data, including IP addresses, you need to make sure you do it in compliance with regulations like GDPR. In other words, you have to obtain a user’s consent to proceed.

Why you need ID graphs

Now that you know the ins and outs of ID graphs, it’s time to have a look at how this technology can be of benefit to your organization. Since ID graphs stitch customer identities into a single customer view, they provide you with an array of potential applications. Let’s review the most important ones.

Optimizing personalization

In a personalization strategy the key focus is on identifying the user. To be precise, knowing about their preferences, desires, and wishes, and then providing them with content that truly resonates. ID graphs help you achieve this as you get the full dimensional customer profile that blends the online and offline worlds of their interactions with your brand.

Understanding each user and customer at the individual level allows marketers to provide more engaging content. In addition, you can constantly enrich and update identity data to optimize the results of your marketing campaigns.

Take the example of giants like Amazon or Netflix that serve the most relevant product recommendations. They take advantage of ID graphs tracking browsing history across users’ devices to serve highly engaging recommendations.

Addressing customer needs in real-time

The digital world speeds up all user interactions. This trend forces marketers to recognize and meet customers’ requirements without delay. It means organizations need real-time technology to gather, match, and activate data.

Enhancing customer engagement

Thanks to the capacity to match offline data with digital identifiers and behavioral data within ID graphs, marketers are able to better engage with customers. With well-rounded customer profiles you can predict their future needs, plan strategies, improve up-selling and cross-selling campaigns, and find better opportunities to re-engage lost customers.

Improving cross-device attribution

By taking advantage of deterministic identity graphs, you can identify the role every channel performs in user conversion. Once you have a bigger picture of the customer journey you’ll be able to accurately attribute conversions.

You can tailor messages to each customer individually and reduce budget waste resulting from generic campaigns. Your aim is to ensure that customers engage with your content in an optimal way no matter how often they jump from one device to another.

Free Comparison of 4 Enterprise-Ready Customer Data Platforms

Get to know 25 key differences between Tealium, Ensighten, BlueVenn and Piwik PRO to find out which platform fits your business’s needs

Download FREE Comparison

Final thoughts

Companies operating within the digital ecosystem need to shift their focus from customers’ actions to customers’ identities. To achieve this goal, they should wield technology across multiple screens and channels to create direct relationships with their users. An optimal solution like ID graphs lets marketing teams keep up with customers who expect immediate and relevant brand experiences across all touchpoints in their multi-screen journey.

Although we’ve just scratched the surface of the complex topic of identity graphs, we hope you’ve found some key solutions to tough issues. If you have any questions, reach out to our team and we’ll be glad to answer them without delay!

Contact us

The post What Is an ID Graph and How Can It Benefit Cross-Device Tracking? appeared first on Piwik PRO.

Originally Posted at: What Is an ID Graph and How Can It Benefit Cross-Device Tracking? by analyticsweek

Oracle releases database integration tool to ease big data analytics

Oracle has bolstered its database portfolio with the Oracle Data Integrator (ODI), a piece of middleware designed to help analysts sift through big data across a variety of sources.

As the name suggests, the ODI effectively eases the process of linking data in different formats and from diverse databases and clusters, such as Hadoop, NoSQL and relational databases.

This enables Oracle customers to conduct analysis on large and varied datasets without dedicating time and resources to preparing big data in an integrated and secure way prior to analysis.

In effect, the ODI allows huge pools of data to be treated as just another data source to be used alongside more regularly accessed data warehouses and structured databases.

Jeff Pollock, vice president of product management at Oracle, claimed that the ODI allows customers to be experts in extract, transform and load tools without learning the code needed to carry out such actions.

“Oracle is the only vendor that can automatically generate Spark, Hive and Pig transformations from a single mapping which allows our customers to focus on business value and the overall architecture rather than multiple programming languages,” he said.

Avoiding the need for proprietary code means that the ODI can be run natively with a company’s existing Hadoop cluster, bypassing the need to invest in additional development.

Cluster databases like Hadoop and Spark have traditionally been geared towards programmers with knowledge of the coding needed to manipulate them. On the flipside, analysts would mostly use software tools to carry out enterprise-level data analytics.

The ODI gives the non-code savvy analyst the ability to harness Hadoop and other data sources without requiring the coding knowledge to do so.

It also means that a company’s developers need not retrain to handle multiple databases. Oracle is touting this as a way for companies to save money and time on big data analysis.

Oracle’s move to build its portfolio to focus on delivering direct data insights for its customers is indicative of the business-focused direction big data analytics are heading, underlined byVisa’s head of analytics saying big data projects must focus on making money

Originally posted via “Oracle releases database integration tool to ease big data analytics”

Source by analyticsweekpick

May 8, 2017 Health and Biotech analytics news roundup

HHS’s Price Affirms Commitment to Health Data Innovation: Secretary Price emphasized the need to decrease the burden on physicians.

Mayo Clinic uses analytics to optimize laboratory testing: The company Viewics makes software for the facility, which uses it to look for patterns and increase efficiency.

Nearly 10,000 Global Problem Solvers Yield Winning Formulas to Improve Detection of Lung Cancer in Third Annual Data Science Bowl: The winners of the competition, which challenged contestants to accurately diagnose lung scans, were announced.

Gene sequencing at Yale finding personalized root of disease; new center opens in West Haven: The Center for Genomic Analysis at Yale opened and is intended to help diagnose patients.

Source by pstein