Doctors store 1,600 digital hearts for big data study

Doctors in London have stored 1,600 beating human hearts in digital form on a computer.

The aim is to develop new treatments by comparing the detailed information on the hearts and the patients’ genes.

It is the latest project to make use of advances in storing large amounts of information.

The study is among a wave of new “big data” ventures that are transforming the way in which research is carried out.

Scientists at the Medical Research Council’s Clinical Sciences Centre at Hammersmith Hospital are scanning detailed 3D videos of the hearts of 1,600 patients and collecting genetic information from each volunteer.

Dr Declan O’Regan, who is involved in the heart study, said that this new approach had the potential to reveal much more than normal clinical trials in which relatively small amounts of health information is collected from patients over the course of several years.

He added: “There is a really complicated relationship between people’s genes and heart disease, and we are still trying to unravel what that is. But by getting really clear 3D pictures of the heart we hope to be able to get a much better understanding of the cause and effect of heart disease and give the right patients the right treatment at the right time.”

Subtle signs

The idea of storing so much information on so many hearts is to compare them and to see what the common factors are that lead to illnesses. Dr O’Regan believes that this kind of analysis will increasingly become the norm in medicine.

“There are often subtle signs of early disease that are really difficult to pick up even if you know what to look for. A computer is very sensitive to picking up subtle signs of a disease before they become a problem.”

The Square Kilometre Array will collect 150 times more data each year than currently flows through the internet

The Big Data idea is sweeping across a range of scientific research fields, and, as you would expect, there are some very large numbers involved.

Computers at the European Bioinfomatics Institute (EBI) in Cambridge store the entire genetic code of tens of thousands of different plants and animals. The information occupies the equivalent of more than 5,000 laptops.

And to find out how the human mind works, researchers at the Institute for Neuroimaging and Informatics at the University of Southern California are storing 30,000 detailed 3D brain scans, requiring the space equivalent to 10,000 laptops.

The Square Kilometre Array, a radio telescope being built in Africa and Australia, will collect enough data in one year to fill 300 million million laptops. That is 150 times the current total annual global internet traffic.

Data revolution

Researchers at the American Association for the Advancement of Science (AAAS) meeting in San Jose are discussing just how they are going to store and sift through this mass of data.

According to Prof Ewan Birney at the EBI, big data is already beginning to transform the way research is done.

“Suddenly, we don’t have to be afraid of measuring lots and lots of things – about humans, about oceans, about the Universe – because we know we can be confident that we can collect that data and extract some knowledge from it,” he told BBC News.

Brain Scan
Detailed scans of the wiring of the human brain from just one project in California need the computer storage equivalent to 10,000 laptops.

The falling cost of storage has helped those developing systems to manage big data research, but when faced with an imminent tsunami of information, they will have to run to stand still and find ever more intelligent ways to compress and store the information.

The other main issue is how to organise and label the data.

Just as librarians have found ways to classify books by subject or by author, a whole new science is emerging in how to classify research data logically so that teams can find the things they want to find. But when one considers the trillions of pieces of information involved and the complexity of the scientific fields involved, the task is much harder than organising a library.

The UK’s Biotechnology and Biological Sciences Research Council (BBSRC) has announced a £7.45m investment at the AAAS meeting in the design of big data infrastructures.

The emergence of big data can be thought of as being similar to the development of the microscope: a powerful new tool for scientists to study intricate processes in nature that they have never been able to see before.

Approaching omniscience

Those involved in developing big data infrastructure believe that the investment will lead to a radical shift in the way research across a variety of disciplines is carried out. They sense that a step toward omniscience is within reach; a way of seeing the Universe as it really is rather than the distorted view even scientists have through the filter of our limited brains and senses.

According to Paul Flicek of the EBI, big data could potentially lift a veil that has been shrouding important avenues of research.

“One of the things about science is that you don’t always discover the important things; you discover what you can discover. But by using larger amounts of data, we can discover new things, and so what will be found? That is an open question,” he told BBC News.

The challenge is for scientists to find new ways to manage this data and new ways to analyse it. Just collecting data does not solve any problems by itself.

But properly organised and managed it could enable scientists to identify rare subtle events that only occur every so often in nature but have a big effect on our lives. The Higgs boson was discovered in this way.

“We are not going to slow down generating new data,” says Prof Flicek. “The fact that we have demonstrated that we can generate a lot of this data; we can sequence these genomes. We are never going to stop doing that and so it opens up so many more exciting things.

“We can learn new things and we can see things we have never seen before.”

Originally posted via “Doctors store 1,600 digital hearts for big data study”

Originally Posted at: Doctors store 1,600 digital hearts for big data study by analyticsweekpick

To Trust A Bot or Not? Ethical Issues in AI

Given we see fake profiles and potentially chatbots that misfire and miscommunicate we would like your thoughts on whether there should be some sort of Government registry for robots so that consumers know they are legitimate or not. If we had a registry for trolls and or chatbots would that ensure that people could feel more comfortable that they are dealing with a legitimate business or would know if the profile or troll or bot is fake? Is it time for a good housekeeping seal of approval for AI?

These are all provocative questions and questions that are so new I am not sure there is one answer as they are so undefined. What do you think? Who should create such standards? Perhaps we should start by categorizing the types of AI?

Source: To Trust A Bot or Not? Ethical Issues in AI by tony

5 Surprising Skills Tomorrow’s Chief Analytics Officers Need to Develop

image credit: yuko honda via flickr/cc
image credit: yuko honda via flickr/cc

In 2013 McKinsey predicted a massive talent gap in the analytics and big data space. By 2018, they estimate the shortage will reach some 140,000-190,000 individuals with quantitative skill sets, and 1.5 million analytics-savvy management professionals. It’s apparent that today’s analytics experts are wise to strive for management and leadership positions, because there will probably be no shortage of demand.

Why Analytical Leadership Matters

Is there a need for analytics leadership in the future? Absolutely. While only 4% of organizations are performing sophisticated, predictive analytics, these  companies have 30% higher stock market returns and 2.5 times healthier leadership pipelines. Deloitte reports that some 96% of professionals believe analytics will become increasingly important to their organizations in the months to come.

In this blog, you’ll learn some of the skills needed to take your career from analyst or junior data scientist to Chief Analytics Officer (CAO). The skills mentioned here are curated from existing job postings for C-Level analytics professionals, as well as the professional backgrounds of newly-appointed leadership at Hospital Corporation of America (HCA), the White House, and other highly-visible organizations. This isn’t meant to be a comprehensive roadmap, but rather a first step towards developing an understanding of what it may mean to be a Chief Data Officer in the years to come:

1. Well-Established Thought Leadership

Aspiring analytical leaders cannot afford to limit their achievements to in-house projects. Newly-appointed White House Chief Data Scientist DJ Patil has long been an influential voice in the data science community, and has an incredibly impressive list of influential articles and books under his belt.

Patil’s publications include “Building Data Science Teams” by O’Reilly Media, and the famous Harvard Business Review (HBR) article Data Scientist: the Sexiest Job of the 21st Century. A recent c-level posting via tech organization Rocket Fuel similarly requested  a candidate with “credentials as evidenced through education, publications, presentations, collaborations and service to the research community.”

Individuals aspiring to high-profile leadership roles should start establishing their position as thought leaders as soon as possible through both academic and non-academic publications, conference presentations, and active social media usage. While you may not immediately win an O’Reilly book deal or a coveted spot on the cover of HBR, each contribution to the domain of data science will enhance your personal brand.

2. Tool-Agnostic Data Wrangling

Today’s data scientists can’t afford to take a tools-specific approach to solving big data problems. Many leading data science teams are self-described “tool-agnostic” shops, who may bring together individuals with a broad variety of technical backgrounds. While it’s absolutely critical to have a solid groundwork in Python, Hadoop and related tools; you can’t afford to develop a basic skill set and stop there.

A recent job posting for a Chief Data Scientist at financial giant Capital One requested an individual who can act as a “wrangler.” The job posting specified this probably meant some nine languages and ecosystems, but the point is clear. You can’t afford to be limited by your technical capabilities. Develop the competency to operate seamlessly across many tools and platforms, and you’ll never be limited by what you can’t do.

3. Aggressive, Self-Directed Professional Development

Very few of today’s Chief Data Scientists have graduate coursework in Analytics. In fact, many of the first graduates of Data Science programs are just now entering the workforce. Today’s leaders and highly visible Scientists have backgrounds that may range from Machine Learning to Computer Science, with a few Mathematics, Statistics, and Engineering specialties thrown in. Regardless of how relevant or irrelevant your professional education is, it’s positively critical to have a commitment to self-directed professional development.

A recent posting from Chartbeat specified a candidate with a “strong knowledge of the media space and research topics” and “interest in growing as an engineer.” In fast-changing economic and analytics environments, it’s critical to stay on top of business context, tools, languages, and best practices. Demonstrating a commitment to growing yourself as a professional, regardless of your organization’s support or requirements, will aid you in being appointed to executive leadership roles. Chief Analytics Officers are inherently educators, with a responsibility of informing the enterprise and leading best-of-class data science practices. There’s simply no room for anyone who isn’t continually pursuing personal improvement.

4. Social Selling

The first – and in many cases, even second and third – wave of Chief Analytics Officers may not be welcomed wholeheartedly by every member of the executive leadership teams at their enterprises. While the potential benefit of predictive analytics is clear to most CIOs and CMOs, gaining support and funding for new initiatives from CFOs and others could be a surprising requirement for their role that wasn’t necessarily listed in the job posting. Depending on the sophistication-level of their colleagues and the enterprise, Chief Analytics Officers could find themselves responsible for not only education on the value of analytics, but also internal marketing initiatives and social selling to convince others of the value of the analytics organization.

Edmund Jackson, appointed Chief Data Scientist at HCA in February 2014, has taken a highly visible role in both his organization and the greater Nashville, Tennessee community as one of the first C-Level analytics professionals in his region. For Jackson, this means participating in a great deal of guest speaking opportunities and panels to educate local professionals on the value of data science. While tomorrow’s leaders can hope they won’t have to launch internal campaigns to convince their colleagues of the value of data science, gaining experience with social selling theory and community education can only help your candidacy.

5. Mentorship and Talent Management

While it may be difficult to predict the tools and types of big data problems tomorrow’s analytical leaders could face in their organizations, one thing can safely be predicted. It’s almost certain that a talent gap and shortage of adequately qualified employees will be a theme for many data executives.

Fifth Third Bank’s posting for a Chief Data Officer highlighted many of the soft skills required for successful leadership in a field filled with competition for limited talent. Their ideal candidate will be focused on employee development, recognition and reward, performance feedback, and inspiration. Personnel management and development is likely to be a theme, and today’s sharpest aspiring leaders will gain experience in these arenas, even if they aren’t currently in a management capacity. Mentoring analytical undergraduate students, serving in advisory capacities to career centers, and undergoing voluntary education in human resources topics can only benefit your future.

While these five skills are simply a few of the qualities required to be appointed Chief Analytics of Data Officer at a major organization, they are likely to be major themes as companies increasingly seek high-level analytical leadership. Regardless of where your career path takes you, developing these competencies should have major positive impact on your trajectory.

What are some of the quantitative and interpersonal skills you feel will be critical for aspiring analytical leaders to develop in the years to come?

Further Reading/Recommended Resources:


What’s the True Cost of a Data Breach?

The direct hard costs of a data breach are typically easy to calculate. An organization can assign a value to the human-hours and equipment costs it takes to recover a breached system. Those costs, however, are only a small part of the big picture.

Every organization that has experienced a significant data breach knows this firsthand. Besides direct financial costs, there are actually lost business, third-party liabilities, legal expenses, regulatory fines, and damaged goodwill. The true cost of a data breach encompasses much more than just direct losses.

Forensic Analysis. Hackers have learned to disguise their activity in ways that make it difficult to determine the extent of a breach. An organization will often need forensic specialists to determine how deeply hackers have infiltrated a network. Those specialists charge between $200 and $2,000 per hour.

Customer Notifications. A company that has suffered a data breach has a legal and ethical obligation to send written notices to affected parties. Those notices can cost between $5 and $50 apiece.

Credit Monitoring. Many companies will offer credit monitoring and identity theft protection services to affected customers after a data breach. Those services cost between $10 and $30 per customer.

Legal Defense Costs. Customers will not hesitate to sue a company if they perceive that the company failed to protect their data. Legal costs between $500,000 and $1 million are typical for significant data breaches affecting large companies. Companies often mitigate these high costs with data breach insurance because it covers liability and notification costs, among others.

Regulatory Fines and Legal Judgments. Target paid $18.5 million after a 2013 data breach that exposed the personal information of more than 41 million customers. Advocate Health Care paid a record $5.5 million fine after thieves stole an unsecured hard drive containing patient records. Fines and judgments of this magnitude can be ruinous for a small or medium-sized business.

Reputational Losses. Quantifying the value of lost goodwill and standing within an industry after a data breach is impossible. That lost goodwill can translate into losing more than 20 percent of regular customers, plus revenue depletions exceeding 30 percent. There’s also the cost of missing new business opportunities.

The total losses that a company experiences following a data breach depend on the number of records lost. The average per-record loss in 2017 was $225. Thus, a small or medium-sized business that loses as few as 1,000 customer records can expect to realize a loss of $225,000. This explains why more than 60 percent of SMBs close their doors permanently within six months of experiencing a data breach.

Knowing the risks, companies can focus on devoting their cyber security budget to prevention and response. The first line of defense is technological, including network firewalls and regular employee training. However, hackers can still slip through the cracks, as they’re always devising new strategies for stealing data. A smart backup plan includes a savvy response and insurance to cover the steep costs if a breach occurs. After all, the total costs are far greater than just business interruption and fines; your reputation is at stake, too.

Source: What’s the True Cost of a Data Breach?

How to Implement a Job Metadata Framework using Talend

Today, data integration projects are not just about moving data from point A to point B, there is much more to it. The ever-growing volumes of data, the speed at which the data changes presents a lot of challenges in managing the end-to-end data integration process. In order to address these challenges, it is paramount to track the data-journey from source to target in terms of start and end timestamps, job status, business area, subject area, and the individuals responsible for a specific job. In other words, metadata is becoming a major player in data workflows. In this blog, I want to review how to implement a job metadata framework using Talend. Let’s get started!

Metadata Framework: What You Need to Know

The centralized management and monitoring of this job metadata are crucial to data management teams. An efficient and flexible job metadata framework architecture requires a number of things. Namely, a metadata-driven model and job metadata.

A typical Talend Data Integration job performs the following tasks for extracting the data from source systems and loading them into target systems.

  1. Extracting data from source systems
  2. Transforming the data involves:
    • Cleansing source attributes
    • Applying business rules
    • Data Quality
    • Filtering, Sorting, and Deduplication
    • Data aggregations
  3. Loading the data into a target systems
  4. Monitoring, Logging, and Tracking the ETL process

Figure 1: ETL process

Over the past few years, the job metadata has evolved to become an essential component of any data integration project. What happens when you don’t have job metadata in your data integration jobs? It may lead to incorrect ETL statistics and logging as well as difficult to handle errors occurred during the data integration process. A successful Talend Data Integration project depends on how well the job metadata framework is integrated with the enterprise data management process.

Job Metadata Framework

The job metadata framework is a meta-data driven model that integrates well with Talend product suite. Talend provides a set of components for capturing the statistics and logging information during the flight of the data integration process.

Remember, the primary objective of this blog is to provide an efficient way to manage the ETL operations with a customizable framework. The framework includes the Job management data model and the Talend components that support the framework.

Figure 2: Job metadata model

Primarily, the Job Metadata Framework model includes:

  • Job Master
  • Job Run Details
  • Job Run Log
  • File Tracker
  • Database High Water Mark Tracker for extracting the incremental changes

This framework is designed to allow the production support to monitor the job cycle refresh and look for the issues relating to job failure and any discrepancies while processing the data loads. Let’s go through each of piece of the framework step-by-step.

Talend Jobs

Talend_Jobs is a Job Master Repository table that manages the inventory of all the jobs in the Data Integration domain.




Unique Identifier to identify a specific job


Job Name is the name of the job as per the naming convention (__



Business Unit / Department or Application Area


Job author Information


Additional Information related to the job


The last updated date

Talend Job Run Details

Talend_Job_Run_Details registers every run of a job and its sub jobs with statistics and run details such as job status, start time, end time, and total duration of main job and sub jobs.




Unique Identifier to identify a specific job run


Business Unit / Department or Application Area


Job author Information


Unique Identifier to identify a specific job


Job Name is the name of the job as per the naming convention (__



Unique Identifier to identify a specific sub job


Sub Job Name is the name of the sub job as per the naming convention (__



Main Job Start Timestamp


Main Job End Timestamp


Main Job total job execution duration


Sub Job Start Timestamp


Sub Job End Timestamp


Sub Job total job execution duration


Sub Job Status (Pending / Complete)


Main Job Status (Pending / Complete)


The last updated date

Talend Job Run Log

Talend_Job_Run_Log logs all the errors occurred during particular job execution. Talend_Job_Run_Log extracts the details from the Talend components specially designed for catching logs (tLogCatcher) and statistics (tStatCacher).

Figure 3: Error logging and Statistics

The tLogCatcher component in Talend operates as a log function triggered during the process by one of these components: Java exceptions, tDie or tWarn. In order catch exceptions coming from the job, tCatch function needs to be enabled on all the components.

The tStatCatcher component gathers the job processing metadata at the job level.




Unique Identifier to identify a specific job run


Unique Identifier to identify a specific job


The time when the message is caught


The Process ID of the Job


The Parent process ID


The root process ID


The system process ID


The name of the project


The name of the Job


The ID of the Job file stored in the repository


The version of the current Job


The Name of the current context


The priority sequence


The name of the component if any


Begin or End


The error message generated by the component when an error occurs. This is an After variable. This variable functions only if the Die on error checkbox is cleared.




Time for the execution of a Job or a component with the tStatCaher Statistics check box selected


Record counts


Job references


Log thresholds for managing error handling workflows

Talend High Water Marker Tracker

Talend_HWM_Tracker helps in processing delta and incremental changes of a particular table. The High Water Tracker is helpful when the “Change Data Capture” is not enabled and the changes are extracted based on specific conditions such as “last_updated_date_time” or ‘revision_date_time.” In some cases, the High Water Mark relates to the highest sequence number when the records are processed based on the sequence number.




Unique Identifier to identify a specific source table


Unique Identifier to identify a specific job


The name of the Job


The name of the source table


The source table environment


The source table database type


High Water Field (Datetime)


High Water Field (Number)


High Water SQL Statement

Talend File Tracker

Talend_File_Tracker registers all the transactions related to file processing. The transaction details include source file location, destination location, file name pattern, file name suffix, and the name of the last file processed.




Unique Identifier to identify a specific source file


Unique Identifier to identify a specific job


The name of the Job


The file server environment


The file name pattern


The source file location


The target file location


The file suffix


The name of the last file processed for a specific file


The override flag to re-process the file with the same name


The last updated date


This brings to the end of the implementing Job metadata framework using Talend. The following are key takeaways from this blog:

  1. The need and the importance of Job metadata framework
  2. The data model to support the framework
  3. The customizable data model to support different types of job patterns.

As always – let me know if you have any questions below and happy connecting!

The post How to Implement a Job Metadata Framework using Talend appeared first on Talend Real-Time Open Source Data Integration Software.

Source: How to Implement a Job Metadata Framework using Talend

What Is an ID Graph and How Can It Benefit Cross-Device Tracking?

Modern marketing is a complex multichannel and multi-device world. On average, 40% of adult users use more than one device to connect to the Internet. This offers more paths for customer engagement, more ways of making purchases, and more chances to gather data – a myriad of possibilities for both advertisers and customers. At the same time, it’s a double-edged sword because with these possibilities come challenges. Recognizing users across multiple channels, devices, and touchpoints, then stitching that information into a full dimensional profile is tricky indeed.

Without a well-rounded view of customers, marketing campaigns have to start from ground zero every time a user jumps from one device to another. Understanding customer identity becomes a key point in every marketing strategy. The question is, how can we achieve this goal with fragmented data scattered across multiple devices? The answer is by using an identity graph (ID graph).

If you’re not familiar with this notion yet, no worries – we’re here to shed some light and provide details on how it can benefit your organization. But before that, let’s take a quick look behind the scenes of graph analytics.

Visualize your customer data

Data scientists, analysts, and marketers all seek the best way to visualize and understand data. The graph method seems to be one of them. You can use graphs to filter meaningful signals from customers, users that otherwise would get lost in the noise of irrelevant data streams.

Graph databases together with linked data present today’s reality, showing us how people live and act. You can employ interest graphs, customer graphs, social graphs, product graphs, identity graphs, and many other types of graphs. The power of this technology comes from its capacity to manage links across billions of identities with millisecond latencies.

The core of your graph is a user profile, and the rest is the context you embed it into. You can design one that best fits your organization’s business needs and requirements.

What makes an ID graph?

First of all: what is an ID graph? It’s a tool you use to get a single customer view, not just user likes or interests. It connects all data on individual users across channels and devices. It’s a database. It houses all the identifiers associated with individual visitors. Throughout the whole customer journey you collect various personal identifiers such as:

  • Email or physical address
  • Account usernames
  • Device IDs
  • Phone numbers
  • IP address
  • Cookies
  • … and many more

These identifiers are gathered and then sewn together into a customer profile that reflects the ID graph. The graph also contains behavioral data, such as browsing history, past transactions, etc. The identity graph is fueled by data from different sources, whether it be CRM, marketing and ad platforms, or e-commerce software. In brief, it connects data across the offline and online landscapes.

Representation of an ID graph.
Representation of an ID graph. Source: Signal

With all these data points you can create a single customer view that lets you better understand and target customers. The primary focus here is the individual customer, not a device. This cross-device identity is a powerful tool in your marketing arsenal.

What’s more, with the ID graph you gain the possibility to link online data with offline data then reach your customers at every touchpoint. It helps you improve your personalization strategy and lift customer engagement.

Free Comparison of 4 Enterprise-Ready Customer Data Platforms

Get to know 25 key differences between Tealium, Ensighten, BlueVenn and Piwik PRO to find out which platform fits your business’s needs

Download FREE Comparison

Why control matters

Connecting the dots of customer identifiers is not the only issue. When you decide to handle a customer’s personal data – which is what some of these identifiers are – you are walking through a minefield of legal issues. That’s why your primary focus should be on data ownership and profile authenticity.

You can acquire ID graphs offered by onboarding providers, ad partners, or social networks. However, this solution means working with third-party data, and that comes with certain restrictions.

First, there’s the matter of legal compliance. The identifiers we’re talking about are usually personal data, which you need users’ consent to process. This might be tricky considering the complexity of data flows between different parties.

Second, when using third-party, a data marketer’s strategy is limited to the capabilities and insights offered by data providers. It might come at the cost of the relevance and precision of information. Consequently, marketers can have trouble measuring their impact and building better relationships with customers.

If you’d like to get a better picture of third-party data, we recommend reading:
Why First-Party Data is the Most Valuable to Marketers

However, there is a solution. You can and even should obtain your own ID graph. Having control and ownership over the graph – over data – your marketing team gets a complete picture of the customer journey. Because you can achieve cross-screen identification of your users, which means recognizing their preferences, wishes, desires, and past activities, you can finally build well-rounded user profiles.

Authenticated or not?

Visitor profiles come in two types: authenticated and non-authenticated. The former are more beneficial as they’re persistent, being built on authenticated IDs. They consist of login IDs, email addresses, customer IDs, and the like. Basically, when a user buys something with their credit card or logs on to a site, the identity graph connects the provided first-party data with other pieces of data used to identify this user across devices.

These kinds of profiles grow with every interaction as they accumulate more and more data. The data deepens and enriches such profiles, making them more advantageous for marketing strategies. That’s why the right ID graph must be linked to persistent user profiles.

In contrast, non-authenticated profiles are created from temporary identifiers like cookies, channel- or device-specific IDs that can’t fit into multi-device strategy. Their application is limited to a single campaign or single channel.

When cookies don’t cut it

For years cookies have been successfully used in tracking online desktop users. However, with the proliferation of mobile devices cookies don’t pass the test in this game. The thing is, on smartphones, tablets, or other similar devices, whenever a user closes the browser, their cookies are reset. They simply weren’t designed for a multi-device reality. They can’t be passed from one device to another, nor be shared between apps.

What this means for identification practice is that if someone accesses a website on a desktop and then on their tablet, they will be recognized as two different users. So if a visitor comes to your website first on their desktop, and later on their smartphone, that one person will be recorded as two different visitors.

But there are other issues. For instance, the cookies of someone browsing the Internet in incognito or private mode (desktop/laptop) won’t be saved. And there’s also the trouble with third-party cookies being constantly blocked either by users changing their cookie settings or through privacy features like Intelligent Tracking Prevention. Consequently, cookies are becoming obsolete.

Matchmaking to the rescue

As we’ve already said, when people switch from tablet to laptop and then reach for their smartphones, it means that consumer data is scattered instead of being held on one device. So if cookies can’t do the tracking in the cross-device world, what can you do about it?

You can take advantage of deterministic matching and probabilistic matching. These two methods are applied to match data and help you build cross-device identities of your customers. In this case, there’s no single right or wrong choice. In a nutshell, the former provides a more accurate match, while the latter offers better scalability. Everything comes down to your business goals and the types of data you have access to. You may even find instances where they compliment each other and you can apply both. But before you decide, let’s dive into the details of each.

Deterministic matching

Deterministic matching lets you recognize the same person across multiple devices by matching the same user profiles together. These profiles are built around various bits of user data with every user having their own profile on different devices.

To recognize users across multiple screens, deterministic matching searches through data sets and ties all profiles of a particular user together with a common identifier. The matching is usually achieved through login details.

For instance, if someone regularly types in their username and password into a site using their mobile phone, the brand recognizes this user profile and can link the user with this particular smartphone and also identify that same user if they then login using the same credentials on a different device (for example, their laptop). It means that the data is authenticated, which significantly increases its certainty.

One of the biggest assets of this method is its accuracy (about 80-90%). However, when it comes to scale, it’s not the best option, because not all websites or applications compel users to log in or submit other information.

Probabilistic matching

If you’re looking for better scalability, you can use probabilistic matching. It utilizes distinct data sets (as mentioned earlier) together with algorithms. Thanks to this formula you can identify the same user across the whole multi-screen digital landscape. Probabilistic matching uses data like:

  • IP address
  • Device type
  • Browser type
  • Operating system
  • WiFi network
  • Location

This method creates probable statistical connections between a user and any number of devices they use. Compared to the deterministic method it’s not as accurate, but it applies the same data sets to teach the algorithms to increase accuracy.

One of the key advantages in this approach is scalability, providing organizations with greater possibilities to scale personalization beyond their user base. So you’ll be able to recognize users across multiple screens without having to gather email addresses or other personal data.

However, the drawback of probabilistic matching is poor transparency of matching methods and algorithms.

Another difficulty which appears with both methods are the privacy issues present on the legal landscape. If you want to process personal data, including IP addresses, you need to make sure you do it in compliance with regulations like GDPR. In other words, you have to obtain a user’s consent to proceed.

Why you need ID graphs

Now that you know the ins and outs of ID graphs, it’s time to have a look at how this technology can be of benefit to your organization. Since ID graphs stitch customer identities into a single customer view, they provide you with an array of potential applications. Let’s review the most important ones.

Optimizing personalization

In a personalization strategy the key focus is on identifying the user. To be precise, knowing about their preferences, desires, and wishes, and then providing them with content that truly resonates. ID graphs help you achieve this as you get the full dimensional customer profile that blends the online and offline worlds of their interactions with your brand.

Understanding each user and customer at the individual level allows marketers to provide more engaging content. In addition, you can constantly enrich and update identity data to optimize the results of your marketing campaigns.

Take the example of giants like Amazon or Netflix that serve the most relevant product recommendations. They take advantage of ID graphs tracking browsing history across users’ devices to serve highly engaging recommendations.

Addressing customer needs in real-time

The digital world speeds up all user interactions. This trend forces marketers to recognize and meet customers’ requirements without delay. It means organizations need real-time technology to gather, match, and activate data.

Enhancing customer engagement

Thanks to the capacity to match offline data with digital identifiers and behavioral data within ID graphs, marketers are able to better engage with customers. With well-rounded customer profiles you can predict their future needs, plan strategies, improve up-selling and cross-selling campaigns, and find better opportunities to re-engage lost customers.

Improving cross-device attribution

By taking advantage of deterministic identity graphs, you can identify the role every channel performs in user conversion. Once you have a bigger picture of the customer journey you’ll be able to accurately attribute conversions.

You can tailor messages to each customer individually and reduce budget waste resulting from generic campaigns. Your aim is to ensure that customers engage with your content in an optimal way no matter how often they jump from one device to another.

Free Comparison of 4 Enterprise-Ready Customer Data Platforms

Get to know 25 key differences between Tealium, Ensighten, BlueVenn and Piwik PRO to find out which platform fits your business’s needs

Download FREE Comparison

Final thoughts

Companies operating within the digital ecosystem need to shift their focus from customers’ actions to customers’ identities. To achieve this goal, they should wield technology across multiple screens and channels to create direct relationships with their users. An optimal solution like ID graphs lets marketing teams keep up with customers who expect immediate and relevant brand experiences across all touchpoints in their multi-screen journey.

Although we’ve just scratched the surface of the complex topic of identity graphs, we hope you’ve found some key solutions to tough issues. If you have any questions, reach out to our team and we’ll be glad to answer them without delay!

Contact us

The post What Is an ID Graph and How Can It Benefit Cross-Device Tracking? appeared first on Piwik PRO.

Originally Posted at: What Is an ID Graph and How Can It Benefit Cross-Device Tracking? by analyticsweek