Best Practices for Using Context Variables with Talend – Part 2

First off, a big thank you to all those who have read the first part of this blog series!  If you haven’t read it, I invite you to read it now before continuing, as ~Part 2 will build upon it and dive a bit deeper.  Ready to get started? Let’s kick things off by discussing the implicit context load.

The Implicit Context Load

The Implicit Context Load is one of those pieces of functionality that can very easily be ignored but is incredibly valuable.

Simply put, the implicit context load is just a way of linking your jobs to a hardcoded file path or database connection to retrieve your context variables. That’s great, but you still have to hardcode your file path/connection settings, so how is it of any use here if we want a truly environment agnostic configuration?

Well, what is not shouted about as much as it probably should be is that the Implicit Context Load configuration variables can not only be hardcoded, but they can be populated by Talend Routine methods. This opens up a whole new world of environment agnostic functionality and makes Contexts completely redundant for configuring Context variables per environment.

You can find the Talend documentation for the Implicit Context Load here. You will notice that it doesn’t say (at the moment…maybe an amendment is due :)) that each of the fields shown in the screenshot below can be populated by Talend routine methods instead of being hardcoded.


Before I go any further it makes sense to jump onto a slight tangent and mention JASYPT. JASYPT is a java library which allows developers to add basic encryption capabilities to his/her projects with minimum effort, and without the need of having deep knowledge on how cryptography works. JASYPT is supplied with Talend, so there is no need to hunt around and download all sorts of Jars to use here. All you need to be able to do is write a little Java to enable you to obfuscate your values to prevent others from being able to read them in clear text.

Now, you won’t necessarily want all of your values to be obfuscated. This might actually be a bit of a pain. However, JASYPT makes this easy as well. JASYPT comes built-in with some functionality which will allow it to ingest a file of parameters and decrypt only the values which are surrounded by ….


This means a file with values such as below (example SQL server connection settings)…..








…..will only have the “TalendContextPassword” variable decrypted, the rest will be left as they are.

This piece of functionality is really useful in a lot of ways and often gets overlooked by people looking to hide values which need to be made easily available to Talend Jobs. I will demonstrate precisely how to make use of this functionality later, but first I’ll show you how simple using JASYPT is if you simply want to encrypt and decrypt a String.

Simple Encrypt/Decrypt Talend Job

In the example I will give you in part 3 of this blog series (I have to have something to keep you coming back), the code will be a little harder than below. Below is an example job showing how simple it is to use the JASYPT functionality. This job could be used for encrypting whatever values you may wish to encrypt manually. It’s layout is shown below….


Two components. A tLibraryLoad to load the JASYPT Jar and a tJava to carry out the encryption/decryption.

The tLibraryLoad is configured as below. Your included version of JASYPT may differ from the one I have used. Use whichever comes with your Talend version.

The tJava needs to import the relevant class we are using from the JASYPT Jar. This import is shown below…..

The actual code is….

import org.jasypt.encryption.pbe.StandardPBEStringEncryptor;

Now to make use of the StandardPBEStringEncryptor I used the following configuration….

The actual code (so you can copy it) is shown below….

//Configure encryptor class

StandardPBEStringEncryptor encryptor = new StandardPBEStringEncryptor();



//Set the String to encrypt and print it

String stringToEncrypt = "Hello World";


//Encrypt the String and store it as the cipher String. Then print it

String cipher = encryptor.encrypt(stringToEncrypt);


//Decrypt the String just encrypted and print it out


In the above it is all hardcoded. I am encrypting the String “Hello World” using the password “BOB” and the algorithm “PBEWithMD5AndDES”. When I run the job, I get the following output….

Starting job TestEcryption at 07:47 19/03/2018.

[statistics] connecting to socket on port 3711

[statistics] connected

Hello World


Hello World

[statistics] disconnected

Job TestEcryption ended at 07:47 19/03/2018. [exit code=0]

These snippets of information are useful, but how do you knit them together to provide an environment agnostic Context framework to base your jobs on? I’ll dive into that in Part 3 of my best practices blog. Until next week!

The post Best Practices for Using Context Variables with Talend – Part 2 appeared first on Talend Real-Time Open Source Data Integration Software.

Source by analyticsweekpick

The Updated Piwik PRO Marketing Suite 6.2.0 is here!

With the beginning of the month we’re glad to announce that Piwik PRO Marketing Suite has been upgraded to version 6.2.0. The official release to our customers was on July 31 this year. The software update brings various new capabilities along with performance improvements, which is the result of numerous meetings, discussions, and significant input from both our customers and staff.

In this post, we’ll give you a run down of all the major changes and fixes so you don’t miss a beat. So, here we go.

What can you expect from the refreshed Tag Manager?

With the latest update to Tag Manager, our product has expanded its library of DoubleClick tag templates with Floodlight Counter and Floodlight Sales to let you more efficiently track conversion activities. The first one enables you to count how many times your users visited your website after they either clicked on or saw one of your ads. Thanks to the second one, you can record how many items users have bought and the value of the whole purchase.

What’s more, our team fixed issues concerning variables and even expanded their functionality. Currently, can now employ refactor variables, covering various types of variables like string, integer, boolean, and objects — depending on usage context.

Next, we made some changes regarding cookies. Namely, the cookie’s default expiry date has been reduced to one year, and that can’t be changed by the user.

What can you expect from the refreshed Consent Manager?

The recent update to Piwik PRO has introduced several new functionalities to Consent Manager. First of all, you can now manage consents with a JavaScript API that enables you:

  • get consent types
  • get new consent types
  • get consent settings
  • send data subject request

All in an easier and more convenient way.

Then, you can get a better view into visitors’ consent decisions with newly included Consent Manager reports. In this way you can see, for instance, if a user viewed the consent form, provided consent or just left the page without submitting any consent decision.

A view of one of the consent manager reports.

Furthermore, we added a new functionality so users with Edit & Publish authorization have the ability to easily manage all consents.

Consent Manager’s visual editor has been upgraded with an HTML elements tree for a better user experience. It enables you with an easy and convenient method to track and visualize changes in your consent. Moreover, with the product update you can easily see the history of all modifications to the copy in the consent form.

Lastly, you’ll be able to ask your visitors for consent again 6 months after their first consent decision was recorded. This can be used to encourage users to provide consent if they didn’t do so the first time, or if they changed their consent decisions at some point in time.

What can you expect from the refreshed Audience Manager?

Another product in our stack that also got a makeover is Audience Manager (Customer Data Platform). One of the most significant features was the addition of two API endpoints. You can now pull lists of created audiences and easily export all profiles into a CSV file from Audience Manager via API. This is particularly useful for automating data transfers from Audience Manager to your other marketing tools, such as your marketing automation platform.

What can you expect from the refreshed Analytics?

Last but not least, our flagship product — Analytics — has got a significant enhancement with row evolution reports for funnel steps. It’s a big asset as you can now take a closer look at each funnel step individually on each row of your report. This will come in handy as you can view how metrics change throughout time, for instance, due to modifications to the site or an increase in traffic. What’s more, you can apply annotations to charts on a particular date to mark the exact moment when a change occurs.

A view of row evolution report for each step of the funnel.

To round out

As you can see, our team has introduced a host of improvements with the new update. Some include major changes, while other are small upgrades and with various fixes. We are constantly working on our products so they’ll run smoothly and help you address all your analytics issues on the spot. Naturally, we’ll be releasing more advancements, tweaks, and new features again soon, so stay tuned! If you have any questions or suggestions, we’re here for you so…

Contact us

The post The Updated Piwik PRO Marketing Suite 6.2.0 is here! appeared first on Piwik PRO.

Source: The Updated Piwik PRO Marketing Suite 6.2.0 is here! by analyticsweek

Compelling Use Cases for Creating an Intelligent Internet of Things

The industrial sector of the Internet of Things, the Industrial Internet, is far more advanced in terms of adoption rates, technological applications, and business value generation than the consumer side of the IoT is. Perhaps the most pronounced advantage between these two sectors is in the utilization of machine learning, which is routinely deployed in the Industrial Internet for equipment asset monitoring and predictive maintenance.

Although the use cases for this aspect of Artificial Intelligence differ on the consumer side, they still provide the same core functionality with which they’ve improved the industrial sector for years—identifying patterns and presaging action to benefit the enterprise.

On the industrial side, those benefits involve sustaining productivity, decreasing costs, and increasing efficiency. On the consumer side, the prudent deployment of machine learning on the IoT’s immense, continuously generated datasets results in competitive advantage and increased revenues.

“The thing about the IoT in general is that the amount of data from these things is enormous,” Looker Chief Data Evangelist Daniel Mintz noted. “So when you’re selling thousands, tens of thousands or hundreds of thousands of devices, that’s a lot of data that gives you a lot of guidance for these devices on what’s working, what’s not working, and how to [best] tune the system.”

Aggregation Analytics
There are multiple ways in which the IoT is primed for machine learning deployments to produce insights that would otherwise go unnoticed due to the scale of data involved. The large datasets are ideal for machine learning or deep learning’s need for copious, labeled training data. Moreover, IoT data stem from the field, offering an unparalleled view into how surroundings are impacting functionality—which is critical on the consumer side. “If you’re trying to understand what the relationship is between failure rates or misbehavior and the environment, you’re going to absolutely be using machine learning to uncover those linkages,” Mintz said. The basic precept for which machine learning is deployed is for user behavior or aggregate analytics in which massive quantities of data from IoT devices are aggregated and analyzed for patterns about performance. “At this scale, it’s not very easy to do any other way,” Mintz observed.

However, there’s a capital difference in the environments pertaining to the industrial versus the consumer side of the IoT, which makes aggregate analytics particularly useful for the latter. “Industrial applications are more controlled and more uniform,” Mintz said. “With the consumer side, you’re selling devices that are going out into the real world which is a much less controlled environment, and trying to figure out what’s happening out there. Your devices are undoubtedly encountering situations you couldn’t have predicted. So you can find out when they’re performing well, when they’re not, and make changes to your system based on that information.”

User Behavior
That information also provides the means for product adjustments and opportunities to increase profits, simply by understanding what users are doing with current product iterations. It’s important to realize that the notion of user behavior analytics is not new to the IoT. “The thing that’s new is that the items that are producing that feedback are physical items in the real world [with the IoT] rather than websites or mobile apps,” Mintz commented. “System monitoring and collating huge amounts of data to understand how people are using your product is a relatively old idea that people who run websites have been doing for decades.” When machine learning is injected into this process, organizations can determine a number of insights related to feature development, marketing, and other means of capitalizing on user behavior.

“You might be using machine learning to understand who is likely to buy another device because they’re really using the device a lot,” Mintz said. “Or, you might use machine learning to improve the devices.” For example, organizations could see which features are used most and find ways to make them better, or see which features are rarely used and improve them so they provide a greater user experience. The possibilities are illimitable and based on the particular device, the data generated, and the ingenuity of the research and development team involved.

Integrating Business Semantics
Because IoT data is produced in real-time via streaming or sensor data technologies, those attempting to leverage it inherently encounter integration issues when applying it to other data sources for a collective view of its meaning. When properly architected, the use of semantic technologies—standard data models, taxonomies and vocabularies—can “allow business analysts to take all of their knowledge of what data means to the business, how it’s structured and what it means, and get that knowledge out of their heads and into [the semantic] software,” Mintz mentioned. When dealing with low-latent data at the IoT’s scale, such business understanding is critical for incorporating that data alongside that of other sources, and even expanding the base of users of such data. “The reality is raw data, particularly raw data coming off these IoT devices, is really [daunting],” Mintz said. “It’s just a stream of sort of logs that’s really not going to help anybody do anything. But if you start to collect that data and turn it into something that makes sense to the business, now you’re talking about something very different.”

Anomaly Detection
The most immediate way AI is able to enhance the Internet of Things is via the deployment of machine learning to identify specific behaviors, and offer ways to make system, product or even enterprise improvements based on them. Perhaps the most eminent of these is machine learning’s capacity for anomaly detection, which delivers numerous advantages in real-time IoT systems. “That’s a huge use case,” Mintz acknowledged. “It’s not the only one by any means, but I do think it’s a huge use case. That comes straight out of the manufacturing world where you’re talking about predictive maintenance and preventative maintenance. What they’re looking for is anomalous behavior that indicates that something is going wrong.” When one considers the other use cases for intelligent IoT applications associated with performance, environments, product development, and monetization opportunities, they’re an ideal fit for machine learning.

Originally Posted at: Compelling Use Cases for Creating an Intelligent Internet of Things by jelaniharper

Benefits of IoT for Hospitals and Healthcare

Undoubtedly, the Internet of Things technology has been significantly transforming the healthcare industry by revamping the way devices, apps, and users connect and interact with each other for delivering healthcare services. IoT is continuously introducing innovative tools as well as capabilities like IoT enabled medical app development that build up an integrated healthcare system with the vision of assuring better patients care at reduced costs.

Consequently, it is an accumulation of numerous opportunities that hospitals and wellness promoters can consider while they optimize resources with automated workflows. For example, a mass of hospitals utilizes IoT for controlling humidity and managing assets and temperature within operating areas. Moreover, IoT applications are offering enormous perks to health care providers and patients considerably improving health care services.

The Impact of IoT on Healthcare Industry

Check Out Some The Best IoT Applications That Are Impacting Healthcare Services:

Real-Time Remote Monitoring

IoT enables connecting multiple monitoring devices and thus monitoring patients in real time. Further, these connected devices can send out signals from home also, thereby decreasing the time required for patient care in the hospitals.

Blood Pressure Monitoring

A sensor based intelligent system like Bluetooth enabled coagulation system can be utilized to monitor blood pressure levels of patients who undergo hypertension. Such monitoring devices also help to diminish the possibility of cardiac arrests in critical cases.

Smart Pills

Some of the pharmacy companies like Proteus Digital Health, WuXi PharmaTech, and TruTag have been making edible IoT, “smart” pills that help to monitor health issues, medication controls, and adherence. Such Smart pills will aid drug creation organizations to lower their risks.

Smart Watches

IoT enabled wearable devices as Apple Watch can effectively monitor and evaluate people’s mood and report information to the server. Moreover, some of the apps are being built to monitor fitness activities and sleep cycles.

Let’s Have a Look at Key Benefits of IoT in Healthcare Industry

Reduced Cost

By leveraging the connectivity of the healthcare solutions, healthcare providers can improve patient monitoring in real time basis and thereby noticeably diminish needless visits by doctors. Specifically, advanced home care services are reducing re-admissions and on hospital stays.

Better Results of Treatment

The connected healthcare solutions with cloud computing or other virtual infrastructure enable care providers to obtain real-time information that aids them to make knowledgeable decisions and provide evidence-based treatments. This makes sure of timely healthcare provision and improved treatment results.

Enhanced Disease Management

This is one of the best benefits of IoT in the healthcare sector. IoT empowers healthcare providers to monitor patients and access real-time data continuously. This helps to treat diseases earlier than some serious condition.

Reduced Faults

Precise data collection and automated workflows along with data-driven decisions greatly help to reduce waste, system costs and most notably diminishing errors.

Enhanced Patient Experience

The Internet of Things mainly focuses on the patient’s needs. This results in better accurateness of diagnosis, proactive treatments, timely intervention by doctors and improved treatment results giving rise to better patient trust and experience.

Improved Drugs Management

Making and management of drugs is a main expenditure in the healthcare sector. Here as well IoT performs a huge role. With IoT devices and processes it is possible to manage these costs better.


IoT Enabled Solutions like IoT enable medical app development, and connected healthcare solutions are proving to be a game changer in the healthcare industry. With its enormous applications, IoT has been facilitating healthcare providers including doctors, hospitals, and clinics to nurture the patients with accurate treatment services and strategies.

Integrating IoT solutions in health care services is going to be essential to match with the increasing needs of the digital world. If you are willing to digitize your healthcare services, then IoT should be your first choice. Contact us to know more about different IoT solutions and applications.


Source : (

Source: Benefits of IoT for Hospitals and Healthcare by analyticsweek

Removing Silos & Operationalizing Your Data: The Key to Analytics

Enterprise information workers have more data analytics choices than ever, in terms of paradigms, technologies, vendors, and products. From one perspective, we are presently in a golden age of data analytics: there’s innovation everywhere, and intense competition that is beneficial to enterprise customers. But from another point of view, we’re in the data analytics dark ages, as the technology stack is volatile, vendors are prone to consolidation and shakeout, and the overwhelming variety and choice in technologies and products is an obstacle to analytics adoption, progress, and success.

This is a stressful time for enterprise IT and lines of business alike. No one wants to make the wrong decision and be held responsible for leading their organization down the wrong data analytics path. Yet the responsibility to act, and act soon, is palpable. The pressure is great, and opportunities for evasion and procrastination are receding. It’s a perfect technology storm.

What should you do? Stick with “the vendor you know:” old school data warehousing and business intelligence (BI), running on-premises? Or go the no guts/no glory route and dive in head-first to open source big data technologies and run them in the cloud? Most people won’t want to go to either extreme and would instead prefer a middle-ground strategy, but there are a lot of options within that middle-range.

Even understanding all your choices—let alone making a decision—is daunting and can bring about a serious paralysis, right at the dawn of real digital transformation in business, where we realize data is a competitive asset. What’s needed is an organizing principle with which to understand the crucial difference in products and technologies. Such a framework is the only way organizations can understand their options, and make the right choice.

Consider this: one of the biggest challenges in analytics is the issue of data “silos,” wherein the data you need for successful insights is scattered – in different databases, applications, file systems, and individual files. And while that may only seem to add to the pressure, there is in this circumstance an opportunity. The structural challenge of siloed data, the many ways it manifests, and the various ways to mitigate and resolve it can act as the organizing principle with which to understand vendors, technologies, and products in the data analytics space. This principle will help you understand your own requirements and criteria in evaluating products, making a buying decision, and proceeding with implementation.

Data siloes aren’t really a “defect,” but rather a steady state for operational data. That is to say that data, in its equilibrious state, is siloed. But analytics is ideally executed over a fully integrated data environment. As such, a big chunk of the analytics effort is to coalesce siloed data, and a careful investigation of vendors, product categories, and products in the analytics arena will show all of them to be addressing this task.

They don’t all do it the same way though. In fact, what distinguishes each analytics product category is the point along the data lifecycle where it causes data silos to coalesce. And that’s why the concept of removing the silos in the data landscape is such an integral part of an organizing principle for the analytics market.

Whether under the heading of data ingestion; data integration; data blending; data harmonization; data virtualization; extract, transform and load (ETL) or extract, load and transform (ELT); data modeling; data curation; data discovery; or even just plain old data analysis, every single vendor is in some way, shape, or form focused on unifying and melding data into a unified whole.

Surely, analysis and insights, technology aside, are about putting pieces of the puzzle together, from different parts of the business and from different activities in which the business is engaged. Each part of the business and each activity involves its own data and often its own software and database. Viewed this way, we can start to see that removing silos shouldn’t be viewed as an inconvenience; rather, it’s an activity inextricable from the very process of understanding of the data.

Once this concept is acknowledged, understood, accepted, and embraced, it can bring about insights of its own. How a product brings disparate data together tells us a lot about the product’s philosophy, approach and value. And, again, it can also tell us a lot about how the product aligns with an Enterprises’s requirements, as types of silo and degrees of data segregation will be different for different users and organizations.

In this report, we will look at an array of analytics product categories from this point of view. We will name and describe each product category, identify several companies within it, then explore the way in which they approach the union of data and the elimination of the silos within it.

The product categories we analyze in this report are:

  1. Data Connectors
  2. Virtualized Data Layers
  3. Data Integration
  4. In-Memory Database/Grid Platforms
  5. Data Warehouse Platforms
  6. Business Intelligence
  7. Business Intelligence on Big Data/Data Lakes
  8. Big Data/Data Lakes Platforms
  9. Data Management and Governance

For some product categories, that characterization will be obvious, or at least logical. For others, the silo removal analysis may be subtle or seem a bit of a stretch. By the end of the report, though, we hope to convince the reader that each category has a legitimate silo elimination mission, and that viewing the market in this way will provide enterprises with an intuitive framework for understanding the myriad products in that market.

Contrast this to the approach of viewing so many products in a one-at-a-time, brute-force manner, a rote method that is doomed to failure. Understanding a market and a group of technologies requires a schema, much like data itself. And with a set of organizing principles in place, that schema becomes strong and intuitive. In fact, such a taxonomy itself eliminates silos and contributes to comprehensive understanding, by connecting the categories across a continuum, instead of leaving them as mutually-exclusive islands of products.

With all that said, let’s proceed with our first product category and how products within it address and eliminate data silos.

Perhaps the best way to progress through the analytics product categories is to go bottom-up, starting with products that mostly deal in the nuts and bolts of data analysis, and then work our way up to broader platforms that provide higher layers of abstraction.


Assess Your Analytics UX in 3 Questions

An application can live or die by its embedded analytics. It doesn’t matter if the rest of your product is perfectly designed. If your dashboards and reports have a disappointing user experience (UX), user adoption and customer satisfaction can plummet.

“User experience matters,” writes Gartner in their recent report, 5 Best Practices for Choosing an Embedded Analytics Platform Provider. “Embedded analytics [should] not only support embedding of charts and visualizations, but also go deeper and integrate the data and analytics into the fabric of the application. This ‘seamless’ approach means that users don’t even know they are using a multiproduct application.”

>> Related: UX Design for Embedded Analytics <<

How solid is your analytics UX? Ask yourself these three questions to gauge where and how you can improve your analytics experience:

#1. Do you have a deep understanding of your users?

A lack of understanding about what users need from their dashboards and reports is a challenge that plagues product teams. Many companies fill their analytics with data they think their users want, and never do the due diligence to find out what users actually need.Don’t assume what your business intelligence users want. Take the time to research how users will interact with your application, so you can build it with them in mind. It’s a seemingly obvious but often-missed point: Different end users want to use your application’s embedded analytics in different ways.

#2. Does your embedding stop at the visualizations?

Embedded analytics involves more than white-labeling some charts and graphs. Application teams need to look at the complete experience—not just the visuals—to ensure end users can’t tell where your application ends and the embedded analytics begins.A truly seamless experience “allows users to take immediate action from within the application, without shifting context,” notes Gartner in their report. Ideally, you want to integrate the analytics into your users’ workflows by letting them take action from the analytics, write-back to the database, and share insights in context.

#3. Do the visualizations match your data?

Another common problem is choosing the wrong data visualizations to illustrate your datasets. Most visualizations are good for some types of data, but not every type. For example, a scatter chart works well to display two variables from a dataset, but it’s only useful when there is a number value on each axis; without that, it will appear to be a line chart without the line. Or consider the common pie chart, which is great for four or five values—but completely breaks down when sliced into dozens of sections. These are just two examples of how poor UI/UX can make information difficult for a user to understand.

If you’ve answered “yes” to any of the questions above, it’s time to update your analytics before your customers start abandoning your product for the competition. Learn how to take the next steps in our Blueprint to Modern Analytics guide.

Originally Posted at: Assess Your Analytics UX in 3 Questions by analyticsweek

Visualization’s Twisted Path

Visualization is not a straight path from vision to reality. It is full of twists and turns, rabbit trails and road blocks, foul-ups and failures. Initial hypotheses are often wrong, and promising paths are frequently dead ends. Iteration is essential. And sometimes you need to change your goals in order to reach them.

We are as skilled at pursuing the wrong hypotheses as anyone. Let us show you.

We had seen the Hierarchical Edge Bundling implemented by Mike Bostock in D3. It really clarified patterns that were almost completely obfuscated when straight lines were used. 

Edge Bundling

We were curious if it might do the same thing with geographic patterns. Turns out Danny Holten, creator of the algorithm, had already done something similar. But we needed to see it with our own data.

We grabbed some state-to-state migration data from the US Census Bureau, then found Corneliu Sugar’s code for doing force directed edge bundling and got to work.

To start, we simply put a single year’s (2014) migration data on the map. Our first impression: sorrow, dejection and misery. It looked better than a mess of straight lines, but not much better. Chin up, though. This didn’t yet account for how many people were flowing between each of the connections — only whether there was a connection or not. 

Unweighted edge bundled migration

Unweighted edge bundled migration

With edge bundling, each path between two points can be thought to have some gravity pulling other paths toward it while itself being pulled by those other paths. In the first iteration, every part of a path has the same gravity. By changing the code to weight the bundling, we add extra gravity to the paths more people move along.

Weighted edge bundled migration

Weighted edge bundled migration

Alas, things didn’t change much. And processing was taking a long time with all those flows. When the going gets tough, simplify. We cut the data into two halves, comparing westward flows to eastward flows.

East to west migration

East to west migration

West to east migration

West to east migration

Less data meant cleaner maps. We assumed there would be some obvious difference between these two, but these maps could be twins. We actually had to flip back and forth between them to see that there was indeed a difference.

So our dreams of mindblowing insight on a migration data set using edge bundling were a bust. But, seeing one visualization regularly leads to ideas about another. We wondered what would happen if we animated the lines from source to destination? For simplicity, we started with just eastward migration. 



Cool, it’s like laser light leisurely streaming through invisible fibre optic cables. But there’s a problem. Longer flows appear to indicate higher volume (which is misleading as their length is not actually encoding volume, just distance). So we tried using differential line lengths to represent the number of people, sticking with just eastward flows. 

Star Wars blasters

Star Wars blasters

Here we get a better sense of the bigger sources, especially at the beginning of the animation, however, for some paths, like California to Nevada, we end up with a solid line for most of the loop. The short geographic distance obscures the large migration of people. We wondered if using dashed lines would fix this—particularly in links like California to Nevada.

Machine gun bursts

Machine gun bursts

This gives us a machine gun burst at the beginning with everything draining into 50 little holes at the end. We get that sense of motion for geographically close states, but the visual doesn’t match our mental model of migration. Migrants don’t line up in a queue at the beginning of the year, leaving and arriving at the same time. Their migration is spread over the year.

What if instead we turn the migration numbers into a rate of flow. We can move dots along our edge bundled paths, have each dot represent 1000 people and watch as they migrate. The density of the dots along a path will represent the volume.  This also has the convenience of being much simpler to explain.

Radar signals

Radar signals

We still have a burst of activity (like radar signals) at the beginning of the loop, so we’ll stagger the start times to remove this pulsing effect.

Staggered starts

Staggered starts

Voilà. This finally gives us a visual that matches our mental model: people moving over the period from one state to another. Let’s add back westward movement.



Very cool, but with so much movement it’s difficult to tell who’s coming and who’s going. We added a gradient to the paths to make dots appear blue as they leave a state and orange as they arrive.

Coloured ants

Coloured ants

Let’s be honest, this looks like a moderately organized swarm of ants. But it is a captivating swarm that people can identify with. Does it give us any insight? Well not any of the sort we were originally working for. No simple way to compare years, no clear statements about the inflows and outflows. If we want to make sense of the data and draw specific conclusions… well other tools might be more effective.

But it is an enchanting overview of migration. It shows the continuous and overwhelming amount of movement across the country and highlights some of the higher volume flows in either direction. It draws you in and provides you with a perspective not readily available in a set of bar charts. So we made an interactive with both.

Each dot represents 1,000 people and the year’s migration happens in 10 seconds. Or if you’d prefer, each dot can represent 1 person, and you can watch the year play out in just over 2 hours and 45 minutes. If you’re on a desktop you can interact with it to view a single state’s flow. And of course for mobile and social media, we made the obligatory animated gif.

And just when we thought we’d finished, new data was released and were were obliged to update things for 2015.

Glowing ants

Glowing ants

Building a visualization that is both clear and engaging is hard work. Indeed, sometimes it doesn’t work at all. In this post we’ve only highlighted a fraction of the steps we took.  We also fiddled with algorithm settings, color, transparency and interactivity.  We tested out versions with net migration. We tried overlaying choropleths and comparing the migration to other variables like unemployment and birth rate. None of these iterations even made the cut for this blog post.

An intuitive, engaging, and insightful visualization is rare precisely because of how much effort it takes. We continue to believe that the effort is worthwhile.

Originally Posted at: Visualization’s Twisted Path

Data Management Rules for Analytics

With analytics taking a central role in most companies’ daily operations, managing the massive data streams organizations create is more important than ever. Effective business intelligence is the product of data that is scrubbed, properly stored, and easy to find. When your organization uses raw data without proper management procedures, your results suffer.

The first step towards creating better data for analytics starts with managing data the right way. Establishing clear protocols and following them can help streamline the analytics process, offer better insights, and simplify the process of handling data. You can start by implementing these five rules to manage your data more efficiently.

1. Establish Clear Analytics Goals Before Getting Started

As the amount of data produced by organizations daily grows exponentially, sorting through terabytes of information can become problematic and reduce the efficiency of analytics. Such large data sets require significantly longer times to scrub and properly organize. For companies that deal with multiple streams that exhibit heavy bandwidth, having a clear line of sight towards business and analytics goals can help reduce inflows and prioritize relevant data.

It’s important to establish clear objectives for data and create parameters that filter out data points that are irrelevant or unclear. This facilitates pre-screening datasets and makes scrubbing and sorting easier by reducing white noise. Additionally, you can focus even more on measuring specific KPIs to further filter out the right data from the stream.

6 crucial steps of preparing data for analysis

2. Simplify and Centralize Your Data Streams

Another problem analytics suites face is reconciling disparate data from multiple streams. Organizations have internal, third-party, customer, and other data that must be considered as part of a larger whole instead of viewed in isolation. Leaving data as-is can be damaging to insights, as different sources may use unique formats or different styles.

Before allowing multiple streams to connect to your data analytics software, your first step should be establishing a process to collect data more centrally and unify it. This centralization makes it easier to input data seamlessly into analytics tools, but also simplifies the methodology for users to find and manipulate data. Consider how to set up your data streams best to reduce the number of sources to eventually produce more unified sets.

3. Scrub Your Data Before Warehousing

The endless stream of data raises questions about quality and quantity. While having more information is preferable, data loses its usefulness when it’s surrounded by noise and irrelevant points. Unscrubbed data sets make it harder to uncover insights, properly manage databases, and access information later.

Before worrying about data warehousing and access, consider the processes in place to scrub data to produce clean sets. Create phases that ensure data relevance is considered while effectively filtering out data that is not pertinent. Additionally, make sure the process is as automated as possible to reduce wasted resources. Implementing functions such as data classification and pre-sorting can help expedite the cleaning process.

4. Establish Clear Data Governance Protocols

One of the biggest emerging issues facing data management is data governance. Because of the sensitive nature of many sources—consumer information, sensitive financial details, and so on—concerns about who has access to information are becoming a central topic in data management. Moreover, allowing free access to datasets and storage can lead to manipulation, mistakes, and deletions that could prove damaging.

It’s vital to establish clear and explicit rules about who can access data, when, and how. Creating tiered permission systems (read, read/write, admin) can help limit the exposure to mistakes and danger. Additionally, sorting data in ways that facilitate access to different groups can help manage data access better without the need to give free rein to all team members.

5. Create Dynamic Data Structures

Many times, storing data is reduced to a single database that limits how you can manipulate it. Static data structures are effective for holding data, but they are restrictive when it comes to analyzing and processing it. Instead, data managers should place a greater emphasis towards creating structures that encourage deeper analysis.

Dynamic data structures present a way to store real-time data that allows users to connect points better. Using three-dimensional databases, finding methods to reshape data rapidly, and creating more inter-connected data silos can help contribute to more agile business intelligence. Generate databases and structures that simplify accessing and interacting with data rather than isolating it.

The fields of data management and analytics are constantly evolving. For analytics teams, it’s vital to create infrastructures that are future-proofed and offer the best possible insights for users. By establishing best practices and following them as closely as possible, organizations can significantly enhance the quality of the insights their data produces.

6 crucial steps of preparing data for analysis


Office Depot Stitches Together the Customer Journey Across Multiple Touchpoints

In January 2017, the AURELIUS Group (Germany) acquired the European operations of Office Depot, creating Office Depot Europe. Today, Office Depot Europe is the leading reseller of workplace products and services with customers in 14 countries throughout Europe selling anything from paper, pens and flip charts, to office furniture and computer.

Centralizing Data to Respond to Retail Challenges

Traditionally, Office Depot’s European sales were primarily sourced through an offline, mail-order catalog model drive by telemarketing activities. The company has since moved to a hybrid retail model, combining offline and online shopping, which required a data consolidation strategy that optimized the different channels. Additionally, the company’s myriad of backend systems and disparate supply chain data collected from across Europe had become difficult to analyze.

Using Talend, Office Depot can now ingest data from its vast collection of operational systems. The architecture includes an on-premise Hadoop cluster using Hortonworks, Talend Data Integration, and Data Quality to perform checks and quality control on data before ingesting it into the Hub’s data lake.

Powering Use Cases from Supply Chain to Finance

Integrating online and offline data results in a unified, 360-degree view of the customer and a clear picture of the customer journey. Office Depot can now create more-specific audience segments based on how customers prefer to buy, and tailor strategies to reach the most valuable consumers whether they buy online or in-store. They can compare different offline customer experiences to see how they are influenced by digital ads. Customer service operators have complete information on a customer, so they can talk to them as they know their details.

Office Depot’s data hub approach also provides high-quality data to all back-office functions throughout the organization, including supply chain and finance. Office Depot can now integrate data from the range of supply chain back-end systems in use in various countries, and answer questions such as which distribution center has the most efficient pick-line and why; or which center is in the risky position of having the least amount of stock for the best-selling products.

The post Office Depot Stitches Together the Customer Journey Across Multiple Touchpoints appeared first on Talend Real-Time Open Source Data Integration Software.

Originally Posted at: Office Depot Stitches Together the Customer Journey Across Multiple Touchpoints