From Data Protection to Data Management and Beyond

Just three weeks into 2019, Veeam announced a $500M funding round. The company is privately held, profitable and with a pretty solid revenue stream coming from hundreds of thousands of happy customers. But, still, they raised $500M!

I didn’t see it coming, but if you look at what is happening in the market, it’s not a surprising move. Market valuation of companies like Rubrik and Cohesity is off the chart and it is pretty clear that while they are spending boatloads of money to fuel their growth, they are also developing platforms that are well beyond traditional data protection.

Backup Is Boring

Backup is one of the most tedious, yet critical, tasks to be performed in the IT space. You need to protect your data and save a copy of it in a secure place in case of a system failure, human error or worse, like in the case of natural disasters and cyberattacks. But as critical as it is, the differentiations between backup solutions are getting thinner and thinner.

Vendors like Cohesity got it right from the very beginning of their existence. It is quite difficult, if not impossible, to consolidate all your primary storage systems in a single large repository, but if you concentrate backups on a single platform then you have all of your data in a single logical place.

In the past, backup was all about throughput and capacity with very low CPU, and media devices were designed for few sequential data streams (tapes and deduplication appliances are perfect examples). Why are companies like Rubrik and Cohesity so different then? Well, from my point of view they designed an architecture that enables to do much more with backups than what was possible in the past.

Next-gen Backup Architectures

Adding a scale-out file system to this picture was the real game changer. Every time you expand the backup infrastructure to store more data, the new nodes also contribute to increase CPU power and memory capacity. With all these resources at your disposal, and the data that can be collected through backups and other means, you’ve just built a big data lake … and with all that CPU power available, you are just one step away from transforming it into a very effective big data analytics cluster!

From Data Protection to Analytics and Management

Starting from this background it isn’t difficult to explain the shift that is happening in the market and why everybody is talking more about the broader concept of data management rather than data protection.

Some may argue that it’s wrong to associate data protection with data management and in this particular case the term data management is misleading and inappropriately applied. But, there is much to be said about it and it could very well become the topic for another post. Also, I suggest you take a look at the report I recently wrote about unstructured data management to get a better understanding of my point of view.

Data Management for Everybody

Now that we have the tool (a big data platform), the next step is to build something useful on top of it, and this is the area where everybody is investing heavily. Even though Cohesity is leading the pack and has started showing the potential of this type of architecture years ago with its analytics workbench, the race is open and everybody is working on out-of-the-box solutions.

In my opinion these out-of-the-box solutions, which will be nothing more that customizable big data jobs with a nice and easy to use UI on top, will make data management within everyone’s reach in your organization. This means that data governance, security and many business roles will benefit from it.

A Quick Solution Roundup

As mentioned earlier, Cohesity is in a leading position at the moment and they have all the features needed to realize this kind of vision, but we are just at the beginning and other vendors are working hard on similar solutions.

Rubrik, which has a similar architecture, has chosen a different path. They’ve recently acquired Datos IO and started offering NoSQL DB data management. Even though NoSQL is growing steadily in enterprises, this is a niche use case at the moment and I expect that sooner or later Rubrik will add features to manage data they collect from other sources.

Not long ago I spoke highly about Commvault, and Activate is another great example of their change in strategy. This is a tool that can be a great companion of their backup solution, but can also live alone, enabling the end user to analyze, get insights and take action on data. They’ve already demonstrated several use cases in fields like compliance, security, e-discovery and so on.

Getting back to Veeam … I really loved their DataLabs and what it can theoretically do for data management. Still not at its full potential, this is an orchestrator tool that allows to take backups, create a temporary sandbox, and run applications against them. It is not fully automated yet, and you have to bring your own application. If Veeam can make DataLabs ready to use with out-of-the-box applications it will become a very powerful tool for a broad range of use cases, including e-discovery, ransomware protection, index & search and so on.

These are only a few examples of course, and the list is getting longer by the day.

Closing the Circle

Data management is now key in several areas. We’ve already lost the battle against data growth and consolidation, and at this point finding a way to manage data properly is the only way to go.

With ever larger storage infrastructures under management, and sysadmins that now have to manage petabytes instead of hundreds of terabytes, there is a natural shift towards automation for basic operations and the focus is more on what is really stored in the systems.

Furthermore, with the increasing amount of data, expanding multi-cloud infrastructures, new demanding regulations like GDPR, and ever evolving business needs, the goal is to maintain control over data no matter where it is stored. And this is why data management is at the center of every discussion now.

Originally posted on Juku.it

Originally Posted at: From Data Protection to Data Management and Beyond

Review of Autoencoders (Deep Learning)

500px-Stacked_Combined
An auto-encoder, autoassociator or Diabolo network is an artificial neural network used for learning efficient codings. The aim of an auto-encoder is to learn a compressed, distributed representation (encoding) for a set of data, typically for the purpose of dimensionality reduction. Architecturally, the simplest form of the auto-encoder is a feedforward, non-recurrent neural net that is very similar to the multilayer perceptron (MLP), with an input layer, an output layer and one or more hidden layers connecting them.

An auto-encoder is often trained using one of the many back propagation variants (conjugate gradient method, steepest descent, etc.). Though often reasonably effective, there are fundamental problems with using back propagation to train networks with many hidden layers. Once the errors get back propagated to the first few layers, they are minuscule, and quite ineffectual. This causes the network to almost always learn to reconstruct the average of all the training data.

Piotr Mirowski (of Microsoft Bing London) presented Review of Auto-Encoders to the Computational Intelligence Unconference 2014, with our Deep Learning stream. These are his slides. Slides are a bit old but it does a great job in comparing some of the models relevant today.

[slideshare id=38162984&doc=piotrmirowskiciunconf2014reviewautoencoders-140820024912-phpapp02]
Original link here: https://piotrmirowski.files.wordpress.com/2014/08/piotrmirowski_ciunconf_2014_reviewautoencoders.pptx

He also has Matlab-based tutorial on auto-encoders available here:
https://github.com/piotrmirowski/Tutorial_AutoEncoders/

Originally Posted at: Review of Autoencoders (Deep Learning)

8 Best Practices to Maximize ROI from Predictive Analytics

maximize-roi
Back in 2010, Forbes.com forecasted that something new and interesting called predictive analytics was emerging as a “game changer.” Well, fast forward a handful of years, and we can easily see that the prediction was an understatement – because predictive analytics hasn’t just changed the game for marketing professionals: it has fundamentally reinvented it.

That’s because predictive analytics isn’t just a method of leveraging customer, prospect and other meaningful data to launch timely, micro-targeted communications and campaigns; on an even deeper level, it’s an engine that transcends marketing and is driving overall business strategy and vision. As Eric Siegel, Ph.D., chairman of the leading cross-vendor event for predictive analytics professionals Predictive Analytics World notes: “business is becoming a numbers game, and predictive analytics is the way to play it.”

However, while many organizations are indeed playing this game quite well — such as Macy’s, Walmart, Netflix and eBay — at the current time, there are many others that aren’t as pleased with their results. This isn’t to say that they aren’t seeing gains in some areas (e.g. uptick in customer retention rates, improved engagement scores, better sales campaign numbers, etc.). Rather, it means that they aren’t reaping the full revenue and profit potential by maximizing their ROI from predictive analytics.

What’s behind the underperformance? Typically, it’s that organizations of all sizes — from start-ups to enterprises – aren’t applying one, some, many or sometimes even all eight of these eight best practices:

1. Define A Clear Objective

Organizations need to proactively define their objective when implementing a predictive analytics platform. Are you looking to: activate prospects, reactivate lapsed customers, increase customer lifetime value or donor value for non-profits? Having a clear objective will help marketing departments craft a more concise strategy and tactical plan going forward.

2. Validate Existing Data Sets

Aside from capturing more data on customers, prospects and donors, organizations need to validate that their data is accurate and reliable. This is so when their database is full of unique contacts, and enriched with meaningful demographic, transactional, product/service and email marketing data on each contact. Click here for more information on ways marketers can capture and gather more data on their contacts.

3. Get Training and Knowledge on Analytics

While predictive analytics is not difficult to grasp — which is part of its value and popularity — it is nevertheless a distinct skillset. As such, organizations need to provide their marketing professionals and other relevant staff with training and knowledge on how to make predictive analytics work in their specific environment and marketplace, as well as on fundamental concepts such as A/B testing, data hygiene methodologies, and so on.

4. Add Necessary Staff & Resources

It is essential to have a marketing team that is primed and ready to leverage a predictive analytics platform, which includes key tasks like: collecting and governing data, creating content-in house based on predictive analytical insights, and so on.

5. Add Necessary Tools and IT infrastructure

Organizations need to ensure that their CRM, email marketing and/or marketing automation systems are reliable, optimized and integrated with their predictive analytics platform to ensure that all relevant data is captured and uploaded into the platform.

6. Get Senior Management Buy-In & Commitment

While predictive analytics can lead to some significant wins in the short-term, it is essentially a long-term commitment that will need ongoing testing, adjusting and refining. As such, it is important that senior management buys into this data-driven approach to marketing and allocates an appropriate budget for the long-term.

7. Develop Marketing Plans, Processes & Tools

If predictive analytics is the engine that drives prospect and customer engagement, then content is the fuel. As such, organizations need to develop marketing plans, process and tools – such as editorial calendars, buyer personas, etc. – that will enable them to create exceptional, relevant content and launch it to the right people, at the right time and through the most effective channel.

8. Benchmark & Measure

The ultimate value of predictive analytics is that it promises to take the guesswork out of marketing, and replace it with measurable, actionable data. However, this promise is only fulfilled when organizations effectively and regularly measure results using appropriate, pre-defined metrics or KPIs such as customer lifetime value, sales revenue or conversion rates.

The Bottom-Line

While capturing rich and relevant data is a big factor in maximizing ROI from predictive analytics, it is not the full story. It is also critical for organizations to apply all of the above-noted best practices.

Article originally appeared HERE

Source

The Mainstream Adoption of Blockchain: Internal and External Enterprise Applications

The surging interest in blockchain initially pertained to its utility as the underlying architecture for the cryptocurrency phenomenon. Nonetheless, its core attributes (its distributed ledger system, immutability, and requisite permissions) are rapidly gaining credence in an assortment of verticals for numerous deployments.

Blockchain techniques are routinely used in several facets of supply chain management, insurance, and finance. In order to realize the widespread adoption rates many believe this technology is capable of, however, blockchain must enhance the very means of managing data-driven processes, similar to how applications of Artificial Intelligence are attempting to do so.

Today, there are myriad options for the enterprise to improve operations by embedding blockchain into fundamental aspects of data management. If properly architected, this technology can substantially impact facets of Master Data Management, data governance, and security. Additionally, it can provide these advantages not only between organizations, but also within them, operating as what Franz CEO Jans Aasman termed “a usability layer on top” of any number of IT systems.

Customer Domain Management
A particularly persuasive use case for the horizontal adoption of blockchain is deploying it to improve customer relations. Because blockchain essentially functions as a distributed database in which transactions between parties must be validated for approval (via a consensus approach bereft of centralized authority), it’s ideal for preserving the integrity of interactions between the enterprise and valued customers. In this respect it can “create trusted ledgers for customers that are completely invisible to the end user,” Aasman stated. An estimable example of this use case involves P2P networks, in which “people just use peer-to-peer databases that record transactions,” Aasman mentioned. “But these peer-to-peer transactions are checked by the blockchain to make sure people aren’t cheating.” Blockchain is used to manage transactions between parties in supply chains in much the same way. Blockchain aids organizations with this P2P customer use case because without it, “it’s very, very complicated for normal people to get it done,” Aasman said about traditional approaches to inter-organization ledger systems. With each party operating on a single blockchain, however, transactions become indisputable once they are sanctioned between the participants.

Internal Governance and Security
Perhaps the most distinguishable feature of the foregoing use case is the fact that in most instances, end users won’t even know they’re working with blockchain. What Aasman called an “invisible” characteristic of the blockchain ledger system is ideal for internal use to monitor employees in accordance with data governance and security procedures. Although blockchain supports internal intelligence or compliance for security and governance purposes, it’s most applicable to external transactions between organizations. In finance—just like in supply chain or in certain insurance transactions—“you could have multiple institutions that do financial transactions between each other, and each of them will have a version of that database,” Aasman explained. Those using these databases won’t necessarily realize they’re fortified by blockchain, and will simply use them as they would any other transactional system. In this case, “an accountant, a bookkeeper or a person that pays the bills won’t even know there’s a blockchain,” commented Aasman. “He will just send money or receive money, but in the background there’s blockchain making sure that no one can fool with the transactions.”

Master Data Management
Despite the fact that individual end users may be ignorant of the deployment of blockchain in the above use cases, it’s necessary for underlying IT systems to be fully aware of which clusters are part of this ledger system. According to Aasman, users will remain unaware of blockchain’s involvement “unless, of course, someone was trying to steal money, or trying to delete intermediate transactions, or deny that he sent money, or sent the same money twice. Then the system will say hey, user X has engaged in a ‘confusing’ activity.” In doing so, the system will help preserve adherence to company policies related to security or data governance issues.

Since organizations will likely employ other IT systems without blockchain, Master Data Management hubs will be important for “deciding for which transactions this applies,” Aasman said. “It’s going to be a feature of MDM.” Mastering the data from blockchain transactions with centralized MDM approaches can help align this data with others vital to a particular business domain, such as customer interactions. Aasman revealed that “the people that make master data management have to specify for which table this actually is true. Not the end users: the architects, the database people, the DBAs.” Implementing the MDM schema for which to optimize such internal applications of blockchain alongside those for additional databases and sources can quickly become complex with traditional methods, and may be simplified via smart data approaches.

Overall Value
The rapidity of blockchain’s rise will ultimately be determined by the utility the enterprise can derive from its technologies, as opposed to simply limiting its value to financial services and cryptocurrency. There are just as many telling examples of applying blockchain’s immutability to various facets of government and healthcare, or leveraging smart contracts to simplify interactions between business parties. By using this technology to better customer relations, reinforce data governance and security, and assist specific domains of MDM, organizations get a plethora of benefits from incorporating blockchain into their daily operations. The business value reaped in each of these areas could contribute to the overall adoption of this technology in both professional and private spheres of life. Moreover, it could help normalize blockchain as a commonplace technology for the contemporary enterprise.

Originally Posted at: The Mainstream Adoption of Blockchain: Internal and External Enterprise Applications by jelaniharper

#BigData #BigOpportunity in Big #HR by @MarcRind #JobsOfFuture #Podcast

[youtube https://www.youtube.com/watch?v=3f9PiTfmxFw]

In This podcast Marc Rind from ADP talked about big data in HR. He shared some of the best practices and opportunities that resides in HR data. Marc also shared some tactical steps to perform to help build a better data driven teams to execute data driven strategies. This podcast is great for folks looking to explore the depth of HR data and opportunities that resides in it.

Podcast Link:
iTunes: http://math.im/jofitunes
GooglePlay: http://math.im/jofgplay

Marc’s BIO:
Marc is responsible for leading the research and development of Automatic Data Processing’s (ADP’s) Analytics and Big Data initiative. In this capacity, Marc drives the innovation and thought leadership in building ADP’s Client Analytics platform. ADP Analytics provides its clients not only the ability to read the pulse of its own human capital…but also provides the information on how they stack up within their industry, along with the best courses of action in order to achieve its goals through quantifiable insights.

Marc was also an instrumental leader behind the small business market payroll platform; RUN Powered by ADP®. Marc lead a number of the technology teams responsible for delivering its critically acclaimed product focused on its innovative user experience for small business owners

Prior to joining ADP, Marc’s innovative spirit and fascination with data was forged at Bolt Media; a dot-com start-up based in NY’s “Silicon Alley”. The company was an early predecessor to today’s social media outlets. As an early ‘Data Scientist’; Marc focused on the patterns and predictions of site usage through the harnessing of the data on its +10 million user profiles.

About #Podcast:
#FutureOfData podcast is a conversation starter to bring leaders, influencers and lead practitioners to come on show and discuss their journey in creating the data driven future.

Want to sponsor?
Email us @ info@analyticsweek.com

Keywords:
#FutureOfData #DataAnalytics #Leadership #Podcast #BigData #Strategy

Originally Posted at: #BigData #BigOpportunity in Big #HR by @MarcRind #JobsOfFuture #Podcast by v1shal

6 things that you should know about vMwarevSphere 6.5

vSphere 6.5 offers a resilient, highly available, on-demand infrastructure that is the perfect groundwork for any cloud environment. It provides innovation that will assist digital transformation for the business and make the job of the IT administrator simpler. This means that most of their time will be freed up so that they can carry out more innovations instead of maintaining the status quo. Furthermore, vSpehere is the foundation of the hybrid cloud strategy of VMware and is necessary for cross-cloud architectures. Here are essential features of the new and updated vSphere.

vCenter Server appliance

vCenter is an essential backend tool that controls the virtual infrastructure of VMware. vCenter 6.5 has lots of innovative upgraded features. It has a migration tool that aids in shifting from vSphere 5.5 or 6.0 to vSphere 6.5. The vCenter Server appliance also includes the VMware Update Manager that eliminates the need for restarting external VM tasks or using pesky plugins.

vSphere client

In the past, the front-end client that was used for accessing the vCenter Server was quite old-fashioned and stocky. The vSphere has undergone necessary HTML5 alterations. Aside from the foreseeable performance upgrades, the change also makes this tool cross-browser compatible and more mobile-friendly.  The plugins are no longer needed and the UI has been switched for a more cutting-edge aesthetics founded on the VMware Clarity UI.

Backup and restore

The backup and restore capabilities of the VSpher 6.5 is an excellent functionality that enables clients to back up data on any Platform Services Controller appliances or the vCenter Server directly from the Application Programming Interface(API) or Virtual Appliance Management Interface (VAMI). In addition, it is able to back up both VUM and Auto Deploy implanted within the appliance. This backup mainly consists of files that need to be streamed into a preferred storage device through SCP, FTP(s), or HTTP(s) protocols.

Superior automation capabilities

With regards to automation, VMware vSphere 6.5 works perfectly because of the new upgrades. The new PowerCLI tweak has been an excellent addition to the VMware part because it is completely module-based and the APIs are at present in very high demand. This feature enables the IT administrators to entirely computerize tasks down to the virtual machine level.

 Secure boot

The secure boot element of vSphere comprises the -enabled virtual machines. This feature is available in both Linux and Windows VMs and it allows secure boot to be completed through the clicking of a simplified checkbox situated in the VM properties. After it is enabled, only the properly signed VMs can utilize the virtual environment for booting.

 Improved auditing

The Vsphere 6.5 offers clients improved audit-quality logging characteristics. This aids in accessing more forensic details about user actions. With this feature, it is easier to determine what was done, when, by whom, and if any investigations are essential with regards to anomalies and security threats.

VMware’s vSphere developed out of complexity and necessity of expanding the virtualization market. The earlier serve products were not robust enough to deal with the increasing demands of IT departments. As businesses invested in virtualization, they had to consolidate and simplify their physical server farms into virtualized ones and this triggered the need for virtual infrastructure. With these VSphere 6.5 features in mind, you can unleash its full potential and usage. Make the switch today to the new and innovative VMware VSphere 6.5.

 

Originally Posted at: 6 things that you should know about vMwarevSphere 6.5 by thomassujain

How the NFL is Using Big Data

Your fantasy football team just went high tech.

Like many businesses, the National Football League is experimenting with big data to help players, fans, and teams alike.

The NFL recently announced a deal with tech firm Zebra to install RFID data sensors in players’ shoulder pads and in all of the NFL’s arenas. The chips collect detailed location data on each player, and from that data, things like player acceleration and speed can be analyzed.

The NFL plans to make the data available to fans and teams, though not during game play. The thought is that statistics-mad fans will jump at the chance to consume more data about their favorite players and teams.

In the future, the data collection might be expanded. In last year’s Pro Bowl, sensors were installed in the footballs to show exactly how far they were thrown.

Big data on the gridiron
Of course, this isn’t the NFL’s first foray into big data. In fact, like other statistics-dependent sports leagues, the NFL was crunching big data before the term even existed.

However, in the last few years, the business has embraced the technology side, hiring its first chief information officer, and developing its own platform available to all 32 teams. Individual teams can create their own applications to mine the data to improve scouting, education, and preparation for meeting an opposing team.

It’s also hoped that the data will help coaches make better decisions. They can review real statistics about an opposing team’s plays or how often one of their own plays worked rather than relying solely on instinct. They will also, in the future, be able to use the data on an individual player to determine if he is improving.

Diehard fans can, for a fee, access this same database to build their perfect fantasy football team. Because, at heart, the NFL believes that the best fans are engaged fans. They want to encourage the kind of obsessive statistics-keeping that many sport fans are known for.

nfl

Will big data change the game?

It’s hard to predict how this flood of new data will impact the game. Last year, only 14 stadiums and a few teams were outfitted with the sensors. And this year, the NFL decided against installing sensors in all footballs after the politics of last year’s “deflate gate” when the Patriots were accused of under inflating footballs for an advantage.
Still, it seems fairly easy to predict that the new data will quickly make its way into TV broadcast booths and instant replays. Broadcasters love to have additional data points to examine between plays and between games.

And armchair quarterbacks will now have yet another insight into the game, allowing them access (for a fee) to the same information the coaches have. Which will, of course mean they can make better calls than the coaches. Right?

Bernard Marr is a best-selling author, keynote speaker and business consultant in big data, analytics and enterprise performance. His new books are ‘Big Data’ ‘Key Business Analytics’

Source by analyticsweekpick

Mastering Deep Learning with Self-Service Data Science for Business Users

The deployment of deep learning is frequently accompanied by a singular paradox which has traditionally proved difficult to redress. Its evolving algorithms are intelligent enough to solve business problems, but utilizing those algorithms is based on data science particularities business users don’t necessarily understand.

The paucity of data scientists exacerbates this situation, which traditionally results in one of two outcomes. Either deep learning is limited in the amount of use cases for which it’s deployed throughout the enterprise, or the quality of its effectiveness is compromised. Both of these situations fail to actualize the full potential of deep learning or data science.

According to Mitesh Shah, MapR Senior Technologist, Industry Solutions: “The promise of AI is about injecting intelligence into operations so you are actively making customer engagement more intelligent.” Doing so productively implicitly necessitates business user involvement with these technologies.

In response to this realization, a number of different solutions have arisen to provision self-service data science so laymen business users understand how to create deep learning models, monitor and adjust them accordingly, and even explain their results while solving some of their more intractable domain problems.

Most convincingly, there are a plethora of use cases in which deep learning facilitates these boons for “folks who are not data scientists by education or training, but work with data throughout their day and want to extract more value from data,” noted indico CEO Tom Wilde.

Labeled Training Data
The training data required for building deep learning’s predictive models pose two major difficulties for data science. They require labeled output data and massive data quantities to suitably train models for useful levels of accuracy. Typically, the first of these issues was addressed when “the data scientists would say to the subject matter experts or the business line, give us example data labeled in a way you hope the outcome will be predicted,” Wilde maintained. “And the SME [would] say I don’t know what you mean; what are you even asking for?” Labeled output data is necessary for models to use as targets or goals for their predictions. Today, self-service platforms for AI make this data science requisite easy by enabling users to leverage intuitive means of labeling training data for this very purpose. With simple browser-based interfaces “you can use something you’re familiar with, like Microsoft Word or Google Docs,” Wilde said. “The training example pops up in your screen, you underline a few sentences, and you click on a tag that represents the classification you’re trying to do with that clause.”

For instance, when ensuring contracts are compliant with the General Data Protection Regulation, users can highlight clauses for personally identifiable data with examples that both adhere to, and fail to adhere to, this regulation. “You do about a few dozen of each of those, and once you’ve done it you’ve built your model,” Wilde mentioned. The efficiency of this process is indicative of the effect of directly involving business users with AI. According to Shah, such involvement makes “production more efficient to reduce costs. This requires not only AI but the surrounding data logistics and availability to enable this…in a time-frame that enables the business impact.”

Feature Engineering and Transfer Learning
In the foregoing GDPR example, users labeled output training data to build what Wilde referred to as a “customized model” for their particular use case. They are only able to do so this quickly, however, by leveraging a general model and the power of transfer learning to focus the former’s relevant attributes for the business user’s task—which ultimately affects the model’s feature detection and accuracy. As previously indicated, a common data science problem for advanced machine learning is the inordinate amounts of training data required. Wilde commented that a large part of this data is required for “featurization: that’s generally why with deep learning you need so much training data, because until you get to this critical mass of featurization, it doesn’t perform very robustly.” However, users can build accurate custom models with only negligible amounts of training data because of transfer learning. Certain solutions facilitate this process with “a massive generalized model with half a billion labeled records in it, which in turn created hundreds and hundreds of millions of features and vectors that basically creates a vectorization of language,” Wilde remarked. Even better, such generalized models are constructed “across hundreds of domains, hundreds of verticals, and hundreds of use cases” Wilde said, which is why they are readily applicable to the custom models of self-service business needs via transfer learning. This approach allows the business to quickly implement process automation for use cases with unstructured data such as reviewing contracts, dealing with customer support tickets, or evaluating resumes.

Explainability
Another common data science issue circumscribing deep learning deployments is the notion of explainability, which can even hinder the aforementioned process automation use cases. As Shah observed, “AI automates tasks that normally require human intelligence, but does not remove the need for humans entirely. Business users in particular are still an integral part of the AI revolution.” This statement applies to explainability in particular, since it’s critical for people to understand and explain the results of deep learning models in order to gauge their effectiveness. The concept of explainability alludes to the fact that most machine learning models simply generate a numerical output—usually a score—indicative of how likely specific input data will achieve the model’s desired output. With deep learning models in particular, those scores can be confounding because deep learning often does its own feature detection. Thus, it’s exacting for users to understand how models create their particular scores for specific data.

Self-service AI options, however, address this dilemma in two ways. Firstly, they incorporate interactive dashboards so users can monitor the performance of their models with numerical data. Additionally, by clicking on various metrics reflected on the dashboard “it opens up the examples used to make that prediction,” Wilde explained. “So, you actually can track back and see what precisely was used as the training data for that particular prediction. So now you’ve opened up the black box and get to see what’s inside the black box [and] what it’s relying on to make your prediction, not just the number.”

Business Accessible Data Science
Explainability, feature engineering, transfer learning, and labeled output data are crucial data science prerequisites for deploying deep learning. The fact that there are contemporary options for business users to facilitate all of these intricacies suggests how essential the acceptance, and possibly even mastery, of this technology is for the enterprise today. It’s no longer sufficient for a few scarce data scientists to leverage deep learning; its greater virtue is in its democratization for all users, both technical and business ones. This trend is reinforced by training designed to educate users—business and otherwise—about fundamental aspects of analytics. “The MapR Academy on-demand Essentials category offers use case-driven, short, non-lab courses that provide technical topic introductions as well as business context,” Shah added. “These courses are intended to provide insight for a wide variety of learners, and to function as stepping off points to further reading and exploration.”

Ideally, options for self-service data science targeting business users could actually bridge the divide between the technically proficient and those who are less so. “There are two types of people in the market right now,” Wilde said. “You have one persona that is very familiar with AI, deep learning and machine learning, and has a very technical understanding of how do we attack this problem. But then there’s another set of folks for whom their first thought is not how does AI work; their first thought is I have a business problem, how can I solve it?”

Increasingly, the answers to those inquires will involve self-service data science.

Source by jelaniharper

How big data is driving smarter cyber security tools

cyber-security3

As big data changes and develops, it’s being used to create better, smarter cyber security tools. There is real value to using the big data approach to cyber security – especially when it can be used to identify dangerous malware and more persistent threats to the IT security of big companies that handle a lot of data. The number of data breaches in the news seems to grow all the time, and big data may play a big role in preventing much of that.

Data Storage

One of the ways in which big data can help with cyber security is through the storage of data. Because so much data is collected and stored easily, analytic techniques can be used to find and destroy malware. Smaller segments of data can be analyzed, of course, and were analyzed before big data got started in the cyber security area, but the more data that can be looked at all together, the easier it is to ensure that appropriate steps are taken to neutralize any threats. More data gets screened, and it gets analyzed faster, making big data a surprisingly good choice in the cyber security arena.

Malware Behaviors

In the past, malware was usually identified with signatures. Now that big data is involved, that’s not realistic. The signature identification concept isn’t realistic on a larger scale, so new ways of handling cyber security were needed as soon as big data appeared on the scene. Instead of signature, big data looks at behaviors. How malware or any other type of virus behaves is a very important consideration, and something to focus on when it comes to what can be done to ensure that data is safe.

When something is flagged as having a unique or different behavior, it’s possible to isolate the data that has that with it, so it can be determined if the data is safe. Piggybacking malware onto programs and data that are seemingly innocuous is common, because it lets people pass things through before the problem is realized. When behavior is properly tracked, though, the level at which these viruses are allowed to get through is greatly reduced. There are no guarantees, because malware is always changing and new ones are being developed, but the protection offered by big data is large and significant.

Computing Power

The computer power offered by big data is possibly the most significant reason it is so valuable when it comes to detecting and stopping malware. Fast, powerful computers can process data and information so much faster than slower ones that are not able to harness a high level of power. Because of that, there exists the opportunity for more sophisticated techniques for detecting malware when big data is used. The models that can be built for the identification of malware are significant, and big data is the place to build them.

With the power available, it is becoming easier than ever before to find problems before they get started, so malware can be stopped before it advances through a computer system or set of data. That protects the information contained there, and also the system itself from attack and infection. Those who produce malware continually try to change the game so they won’t be detected, but as computer power advances the chances of malware avoiding detection continue to shrink.

To read the original article on IT Learning Center, click here.

Source: How big data is driving smarter cyber security tools