The DataPA Blog

Tuesday, October 3, 2017

The Democratization of Big Data and Artificial Intelligence

The market chatter about Big Data and AI is relentless. For Big Data, the statistics that many of us in the tech industry see bandied about are certainly eye catching; 2.7 Zetabytes of data exist in the digital universe today, 571 new websites are created every minute of the day, by 2020 business transactions on the internet will reach 450 billion per day etc. For AI, they are no less impressive; there was more than $300 million in venture capital invested in AI startups in 2014, a 300% increase over the year before; by 2018, 75% of developer teams will include AI functionality in one or more applications or services; by 2020, 30% of all companies will employ AI to augment at least one of their primary sales processes etc.

However, for many people not directly involved in the tech industry or the IT department of a huge multinational it’s difficult to see how these grandiose claims have any relevance to their day to day tasks. The real issue is, until recently, to do anything innovative with big data or AI you needed highly skilled data scientists versed in seemingly impenetrable technologies like NoSQL, R, MapReduce or Scala. And these guys are hard to come by and expensive, and not getting cheaper. IBM predicts that demand for data professionals in the US alone will reach 2.7 million by 2020.

However, that’s not the complete picture. Much in the same way computers began entering the business world as the preserve of large corporations like J Lyons & Company and the U. S. Census Bureau, were later more widely used as companies that could afford the huge cost of buying them provided services to others, and finally the productization of computers by the likes of IBM allowed almost every organisation to buy their own, Big Data and AI are going through the same process of democratization.

The major three Cloud data providers Microsoft, Google and Amazon are amongst a host of providers that now offer scalable and affordable Big Data platforms that can be spun up in seconds. In the last few years all three have also started offering API driven AI services bound into their cloud platforms. More importantly, those Big Data platforms and AI API’s are now becoming easily accessible to more traditional development environments like .NET. This means that millions of traditional developers can now leverage Big Data and AI without leaving the comfort of their familiar development environment.

The natural consequence of this will be an explosion of products that leverage Big Data and AI technologies available to even the smallest organisations, allowing the huge opportunities to filter down to all. In fact, here at DataPA we have spent the last twelve months working hard on a new automated analytics product leveraging Big Data and AI techniques, which we are hugely excited about launching in the coming months. The world is on the cusp of huge change that historically will rival the industrial revolution, and we are excited about sharing that journey with all our customers and partners in the coming months and years.

Tuesday, July 4, 2017

Maps look great on a dashboard, but use them sparingly

As an analytics vendor, we’re always keen to respond to our customers’ requests. So recently, we’ve been working hard to add mapping functionality which we will be releasing in the next few months. Before that however, we thought it might be useful to look at the role of geographic mapping in dashboards and explore when and how to use them.

Maps are visually engaging and more exciting than a chart, so it’s tempting to assume that if your data is grouped by some geographical measure, or includes location data then you should be plotting it on a map. However, this is far from always the case. Consider the two displays below, both showing UK regional sales data. We already know the areas location, so plotting the data on a map doesn’t add anything. What we’re most interested in is comparing the value sold in each area. Both visualizations display this, the map with color, and the chart with the height of the bar. However, it's much easier to get an instantly clear comparison from the chart.

In contrast, take the example below that maps average temperature by country. From a quick glance, we can see that the coldest regions are the large land masses to the North, Western Europe is milder than Eastern Europe and Africa is warmest continent overall. In the context of the data we’re looking at, this is all hugely useful information that would not be apparent at a glance with any other representation of the data. This is a clear example of when plotting the data on a map adds to our understanding.

For a more practical example, take the dashboard below. It’s designed to inform the decision as to where a retailer should open their next store. The measure shown on both the map and chart is the population of each state divided by the number of stores, giving a measure of the population per store. Given just the chart, we may choose to open a new store in California. However, the map suggests a different decision. Plotting a circle centered on each state, with the population per store represented by the diameter of the circle, the area surrounding New York contains a high number of overlapping circles. From the map, it’s clear that locating a new store here would potentially cater for more customers than one opened in California.

Indeed, the functionality of the map would also let us progressively zoom in and perhaps choose a specific location such as New Brunswick in New Jersey.

So, when deciding whether to use a map to represent your data, ask yourself the following question; does visually expressing the physical location of the data elements add to our understanding of the data? If not, some other object would likely be a better choice. This is in fact just a specific instance of a wider rule we should apply when designing any dashboard; if a visual element does not add to our understanding of the data, it is just clutter that makes the dashboard harder to understand, and as such should not be there.

If you’d like to find out more about DataPA OpenAnalytics, or our forthcoming support for mapping, please contact us.

Tuesday, March 7, 2017

Live analytics for streamed data with Kafka & DataPA OpenAnalytics

Apache Kafka™ is a massively scalable publish and subscribe messaging system. Originally developed by LinkedIn and open sourced in 2011, it is horizontally scalable, fault tolerant and extremely fast. In the last few years its popularity has grown rapidly, and it now provides critical data pipelines for a huge array of companies including internet giants such as LinkedIn, Twitter, Netflix, Spotify, Pinterest, Uber, PayPal and AirBnB.

So, when the innovative and award winning ISP Exa Networks approached us to help deliver a live analytics solution that would consume, analyse and visualise over 100 million messages a day (up to 6.5 thousand a second at peak times) from their Kafka™ feed, it was a challenge we couldn’t turn down.

The goal was to provide analytics for schools who used Exa’s content filtering system SurfProtect®. Information on every web request from every user in over 1200 schools would be sent via Kafka™ to the analytics layer. The resulting dashboards would need to provide each school with a clear overview of the activity on their connection, allowing them to monitor usage and identify users based on rejected searches or requests.

The first task was to devise a way of consuming such a large stream of data efficiently. We realised some time ago that our customers would increasingly want to consume data from novel architectures and an ever-increasing variety of formats and structures. So, we built the Open API query, which allows the rapid development and integration of bespoke data connectors. For Exa, we had a data connector built to efficiently consume the Kafka™ feed within a few days.

The rest of the implementation was straight forward. DataPA OpenAnalytics allows the refreshing and data preparation process for dashboards to be distributed across any network, reducing the load on the web server. In Exa’s case, a single web server, and a single processing server were sufficient to allow the dashboards to be constantly refreshed, so data is never more than a few minutes old. To help balance the process, the schools are distributed amongst 31 dashboards, and filtered to a single school as the user logs in.

The final solution gives each school a dashboard, with data never more than a few minutes old, showing figures that accumulate over the day. Each dashboard allows the school to monitor web traffic and any rejected requests or searches on their connection.

We're really excited about what we delivered for Exa Networks, and think with the versatility and scalability the latest release of DataPA OpenAnalytics offers, we can achieve even more. If you have large amounts of data, whether from Kafka™ or any other data source, and would like to explore the possibility of adding live analytics, please get in touch, we'd love to show you what we can do.

Friday, February 24, 2017

A quick guide to building great dashboards

The guiding principal when designing any dashboard should be to ensure your users understand key information at a glance. Ideally, a user taking their first look at your dashboard should be able to understand key information within a few seconds. Achieving this is not rocket science, simply applying a few simple principles to your decision making will transform your dashboards.

First off, make sure your dashboard is focused. If I’m looking at a dashboard that contains information for five different roles in the organisation, I need to filter out or navigate round views for all the others to get to the information that is relevant to me. That’s going to take more than a few seconds. Step one of building a dashboard should be to decide who the dashboard is for and to understand in detail what information is key to their performance. Limit your dashboard to just this information. Remember, five simple dashboards are always more useful than one complex one.

Next, always avoid clutter. The more complex the view you offer the user, the longer it will take for them to glean the information they require. Carefully consider every object and ask yourself “do I really need this? Is the information this object represents unique and important?”. If not, it’s just clutter, get rid of it.

A little more daunting at face value, but simple in practice, is the concept of using visual clues to help the user quickly recognise what they are looking at. There are two principles of design that are particularly useful with dashboards, similarity and proximity. Let’s take similarity first. Say I have a sales dashboard that shows total sales and profit in several different charts, say by sales rep, region and date. Design principles tell us that things that are similar are perceived to be more related than things that are dissimilar. As such, if I make sure that total sales is always plotted in blue and profit in green, the user is likely to recognise these values across the different charts quickly, without having to read the legend. This principle applies to more than just colour. For instance, I may always plot charts that show sales over time as line charts, sales by region as bar and sales by person as column. A second design principle, proximity, tells us things that are close to one another are perceived to be more related than things that are spaced farther apart. Implementing this is simple, make sure you place objects that are related close together, giving the user another visual clue as to their meaning.

The final tip for creating great dashboards is to think about when the data needs to be refreshed, and let this inform the design of your dashboard. An operational dashboard is likely to require real time data, so keep it simple so it updates fast. A strategic dashboard is more likely to be updated periodically, so you can afford to (and often want to) add more detail.

There are obviously many more considerations when building dashboards, not least your choice of visual objects, when and where to prepare data and much more. However, these are more particular decisions that deserve a blog in their own right. My hope is that in following the simple design principles above, you’ll quickly be creating more effective dashboards.

Friday, November 27, 2015

Application vendors need to address Analytics

Speak to pretty much any application vendor whose been around for some time and they’ll likely tell you their resources are focused heavily on modernization. Understandable, given how rapidly our industry has been changing over the last few years, and the constant barrage of social chatter around cloud, mobility and SaaS.

Yet look at any recent survey of CIO investment priorities and you’ll find modernization of enterprise applications is near the bottom. We think this is because disruptive change within industry is not being driven by change to core business applications (which is often very expensive and presents huge risk), but from new technologies and services integrated with these applications and other data sources.

Analytics on the other hand is pretty much consistently the top priority, and for good reason. Innovations within business analytics, such as mobile, alerts and collaboration and the emergence of new technologies such as Hadoop, MapReduce and Spark that are driving down the cost of big data analytics are opening up huge opportunities for disruptive change. As these technologies mature, and are applied to more aspects of industry the pace of this disruption is set to rise dramatically.

So we think if you’re a successful application vendor focused purely on modernization, and you’re not already addressing analytics, you’re missing a huge opportunity to drive revenue from your software. Here at DataPA we’re dedicated to building partnerships with application vendors. We use our expertise in analytics to build industry leading technology that can be integrated seamlessly with our partners’ applications. We’d love you to join us.

Monday, November 23, 2015

LexisNexis delivers powerful Analytics for Visualfiles

Workflow is a key component for any legal practice – increasing efficiencies, improving customer service and coping with evolving regulatory requirements – which is why legal case management has always been a key focus area for software developers.

The market leader in legal case management is Lexis® Visualfiles from LexisNexis. The Visualfiles “toolkit” allows organisations to expand the standard solution, adding their own entities and workflows to match any business requirement, automating even the most complex processes. Today, Visualfiles is the most widely used case and matter management system in the UK with more than 25,000 registered users in firms ranging from 5 to well over 1,000 employees.

This “ultimate flexibility” was proving to be a particular challenge for LexisNexis to provide an embedded analytics solution to their customers. For most business applications, the process of transforming raw data in the database into meaningful information for an analytics solution is the same for every customer. Everyone uses the same system so it can be easily understood, designed and implemented once for all customers. However, with Lexis Visualfiles this is not the case. The unique power of Visualfiles allows each customer to evolve their system, and by definition the underlying data set, to match their specific business needs. Whilst this provides fantastic flexibility to ensure the system evolves as the business develops, it creates a huge challenge for analytics.

However, at DataPA we understand that application developers have already designed and implemented this transformation process, otherwise the business application would be of little use. We believe developers should be able to reuse their valuable business logic assets for analytics, not be forced to reengineer them for another platform. So with DataPA OpenAnalytics, the LexisNexis development engineers were able to reuse the existing OpenEdge ABL code to deliver beautiful, accurate, live analytics embedded seamlessly into their application.

The result is the best of both worlds – a powerful business solution married to sparkling analytics – so everyone wins.
If you have equally valuable business logic developed in OpenEdge, why not talk to us today to find out how you can leverage this valuable asset to deliver beautiful, live intelligence to mobile and web.

Friday, October 30, 2015

Join us next week in Copenhagen, we’re looking to the future of analytics

Look up pretty much any survey on IT priorities over the last few years, and analytics is consistently number one. Not only that, the number of decision makers reporting that it is their highest priority is growing year on year.

Why? Well we think the major reason is modern analytics offers real disruptive change for organisations, rather than just the iterative efficiencies afforded by traditional BI. This change has been driven by the development from passive reporting to technology that allows users to actively discover, share and collaborate with insight about their organisation.

In our presentation at PUG Challenge EMEA in Copenhagen next week, we’ll explore these ideas further and show how some of our customers are using our technology to radically change how they do business. Here at DataPA we believe this is just the start of a hugely exciting cycle of innovation in analytics, offering huge potential for us and our partners. We’ll also discuss the innovations we’re introducing to our software in the next few months and beyond. Innovations that we’re convinced will keep us and our partners at the forefront of this revolution. We’d love you to join us.

Site link

Find out more about DataPA at datapa.com