Are Your Data Pipelines Ready For the Cloud?

Are Your Data Pipelines Ready For the Cloud?

Louise Westoby: Welcome, everyone will get started in just a few minutes.

Louise Westoby: I just wanted to go through a little housekeeping before we begin, if you have any questions during the presentation, please type them into the Q amp a box at any time we'll do our best to address them at the end during the Q amp a session or live during the actual event.

Louise Westoby: Again we'll get started in just a minute, please stand by.

Louise Westoby: hi everyone, welcome to our webinar today are your data pipelines ready for the cloud, my name is Louise west to be and i'm in quarters VP of product marketing and i'll be your moderator today.

Louise Westoby: i'd like to introduce our speakers so first my asthma matt leaves the digital technology at ventana research covering applications and technology that improve the readiness and resilience of business and it operations.

Louise Westoby: His specialization is an operational an analytical use of data and how businesses can modernize their approach to business to accelerate the value realization of technology investments in support of hybrid and multi cloud architectures.

Louise Westoby: matt has been an industry analyst for more than a decade and has pioneered that the coverage of emerging data platforms, including no sequel and new sequel databases data lakes and cloud based data processing.

Louise Westoby: Our next speaker will be Matthew holiday Matthew is a veteran software engineer and a data analytics expert he co founded in quarter in 2013.

Louise Westoby: After more than 15 years at Oracle and several years managing products that Microsoft.

Louise Westoby: With over 20 years of experience, developing products and taking them to market Matthew is served in several key roles across the company playing a hand in nearly every aspect of in quarters growth and product development.

Louise Westoby: A quick look at our agenda so first, as I mentioned madness going to kick it off by talking about some of the key considerations for migrating data pipelines to the cloud.

Louise Westoby: Matthew, then, is going to talk about building agile data pipelines in the cloud and then go through a DEMO talking through data loading and querying three NF and bronze data.

Louise Westoby: And then, as I mentioned we'll wrap it up with a live Q amp a so do make sure to add your questions to that Q amp a but.

Louise Westoby: So with that I will hand it over to my ass let to kick us off that.

Matt Aslett: Thanks louisa me share my screen hey.

Matt Aslett: Have you can see that and yeah pleasure to be here, thank you, everybody for joining this session so i'm vpn resets right coming data analytics, as we said so i'll be kicking things off here talking about.

Matt Aslett: data and the cloud data pipelines in the cloud, and in particular in relation to some of the migration trends that we're seeing, so I think you know, first of all.

Matt Aslett: You know we'd see you know the cloud has been obviously an important part of the business landscape for for many years now, you know it's almost have assumed that will be some part of an organization's data landscape.

Matt Aslett: Public we see an increasing number of workloads and applications being deployed on cloud infrastructure.

Matt Aslett: and managed cloud services, I think this is particularly relevant as it relates to data analytics workloads because you know, whilst you know what.

Matt Aslett: They do analysis workloads have of course been deployed in the cloud for many years we've seen they've been slower to shift to the cloud compared to perhaps you know some some less mission critical work, shall we say.

Matt Aslett: Thanks, though, you know, an ongoing reliance on premises data centers and things like data sovereignty and security and privacy and you know lots of reasons um.

Matt Aslett: But we've seen particularly last couple of years, a real shift, and you know the acceleration in terms of adoption of data and analytics analytics workloads.

Matt Aslett: In the cloud and data, you know, one of our market sessions for this year is that you know through 2025.

Matt Aslett: We see that seven and 10 organizations will be migrating on premises workloads to cloud their platforms shifting their focus to solving you know business needs and delivering business value, rather than the ongoing maintenance of existing on premises infrastructure.

Matt Aslett: And, of course, when we think about you know the acceleration of.

Matt Aslett: Migration to the cloud which there's multiple sort of trends going on, we see here, we will already see almost three quarters of organizations currently use or planning to use cloud for analytics and data workloads.

Matt Aslett: And what we see as organizations are increasingly storing data both structured and unstructured in in cloud the cloud based platforms, including object storage, of course, has been a form of data lakes.

Matt Aslett: And, and then utilizing data engines data platforms to enable high performance processing.

Matt Aslett: Of that data and there's you know there's a number of benefits or potential benefits, perhaps you know in in in using cloud for data environments, we see that cloud can.

Matt Aslett: lower the upfront data infrastructure purchasing and configuration delays, enabling businesses to get started.

Matt Aslett: You know much quicker with data projects and that you know that ability to just spin up workloads faster can really facilitate analytic experimentation and innovation and, as also self service, of course, an organization.

Matt Aslett: And really you know we see the agility is perhaps the key driver, above all, that we see for organizations adopting cloud computing generally but.

Matt Aslett: You know, in particular in relation to data and analytics, as I say, we've seen that accelerate in recent years.

Matt Aslett: In particular, which is highlighted more than ever, you know the differences between organizations that can turn data.

Matt Aslett: into insight and are agile enough to act upon it and those that are you know they're they're incapable of of seeing or responding to the need for change.

Matt Aslett: And you know people talk about a lot about being data driven it's become a bit of a cliche, but we do see that organizations that you know that.

Matt Aslett: are built around data enough data as part of their their lifeblood they really stand to gain competitive advantage responding faster to both work and customer demands for more innovative data rich applications and personalized experiences.

Matt Aslett: So this need to become more data driven as as any become a mantra amongst large and small organizations.

Matt Aslett: recognize the potential that there is to transform business processes develop new products and services and improve both also operational efficiency through the accelerated and automated processing of data.

Matt Aslett: One part of that that trend is the shift towards or faster and more frequent analysis.

Matt Aslett: You know, we see today that nearly a quarter of organizations are analyzing data in real time, almost a third analyze data at least every hour and.

Matt Aslett: more frequent analysis requires organizations to reduce the time taken to make decisions based on data shrinking the time between the generation of data and the insights that they can produce from that data.

Matt Aslett: You know, absolutely you know this, one of the key factors, enabling business agility is the ability to make.

Matt Aslett: decisions, as I said, not just faster, but also more frequently and not surprisingly, obviously we see this greater focus on real time data processing and analytics as a result.

Matt Aslett: Because, increasing the frequency of analysis is is easier said than done, and it relies on having data pipelines that are themselves agile and especially given the increasing importance of the cloud able to adapt to evolving data infrastructure.

Matt Aslett: So if we think about you know the the dimensions of this path from data.

Matt Aslett: To insights fundamentally here what we're talking about is accelerating you know that that that path, reducing the time it takes from the the data is being generated.

Matt Aslett: By you know, an application to insight, you know, based on business, intelligence and or data science and that's really key to this opportunity to gain competitive advantage.

Matt Aslett: And you know that traditional path of data insights involves you know.

Matt Aslett: Simplifying somewhat, but essentially a three step process which involves data ingestion and integration that a storage and processing.

Matt Aslett: And then some visit data visualization and analysis and you know this illustration is, of course, the simplification of of that process, not least because data storage and processing.

Matt Aslett: is not a single stage many organizations utilize multiple data storage and processing platforms, of course, and not least if we're thinking about the the ultimate outcome.

Matt Aslett: To run separate workloads for business intelligence traditional business intelligence and sort of data, science and most of predictive analytics and machine learning.

Matt Aslett: So you've really got you know, again, this is still a simplification, but you've you've already got you know additional complication here when we think about this, how this actually looks in reality.

Matt Aslett: Nevertheless, you know each of these three stages, there are opportunities to reduce the time taken from data generation.

Matt Aslett: To insight, so you know we'll we'll take a look at those one, at a time, although slightly out of order.

Matt Aslett: In relation to how they look on the on the chart here and we'll start with data storage and processing so.

Matt Aslett: You know, as previously noted many organizations utilize mode multiple data storage and processing platforms.

Matt Aslett: As I said, not least to run separate workloads with business intelligence and data science that, of course, leads to multiple challenges that are redundancy they did duplication.

Matt Aslett: and additional cost, as well as fragmentation and time delays and configuring and maintaining multiple platforms so there's a whole there's a whole heap of things that are immediately.

Matt Aslett: Present presenting potential delays in that that you know that path from from data to insight, so we see that you know organizations are increasingly looking.

Matt Aslett: To making use of data enrichment and preparation for data science and machine learning by notebook interfaces alongside data exploration exploration.

Matt Aslett: And visualization interfaces for charts and dashboards adding to you know the the breadth and the depth of the the insight, they can gather from their data.

Matt Aslett: And this means not only consolidating the number of data platforms by leveraging leveraging potentially a single platform.

Matt Aslett: For business intelligence and data science, but also compressing the path from data to insight.

Matt Aslett: By taking advantage of visualization and machine learning functionality within the data platform itself that doesn't, of course, mean you know, there is no role for standalone business intelligence tools and data science.

Matt Aslett: tools and platforms, but increasingly we see you know both that data storage and processing layer

Matt Aslett: It has a richness of functionality and inbuilt capabilities to at least you know get started and fulfill you know, a portion of the requirements for generating insight and and performing data science.

Matt Aslett: And you know, to also that you know with the self service analytics capabilities also built in at that data storage and processing layer, you can see there that the acceleration, the potential for acceleration in terms of driving innovation through through self service as well.

Matt Aslett: So again, another of our sessions, as we look at you know what we think is happening and how the market will evolve, you know receive that through 2024.

Matt Aslett: Those APP analytic data platform vendors serving that that's what middle tier for like.

Matt Aslett: will continue to accelerate the delivery of actionable insight by integrating native data integration data management analytics and machine learning functionality with that core persistence and processing capability.

Matt Aslett: which obviously is a bit of a preview for like the next step, which is, you know how we see them, it being truncated at both ends they'd be like so you've got.

Matt Aslett: An increased focus on reducing the need that upfront data modeling and transformation and and in focusing instead on high speed data ingestion.

Matt Aslett: and flexible approaches to accelerate analysis of data without up front data modeling preparation and transformation traditionally associated with data warehousing and again.

Matt Aslett: You know, increasingly, we see they're called data platform vendors offering functionality that will enable this doesn't necessarily mean you know organizations won't also look at other external.

Matt Aslett: tools that are part of an ecosystem that can also serve the same purpose but either way it has the potential to significantly reduce the amount of time taken by organizations to generate valley from that data.

Matt Aslett: And that's particularly important in terms of that upfront data modeling and data preparation, because we see from a.

Matt Aslett: vent our data benchmark research that report that preparing data accounts, the majority of time spent analyzing data according to more than two thirds of respondents 69%.

Matt Aslett: Say that you know preparing data is is one of the most time consuming aspects of analyzing data also things like data quality and consistency I significantly.

Matt Aslett: time consuming as well, and what we see is this, this is particularly challenging and only Matthew is going to go on to talk about this, but in relationships or.

Matt Aslett: Third, normal form data in enterprise applications, which you know really needs to be translated for storage and analysis in traditional data warehousing environments.

Matt Aslett: There are also challenges, of course, in terms of aggregating data and copying data to make it ready for analysis so.

Matt Aslett: You know a lot of the time that can be saved can be done at this sort of upfront stage before the data is actually is actually stored and and processed and then obviously analyzed.

Matt Aslett: And again, you know, one of our sessions, we see that by 2024 six and 10 organizations will adopt data engineering process that that span.

Matt Aslett: Data integration transformation preparation as well, producing repeatable data pipelines that create more agile information.

Matt Aslett: architecture and that, ultimately, is what we've been talking about here.

Matt Aslett: Is the the need, reinforcing the knee really hot agile and automated data pipelines that enable organizations to transform their business processes develop new products and services and improve operational efficiency through the accelerated and automated processing of data.

Matt Aslett: So it's a you know quick bit of a rapid run through some of the key trends that we see shaping the space i'll finish on that with a sort of a recommend nation here.

Matt Aslett: You know if if organizations that haven't already done so already or still in the midst.

Matt Aslett: of doing so, you know prepare for cloud transformation, the migration of data and analytics workloads to the cloud provides opportunities for business transformation, innovation and facilitation of self service analytics.

Matt Aslett: All of which could provide competitive differentiation and we've seen in particular, in the last couple of years, you know due to oversee multiple reasons socio economic changes and impact on businesses.

Matt Aslett: You know, some not all companies making really rapid changes in terms of transforming their business migrate accelerating migration to the cloud accelerating.

Matt Aslett: Data pipeline change as well and it's those organizations that really have already gained in stand to gain even further, obviously, as we move forward.

Matt Aslett: In order to deliver that data pipelines, need to be agile and automated to enable organizations to take advantage of this shift to the cloud by overcoming.

Matt Aslett: You know some of the traditional data preparation and schema transformation challenges, and you know, ultimately, as I said, accelerating that path from data generation to the realization of insight.

Matt Aslett: So as a couple of links here, or more than a couple, if you want to explore further our.

Matt Aslett: analytics and data research or find us on social media, you can do so follow any of the links here, but with that i'll Thank you very much for your time and i'll hand you back to Matthew to continue with the presentation.

Matthew Halliday: Thanks matt alright so i'm going to just share a little bit more here.

Matthew Halliday: Around the approach that was taken to this problem and how we've been able to help customers on some of the things that matt's been talking about, and so, first of all I want to pick up on something that matt kind of.

Matthew Halliday: raised a little bit and said, I will speak about, and that is about the different types of data that we think about now traditionally when you ask someone.

Matthew Halliday: Or if someone says to you i've got a data set probably the first question, out of your mouth might be, what is it and then second is probably how big is it.

Matthew Halliday: And that's generally how a lot of people think about data it's always the most complex piece of it is associated to be the size of the data and historically that has been that has been true.

Matthew Halliday: Actually forgot back here second here we go if we look at the two different types of data really that exists there's this really this idea of.

Matthew Halliday: data that is you know impression data click data iot data that's come, this really is huge volumes of data.

Matthew Halliday: Generally, is a single table kind of approach, where you think about it, it might have some small tables of joint around it, but that the whole kind of transaction fits in one one area, and then the three enough data.

Matthew Halliday: Where you've got you know application data business problems that you're looking at and these tend to be relational data sets and two different approaches right so NPP.

Matthew Halliday: mapreduce whatever flavor it might be being able to run against that roar that that impression data that click data those data volumes that you know can be petabytes of data, we know about.

Matthew Halliday: You know, social media companies processing 300 petabytes of data on their data platforms, but when it comes to 256 gigs of relational data that same data platform does not perform.

Matthew Halliday: And you know might might be shocked to think well hang on a minute, if we can process 300 petabytes in our data platform and do that with relative ease we're going to say it's easy but.

Matthew Halliday: be able to get very successful results from it, why is this data set such a problem for us, why is like 256 gig which, compared to.

Matthew Halliday: 300 petabytes is you know nothing.

Matthew Halliday: Why is that a challenge of the problem and it's that second category it's this three and a half data set that in quarters really looked at how do we help and accelerate that journey to the cloud because.

Matthew Halliday: Taking that data to the cloud on 300 petabytes data platform doesn't fix the problem, so you need something else to do it, it can't just be let's just throw more resource at it actually at a certain point that doesn't help it actually makes it even worse for more than you do them.

Matthew Halliday: So typical pipelines are really not architected for business data right a lot of what we've seen and the great innovation we've seen in the market that's been really exciting has been around a different set of data.

Matthew Halliday: And when you go to data lake or you go to the cloud what you'll find is that you still go through, something that, in some respects feels very similar to what we've been doing for 2530 years.

Matthew Halliday: But just use some some new technologies, and that is this whole idea of getting to a new data model and getting the data change from its original shape.

Matthew Halliday: into something that actually will perform So what does that really mean it honestly means taking three enough data and reversing it and just making it flat kind of making it into just a big fat white table.

Matthew Halliday: And that is the most complex, difficult slow thing to do, for a couple of reasons, one is you got to know exactly how do I want to flatten the data I want green of record do I want to flatten the data.

Matthew Halliday: what's the detail, I need is there are the lowest level or is a header information enough there's all these questions that come into play, and then it's like well what columns Do I need in that.

Matthew Halliday: Because the expense of pulling a column from a new table is actually pretty high so you don't want to just pull in things, just in case you really need to know there's value there, so this has been like the additional challenge.

Matthew Halliday: And even though we're using these new technologies that are really great for other things such as like iot click data that I mentioned.

Matthew Halliday: They don't work for this relational data set, and so you have to go through all this process when, in reality, you dumb dumb your data you from like maybe a full extract of the data you got everything you got all the records and trends and the details, you need.

Matthew Halliday: And then you run it through some transform processes i've had customers say that they spend 50% plus of their consumption in the past on these you know cloud query platforms, just to do this, transform piece before they can even get it to a point that they can run queries against it.

Matthew Halliday: And in that during that process, you literally lose a lot of the fidelity of the data that's that process of saying here's the columns I don't think we're going to use but probably more importantly, the question is asked like are you sure you really need it.

Matthew Halliday: And that is always a tough question to ask because, as you invest in look at data we always know that new questions come up, and so it really inhibits your ability to be agile and to have some curiosity around your data.

Matthew Halliday: Once you get through that process, though, you still have to aggregate it, because if you've got 2 billion records, but you've got rid of a bunch of columns okay i've got rid of the columns, but I still have some you know large data sets here to work with so.

Matthew Halliday: lot of systems they aggregate it because you don't want to put tablo on top of 2 billion records it's not going to perform it's not going to get the results you want.

Matthew Halliday: And so what ends up happening is then you aggregate it and then you deal in the aggregates so once you do on the aggregates it actually becomes very difficult than to know exactly what is behind that number.

Matthew Halliday: And everything has a process that sits in between which becomes for the business users, at least for a merchant black box.

Matthew Halliday: and very difficult to understand exactly what is going on, so these finance uses business users, looking at this data, yet they're getting some insight do they have the power of all the data that's in the raw data know.

Matthew Halliday: Do they have a version of that data, yes, they do, can it always be trusted well, they have to find out.

Matthew Halliday: drill into it look into it and that is really problematic, and so a lot of cases they're unable to validate these numbers like is it correct.

Matthew Halliday: I don't know there's no other system, I can run these queries on I can't go back to the source system and run them.

Matthew Halliday: I can, maybe if I apply a ton of filters to look at a very specific area, but then, how do I match that very specific.

Matthew Halliday: filtered data set that I run an application to what's in my data Mart and may be able to you know do some aggregation of that and then compare but it becomes very, very complex.

Matthew Halliday: And it's just very, very slow and brittle so there's nothing about speed agility just in time kind of reporting here, it becomes feeling, much like.

Matthew Halliday: kind of some of the systems of the past, when we would use et al and build out though so much you know similar approach.

Matthew Halliday: If we look at.

Matthew Halliday: The next slide here apologies here my computer seems to be not cooperating well.

Matthew Halliday: In quarters approach is simplifying this right, and so one of the reasons we're able to do this again is we focused on that data problem, the application data, the three enough data.

Matthew Halliday: Sets, you have the relational data, and so, when we look at that we made a whole ton of business decisions technology decisions along the way, that enable us to say, can we take that data as this.

Matthew Halliday: Because there's huge value in that we know that, because people want to run operational reports they want to look at the data at the at the role level.

Matthew Halliday: We want to have it as near real time as possible, and we know if there isn't a huge data pipeline between the two, where things are being transformed and aggregated.

Matthew Halliday: We can get much closer to real time because we don't have to recalculate every single kind of aggregation bucket that we might create.

Matthew Halliday: And so, by doing this, what in corners done is say well let's take the application data as it is, but landed in the cloud.

Matthew Halliday: make it available to in quarter, so that we can actually run these aggregations and queries against that data set and I learned lies everything.

Matthew Halliday: Not just 10% of my data or even 25% of my data, but the whole hundred percent.

Matthew Halliday: Being able to just say hey I want this column, and it doesn't need a data pipeline exercise to come on you've already done it so you've really future proof.

Matthew Halliday: Your data pipeline here in this cloud environment, to make sure that whatever the business will throw at you you're able to respond and you're able to spawn so fast.

Matthew Halliday: That we have customers are saying it used to take a 16 weeks to reply to this and now we're doing it in the same hour or in minutes, in some cases.

Matthew Halliday: So really it's making this data usable and an actionable and really empowering business to be curious and to engage in a platform that works for them so it's a real when when.

Matthew Halliday: When you think about the the flexibility gives to the business but also how it enables the it teams to provide and to build to do it in a way that just doesn't come frustrating, you know very brittle and require a ton of maintenance to support.

Matthew Halliday: So what is the point kind of pulling back the curtain just a little bit more here and kind of showing a little bit of the architecture and I will be getting into a DEMO so.

Matthew Halliday: you'll kind of get to see some of this in action and you will see some of these things they'll they'll be there, behind the scenes right, so we made it very easy and seamless for you to get.

Matthew Halliday: All of these things which you might look at and go hey This is great, you know i've heard about some of these things so recognize some of those logos.

Matthew Halliday: That sounds like it looks like a modern platform and, to some degree you'd be right, that is the case, but I want to just pinpoint a couple of things that make this absolutely unique and different from everyone else out there.

Matthew Halliday: And so right starting to the bottom, there that's our cloud storage right you want to get your data into the cloud you like, yes, how do I get it there, how do I get it out of these business applications well, you need a data acquisition.

Matthew Halliday: And so to get it there and quarter provides the capability of connecting to those common business applications that you would have.

Matthew Halliday: Databases or or even your salesforce netsuite products like that in the cloud be able to get that data and land it inside of a park a file format.

Matthew Halliday: park a file format if you're not familiar with it.

Matthew Halliday: is purely a an open standard that enables us to treat Amazon cloud storage file system as almost like a database in terms of storage now it's not for the query layer of a database, but a storage layer

Matthew Halliday: That can be leveraged by different systems, which is great because different systems have been built to solve different technical challenges, as I mentioned three and a half.

Matthew Halliday: Is a different technical challenge than if i've got 300 petabytes of data and it's largely like one data type structure and so.

Matthew Halliday: This is where having that data in that format gives you options to run things like spark spark is obviously got a lot of.

Matthew Halliday: Press and a lot of coverage here machine learning being able to run those spark jobs.

Matthew Halliday: Within that and be able to kind of do things from a data science perspective, but also what we've even seen is people moving and migrating from my PL sequel packages and Oracle databases.

Matthew Halliday: and implementing those ups pay spark inside of a spark cluster and be able to leverage that in the platform to be able to do business enrichment of that data and to kind of just bring additional value to that data.

Matthew Halliday: This is not necessarily Center, this is what we're doing to like data transformations and when I say data transformations i'm referring to the largely.

Matthew Halliday: That the negative ones, the ones where you're losing the quality of your data and not the ones where you might be taking a value, such as.

Matthew Halliday: Oh there's a status code that says a C T and you're like I don't know what those mean like transformations where you make the data more usable for the business.

Matthew Halliday: A value add yes absolutely you want to be able to add those things and there's a number of places, you can do that in the platform.

Matthew Halliday: But we're not talking about those we're talking about the transformations that are really just simplifying your data to get it to a point where an analytical query engine can run against them.

Matthew Halliday: Now we're able to remove that entire stack right remove that are built that the prerequisite or requirement to say I got to get this data in a shape that a sequel query has a shot at being successful APP.

Matthew Halliday: And what we're able to do to do that is to take the park a data, in conjunction with something we refer to as directed a map i'll talk a little bit more about that in a moment.

Matthew Halliday: But this direct data map, you can think of it as a data layer or an extra bit of data that sits metadata about your data specifically to the relationship piece of it, the three and a half part of it.

Matthew Halliday: it's a it's kind of like a join level at the schema versus just individual tape joins so really understand how joins connect to each other.

Matthew Halliday: And at that point we can actually reverse your data as if it had been flagged without ever flattening it so you get the same performance as if you find it without spending the hours and sometimes in days of flattening the data sets.

Matthew Halliday: that's the magic that's where you can actually start now to use the data and it's all format i've known as like brands.

Matthew Halliday: If you come from you know data lake experience or just really the the data as it was in the source system.

Matthew Halliday: And then being able to create that and engage with that you whether you're using encoder or other tools or the visualization products in the marketplace.

Matthew Halliday: I mean quarter is not just a visualization tool, if you come away from that and think oh it's a visualization tool you missed this is a data platform that enables you do have data in a format that will work for your business for your users right out of the gate.

Matthew Halliday: So there's some other things that we do to help you on this journey, and so, if you're saying this is great Matthew I can point directly these applications that we have.

Matthew Halliday: But I don't make sense of them, some of them are very complicated if you haven't looked at a three and a half data model of an era P lately.

Matthew Halliday: You haven't had your head explode I just haven't joined relationships and how complex these things get its astronomical.

Matthew Halliday: One of the projects, I worked on was Oracle E business suite and Oracle cloud Dr p.

Matthew Halliday: Those had over 50,000 tables that were used now what might surprise you, is when you would pull up like an invoice or something.

Matthew Halliday: You would look at and go oh I recognize these fields, my address my bill to location, the products shipping and things like that, but in reality there could be 30 to 40 tables behind the one screen that you're looking at.

Matthew Halliday: That have been joined to pull the data together to represent it and there's really good reasons which I won't get into about why we do three and a half, and when we build applications.

Matthew Halliday: But what I mean quarters provided as a way for you to demystify all of that, and just say you know what.

Matthew Halliday: If all these tables are you know what I need and some of these common business applications Can you help and absolutely.

Matthew Halliday: So we've specialized in building out these blueprints that really have here's the raw tables, you need.

Matthew Halliday: here's also the relationships and how they connected and that's that's the tricky bit trying to figure all that stuff out and so these really just enable you to know okay i'm going to get the results, I need.

Matthew Halliday: I know how these tables relate to one to another and then also some business logic where we enhance and then rich the data so make it more consumable and usable for the business user think status code changes and other things like that, and then be able to create a view which is.

Matthew Halliday: An essence, not an aggregated view on it, but really it is a way that you could think of it as I created without it being aggregated behind the scenes, so you can start now to build on this semantic abstraction layer

Matthew Halliday: So you can look at a visualization and said, this is great, I can see all these numbers, but I can also drill into detail as well.

Matthew Halliday: This really gets people moving very quickly so that, in most cases when you install a blueprint you're looking at your data that same day, even if it's huge volumes of data.

Matthew Halliday: And so what we've done here is just this is actually a slide that was actually done by one of our customers, and this is after they had used in quarter, this is how they explained it internally and justified why why.

Matthew Halliday: You know they they loved in quarter, quite honestly, and so traditional on the left side that was their workflow.

Matthew Halliday: that's what they had to go through Whenever someone would identify a piece of data or they wanted to do some analysis.

Matthew Halliday: And it would take months to get through that process, this is kind of the way that they would live.

Matthew Halliday: And Whenever someone had a new question, you can see it's back to the drawing board it's back to extraction to staging and working on that data.

Matthew Halliday: on the right side was the way that they did it inside of them quarter.

Matthew Halliday: And you can see that really they were just pulling the data from staging and being able to run it and be able to come, look at that data and turn it around and four to seven days.

Matthew Halliday: As I mentioned we've had customers who literally do it on the same day we've heard stories of you know CFO is walking into a room.

Matthew Halliday: asking a question and getting the answer within the same five minutes, and he actually was not just Oh, we already have that report, it was a key changes and new pieces of data being considered as part of that.

Matthew Halliday: But being able to do it because the data was already there they didn't have to put it through a new pipeline they already have it available at their disposal.

Matthew Halliday: she's an example of fortune 50 media company, and when you look at just the ecosystem, they had they they were on Oracle access data which is you know no slouch in terms of how it performs and the compute and the capacity of those machines.

Matthew Halliday: But they're using Ob and informatica they were taking this Oracle EBS data financial data and being able to kind of bring it in to their environment, and you can see, on the left side a whole bunch of different metrics.

Matthew Halliday: around what that looked like just like how long it would take to even just you know, look at a dashboard and then filter it by a collector name, so if you want to say, I want to look at Joe how has God.

Matthew Halliday: Well you're taking over a minute for you to get that answer the data refreshes we're just 12 a day alright so.

Matthew Halliday: Your data is no Actually, I would say that's actually pretty good for most companies as a lot of companies where it's once a day.

Matthew Halliday: Maybe two or three times a day, but they actually have pride pride, you know we're quite proud of the fact that they got it to 12 and to be fair.

Matthew Halliday: Now that was a That was a good accomplishment given everything that they had in the challenges they faced.

Matthew Halliday: on the right side that you can see that these numbers just get shattered right it's like dropping things down to one, second, and when you think about what does that one second really mean.

Matthew Halliday: It actually isn't like oh my reports faster that's that's a behavioral change.

Matthew Halliday: that's when you start to do an ask different questions when something is is one second versus one minute when it's one minute you generally run it get your response to move on.

Matthew Halliday: When it's one second you generally ask more questions, and so one of our customers went they had a very similar structure on the left side here with Oracle and they were running 400 calories a day.

Matthew Halliday: You know, looking at some reports probably be mailed out and most people looking at them after in quarter it wasn't that those 400 reports when faster they did significantly so.

Matthew Halliday: What was really interesting is they went from 400 calories a day to 70,000 queries a day.

Matthew Halliday: which I think really just signifies and showcases that there was a behavioral change there was an organization that actually changes the way they leverage and use data.

Matthew Halliday: It wasn't just Oh, this is a better experience it's now they wanted to use it at the data they wanted to engage with it.

Matthew Halliday: And so it is really amazing when you see these kind of transformations happen and it's not an incremental improvement.

Matthew Halliday: it's not Oh, this is 20% faster or this is 40% faster, which you might say, wow 40% faster that's that's significant this is orders of magnitude faster that it does change the way people even think and approach data within the business.

Matthew Halliday: So when we think about you know transition to the cloud, one of the key things here is how do I simplify this, you know how would I consolidate and understand now.

Matthew Halliday: When you look at some of the challenges that you can you're facing in this right apple migrations require a ton of rework for existing investments.

Matthew Halliday: there's just a lot that you need to consider when you're doing this.

Matthew Halliday: In many times, if you're moving your applications from on Prem to the cloud that's done in pieces.

Matthew Halliday: where you will take individual phases of these projects and move them across but then you lose.

Matthew Halliday: The ability to report on that data because you're like I don't I consolidate it i've got some in let's say Oracle cloudy up here and i've got some data to still residing in Oracle EBS because this module hasn't.

Matthew Halliday: been transformed yet or hasn't been migrated to the cloud yet how do I get insight into that and that's where a platform like in quarters unified data analytics platform.

Matthew Halliday: can really help in that scenario where can really help by saying we can take that data, make it available bring it together to give you a unified a unified.

Matthew Halliday: view of that data to bring it back together again in a way that you can leverage and drill down and even when you find a problem pinpointed in the source system.

Matthew Halliday: that's probably one of the key things you have been able to understand that when you can find data at this level.

Matthew Halliday: It maps directly to how you fix it in the source, or when you.

Matthew Halliday: Were how you can help identify it when you do when you're talking in aggregates it generally means you need to send it off and say Can someone try and figure out why this aggregate numbers not correct there's a transaction there's something in there that's making a problem.

Matthew Halliday: So quarter really with this comprehensive.

Matthew Halliday: blueprints are bringing that data, making it available in an open file format, enabling you to have you know connected to other systems to build a landed in the cloud storage.

Matthew Halliday: To build a leverage, you know just not just the data you're bringing in but also you know data enrichment machine learning and analytics on top.

Matthew Halliday: To have it with compliance with with governance and be able to understand exactly how my users using that.

Matthew Halliday: really does help on this journey and transformation to the cloud which we're seeing you know across the board with you know companies in terms of moving, I would say, their their data warehouses their data kind of central locations to a cloud of structure.

Matthew Halliday: So with that i'm going to say let's look at you know what does this look like lovely saying okay prove out to me Matthew let's see exactly what does this look like, and so this.

Matthew Halliday: couple of things here i'm going to show in this DEMO first of all, the first part i'm going to look at is data in just how to load the data in.

Matthew Halliday: And then i'm going to look at how do we actually create this data, what does it look like, how can I query on a fairly large data set without the need for transformations and so with that i'm going to stop sharing here and start sharing my application.

Matthew Halliday: Alright, so here, I am in the quarter environment, you can go ahead and set one of these up just go to encoding calm and say get started and click on the get started, and within less than five minutes.

Matthew Halliday: On average, about one and a half minutes you'll be in during quarter environment of your own and we've published it and created it in a way that you can actually.

Matthew Halliday: do some of the things i'm showing you today, right here so first of all i'm going to show you what does it take to get data into the system.

Matthew Halliday: And so, here we have a set of data connectors you can see here we've got external data sources, we got local data files and data destinations.

Matthew Halliday: And I can go ahead and come in here and look at my different data connections, I have set up, you can see here there's a whole bunch of.

Matthew Halliday: Other ones available that I can put in my environment here and leverage and you can also have custom connectors as well, so you can upload your own custom connector.

Matthew Halliday: If you so wish, in this particular case though i'm going to use my sequel database and you can see here it's just a very straightforward, you know GDP see collection.

Matthew Halliday: And that could be you know your Oracle database, it could be, you know, whatever you might have but i'm gonna show you what does it take to actually get that open file format that paki file format.

Matthew Halliday: and bring data and so we'll call this cloud migration we're going to use this online store that we had.

Matthew Halliday: And so on the left side here, you can see, this is my This is my database, and this is, you know how I would go and look through all the different objects available.

Matthew Halliday: And I can click on an address, and I can bring over the definition of that table and click on product, I can look at that I can change the labels what this is actually doing is just creating a very straightforward sequel statement, which is the select the column names from the table.

Matthew Halliday: i'll get into a little bit more of an advanced case when these when these get large.

Matthew Halliday: In the moment but here i'm just going to say let's like everything, so you can see here it's gone out it's gone and identified here's all the objects here's the number of columns and now I have these 32 objects selected.

Matthew Halliday: I also want to create the joint between this so i'm gonna go ahead and create the schema.

Matthew Halliday: And once I print the schema i'll just show you what this looks like before I load the data I haven't loaded it yet i've just done all of the.

Matthew Halliday: information gathering, if you like, so here this isn't the most complex three and a half model by far.

Matthew Halliday: These definitely a lot more complex, when you get into this business applications but it's enough for you to see Okay, I can see here that.

Matthew Halliday: You know the sales order there's your details around the sales order detailed down here as well, sales order header connects to an address.

Matthew Halliday: A shipment method, a salesperson a territory, etc, etc, so I can start to see exactly you know this data is first dispersed across many objects.

Matthew Halliday: And when I would start to kind of run queries against this it'd become kind of difficult for the databases to perform when this gets turned into kind of large volume.

Matthew Halliday: We go ahead and just kick off a full load, you can see, here we have incremental loads as well, so you can actually say.

Matthew Halliday: Once you've done this load, the first time I could set this up to do incremental and they were just go ahead and start grabbing what's new what's changed, and let me go ahead and update.

Matthew Halliday: what's nice about that is parquet is actually an append file format originated out of way when you just want to keep a pending data, and so there is a process.

Matthew Halliday: That we refer to as known as compaction and compaction in essence is de duplication and so whenever an insert an upset an update happens to the source system.

Matthew Halliday: We need to rewrite that new row well there's no way to go and actually rewrite an individual row within party you actually have to rewrite the pocket file.

Matthew Halliday: Now the good thing is the pocket files actually many and so your book at a table and many files would make up that table, so you just kind of rewrite the the file that contains that row.

Matthew Halliday: And so that's a compaction process that takes place in in quarter does this for you, without having to do any coding.

Matthew Halliday: So you just can go in connect it and then you'll get a set of paki files that will always match and mirror what you find in your system, so you don't have to try and figure that out in your queries and say oh.

Matthew Halliday: i've got to make sure I don't you know double count any of these things and I need to pick the latest version if it finds two versions, we kind of take that headache away.

Matthew Halliday: So you see here i've just loaded some data loaded about 25 million rows in about a minute and I could actually start querying against this now, and so.

Matthew Halliday: This data is in my system, just to show that this actually did work i'll just come in here and i'll just do a very basic query i'll go off my detailed table, which is my most detailed it's got the most number of records and i'll just kind of look at maybe category of product.

Matthew Halliday: And here i'll just change this to be like an aggregate table can you see here i've got that I can come in here, and maybe say you know subcategory, for example, and drop that underneath.

Matthew Halliday: I can go ahead and change, and you can start to you know whoops I can see.

Matthew Halliday: A breakdown and these aggregations being performed on the fly right as i'm going through grabbing different values.

Matthew Halliday: I can throw them in and this is being done on that three and a half data or the Bronze data I didn't go in I didn't say hey.

Matthew Halliday: What are the categories, what are the fields, what are the dimensions that you're going to need to want to look at.

Matthew Halliday: And then create a model that can support it, I just kind of went out of the gate and got this, which is, in my mind, I think this is fantastic because you know I struggled with this for years in both Microsoft and in Oracle.

Matthew Halliday: So this to me just really empowers people to very easily get access to the data now, this is 25 million rows and you might say, well that's not earth shattering.

Matthew Halliday: You know 25 million rows is enough to kind of show the performance that if you ran this on another system, even at this volume, you would see minutes, in some cases.

Matthew Halliday: Maybe you could get it into some seconds, but you won't see what i'm about to show you so i'm going to move now into let's look at this at a little more scale.

Matthew Halliday: So i'm going to look at a data set here.

Matthew Halliday: That I have that is actually 2 billion records in size and it actually is a real world Oracle EBS data set that has all of the data that relates to receivables so collections you think of that and I want to be able to go in and build out.

Matthew Halliday: A report on that data, so you can see here, I actually got a whole ton of schemas, but in this particular one i'm going to open up my accounts receivable now if any of you.

Matthew Halliday: are familiar with Oracle EBS which is this particular case, raise your hand if you recognize any of these tables so let's look at a payment schedules or, this is a real common table in.

Matthew Halliday: In receivables and what we actually have, if we look at how we load, this is it really is just a simple.

Matthew Halliday: select the columns from and we put them in you can add you know columns just by checking on what you want, and add them in no problem, and you have them.

Matthew Halliday: But there's no transformation taking place is just a straightforward bring the table and as as is now when these tables get large we have.

Matthew Halliday: Some you know ways to kind of enhance this with things like chungking so if you've got records that are very, very large.

Matthew Halliday: We want to say I need to get this table in I can't have this single threaded I need multiple threads to recover, so you can.

Matthew Halliday: hear you can see a payment schedule ID and you can create a chunk size and then go through a single table and chunk it up into what is that, like 8 million rows I think and then.

Matthew Halliday: use this order column of payment schedule ID and going to go through and bring that in so we provide this abilities, for you to take out some pretty huge tables and to bring them in very efficiently and effectively.

Matthew Halliday: And again, this is one of the beauties of the cloud is you can apply additional resources to it, so if you need more ingest compute you can allocate more to add more workers to go and do the chungking you can do that.

Matthew Halliday: But if I come out of here now.

Matthew Halliday: entity, here we got 2 billion rows the status data set has pretty significant joins between them.

Matthew Halliday: So you can see here forms and 71 million rows 300,000,170 I look again at the diagram you can see, these are the the join relationships.

Matthew Halliday: These are the things that if I was joining you know close to 500 million rows of 300 million rows to 180 million rows.

Matthew Halliday: The joins and sorts that I would have to do in a regular database would kill it, I would have to apply a ton of filters to make sure this works.

Matthew Halliday: So let me show you that what the experience is like inside of a quarter, this is the roar tables again think three and a half models or your bronze tables before they've gone through transformations sitting in parquet on your cloud data lake.

Matthew Halliday: And they can get this kind of experience so let's just go straight against this and i'll just put together a basic like pivot tables yeah.

Matthew Halliday: And so i'll start with my revenue amount.

Matthew Halliday: As my measure, and I can bring in obviously different ones i'll look at my sales channel.

Matthew Halliday: also bring in my party type.

Matthew Halliday: me just do a year first it's bringing the year.

Matthew Halliday: So physical you know I can just create a pivot table just like that, and again this is, you know significant data volume that we're looking at, but let's make it a little more interesting let's bring in party type.

Matthew Halliday: can drag that in.

Matthew Halliday: I could bring in you know the month name, for example, a month number let's say after the year.

Matthew Halliday: and start to you know, look at this now on my dashboard so i'll create a.

Matthew Halliday: And now i've created a report that will run against you know that that data set.

Matthew Halliday: and be able to look at you know all the different numbers and the different categories within that and be able to drill down into these individual numbers.

Matthew Halliday: See i'm kind of went over you know crazy creating you know huge numbers here of columns you can see.

Matthew Halliday: You know the data is a little more sparse at the beginning, and here it gets a little more dense as we get into these.

Matthew Halliday: These years over here, but this is something a finance user potentially might want to come in to say you know I want to drill into this particular number, what is this $504.

Matthew Halliday: Now, because one of the Bronze tables we've done that aggregation on the fly for you, but I could actually get in and see what is that transaction So if I go in here, and maybe just for ease of use i'll just duplicate this.

Matthew Halliday: So I created a copy of the same table, but i'm just going to move things around a little bit here i'm going to change it from a pivot table into a detailed listing table so did that i'm just going ahead and.

Matthew Halliday: bring this down here bring that down and then I will bring this down this down here and just change this to a listing table.

Matthew Halliday: And so now this point there's no there's no.

Matthew Halliday: aggregations being done right, this is just looking at the raw data I could go in and bring in some of these additional fields, if I so wish, like any of these things just drag and drop right just for the interest of time here i'm just going to go ahead and save it, as is.

Matthew Halliday: And now, when I go back to my dashboard is running again against the live this isn't caching the data this point.

Matthew Halliday: You suddenly in court, a smart, so that if you ask a brand new question you keep asking and the data is not changed, you know we keep running the query needlessly.

Matthew Halliday: But let's see here there's you know 501 50 what is this number, I need to understand what it is, I can set, let me filter by all of these values.

Matthew Halliday: I filter by them and then you'll see my pivot table just gets that one point, and then here are the individual rose.

Matthew Halliday: The makeup that number right So if you add these up between the you know debits and credits you come out of 501 50.

Matthew Halliday: So you've got those got that tie back between the two, so it becomes really powerful when you want to navigate through this kind of level of financial data in this particular use case.

Matthew Halliday: which I think showcases, you know very well the the power of having that bronze data or that three NF data always at your fingertips and being able to query it.

Matthew Halliday: Without going through all those data steps and pipelines that we talked about so simplifying taking leverage of cloud technology is kind of exactly you know what we've tried to achieve here and hopefully you've seen the DEMO.

Matthew Halliday: i'll stop here and just go back to some slides where we have some I think we move towards the Q amp a time.

Matthew Halliday: So yeah here we go just share my screen.

Matthew Halliday: Right, so I really hit on some of these you know these three things.

Matthew Halliday: You know unrivaled data access having all of that data at my fingertips and being able to make any business question you know, a reality and get answers to it is huge.

Matthew Halliday: Hopefully you saw that in the DEMO you know, trusting numbers is key if you don't trust numbers it doesn't matter if they're fast.

Matthew Halliday: It can I have fast and can I verify and when those two things come together that's when you see transformation and change in the way that people think and engage with their data.

Matthew Halliday: Faster time to insights can I get to data curiosity, can I answer ask new questions can I see the data a lot closer to real time that becomes a possibility, when you don't have to transform it transform the data all the time before you can actually do anything with it.

Matthew Halliday: wanted to make sure that you're aware that there is a free trial, you can go on sign up just go to encoding calm or the link on the bottom and you just go ahead.

Matthew Halliday: provide an email address will send you a link click on that link which you'll get within probably 1015 seconds of you doing that.

Matthew Halliday: You will have an environment that you can go into an experience exactly what I showed for yourself and we're certainly happy to help with that and, with that i'll move over to Louise and see if we have any questions.

Louise Westoby: Alright, thanks matt and Matthew as a reminder, please type your questions into the Q amp a box and we'll address them live.

Louise Westoby: So first questions for matt, what are the reasons for not deploying data workloads in the cloud.

Matt Aslett: yeah so it's it just touched on this earlier and it's interesting, I think that we've seen this question evolve over, especially the last recent years previously, why would you put workloads in the cloud now it's much more Why would you not.

Matt Aslett: Always I think when we when we talk to people about the security comes top of the list is always you know, one of the person is people think about.

Matt Aslett: I think the actual the actual importance of that has diminished over the over the years I think people have got.

Matt Aslett: In businesses have got much more comfortable with the security capabilities of the various cloud providers and they've adapted.

Matt Aslett: It you know their their processes and take those into account it's still you know so many people mentioned, but I think more realistically in terms of preventing you know workloads moving to the cloud.

Matt Aslett: You know data sovereignty is clearly significant regulatory reasons why data has to remain in a specific.

Matt Aslett: location, because it doesn't mean it can't be in a cloud and certain location, but you know it definitely puts a.

Matt Aslett: block on on the ease of moving that data around and then obviously performance again regional requirements.

Matt Aslett: latency and then and also definitely you know we see a significant influence still is the existing investments that organizations have made in.

Matt Aslett: On premises infrastructure and it may not be that they intend to keep those workloads on premises forever, but certainly it's off throughout the lifecycle of.

Matt Aslett: Their existing platforms, but again as i've said in and presentation, they should still be thinking clearly about.

Matt Aslett: Those workloads potentially moving to the cloud or the next version of those applications potentially being in the cloud because, clearly, you know the trajectory is is overall is strong.

Matt Aslett: towards the cloud, even if there are some workloads that it still makes sense for various reasons for to be on premises okay.

Louise Westoby: Thanks man Matthew question for you, can you please go into a bit more detail on the data enrichment then imported does or can do.

Matthew Halliday: there's a number of ways that you can you can do this inside of them called i'll talk about what the platform provides and then i'll talk a little bit about what the blueprints provide for you, so the platform enables you to do this in.

Matthew Halliday: Three four places actually So the first time is you can define things add ingestion so as the time when we're actually creating that parquet file.

Matthew Halliday: So we showed it in the example today that we were just creating paki based upon a select statement you can actually.

Matthew Halliday: outside of that select statement create formula columns which enable you to create definition using a formula builder very much like excel.

Matthew Halliday: That will create and calculate fields at the time that we're ingesting the data so as we bring the data in.

Matthew Halliday: he'll create those calculations and it will store them in the park a file and from everyone using it, from that point on, it almost feel like that data was in the source system, it will be in Part A.

Matthew Halliday: And it will reside there, so this is super great if you wanted to say, you know we don't want to keep the you know the status code is is we're going to do a transformation and we replace things so you can do it at that point and store it.

Matthew Halliday: You can also inside of in quarter provide the enrichment around what we're fridges materialized views to materialized views are an optional step, where you can go in and say.

Matthew Halliday: I want to do some some processing on my data, and this is more generally programmatic so maybe it's the machine learning model, maybe it's.

Matthew Halliday: Even you have one customer did a fixed asset depreciation inside of our Python spark envies rather than running it on the earpiece system.

Matthew Halliday: So you see a lot of different creative ways that people are using that and leveraging it and enriching that data to provide more value to the business.

Matthew Halliday: that's an area that really whatever you can imagine, you could build in Python you can do that right so with that kind of really opens up a rich capability to do that.

Matthew Halliday: The second area is you can do the same formula columns but not persistent in the park a but actually do it within the business schema so business schema I didn't really get into that today.

Matthew Halliday: But I wouldn't put business users directly on top of three and a half data model.

Matthew Halliday: I would create these semantic views that sits in between, which included provides referred them as business schemas.

Matthew Halliday: Which are just metadata that says here's the columns in the category in a way that you'd want to look at it and in there, you can actually create these former comes as well now those are not persisted so they're calculated on the fly when you.

Matthew Halliday: And then the fourth one is you can do this formula columns in the visualization layer as well.

Matthew Halliday: So those are just things that you can say hey I just need to do this right now, can I can I transform I change this data can enrich this data and make it more valuable.

Matthew Halliday: You can also upload additional data sets and join that it is you know we showed a very primitive case in the sense that we were just going to one data source.

Matthew Halliday: Most of our customers are not just taking one data source, they have a lot more than one data source and they joined enrich that data by getting customer 360 views.

Matthew Halliday: by looking at multiple systems like maybe their collections, but also looking at maybe salesforce and there it is Sam and their ticketing systems to really understand what is going on with

Images Powered by Shutterstock