WEBINAR Series

Rightsizing Your Salesforce ETL Solution

After outgrowing the free tools, it’s important to plan the next best steps. This session explains how one goes about choosing the right ETL tool for your needs.
Host
Leonard Linde
Keynote Speaker
Mark Smallcombe
Chief Technology Officer at Integrate.io
Areas We Will Cover
Salesforce, Integrations

Share This Webinar

Rightsizing Your Salesforce ETL Solution

After outgrowing Salesforce Data Loader and the Salesforce Data Import Wizard, it’s important to plan the next best steps. This session, featuring Leonard Linde (Salesforce Specialist) and Mark Smallcombe (CTO of Integrate.io), explores the state of ETL tools as they’re available today, focusing on how one goes about choosing the right tool for their ETL task. 

This Xforce Data Summit talk was more of a free-wheeling session based on the expertise of both the guest and host. However, some of the core topics of interest include the unique challenges users face when using the Salesforce platform, what the current integration trends are within the market, how one goes about selecting the best integration platform for their needs, the total cost of ownership regarding various integration solutions, and then key tips surrounding integration projects. 

“Rightsizing Your Salesforce ETL Solution” is ideal for any organization that is in need of a more customized ETL solution — especially those who aim to ETL their Salesforce data. If you’re looking for a more optimal solution in terms of data warehousing and analytics, this resource will guide you along your ETL journey. From industry trends to tips on how to select the best integration platform for your unique needs, keep this guide on-hand to better support upcoming integration projects.

TRANSCRIPT
  • Unique challenges of the Salesforce platform (i.e. issues with native tools, ETL limitations, etc.) (2:55)
  • Integrations trends being seen in the market (10:35)
    • Privacy and compliance (10:45)
    • Total cost of ownership (13:59)
    • API “platforms” (17:05)
    • Decentralization of data (17:45)
  • How do you select an integration platform? (20:33)
    • Core systems (20:45)
    • Team capability (22:36)
    • User experience (23:26)
    • Cost (24:07)
    • Data privacy (25:58)
    • Scalability (26:46)
    • Real time or batch (28:00)
  • TCO of bi-directional Salesforce integration solutions (35:09)
  • Top tips for integration projects (43:54)
    • Scope (44:34)
    • Systems (45:55)
    • Data flows (46:40)
    • Transformations (46:52)
    • Incremental development (47:43)
    • Project management (49:24)

[00:00:00] Welcome to another X-Force Data Summit session. Today, it's going to be me and Mark Smallcombe. I'm Leonard Linde — I've got a lot of experience in a lot of different platforms, including Salesforce. I've been a software developer for 30 years, I think. I've also been the CIO of a small company.

[00:00:37] I founded my own software company and I’ve done a lot of different things. And with me today is Mark Smallcombe. He's the Chief Technical Officer at Integrate.io. And I'll let Mark tell you a little more about himself. 

[00:00:47] Yeah. Thank you, Leonard. Um, yeah, I'm CTO at Integrate.io. I am head of Development Support Operations at Integrate.io. I've worked in Silicon Valley for a number of years, for big companies such as Pinterest and Funny or Die and Citysearch. I've moved my way back to Sydney where I live now and manage our global team. I've been involved in a lot of data projects over my time. I’ve worked at Deloitte in Sydney.

[00:01:14] I also worked at NewsCorp and some, very, very small companies and very large companies, and have seen a number of things and have worked with a number of different Salesforce environments as well during that time. Thank you, Leonard. 

[00:01:29] Alright, well let's, let's get this party started and take a look at our agenda today. We've done the introductions. Our topics are, this is going to be more of a freewheeling session, but the first one is going to be the unique challenges of the Salesforce platform. There's quite a lot there. We're just gonna touch the surface of that — but the important point there is that Salesforce is a unique platform.

[00:01:51] It's not like, you know, the database or data warehouse environments that a lot of different integration tools deal with. Mark's going to talk a little bit about the integration trends we see in the market, and I'll pitch in a bit on that. How do you select an integration platform? What are the factors you use to select that?

[00:02:10] We’re going to talk about that and we've got a little bit of experience-based knowledge on that. For integration solutions, we think that total cost of ownership is a real big factor, especially in the Salesforce arena. And we're going to go over a chart that shows you the wide range of integration that's available on Salesforce from free to very expensive, and try to give you some insight into how you might want to right-size your integration choice.

[00:02:41] Then we're going to close out with some tips and tricks on integration projects. And then we'll finish out with a few final words. So with that, let's start talking about the Salesforce platform. So, Salesforce is interesting, for one thing, I think, it's probably lagged a lot of other, which you might call database platforms, in terms of getting integration tools that are purpose built for the platform, simply because Salesforce itself, provided a lot of really good/good enough tools for free with the the platform. 

[00:03:18] We start with the new tool in town, which is a Salesforce Data Wizard, which basically through a web interface allows you to load data directly into objects in Salesforce. Prior to that, and still existing is the Salesforce Data Loader, which lets you take data from Excel spreadsheets and push them into Salesforce and also take data out of Salesforce and put it into Excel spreadsheet.

[00:03:42] And of course, you can't forget reporting. Salesforce has always let you export reports into CSV files, and you can actually do quite a lot with that. It’s integrating your data. *Cough* Alright, well, that's the first thing that I'm going to have to write down, get rid of that cough.

[00:04:10] Right, uh, another, another interesting challenge of the platform is unlike most relational database platforms, so even if you're running SAP, you're running SAP and Oracle or, or some other, or SQL server. And almost every ETL supports that. Salesforce has a fairly unique object database that isn't supported by every tool.

[00:04:30] And that's important because, even if — and another factor is some of that supported unidirectional. They'll take data out of Salesforce, but they won't put it back in. That's something to think about when you're looking at ETL tools, especially when you're one — your organization is buying an ETL tool, and that brings me to the next dilemma or conundrum that you come up with with Salesforce, which is if your organization's buying an ETL tool, sometimes you'll be left in the dust, because Salesforce is often implemented.

[00:05:01] In what I call a Rebel Alliance implementation, which means that somebody has been fed up with whatever CRM tool or lack of CRM tool that centralized corporate IT has decreed would be the product for the company. And they went out and bought Salesforce, they got some money and they bought it — and they hired an administrator and they ended up — that company ends up having — you end up having a DIY culture.

[00:05:25] By that I mean that you're not going to be not relying on central IT to support you with a software as a service platform like Salesforce, a cloud platform, you can often support yourself. And that leads to some more interesting cultural factors about Salesforce. And I'm gonna use Ian Gotts term for this, Ian made a really interesting presentation, which I would recommend to everybody as part of the conference.

[00:05:50] And, he came up with two, two terms. One is accidental administrator. That's not original Ian, but it's a, it's a really common Salesforce culture — cultural factor, which means that, people that are Salesforce administrators often grow into the job. They might've been in sales to start with, or sales ops, and then they become a Salesforce administrator.

[00:06:10] They don't have an engineering background, and that's not a knock against those administrators. They can be very effective. But, what happens to an administrator often is as a side effect of not having an engineering background is what Ian calls uncontrolled agility, what I call a kid in the candy store.

[00:06:27] Which means that you have this tool that is very powerful. You can do anything with it. And you do. You ended up building tons of custom objects, tons of custom fields. So, what happens is you have kind of an out of control Salesforce instance that you have to roll back, and that makes integration hard because if you have tons of custom objects and tons of custom fields, a lot of which you might've abandoned if you pull up your integration tool, you pointed at Salesforce, you get this giant list of objects and you're wondering which ones are those objects you really want to target or you really want to take data out of.

[00:07:06] And you might be the second or third administrator in that instance, that rebel alliance instance, and you have no idea which one of those objects are relevant to today's task and which aren't. [00:07:17] Another interesting or problematic part of the, uh, Salesforce equation is governor limits. It's very easy to point a tool at Salesforce and just roll through your API limits very quickly.

[00:07:30] Now Integrate.io, not to plug our tool too much, but Integrate.io, like any good Salesforce integration tool, uses a Salesforce bulk API, and the Salesforce bulk API, because it allows you to be a good citizen and a multi-tenant organization, multi-tenant software — software as a service platform like Salesforce, the bulk API is much easier to use.

[00:07:55] It doesn't take all of, it doesn't use up all your governor limits right away, but, if you're right, if you're doing a roll-your-own Salesforce integration platform, don't do what I did the first time I ever pointed something at Salesforce, and blow through your governor limits right away because you don't know what you're doing and you're going and running queries and retrieving one row at a time with every API call.

[00:08:17] Uh, but if you learn that lesson, it's fine if you learn that doing it once, it's very painful if you do it in a production environment when you're on deadline. That brings — and the governor limits exist for a very good reason that Salesforce is a multi-tenant object-oriented database. Now, a multi-tenant object-oriented database is going to be optimized for performance in transactional performance.

[00:08:42] In other words, it's optimized to throw a screen up to a user. It's not optimized to do bulk queries, and it's also, as an object-oriented database — it doesn't have native adapters. Again, if you're in a, if you're in an environment where you have a traditional relational database, or even like a, a columnar data warehouse database like Amazon Redshift, you can use ODBC or JDBC and JDBC in the case of Redshift, go and query that database.

[00:09:09] You don't have that in Salesforce. All Salesforce queries have to go through the API. So, there's no — there's no SQL. Yeah. It's all SOQL, which is certainly not SQL. It's not, it's much more limited. And I want to, I want to recommend just for pure interest, another presentation that we have in the, in the conference by a guy named Daniel Peter.

[00:09:35] And Daniel's a guy who has been saddled with doing some very unnatural acts with Salesforce in terms of getting performance out of the object-oriented database. And Daniel has really jumped through some hoops for customers who had requirements like, oh, we need to bring down a hundred thousand records and do something with them in almost real-time.

[00:09:53] Now, if you had a relational database, and it's on a reasonably functional platform, bringing down a hundred thousand records is nothing, really. But in Salesforce, if you watch Daniel's presentation, you'll see what he had to do to get that accomplished. So, I think the take-home from this, from this slide or this, this topic, is that when you're looking at an integration tool with Salesforce, there's a lot of things that you don't have when you're integrating other platforms that you might as a, as an integrator, as even experienced integrator, that you might not expect when you, when you come to the Salesforce platform.

[00:10:31] Mm, okay. Thank you.

[00:10:35] Now on this side, we want to talk about four integration trends that we see in the market. So we're seeing this from our customers coming in. The big one is privacy and compliance. This has been going on for a while. You know, GDPR and other standards are coming into the market. Next one we want to talk about is total cost of ownership, API platforms, and then decentralization of data.

[00:10:59] So if we talk about privacy and compliance, there's a number of big standards. As I mentioned, GDPR, there's also CCPA, which is California Consumer Privacy Act that's coming into power. I think that came in, this year actually. So a lot of people are focused on that and figuring out what to do.

[00:11:17] There's also Brexit, which I think is pretty interesting. So obviously the UK was part of Europe and under GDPR, and now with Brexit, that all gets turned on its head and they've got to figure out what they're doing next. There is the UK Data Privacy Act, which they have in place. And so the rumor is they're going to almost do like a UK GDPR, which hopefully we'll have no material difference, but it's certainly something to watch.

[00:11:43] And the reason why it's important for integration is, you know, one of the big things you have to do is keep your data in Europe or protect it in Europe. And so if you want to take it out of Europe, you need to start, you know, encrypting or removing data. And so common things we see are people wanting to anonymize data.

[00:12:03] They want to remove data, they want to mask data, or actually, they want to encrypt it. And we see that for GDPR, and we also see that for customers who have, PII data or even, PHI data, which is for healthcare industries. And so they’re important trends that we see, and, because these governance regulations are — you know, have very material fines.

[00:12:29] What we're seeing is that this is a trend that goes right the way up to the board level of these companies and the, they want to manage the risk of that — so the decisions are coming in from the CTO. So, it's less about the data engineer and making the decision.

[00:12:44] It's actually being made at the CFO, CTO level, that they want to basically minimize their risk and want to make sure all the data is compliant with the standards that they’ve applied to their business. give you an example, a recent example for us, actually, we had a customer come in. They had the — the company's based in Frankfurt and for a compliance reason, I'm not sure what the reason was — maybe it was because it was a financial services company. 

[00:13:14] But they need to keep all their data in Frankfurt. And the beautiful thing about the cloud now is, that's actually possible. A few years back, that would be a complete nightmare. You know, you'd have to set up a data center and hire all the people that have all the infrastructure in Frankfurt.

[00:13:27] We were lucky, we could spin that up with Kubernetes and Hado, creating a new AWS region. We had it done within a couple of weeks, which has, fantastic, and something that you couldn't have done, probably even five years ago — it wouldn't have been possible. Now with all the cloud vendors like AWS and Google and all the other ones out there, they all have multi regions and it makes this much, much easier. 

[00:13:54] So for people who are doing integration work, that is a huge benefit, I think. For total cost of ownership, I think there's a few things people are doing — they're looking at different options. So there's lots of different solutions for integration.

[00:14:09] You can pick a low end, point to point solution. And that's obviously low cost, it's got no functionality, probably doesn't do many transformations — but if your only goal is to get data, maybe out of Salesforce, you know, just a single pull, that might be a good fit and it might be a really good way of doing your ETL.

[00:14:30] Another thing we see is people who want to do ETL with transformations. So they want to do all this before the data is stored, and it's often done for compliance reasons. So they need to make sure that the transformations are done. As we talked about, you know, moving data or encrypting data. They want to do that before the data gets into their warehouse and before it gets dispersed really within the company, and before that — all that data is written to locks. The other trend we see is ELT, where the transformations are done in the warehouse. And that's another valid path to take as well, where you have a very basic extract, and then you push that into a warehouse and then you run your transformations later on in your warehouse to, to remove data or to, reformat dates and things like that.

[00:15:25] And then the other trend we see is people wanting to build it themselves. And this could be a really good fit if you've got a very strong team. So, if you have a very large data engineering team, that could be a great way of doing this. And it means that you can centralize everything. But the — there's lots of downsides obviously about building yourself.

[00:15:47] You've got to maintain it — you've got to make sure that you can hire all those people for your team. So we see that happening with, you know, some very large orgs, where they have the ability to hire large engineering teams. And that gives them the flexibility because they can almost do whatever they want, whatever that is out there, and they can choose whatever product they want.

[00:16:09] So when we talk about total cost of ownership, it's really not one-size-fits-all, and there's not one-size-fits-all from the vendor point-of-view. There's lots of great vendors out there. Obviously, Integrate.io included, and I'm biased on that. But there's lots of different parts that you can pick, which actually depend on where you are, almost on your integration journey.

[00:16:33] Like I mentioned, you might start off with, point-to-point might be a great solution at the beginning, and then later on you might decide that you need to do some transformations or you need to write stuff into Salesforce — and then you can go down that journey. And, sometimes people start with, in house development, that's another path where they start.

[00:16:52] And then they find that it gets to be a maintenance problem for them and they — they need a vendor. And that can also be a valid path as well. The other thing we see is this, the emergence of API platforms, which I think is quite interesting. Everyone wants to call themselves an API platform now, and it feels like, almost like the new word for enterprise service bus, which is probably an old term.

[00:17:19] And Leonard and I probably have worked with that many times in other projects, but it's now — API platforms is the big thing. And the very large vendors like MuleSoft, I think, and other ones, and Informatica and vendors like that, are promoting their API platforms, which — which is interesting, I think. 

[00:17:43] And the fourth trend we see is this decentralization data, which I think is — is we've seen this with a number of customers. And what's happening is, originally everything was centralized into IT. So all integrations were all done in IT, probably with a big product like Informatica or maybe even, custom-built work.

[00:18:05] And what we see, is that that worked well, but it meant that people in the business, like the business analysts, had to always go back to IT to do all their pipelines and it made them slow — it made them less agile. They couldn't, they couldn't get this work done quickly enough. And we'll have some examples of that.

[00:18:24] And yeah, what they're finding now, is they want to almost push all this work out to the data scientists and the business analysts who are working with the business units because there's been a decentralization of data science as well, and visualization. And so these, these, BAs are actually sitting in the business unit and they don't want to wait for IT to do this stuff.

[00:18:44] Almost like you mentioned Leonard about Salesforce, and how there is a culture of getting stuff done and almost getting around IT — and just, I need sales information today. What they're finding is that BAs want to do stuff immediately, they want to build their own pipelines. They don't want to wait for central IT to do it.

[00:19:02] They want to have a tool that can do that, and that changes a bit of — almost like the tool they choose. So they don't want to have a development platform where they have to write in code. They are sort of self-selecting platforms that are low code, drag and drop type platforms, that they can run and maintain and look after, but also fits the, you know, ITs needs. So they, you know, there's strong security, they know how to do encryption and all those sorts of things so that there's a bit of a blend of development as well as, decentralization of creation of pipelines as the data teams are being decentralized. 

[00:19:43] Yeah, I just want to add Mark, in the presentations in the conference from Integrate.io customers, except for, except for one, it's all data analysts who are doing a DIY solution. And I also wanted to plug when you're talking about privacy and compliance of PHI in the United States, you mentioned private health insurance, private health information.

[00:20:08] Matt Gladscow has a presentation where he talks about Salesforce for healthcare that people are interested in. You know, healthcare and some of the issues around privacy and so forth. He does touch on that in that. 

[00:20:24] Oh that’s great. I'll make sure I check that one out as well. Great, so the next slide is, is really to go through some of the things that we see about ‘how do you select an integration platform’ and some of the decisions that you need to make. So when you look at the core systems, you know, the first step is to look at the core systems that you want to integrate to and basically look deeply at them and figure out what, what opportunities are there?

[00:20:54] You know, you often have a business requirement to do an integration and get some data out, but there might actually be other opportunities, other business opportunities that come up because of this new platform or this new integration you're going to do.

[00:21:07] And it's important to check the vendor support for the integrations you want to perform. We see a lot of people who have integration platforms that, for instance, read out of Salesforce, but they can't write into Salesforce. Which is fine at the beginning — you think, oh, that's fine. I only need to read.

[00:21:27] You know, this is the only thing I want at the moment. But you know, two years down the track, you probably find that you actually want to write stuff in Salesforce as well, and update with some information from your ERP system or something like that. And to do that, you'll actually have to probably change vendors.

[00:21:42] Which is an enormous total cost of ownership issue because most of these vendors, it doesn't matter which vendor you pick, the pipelines are normally proprietary, and they need to be rewritten when you switch vendors, which is a big cost. So, recommendation there is, check all the systems you want to integrate with — not just the ones you want to do today, but also the ones you want to do in the future. 

[00:22:08] Just make sure that you have that covered and they could do those things because it can be, it can be painful if you learn that a year later, that you need to do, let's say write to Salesforce. And the other thing with sales is, like you mentioned, you need to make sure it can handle or uses the bulk API for the governor limits. That's another important criteria, I think, for Salesforce.

[00:22:31] And not all the vendors do that. So worth checking into that. And then team capability I think is important. And there's, you know, there's not a one-size-fits-all on this either, but it's, you know, who's going to build and maintain these pipelines.

[00:22:47] You know, is this going to be a developer? There's going to be someone who's got a, you know, if you've got a team of Python developers who are really skilled at that, you’re happy maybe working in a proprietary language for a vendor. You know, or is it going to be a BA or a data scientist to do this work?

[00:23:06] And so, yeah, looking at your team, you know, the ability and also looking at whether this is central IT or is this going to be like you mentioned, distributed amongst all the business units — cause that can make a huge difference to the platform you choose as well. And then related to this as well is actually the user experience.

[00:23:27] And what I mean, is that it's more the user experience of the integration platform. So if you're, if you are a business analyst, you probably want a low code, drag and drop type platform that's easy to maintain — quick to use. If you're a developer, you probably don't want that at all. You want, you know, SDKs and workflows, and you want integration into Airflow, and more hardcore tech stuff. And neither is —  one isn't better than the other. 

[00:23:55] It's just that they're different user needs and you have to pick the platform that fits your users, your customers in your company — and so it's worth thinking about that. And then on the cost side for these integration platforms, often cost is aligned with, breadth of functionality is how I see it.

[00:24:15] So, you know, if you want low functionality like you want, point-to-point, there's lots of great vendors out there that will do that. And, they have a low price point, and they have limited functionality. And then on the other extreme, there are really massive companies like, you know, MuleSoft and Informatica, Dell Boomi, companies like that, that have tons of functionality, way beyond ETL and integration — you know, data governance and data lineage, and all sorts of things.

[00:24:46] Which is fantastic if you need all that. But it's good to know what you really need, rather than, taking this massive thing when you might only use a slither of the functionality. So again, just figure out like, where are you on that continuum? Are you one extreme — is point-to-point. The other extreme is very large enterprise vendors that have everything, in one package.

[00:25:12] And then, the other thing to think about when you pick a vendor is that, yeah, you probably want to think of like a three to five-year timeline. I think maybe even longer because when you do these integrations, it's a lot of work to do integrations and these pipelines take time to build and they, you know, you have to investigate systems and it's an investment for a company. So you don't want to switch them out, but you don't want to have a vendor for one year and then have to replace everything the year after. So when you, when you work with these solutions, find a solution that you think you're going to be comfortable with for the next, you know, three to five years and you can grow — the vendor can grow with you and you can grow into this solution.

[00:25:53] And you feel comfortable with it. I think that's important. And then on the data privacy side, we mentioned, you know, GDPR, if it's important for you, you know, make sure that the vendor can do a data privacy agreement, a DPA, you probably want a vendor that can, that has security certifications like SOC 2.

[00:26:15] And then if you want to do things like HIPAA, you need to have a vendor that can handle that and has the, you know, the security standards that you feel comfortable giving them your HIPAA data. So I think data privacy is important. And then the other thing we see is some companies want to actually encrypt data.

[00:26:31] And so, encrypted fields of data. And so if you want to do that again, pick a vendor that can do things like field-level encryption, and you feel comfortable that they can, they can handle that level of sophistication. And then on the scalability side, this again is something that's worth thinking about right from the start.

[00:26:53] Because how big are your data sets? Like how much data do you actually want to, move in and out of your systems? And some, again, like that can actually impact which vendor you pick. So if you have like, absolutely enormous data sets, then you need to sort of make that as part of the requirements right off the bat, to make sure they can handle that.

[00:27:14] And the other thing is data latency, which is a bit of a subtle one that you might not think about — is almost like, where are the locations of the data centers for the integration? And so if I have an OnPrem, like I'm OnPrem, and say New York — I have a database there.

[00:27:32] And, you know, I want to do some integration work and maybe the other systems, in a different region, know that might not be possible with some vendors. Some vendors might not have a data center in New York or not be able to write data locally in New York. And so understanding the location of the data centers in relation to your data and your warehouses can be important for latency.

[00:28:00] And the next one I've got is real-time or batch. So we see this, this is another trend that we are starting to see a little bit, is that people are talking about the streaming of data and, you know, there's Kafka and Flink, and other solutions like that. And there's some vendors like Aluma that just got bought by, I think Google, is one of the vendors in this space.

[00:28:27] So the question is, do you really need real-time? Is that critical for your business? And for some it is — they need, you know, to move data around within a minute or 30 seconds or something like that. And that could be a business use case that reduces the number of vendors that are capable of doing it.

[00:28:47] Often, most companies can work in a batch — they can work in a, probably not a day batch now, I think where everything is moving a lot faster, but you know, five to 10-minute batches actually — almost like real-time for 99% of all companies. And so again, when you are selecting an integration platform, you know, don’t necessarily fall in love with real-time, if you don't need real-time.

[00:29:10] Pick a platform — pick the tool that you really need and you might find that batch is actually exactly what you need for your organization. The other recommendation I have is to create a matrix of your requirements before you actually talk to vendors, and so it's very easy to talk to vendors and get swayed on what's important because that's what the vendor has in that product set.

[00:29:37] And I think it's much better to independently go through what you need from an integration platform, or, in fact, whatever technology you're picking, is to go through that — create a spreadsheet, create the columns in a spreadsheet with the different features you want, you think of what’s important and why they're important.

[00:29:55] Get buy in from your team, that that is actually true, and that's really what you all want. And, understand that. And then when you go to market, you can, you can talk to the vendors and actually see how they rate. Because unless you're going to go for the really, really huge vendors. Most of the vendors, there'll be a set of trade-offs.

[00:30:14] You know, some will have, you know, it might be data centers versus user experience or even a, you know, some other thing might come up — and so it’s good to have all those requirements in a matrix so you can actually go through them together. 

[00:30:32] I was gonna talk about a couple of things I've seen, which is quite interesting — just a couple of big stops from a customer point of view. So on the Agile, I've seen, this is almost like an example of decentralization. So I went and visited one company and they actually received all these textiles from banking customers, and the text files would all come in, in different formats.

[00:30:55] And they found it very hard with these big companies to get them all to, agree to a format and stick to a format. And so they'd find these, these files would come in, you know, and all sorts of stuff. And they were a very big —  a global company with a U.S. head office that had a U.S. data team and U.S. data integration team there.

[00:31:15] And what they found was it was really hard and time consuming when a new file format came in, to work with the U.S. team because of time zones. And, this is in Sydney, and, and actually he had the U.S. team to make the changes and then send them back the updated pipeline. And so that cycle probably took two days to do.

[00:31:36] And so, they actually picked, you know, Integrate.io for this solution, because what they wanted to do — was they needed the team to be empowered to make their own pipeline changes and make them in real-time. So when they got a file in from a bank, they could immediately make that change, and, and do it independent of central IT.

[00:31:57] And so, I thought that was a really interesting use case and very decentralized. And then on the scale side that I've seen again with Integrate.io was, we have a big customer, a big pharmaceutical customer who's working with one of the big four consulting companies. And then rolling out an Asia-wide implementation of Salesforce, using Integrate.io

[00:32:23] And what they needed, in that case, is a sort of a scale thing —  they needed a lot of data out of their ERP system. So they needed to do complicated joins across multiple systems, and then they wanted to push that data in near real-time into Salesforce. So every five minutes they're getting data pushed into their Salesforce instances around the world.

[00:32:45] And I thought that was a pretty interesting, you know, example of how we've seen customers wanting to do that with Salesforce at scale. 

[00:32:56] Yeah. I wanted to add on the real-time or batch thing. I did a, I talked to Thomas Speronic, I think his name is? I just murdered the pronunciation of his name, but he, he gave a presentation about, integrating from Segment — Segment is a massive data integrator that basically, it takes data from, he works for a company called Bitwise — takes data from all different systems, including internal systems that pushes, it basically pushes out a stream of JSON, from what I can tell. 

[00:33:28] And he uses Integrate.io, and his company uses Integrate.io — he implemented, he was a data analyst, and his implementation runs, I think, once a day or twice a day. But there's one specific event that he wants to capture in near real-time, which is when a customer — when there's an inquiry from a potential customer, he wants to push it into Salesforce as a lead immediately.

[00:33:51] And so it can be acted on immediately. And his choice to do that was to just write a Python script that he used with the segment API and created a lead in the Salesforce API. And, and essentially, you know, I don't want to put words in his mouth, but what it seemed like to me was basically, look, the complexity of having a real-time solution for everything was unnecessary. 

[00:34:18] And instead, we'll just write one for this very simple use case and push that in and, and basically run that. I don't even think it was, you know, triggered, I think it was, they were just going to run it very frequently. So, you know, a hybrid approach is possible too in this real-time thing, because most of your data in any business situation is going to be in real-time. 

[00:34:40] Yeah, I agree. And it's real-time in real-time, you know? 

[00:34:45] Yeah, there’s real soon time and then real, real-time.

[00:34:58] Exactly. Yeah. I mean, if you're in military, then yeah, you need real-time. But a lot of, most businesses, I mean, we used to do everything in a, you know, 24-hour windows or even longer.

[00:34:58] And so real-time to most businesses is if it happens within an hour, I'm happy, I’m thrilled. And that's much better than I've got today. 

[00:35:11] Alright, we're going to talk a little bit about the total cost of ownership of, and remember, these are the bi-directional Salesforce integration solutions. So if you don't see your favorite on here or one you've heard of, it's probably because it doesn't support a full, robust, bi-directional Salesforce integration. So, this chart goes from left to right, from total cost of ownership to essentially zero over and above the Salesforce platform to, you know, infinity and beyond for all I know.

[00:35:38] But on the left side, we can see Salesforce is there because of the Salesforce tools I mentioned a few minutes ago. It comes with the free Data Loader and Data Wizard. And then there's Zapier. Now, Zapier is an interesting example, there might be others — because Zapier has what are called zaps, which are very purpose-built integrations.

[00:35:59] For instance, you want to answer a lead into Salesforce from one of the, you know, 40 other systems that Zapier supports. So your peer can do that. So if that's what you need, if it's, if you're, if you have a, I don't know, I'm going to pick a system out of the top of my head, Constant Contact or something like that.

[00:36:17] Bad example, but you have some kind of a web form that you need to take a lead and push it into Salesforce and there's a zap for it, that web form, and then from there, you're done. A little bit more — a bit further over in complexity, Data Loader IO, which I think many of the Salesforce administrators who are watching this presentation have heard of it because it's the number one integration half on that Salesforce app exchange. 

[00:36:45] Data Loader IO was made, it was purchased by MuleSoft. It's a completely different solution than MuleSoft. Essentially, it lets you pull data out from Salesforce and put it into a file, a Data Loader file and put it into Salesforce.

[00:36:55] How it's different from the native tools is it allows you to schedule those pushes and pulls, but other than that, it is pretty simple. And what, what distinguishes Integrate.io, which is a little farther down the TCO line here. What distinguishes Integrate.io from those to the left, is that Integrate.io data pipelines let you do transformations on the data, for one thing. 

[00:37:17] You bring data in from Salesforce, you do some work on it and you push it into one of the many targets Integrate.io supports. The other thing about Integrate.io is it does support a number of different targets, including databases, including rest APIs.

[00:37:33] So, as you go farther to the right and TCO, you tend to get a tool that has more power, has the power to transform data, and it has, it has many more targets. And that's the case for Talend as we go to the right. And then, we get to Jitterbit and MuleSoft, which I'll have I'll talk about together because as Mark was saying in that prior slide, Jitterbit MuleSoft — and this is going to sound a little cynical, but remember, I'm a person that bought software and recommended software purchases. 

[00:38:07] When you're a software vendor, at some point, you have to justify your existence. So Jitterbit and MuleSoft have decided that integration isn't enough. Where they want to go from integration has to be an API clap.

[00:38:19] So what they mean by an API platform is pretty much, as Mark said, enterprise service bus. Basically, the idea is you're a company, you have many different integration points, some of them might be home-built and you want to expose an API — but other of your systems can use too, to access the services of those platforms.

[00:38:40] So not only do MuleSoft and Jitterbit allow you to just push data back and forth between platforms and transform it, they also let you build APIs. Do you need that? Is there a best of breed API builder and a best of breed integration solution that you might want to buy instead of buying one bundle?

[00:39:00] That's definitely something you should think about, and Boomi I think is similar in their strategy, but I don't know that much about them as MuleSoft. Let's take Informatica because they've — let’s look at Informatica for a minute cause they've taken kind of a different approach.

[00:39:17] It's very interesting. Instead of going down the API path, they've gone down the MDM path, master data management — and they want to help you build an enterprise data dictionary, and do enterprise data cleansing. So if you have Salesforce and you have homegrown systems and you have SAP or whatever your ERP is, Informatica promises that you can take all the data from all the systems.

[00:39:44] And put that into an Informatica data dictionary, manage it, and have a master data management scheme where you don't have silenced systems. You have one integrated — that even though you might have systems that were once in a silo, you'd want an updated data dictionary. The question I've raised to people considering Informatica as both an integration tool and an MDM tool is to compare the MDM toolkit or the enterprise data dictionary in Informatica with purpose-built Salesforce tools.

[00:40:16] And the one I'll pick out is demand tools, which I think many administrators listening have heard of. DemandTools is a purpose-built data cleansing tool for Salesforce. It's incredibly powerful, I’m not selling it, obviously, but I've seen it in action in the hands of a good administrator, you can do some really powerful transforms on your data and do a lot of data cleansing. And DemandTools is built just for Salesforce, and it handles all the oddities of Salesforce, like object, you know, like the governing limits. So, again, the message I'd give is, you get to the right of the TCO line — to get that — to get to the cost, to justify the cost of those products, they've added a lot of functionality. The question I would ask if I were in a procurement situation with those products is, are those, are all the checkmarks you're going to get from that vendor — cause they'll give you a list of marks to check.

[00:41:22]. Are all those checkmarks, things A, that you need, and B, are they best of breed? Or are they good enough for your situation? If the answer's yes, then those are the tools for you. If the answer is no, looking at a point solution, going to the left on the TCO graph might make more sense.

[00:41:43] Also, just to note, one thing that isn't on this TCO chart is, homegrown API because that can be way over to the right or it can be way over to the left. If you've made it — if you built one little integration, and that's all the integration you'll ever need, that's very cheap. If you've — I'll, I'll talk about my experience.

[00:42:06] I built a — I was integrating Salesforce with a data warehouse and Salesforce as an API, and in my defense, this was before there were as many great tools. I just got out and started writing code to the Salesforce API. Worked great, but then they wanted to integrate another object and another object, another object, and we have to write code to do each of them.

[00:42:28] And I have to rewrite my code to make it more of a little bit of a framework so I can plug in new objects more easily. In the end, in the end, I'm in the app writing, you know, half baked Integrate.io or Talend,  or whatever, and not doing it as well because I don't have the budget — or that I, you know, I don't have a big team to do it.

[00:42:48] It's just me. That's definitely something you should think about. You know, where if, if you're thinking about writing an API solution, where are you going to be in three years? Where are you gonna be in five years? Yeah. You need to think about that. Don’t just go on, sit down at your keyboard, and hack out a piece of code and feel happy with yourself.

[00:43:08] Cause, you might've just bought yourself a headache. 

[00:43:12] Yeah, or even someone in your team, a headache as well. Like, we've had courses that were, you know, customers build all the pipelines and then, and they find out they're finding it hard to maintain it, you know, find it hard to hire great data engineers or Salesforce administrators to actually maintain that code.

[00:43:37 And so they want somebody that removes maintenance burden from them, but still allows them to keep moving forward and developing.

[00:43:53] Okay. What's next? 

[00:43:53] Alright, what are some tips for integration projects? So some of the things I've seen, in fact, we're actually doing an integration project right now for our parent company. And so we're trying to integrate with Salesforce, Google Analytics, Intercom, Stripe for monthly recurring revenue across a whole range of companies.

[00:44:16] And the way we're doing it is we're, we're actually of course using Integrate.io, but where we're writing to BigQuery, which is our data warehouse, and then we're using Data Studio to do the visualization. The steps we take when we do these types of projects — first of all, you know, figuring out the scope.

[00:44:34] It's really digging into that and defining the scope. Define the objectives. And, related to project management is, really get by it. Like you need to have executive buy-in to these types of projects. It's very hard to be successful if you, you know, it's a Skunk Works thing. It's done by one individual engineer.

[00:44:52] We really need to have like the executive owner of the system or systems involved and passionate about the project. Because there will be speed bumps and with executive buy-in, they help, get you over those speed bumps, which may be more budget or more resources or more time or whatever was causing, causing that speed bump.

[00:45:19] And then look at the tech as well when you're doing the scope — really look at what you need the tech to do. Like almost say what are the technical requirements? So how fast you'd need the data to be integrated, what sort of data volume are you going to pass between different systems, because that can make a significant difference to the solution you pick.

[00:45:39] And the, and the size of the scope — you know, if you, you know, a huge data volume, that might mean a completely different technical solution to be able to handle that. And then on the system side, really understand your integration point capability. So what are these systems — what is their API?

[00:46:01] Which API should you use? What are the requirements for the API, maybe on the security side —  how are you going to authenticate, maybe, on the security side, what type of data are you allowed to pull out of that system? Maybe there are data compliance issues that you need to adhere to.

[00:46:19] Maybe you need to anonymize data or mask it or delete it before it even moves out of a system for security reasons. And, most of those are internal and you need to sort of see, talk to your IT teams to understand their data compliance concerns or, or requirements for you when you're doing your integration work.

[00:46:40] And then we move on to data flows and that's really mapping out the data flows, understanding what fields are going to be needed, what APIs you should call. And then related to that is the transformations. You know, during that time — during that journey, what transformations are you going to need for the data?

[00:47:00] How are you going to join the different data sets? What's the, the primary key or that you'll use to do that? And also what format, do you need to have that data? So you'll probably be exporting. It could be to Salesforce, which may be, it has one format, but it could also be to a data warehouse or to a data lake.

[00:47:20] You know, what, what format of data time do they need? You know, how do they, you know — what ID do you need to pass across or whatever it is? But really understand the format that's needed because that might require you to do some transformations along the way. But most of the time it does because one system is, is slightly different than the other for how they handle things like that.

[00:47:43] And then the other thing I see is incremental development. I remember those death march waterfall projects of the eighties, where they were, you know, everything was like six months and nothing really came in under a year, you know, everything was under scoped. Everyone promised the world and never delivered it.

[00:48:06] Stuff like that. I know, I'm a big believer in Agile. It's just. Lots of quick wins, small wins. So if you're going to do a big integration project, do it almost like one pipeline at a time. Just deliver one thing, one integration into Salesforce. Get it out to the business, get feedback on it, deliver it. Move on to the next one.

[00:48:28 Cause every time you do it, every time you get one of these wins, you're getting, almost like the benefit from the business. You know, they see the advantage that you've done. They support the future work. They get excited about it. but if you, if you do it in isolation over a long period of time.

[00:48:45] It's likely you'll miss your deadlines and your, your executive sponsors won't be excited about the project anymore. They would have moved on and you probably won't understand the requirements. Cause I think often when you do these quick wins, you get some more learnings about what the scope is like.

[00:49:02] You, you thought it was this and the business guy thought it was this, but when you did the integration it was like, ah, I need to have this other thing as well, or I need this other integration. And you get that feedback immediately and then you can move on that, and actually get them what they need in the end.

[00:49:17] So I recommend launching, you know, pipelines and functionality incrementally as you're doing this work. And then on the project management side, I've done a lot of projects in big companies and small. The more I do it, the more I realize you really need a strong project champion. You need someone in the business who's passionate about what you're doing, who will be your backer — it’s really hard to build, to do any project if you're doing it just from a technical point of view without that sponsor.

[00:49:50] So I'm a big believer in trying to get, almost like the higher, the highest person you can that is passionate about what you're doing. To sponsor the project and to, to be the person who's going to be the protector of the project and the salesman for the project. And then another thing I, I'm not, I don't like, big company stuff, but the thing that I do like is actually steering committee, I’m a firm believer in steering committees.

[00:50:17] And the reason is that you have a very quick way, it can be half an hour every week or something like that. But you can get the executive sponsor in there, so you're constantly on top of mind for them. They see the deliverable as they see what's happening. If you end up with any roadblocks, they will be there, you know, covering your back and helping you, pushing the project along.

[00:50:39] And so I've, I've really liked having a steering committee, even in tiny companies. I still have steering committees just cause I think it's important to get stakeholders together and, and get them brought into the problems and the solutions as they evolve and emerge out of the project. And then I'm a big believer in constant communication about a project.

[00:50:58] So having — I do think having artifacts about the project, so like, you know, what happened this week? What's going to happen next week? What are the risks you have, what are the risks that have changed? And then making sure everyone's aware of the decisions that have been made, so they kind of, you know, down the track — people can’t say, well, I didn't, I didn't know about that, or I, I never agreed to that thing.

[00:51:20 If you have it all written down and you have clear, constant communication with everyone, all the stakeholders should be all on the same page and understanding exactly what they're getting. And then I'm a big believer in clear timelines and deliverables as well. Again, cause you just want to, at the end of the project, make sure everyone's happy and things will change and things will come up.

[00:51:46] And if you have, yeah, clear timelines and deliverables, people will be forgiving if things come up and they're, they're in the loop. But no one wants to be surprised. You know, they didn't want to come into a meeting in two months time and be told, oh yeah, the project's actually three months late, and these are the reasons, you know, much better to be involved in it and maybe have some, some shared accountability. And also, I have some steps that we agree together to get, maybe get things back on track. 

[00:52:17] Yeah. I just want to underline — and you're absolutely right. Strong champion, especially for an integration project, because those are, those are generally breaking down the walls of silos and yeah, there's a lot of, there's a lot of corporate politics around who owns the data in those silos. You have to get somebody above silo owners, otherwise, it's just going to be a fight. 

[00:52:40] It is. Yeah, exactly. And often, you know, in a very large company, often there's a, there's an owner for each one of these big systems. And so, you know, ideally your, your project champion — you know, these people were either reported to them or they're somewhere in the hierarchy and they can break those potential conflicts you might have between teams and stuff like that.

[00:53:05] So that the project championship, yeah they create the shared vision for the project and get everyone aligned to help and work towards getting integrations done across the organization. 

[00:53:16] Great. Well, thank you. I mean, this is the end of a brief discussion, but I wanted to thank everyone for attending the Xforce Virtual Summit, and hopefully, you've found all the talks useful and you've found our discussion useful today. What I think is really exciting for all of us is that companies are now leveraging data more and more for their competitive advantage.

[00:53:45] I see that all the time now. It used to be a thing of the — it was only the very big companies like Facebook and Google that were into the, into the data and using data, and now it's really becoming a commodity and it's becoming a point where every company has to do it to remain competitive. So it's a great time to be in the industry for all of us.

[00:54:06] You know, it's a great time to be in Salesforce, great time to be in the integration space. And that we're seeing is emerging data strategies and data platforms and storage solutions, coming out all the time. Things are evolving and — our work is really becoming business-critical, which is very exciting.

[00:54:26] So, I, I'm, I'm very happy to be in this space and happy to be involved in this Xforce conference, and hopefully next year, COVID-19 goes away. We can actually have this in person next year and have a real conference and have speakers there and. And meet and discuss one-on-one. But really appreciate everyone for joining and look forward to your questions.

[00:54:51] And if anyone wants to learn more about our platform, you know, you can reach out through our sales team as well. 

[00:55:00] Yeah, and I just wanted to plug that, I've written a few articles on Salesforce for the Integrate.io blog. You can go out and look at that and if you have some more questions — and I invite everybody to attend as many sessions as you can in the conference because — and especially Integrate.io customers. Because as Mark said, it is interesting how these customers are not using Integrate.io for nice to have — they're mission-critical. They run their business on integrated systems, which is something that 20 years ago would not be the case. So thank you for watching and I invite you to participate in some of the other conference sessions.

[00:55:55] Yeah. Thank you.

The Evolved Stack
for Tomorrow's Leaders

The no-code pipeline platform for
your entire data journey

Ensure Data Quality