April 08, 2023

Smart & Easy Data Delivery: Shipyard is for Every Data Professional : Podcast

Blog > Smart & Easy Data Delivery: Shipyard is for Every Data Professional : Podcast


 

Show Notes

Carl Lewis: Welcome to the Connected Enterprise Podcast. I’m Carl Lewis, your host from Vision33, and my guest is Blake Burch, a co-founder of Shipyard. Blake, welcome to the podcast. Please tell us about yourself and Shipyard.

Blake Burch:

Thanks for having me, Carl. I'm the co-founder and CEO of Shipyard. We’re a modern data orchestration platform that helps data teams connect tools, move data between those tools, verify that data is being delivered, and identify issues that happen that they get notifications/alerts about and have observability into everything that's going on under the hood.

Carl Lewis:

You specialize in helping people use their data, analyze it, make sense of it, and drive automations from it.

Blake Burch:

It’s usually about trying to deliver data to stakeholders. So, whether you're trying to get something out of a file storage system like Google Drive or Box or Dropbox and deliver it, or trying to get data out of a warehouse solution like Snowflake or BigQuery, or maybe it's in Google Sheets, Smartsheet, or Notion. We help people easily deliver data that way with the low-code templates we’ve built, and we see people using it to build out proactive alerts.

That could be letting the sales team know when a high-priority lead recently came back to the site or did some activity that seems valuable. You can set up Slack or email alerts for that. You can let the data team know when there are active data issues. You could ensure the HR team knows when someone hasn't filled out a survey internally. There are many use cases, but it comes down to making people aware of what they can use their data for.

Carl Lewis:

Good stuff. You just described several systems where data might be, and it’s amazing that in today's world, people still have data in sheets. I talked to a company that just put in an ERP system—they ran the entire company in Excel for the last 20 years.

Blake Burch:

It’s crazy.

Carl Lewis:

It boggles my mind people can still do that. So much effort goes into that. But others have had ERP systems for 20 years, and their databases have grown large and cumbersome, so they’re not as productive as they were. Transactions slow down, etc. And they talk about using other solutions. You mentioned data warehousing solutions like Snowflake and BigQuery. Tell us about these data warehousing systems.

Blake Burch:

There's been a rise in cloud data warehouses—although that's not necessarily the best term because there are other databases, like MySQL, SQL Server, and PostgreSQL, that can be in the cloud. But the real differentiator between the databases is that they primarily serve as analytical databases. Anyone on the team can actively query to get insights out of the data rather than something transactional to make the system work, update permissions, update records of how much inventory you should have left, or stuff like that. And regarding these analytical databases living in the cloud, the big thing they do that makes it easy to manage is automatically scale things up and down based on the query load.

Often, for traditional data warehouses, you might have to be in a situation where you’re actively managing the servers themselves, so you're provisioning new servers because you ran out of CPU capacity or have too much data in the storage. So now you have to split it up across multiple systems and say, "This query needs to access data in this system, and this other query needs to access data in another system." And they abstract that away so you can just throw as many queries as you want at it, and it will automatically scale itself up and down.

They have some insane stat—but don’t quote me—that one terabyte of data can be processed in 10 seconds on something like BigQuery. That's unfathomable on other platforms. So that's what people use if they're trying to handle big workloads and don’t want to worry about who's querying it, when, at what time, and how much data is being processed.

Carl Lewis:

What kind of data do companies put in a data warehouse?

Blake Burch:

Everything under the sun that relates to business. If you're using support tools like Intercom or Zendesk-like ticketing systems, you can have information about the tickets customers have had with you over time and analyze that data, which can live in there. Marketing platforms like Google Ads, Bing Ads, programmatic providers like Trade Desk, Meta, and those folks—all that data is something you could have in a warehouse to analyze.

I used to work with different product feeds, so retailers I worked with had hundreds of thousands of products. Live inventory data for every hour could live there. What was the stock of every item? What's the description associated with that? What's the category associated with that? There are so many examples you could have, but the real goal is to make a data warehouse your central source of truth. It doesn't matter where the data comes from or what aspect of the business it relates to—it should live under one roof.

Carl Lewis:

Would it be correct to think of the data warehouse as history, not necessarily transactions happening now?

Blake Burch:

Yes, a lot of it is about history—how things performed, what they look like over time.

Carl Lewis:

Do you have examples of companies that have reached this critical mass point where they need a data warehouse? Are there triggers that drive that decision?

Blake Burch:

I believe that's one of the first things that should come into play, especially if you're starting a new business from scratch. Before customers have a data warehouse, they typically rely on what you said earlier: a bunch of disparate systems they have to log into to pull analytic numbers for internal reports. Or it's a bunch of spreadsheets where that data is being copied and pasted into a single sheet and then aggregated in another sheet.

We see that frequently, and it's a real sign you're probably not doing things the best/most efficient way. And if you're relying on multiple files or sheets, it should probably live in a database. But it's difficult to overcome that and figure out the tipping point of, "This is probably bad; I should set up a system to make sure this data is dumped into a warehouse instead of relying on my brittle sheet-based system."

Carl Lewis:

Exactly. You mentioned you can put almost anything you think has value, and you might want to look at its history into a data warehouse. But a lot of these disparate systems have different structures. How does a data warehouse overcome that issue where, let's say, I have information from my ERP system, Salesforce, a website, and Zendesk—and I want to put it all in one place? How does it get structured so it's not just a mess?

Blake Burch:

You must also consider how you want to build your data warehouse. As you're creating tables where the data will live, use a consistent naming system. A simple example is naming your tables related to the department, then a separator like an underscore, then a name related to the tool. Then relate it to the level of data granularity you're getting. For example, on Salesforce it might be sales_Salesforce_. Then you have contacts, deals, and companies. You must consider every report you might want to pull from that system. Where does that live?

And then, wherever you make it live, it's a matter of determining what information is valuable there. So, regarding Salesforce, there are hundreds of fields you could have that are related to a company, but which of those will drive business value? Which will you need analytics for? Some people operate on a dump-it-all philosophy, which is acceptable but overwhelming. It's nice to critically assess, "Am I actually going to use this data for some analysis? If not, is it worth storing the history of it?" Because the more data you store the history of, the more costly it will be. But that's the general structure I would follow.

Carl Lewis:

You're definitely an advocate for cloud-based systems.

Blake Burch:

Absolutely.

Carl Lewis:

Although people do some of this on-premises too. Why do you believe cloud-based systems are the smart choice now?

Blake Burch:

They can also be the expensive choice, and I like to be mindful of that because I never want to say, "This is the absolute best thing for your business." I’m not in the weeds of someone else's business; they could need a different choice. But often it saves you the human capital of maintaining the systems that power your warehouse. It ultimately gets you into a spot where your team can service themselves, analyze the data, apply multiple filters, and have numerous queries running where you're not worrying, “Do I have the capacity for them to actively use the data?” You're basically able to give them a free pass that says, "Here’s the data that's available, and your analysts can query it directly via SQL.”

Your business users can query it using drag-and-drop filters and business intelligence tools like Tableau or ThoughtSpot." That's really the reason for the cloud—it eliminates human capital. The other part is speed. Nobody wants to sit around waiting for a query for 10 hours. There are many legacy systems where the data can get so bogged down you can't get answers. What usually happens is that someone accidentally didn't query the right thing, or one little filter was off, so you waited 10 hours for the wrong answer and must wait 10 more hours for the right answer. A cloud warehouse avoids those issues.

Carl Lewis:

We're in SaaS mode now. Are these tools people use to analyze the data also SaaS-cloud driven?

Blake Burch:

A lot of the tools I've mentioned are effectively SaaS-based tools. That's what I'm most familiar with because those are the tools we typically build out integrations for ourselves at Shipyard.

Carl Lewis:

You said you could dump everything into it, but it sounds like there’s a more intelligent approach.

Blake Burch:

Asking what business problem I'm trying to solve and determining the minimum of information I need to solve this problem is better than dumping all the data in there. When you asked about how you structure this data when it's so different between each tool, you figure out, “How can I marry that data together? What’s a common factor?” For things like your CRM system, it will typically be email. For other systems, you must figure out, “Can I map people up based on IP addresses, addresses, or customer IDs? You need to make sure there are some unifying elements so the data between these two systems can actively be mapped.

Carl Lewis:

Some concept of master data must come into place to help you relate all this information from all these systems, whether that's customer ID, vendor ID, etc. Something that gives you relational capabilities. Because I can think of a scenario where invoice information in the ERP system might involve 15 tables for various things. I might not care about the addresses we shipped to, but I might care about the inventory item, quantity, description, and purchase date.

That's probably a smaller data set that would make my data warehouse operate better. Also, if I don't conglomerate it too much—maybe I want a data warehouse where I throw everything so I have a copy just in case. Or maybe I also want one that's a little more thoughtful in that way. One reason our customers’ ERP systems become frustrating for reporting is that there's everything there. And trying to understand those 15 to 2000 tables with hundreds of fields is overwhelming if you're not a consultant working in that application all the time.

Blake Burch:

People have a little FOMO (fear of missing out). If I don't store this data, what happens if I eventually need it? Well, when you need it, you can start storing it and analyzing it—it’s not worth getting yourself into a state where everything is overwhelming because you've stored everything, and you've been paying for it. Recent studies said 90% of companies only use data from the last six months. You can use things for year-over-year reporting, but most queries and analyses are often on much more recent data—and on much smaller data sets than are stored.

Carl Lewis:

Let's say a company's decided they're a good fit for data warehousing. What recommendations would you make for warehousing tools they should consider?

Blake Burch:

I strongly recommend Snowflake and BigQuery. They’re easy to get started with. You can sign up and start using them immediately, and both have usage-based pricing based on the quantity of data you're actively processing or how much time servers are up and running against your queries. And they're relatively approachable to get data into. But having a data warehouse is easy. What's more difficult is getting the data into those systems.

If you're trying to find something because you don't know how to code and you want access to data and have it dumped into a system, there are tools like Portable, Fivetran, Stitch, or Airbyte. What matters is, “Can they dump the data in my system into that warehouse?” Almost all of them support the warehouses I mentioned, but they don't all support the same tools. You just mix and match to get the data where you want it.

Carl Lewis:

What basic analytical tools would you recommend for non-programmers?

Blake Burch:

In terms of business intelligence and displaying the data, a great free option is always Google Looker Studio. Other tools like Power BI or Tableau let you access the data in those warehouses, start building out reports, and verify how things look. Those take advantage of the two opposite spectrums: tools that load the data and tools that let you look at the data. Another important part is making sure data is consistently running through the system and that you can build alerts and everything with it. That’s the orchestration aspect we focus on at Shipyard.

Carl Lewis:

So, you have these tools to take the data from one system and put it into the warehouse. Is that something you do periodically, or is it in real-time?

Blake Burch:

It depends on the frequency of your business. You'll see plenty of articles saying, "Why real-time is the way to go." But real-time only makes sense for things where the information must be up to date at that moment. So, for eCommerce situations where you’ll be handling thousands of orders per second, yes, that must be real-time. For example, a ticketing system for Taylor Swift's concert tickets must be real-time because people need to know if they can get a ticket.

Carl Lewis:

That needs to be real-time times 10!

Blake Burch:

Exactly. But most businesses can usually deal with batch processing. And again, we must think of it in terms of analytical databases versus transactional databases. Transactional needs to be as close to real-time as possible, but analytical can be delayed by three, six, twelve hours. Most brands I work with only do things daily. It’s the question, “Am I going to get any insight out of that data I can react to fast enough to warrant getting a real-time database?” If you can't answer that question, it's okay to do daily batch processing.

Carl Lewis:

Here’s something businesses will ask: “Do I have anybody who can get value out of this for me?” What are the basic skills someone on my staff needs if I go down this road?

Blake Burch:

The most basic skill you need to get the most value from your data is being able to read and write SQL. That's the language of most databases. Each has its own unique flavor and slight nuances, but at my previous job, I taught many people how to read and write SQL. It’s not too different from Excel formulas—not overly complicated. It seems scary, but you can self-teach some of that stuff.

So, SQL is the most important to make sure you can access that data. Some Python knowledge also helps. SQL will be great for getting the data out, analyzing it, understanding what it means. But if you ever want to react to it and have it direct some other system to do some specific thing if it reaches a certain threshold, you must know Python to figure out how to work with these external systems and make those calls.

Carl Lewis:

That means I'm not a candidate. I could probably do my own queries, but Python—no, don't give me that job.

Blake Burch:

That's mostly if you're trying to do super technical things and the tools I mentioned aren’t natively supported. If those tools are ingesting data, you'll probably have to write a script to grab that data and automate it on an orchestration platform. But for most tools and businesses, you could get away with none of the skills I mentioned. Point-and-click solutions will analyze the data—you just won't be able to go as deep.

Carl Lewis:

So, solutions like Tableau and Microsoft Power BI. They have connectors to most data warehouse systems.

Blake Burch:

Yes, and you don't have to know SQL to use them.

Carl Lewis:

Right. You just have to point to the right thing. That's good information. As I mentioned earlier, we have customers who’ve been running our ERP system for 20 years. Their databases grow and get a little unwieldy, and data warehousing is something they should think about.

So, it’s good to know this and know there are folks like you who specialize in helping companies get the most out of data. We live in a world where everybody wants to keep all the data forever and recognize its value. Everybody is building their own library of Alexandria. That's the way I think of it. Data is power and revenue, so they hang on.

Thank you, Blake. I appreciate your being with us. And for everyone else out there, we'll see you again on the next episode of The Connected Enterprise podcast. Stay connected.