Amazon Web Services' Mai-Lan Tomsen Bukovec is preparing for a future in which data lives forever

Mai-Lan Tomsen Bukovec - GeekWire Cloud Summit 2019 — Mai-Lan Tomsen Bukovec, vice president and general manager of Amazon Web Services’ S3 division, speaks at the 2019 GeekWire Cloud Summit. (GeekWire Photo / Kevin Lisota)

Ridding themselves from the burden of managing hardware is one of the primary reasons why companies are moving their technology infrastructure into cloud computing. But removing that burden also unlocks new possibilities, now that computing and storage are purchased and managed separately.

That’s been one of the most exciting developments brought on by the shift to cloud computing, said Mai-Lan Tomsen Bukovec, vice president and general manager of Amazon Web Services S3 data storage service, at our 2019 GeekWire Cloud Summit last week in Bellevue. The ability to scale compute and storage separately, rather than buying bigger and more powerful boxes for your own data center every year even if you only need to upgrade one of those things, allows companies to spend their money more carefully and design new applications around this reality.

But companies that are new to the cloud often need help understanding how to take advantage of this freedom, and they also need help understanding how to manage the security of their data on the cloud. Like all cloud companies, AWS uses a “shared responsibility” model for security, which means it’s on the customer to make sure they’re using security tools provided by storage services like S3.

We talked about those topics and more during the session, which was part of a great day at the Cloud Summit. A lightly edited transcript of our conversation follows below.

GeekWire Cloud and Enterprise Editor Tom Krazit: Following the announcement that Microsoft just made (about its Women in Cloud initiative), I was wondering if you could talk a little bit about how opportunities for women in cloud and the technology industry in general, have changed over the time you’ve been working in this industry. And what more needs to be done?

Mai-Lan Tomsen Bukovec (Amazon Web Services Photo)

Mai-Lan Tomsen Bukovec: Well, I think we have made some good strides with women in tech. Primarily in terms of education, Tom. If you think about the graduating classes for computer science for both undergrad and graduate, a lot of the major tech colleges now are turning out roughly half women in their graduating classes.

Carnegie-Mellon: great university and they have a great tech program. They are pretty steadily at 50 percent, sometimes a little higher, women out of their graduating classes. So if I think about the strides that we’ve made in education, it’s really been phenomenal. If I think about the strides that we’ve made in high school education and middle school education (that’s) actually pretty good as well. A lot of the focus on STEM activities has made a big difference.

It’s something that AWS recognizes as well. We have an initiative called AWS Educate, that goes and runs events from ages 14 and higher, all over the world. Where we’re either working with for example, a women’s technical university in Korea, or we’re working with young people in the Philippines to do hackathons, to do contests, to do all kinds of education around cloud skills. Because we know in order to change the demographics, in order to grow more diversity in the technical workforce, you’ve got to start early. You’ve got to start at middle school and go on up from there.

I think the place though, Tom, where we really need to do better as an industry, is those transitions that women go through when they start having a family, as an example. We’re getting these women straight out of school, because the schools are doing a great job at getting more equal.

But then when they have their first kid, how are we as an industry being more welcoming to new parents? And I’m not talking just women. I’m talking men as well. For me, that just goes right back to the behavior that everybody in this room does, you and I.

When we come back and we see somebody who has come back from paternity leave or maternity leave and they aren’t getting much sleep, they’re back at work and they’re trying to get their feet under them? A simple gesture like, “welcome back,” goes a long way.

A lot of that is cultural. That’s something that we believe certainly in my experience at Amazon, but everybody in the room can help build that culture that welcomes not just new moms, but new parents back into work. And help them with what we think of as these life transition stages.

Tom Krazit: Yeah, I think that will make for a better overall tech industry in general as well.

Mai-Lan Tomsen Bukovec: I think so.

Tom Krazit: Jumping in to some more of what we hope to talk about today. Kevin (Scott, Microsoft CTO) referenced this during his talk (earlier in the morning), but talking about the immense amount of data that has been generated over… compared to 10 years ago, what’s being generated today. I was looking up an estimate last night and by 2020, all of us on the planet will generate 44 zettabytes of data. A zettabyte for those of you in the audience, who like me can’t count that high without help, is one trillion gigabytes. Did I do that right? (Mai-Lan nods.)

Okay. Where is all this data coming from, and what are we going to do with it?

An Amazon Web Services data center (Amazon Web Services Photo)

Mai-Lan Tomsen Bukovec: Yeah, I think it’s amazing to watch. Really, I think the growth, the explosion that we’ve been seeing for data is just driven by the explosions in the source of the data.

You have IoT sensors: John Deere tractors are sending IoT data about field conditions back up into the cloud, stored in S3 and analytics are run on top of that. You have human genome sequencing, which is just a fascinating field, (and) it’s exploding right now because of the cost of cloud storage.

So if I think about for example, in S3 we (have) one customer, HLI, which is Human Longevity, Inc., and their mission is to sequence one million human genomes. Now a genome has all the information that you need to build and maintain an organism. If you sequence a human genome, it is 100 gigabytes of data. That’s raw data, unannotated, without phenotype sources, nothing like that. If you’re just starting from 100 gigabytes of data per human genome and you’re going from there, you can see how something like medical research and genome sequencing in itself is driving some of the volume that you’re talking about.

But the list goes on: autonomous cars. Every time those autonomous cars come back and they park, they upload a ton of data. That data is necessary and important for safety, for analysis, for all sorts of things. I think we’re just at the start, to be honest, Tom. We’re talking about this explosion of data, the numbers that you’re talking about. But really, as these sources start to multiply, not just in the U.S. or in Europe, but also across southeast Asia, east Asia and it just goes global, that is a phenomenal amount of data.

Cruise Chevy Bolt — GM-owned Cruise Automation is developing autonomous driving capabilities using the Chevy Bolt as a testbed. (Cruise Automation Photo)

Tom Krazit: How have people changed the way that they want to work with this data? I mean, we’re not just throwing this (data) on a tape drive and throwing it in the closet somewhere and forgetting about it anymore. I mean, there’s a much more active relationship with your data. In your experience, what do people want to do as they generate higher and higher amounts of data in their businesses?

Mai-Lan Tomsen Bukovec: Well, I think there’s a couple of really interesting trends that we’re seeing right now. I think one of the trends that we’re seeing is that customers are now able, particularly with cloud technologies like S3, they’re able to separate the compute and the storage. This is a really fundamental concept. Because if you can separate your compute and your storage, you can independently scale them.

So the explosion of data that we’re talking about right now, that explosion of data can grow elastically, which is what we do for S3, is that we’re purpose-built for elastic scale. It can grow elastically, and then depending on what you need today for your compute, whether it’s analytics or what have you, you can build that. But you can separate the building of the compute applications from the growth of the storage itself.

What we find, is that customers do like to think about that as two separate models. Because at the end of the day, they don’t know what the compute applications they’re going to build in 10 years are going to look like. But they want to make sure that the data is there for them to do it. Machine learning is a great example of that.

We have plenty of customers who are saying, “Today I need analytics. Today I need some type of log diving on top of my data leg. But I want to get to machine learning in a year, and I need to know that my data is there for that.” I think that separation of compute and storage is what customers want.

The other thing that we have, is we have customers that have exabytes of storage. And what we find is that it’s incredibly important to build capabilities into our storage system that let you manage a terabyte of storage or an exabyte of storage equally easily. So as that data explodes, how can you make sure that you are evolving the simplicity of your capabilities such that the growth of data doesn’t mean additional complexity in having to manage it?

Tom Krazit: What enabled this separation of compute and data? How did that come about?

Mai-Lan Tomsen Bukovec: Well, I think inherent in the cloud is this idea … certainly for AWS when we think about it, we think about purpose-built systems. So we have well over 140 different services now. One of the things that we find is that when we build a particular service that’s really good at something — S3 is really good at storing the table stakes of data, which is security, durability, availability, reliability and performance — that is a core competency that you continue to develop.

And what I think customers are finding that they also have their core competencies. When they are able to separate the growth of storage from the business logic they have on some aspect of compute, they’re able to iterate faster, they’re able to innovate faster, and they’re able to experiment, which is the heart of AWS. They’re able to try things and really make a name for themselves in an industry.

We have a lot of examples of that. Airbnb in travel, there’s a whole bunch of them. But really the heart of it is that this separation lets companies do what they’re trying to do, which is innovate the new patterns of tomorrow that make them a long-term business in their field.

Tom Krazit: What do people want to do with data that we can’t do today? What are some customer requests, I guess, for things that they want to do with their data, but the technology isn’t there, or the business model isn’t right? What do you find?

Mai-Lan Tomsen Bukovec: We don’t hear a lot of what customers can’t do with their data. I think what we hear a lot of is customers are coming to us and they’re saying, “I want to do this thing with my data.”

And when we think about this, you’ve probably heard that customer obsession is in our DNA.

Tom Krazit: I have heard that.

Mai-Lan Tomsen Bukovec: I’ll give you an example. Digital Globe collects and provides satellite imagery. Digital Globe came to us a few years ago. And they said, “Look, I have this 18 year archive of satellite imagery, it’s 100 petabytes. I don’t want to build an application that uploads all that data into S3, what do I do?”

So we built the Snowmobile, we built a truck, a data truck, if you will. The data truck, one of the first customers that the data truck had, and Digital Globe has a great blog and pictures about this. Is we drove up to Digital Globe’s data centers, hooked up the truck and we put all the data they needed to move on a truck and we brought it in to the AWS region that they really wanted.

Amazon Web Services Snowmobile — If you’re a potential AWS customer that is hesitant about moving a lot of data, the company will show up at your data center with Amazon Snowmobile, a truck-sized storage-transfer unit. (Amazon Web Services Photo)

That idea or that pattern just repeats itself over and over again in storage, or compute, or database, or machine learning. Where a customer will come to us and say, “I want to do this. How do we do it?” Really one of the fun things, you’ve seen this, for working in the cloud, is that the spirit of possibility is very strong right now. And the reason for that is because all of these technologies are evolving so quickly, you can just pretty much do anything you want if you put your mind to it, and you build that purpose-built application for it.

Tom Krazit: Do you roll that truck up to your data centers if people want to move their data to another location?

Mai-Lan Tomsen Bukovec: We don’t right now.

Tom Krazit: Maybe one day. Let’s talk about storage buckets, because I feel that that’s one thing that comes up all the time when I hear people talk about S3. And most of you have probably seen stories of unfortunate or negligent people who have left their storage buckets on S3 open, discoverable on the internet resulting in breach of customer data or personal data. I know that AWS has done a lot to help people deal with this issue, but it keeps happening. First of all, why do people leave buckets open? And what more can you do to help them?

Mai-Lan Tomsen Bukovec: This one I think it’s a tough situation for customers when they get into that spot. I feel like if you think about the model that we have for security, when we architect, we architect from the ground up for security. We operate in the shared responsibility model where we on AWS are responsible for the host OS, and the virtualization layer, all the way down to the physical assets in the data center. The customer is responsible for using the tools and applying the permissions that are appropriate for their application.

So when we created S3 13 years ago, we started off and continue to maintain that when you first create a bucket, it’s locked down to just the owner of that account. That’s true today. You go create a bucket, the only one who can get in is you.

What happens over time, is that customers have other people come in and maybe modify the permissions. And sometimes they’ll modify permissions in such a way that they leave the permissions in a spot where people can do what you’re talking about. While we know that it’s a shared responsibility model, we’re also doing a ton to help customers with this.

As an example, late last year we built a capability called block public access. The reason why we built this is because we want everybody, everybody except for people who are doing web-based assets that they need to have for their application, everybody else should be using block public access. Everybody. The reason for that is because when you put block public access at an account level, you are locking down all public access for every S3 resource, object, and bucket in that account now and in the future.

That is incredibly powerful. It is the only capability in any cloud storage that lets you do this today. It’s future proofing. It means that any bucket or object created under that account ownership going forward in time, will not have public access.

Amazon Web Services S3 block public access — Last year after a wave of data breaches caused by S3 storage buckets mistakenly left open on the internet, Amazon Web Services created a new feature called “block public access” that shuts off access to all S3 storage buckets linked to an account with one click. (Amazon Web Services Photo)

That’s a type of capability we build into S3 to help customers. It’s a control if you will. But we also have services like CloudTrail, which does monitoring as you know, of access. And we have Amazon Macie, which is a machine-learning driven service that looks for personally identifiable information or intellectual property. We built this whole ecosystem at the resource layer for S3 on up to other services, to help customers with it. But at the end of the day, the application developer has to understand the permissions and apply them appropriately for their application.

Tom Krazit: Why do people leave these buckets open? What are the types of applications or the types of businesses that require that kind of access?

Mai-Lan Tomsen Bukovec: Well, when this first started happening, we went out and talked to a lot of customers. Because we said, “Look, we have all these capabilities for you to control your permissions. What’s going on here?” We found a couple of things.

One is we found that as cloud storage is being adopted by more and more companies, the ownership of the bucket starts to be shared between say different groups. And somebody might get the idea of opening up bucket permissions temporarily in order to exchange files, to share files back and forth. When we heard that, we built SFTP. Yes, it’s true. It’s an oldie but goodie. Okay, at least it’s SFTP. And we built a whole service that was dedicated towards helping customers transfer files more securely. That we launched last year or the year before, and it’s been enormously popular.

Mai-Lan Tomsen Bukovec, VP and general manager of Amazon S3 speaking at Grace Hopper x1 in Redmond, Wash. (GeekWire Photo / Clare McGrane)

So file exchange was one reason. Then people would sometimes forget to change the permissions on it. Another one is that maybe you would have a contractor or somebody else who’s not part of the company come in and maybe they don’t understand the permissions model. That was one of the main reasons we built block public access, because we think of it as a control. And if it’s your bucket and your storage, no matter who works on it, you want to apply a control.

So we’ll keep on going down that path of building more controls, just like we build services that help for auditing, like CloudTrail, or analysis like, Macie. But at the end of the day, a lot of this also comes back to a customer understanding “what is the security I want on my data, and is it in place?”

Tom Krazit: It feels like a lot of this is born of migration. People who are new to the cloud and don’t necessarily understand the best way to operate or the most secure way to operate. But as more and more companies who are not familiar with the cloud migrate to it, what are some of the things you’re thinking about to help them make this transition and keep their data secure?

Mai-Lan Tomsen Bukovec: Security is job one for AWS, just like it is for all of our customers. It’s job one. You ask any CIO, CEO, they will tell you this. This the most important thing, is if you’re storing customer data you have to protect it.

We’ve already built a great foundation. Every service that stores data in S3 has encryption by default as an option. By default or as an option for customers. We’ll keep on building more services in the security space that let people do this type of control. I think the thing that I’ve seen when working with customers who are at the beginning of their cloud journey, the thing that I see as being incredibly useful, is when you architect for it as part of the first step in your cloud migration.

I will tell you, one of the remarkable things that happens is that when the security office of some company sits down and starts to really learn and break down all the security models at AWS, one of the things that actually happens is that the security office ends up being a champion of the cloud initiatives. Not somebody who is worried or double checking.

When they deeply understand what the depth of for example AWS security is, it actually is fairly transformative to your cloud journey. Because they can put in the controls across all the different groups and the different applications migrating to the cloud. And it removes this point of friction. Because every one of those different groups is going to have to think about it.

If a central model is put in place, because the security office says this is the most secure place to put our data, just like Capital One did. Capital One did an analysis of AWS security and said, “You run your data centers more securely than we can. So we are putting everything on top of you.” That’s the type of thing that really unlocks the speed of migration, but also removes the friction of all those different applications that are moving too. Think of it upfront and think of it as something that applies across the company, and you can get both of those benefits.

Tom Krazit: If you look at the future of storage technology, I feel like storage technology gets a little bit of a bad rap. There’s not necessarily a Moore’s Law for the way that that progress has been made. But obviously you’re working on ideas, you’re working on things that you think will deliver breakthrough storage gains for customers into the future. What are some of the things you’re looking at? And how do you think those will be implemented?

Mai-Lan Tomsen Bukovec: Well, I think there’s two parts of this. One is going back Tom, to this idea, the separation of storage and compute.

We have a whole set of initiatives that make it easier to manage storage at scale, because we are seeing exabytes and exabytes of storage. I find that a lot of the ways that we think about what we build there are really governed by what customers are asking for. For example, customers want more flexibility around replicating storage from one region to another. So you’re going to see a lot more capabilities around replication on any storage class coming from AWS.

I think the other major area of innovation for us; and again this is not something that you’re going to see in any other cloud storage provider. I think one of the things about S3 is that because we’ve been around for 13 years, and we’ve worked with so many customers, we’ve heard a lot about where they’re going. Just like you were saying earlier, that, “he, I want to do this. How do I get there?” The place where we’re really innovating right now is this idea of bringing compute into storage.

We talked about the separation of compute and storage in terms of customer applications and customer data. But what we’re observing is that there are certain types of compute that customers do over and over again to their storage.

Tom Krazit: Like what?

Mai-Lan Tomsen Bukovec: I’ll give you an example. A few years back, we found that a lot of customers of data lakes, for example, were pulling out a lot of storage, hundreds of petabytes, in order to filter down to 10 percent of that data that they actually wanted to do their analytics on. They were spending money on compute clusters to filter, essentially. So a couple years ago, we launched an S3 API called S3 Select, which is a native S3 API that lets you filter within an object using SQL statements, in an object storage API, to (natively) bring back what you need.

So if you have an object that’s an Apache Parquet file, you can actually use an S3 API to pull back data from a column in that object. And it’s incredibly powerful. That is one aspect of taking the compute you would do from that cluster and bringing it in to S3. So we do the calculation for you, or we extract the data for you and we bring it back to you as part of a retrieval for storage.

Another way that we’re doing this, is that this year we launched basically a batch. We call it Batch Operations. A batch engine on S3 where you can run batch jobs on S3 in the service, without having to pay for any compute on your own. To do things like set AQLs, copy storage, or run a Lambda function. If you think about that, you’re running your Lambda function on S3 as part of the S3 operation. Rather than having to pay for it on your own. That is incredibly powerful.

An overview of Amazon S3 Batch Operations. (Amazon Web Services Image)

Tom Krazit: That’s an interesting thing to think about. Will serverless change the equation for data? If you think about serverless in terms of functions and events, I mean obviously that’s a lot of data that’s being generated. Does that create an exponential increase in what data storage requirements will require?

Mai-Lan Tomsen Bukovec: Yeah, I think what’s really interesting about serverless, is that I don’t know if it has as much of an impact on the growth of storage. It has a huge impact on the usage. Because what serverless lets you do is experiment. It lets you build applications that connect with business workflows, and it lets you take advantage of your storage in ways that you might not have thought of before. A lot of this container and serverless, other forms of compute are operating on that same data lake philosophy that people are building their next generation applications on.

Tom Krazit: What has surprised you most about the way that people use data on S3? Like, “I can’t believe Customer A did that.” Are there things that Amazon necessarily didn’t even think of, and then saw a customer who wanted to do something along those lines and was like, “Wow! That’s really something interesting.”

Mai-Lan Tomsen Bukovec: Well I feel that happens every day. It’s incredibly fun to work on AWS because we have so many different type of customers doing so many different types of things. I’ll give you one thing that surprised me a few years back. And then I’ll give you one more recent example.

A few years back, FINRA, which is the regulatory body of the U.S. stock exchanges, their mission is to be a consumer watch dog. It’s to look for instances of fraud in the daily transactions that are done all over the U.S. for the stock markets. One of the things that they did phenomenally well is that they recognized pretty early in their evolution a few years back that they were going to re-architect their mission-critical application.

This will surprise you, how many companies start first with their mission-critical application. They don’t start with a smaller app that’s off in some department as an experiment. They actually start with the main one because they recognize that that main application is going to get the most benefit. Their mission needs the benefit for what the AWS cloud provides.

The New York Stock Exchange (Bigstock Photo)

Now, FINRA is doing validations on over 500 billion stock transactions a day. All operating out of S3 and EMR because they made that decision fairly early on. And I continually am amazed at the amount of innovation and really just leadership that a lot of these companies are doing by going to the cloud on their mission critical applications.

And more recently, I think some of the really interesting use cases are around machine learning. People put data in S3, and some of them do machine learning now and some of them want to do machine learning in a year. But Marinus Analytics is a company that’s based out in Boston, and they build AI tools. What they’re doing right now is that they are running Amazon Rekognition, which is a image recognition machine learning service, off images stored in S3 to help identify and find victims of human trafficking.

It’s stories like that, it’s stories like the sequencing of the human genome, and what that means for preventative healthcare. That’s the stuff that’s to me, incredibly inspiring. Because it’s not just about technology, it’s about bringing technology and human beings together and making just the human condition better. Tom, I was in the Peace Corps, that speaks to me as an ex-Peace Corps volunteer. I think it’s one of the great things about technology that’s happening today.

Tom Krazit: So there’s a lot of customers who I’m sure are really excited about some of these future possibilities. Then there’s another class of customer that’s just like, “I want this to be cheaper.” How do you think about that, as we talk about this explosion of data, as we talk about all this data that’s going to be absorbed. It’s going to cost customers something. How will prices adjust to this explosion of data over time?

Mai-Lan Tomsen Bukovec: For whatever the customer’s goal, I would actually say that for all of our customers, cost is a priority. Now sometimes we see customers that come to AWS initially because they’re lured by the attraction of the lower cost. Then they find the benefit of the agility.

But at the end of the day, cost matters. And cost matters a lot when you have the growth of data. I think you’re going to continue to see more and more options for lower cost storage.

We launched, earlier this year, Deep Archive. And Deep Archive is less than a tenth of a cent for a gigabyte of storage per month. It’s that type of price point that makes people go, “Why should I delete my data?”

When we launched Deep Archive, we had a ton of customers that came to us and said, “At that price point I’m just not going to delete. Because what I’m going to do is just keep it around. It’s cheaper than tape, it’s better than tape. I’m just going to use the S3 API when I need it anyway. I can put it in my data lake. I’m just not going to delete it.”

I think what you’re going to find is the lowest cost price points, like Deep Archive, are going to change how people think about the evolution of that data lake and what applications they can do in the future, and keep that door open because they don’t have to delete. They have something active at their fingertips, not locked away in tape anymore.

Tom Krazit: Data that lives forever, that’s an…

Mai-Lan Tomsen Bukovec: Data that lives forever.

Tom Krazit: …interesting and scary thing. We’re about out of time. Thank you very much for being with us today.

Mai-Lan Tomsen Bukovec: Thank you.

Amazon Web Services’ Mai-Lan Tomsen Bukovec is preparing for a future in which data lives forever

Most Popular on GeekWire

Job Listings on GeekWork

Related Stories

Most Popular on GeekWire

Job Listings on GeekWork