In this episode of The View with Vizard, Cloudian CTO Gary Ogasawara joins host Mike Vizard to explain the challenges of Kubernetes storage. The video is below, followed by a transcript of the conversation.
Mike Vizard: Hey guys. Thanks for the throw. We’re here with Gary Ogasawara who is the CTO for Cloudian. They are a maker of an object based storage system that gets deployed frequently in an on premise environments and we’re going to be talking about storage and Kubernetes and microservices and all kinds of fun stuff. Gary, welcome to the show.
Gary Ogasawara: Thanks. Nice to be with you.
Vizard: What exactly is the challenge with storage and containers these days? We see a lot more stateful apps starting to show up on Kubernetes environments. For the longest time, everybody thought it was just going to be stateless. What’s driving that and what are the challenges?
Ogasawara: Right. So, like you said, stateless Kubernetes started a lot with stateless apps. That’s a lot easier, especially to scale out. You could think of web servers that are just doing routing, but as you get to more interesting applications, it’s all about the data. It’s unique data to do analysis, unique data to make decisions and in order to have that, you need some persistent storage. And stateless and persistent don’t match, so Kubernetes really had to go back and build some of the infrastructure that’s needed to support stateful applications.
Vizard: So what exactly did that involve? I mean, was that – we put a container storage interface together and that was it or is there more deeper work going on?
Ogasawara: So, part of the advantages and beauties of Kubernetes is that it was built to be fault tolerant, so if something goes down, then Kubernetes scheduler can look around and find where to reschedule that. Now for storage, you know, it’s – that’s not as simple as just spinning up another VM or another machine anywhere. So for example, if you have some data or database and you have that data there and that machine goes down, you can’t automatically move that data from one place to another very easily and respin it out.
So, in order to support that, Kubernetes defined different concepts like storage classes and how that data, how redundant and durable data needs to be stored and you need to accommodate things like multiple replicas or some redundancy and availability strategy to maintain that.
So, that’s a lot of the work. I mean, Kubernetes is really an overlay on top of the distributed storage technology that’s been out there that we see in distributed databases, new SQL databases and the like. But it’s been moving very, very quickly and I think we’re starting to see some very good Kubernetes based storage applications now.
Vizard: Part of the equation then in my mind is there’s a lot of people who have existing storage systems, so why should I think about replacing those systems to deal with Kubernetes? Is there some reason why I might need an object model system versus just my traditional MAS or whatever I got.
Ogasawara: Well, let me interpret your question in one way that we see the storage and applications being moved. So as we talked about at the top, to make interesting applications and do interesting decisions, you need stateful apps, and that’s bot the apps and the data. And as we’re getting more and more data around, we need better ways to store it, and object storage is a very good fit for large classes that do largely unstructured data that we could globally access anywhere.
Now, if the applications are written in Kubernetes, then one option is to have your applications still remain in Kubernetes and your storage be still outside Kubernetes. And that’s a valid solution that many companies and enterprises are using, but there is another advantage to bringing the storage itself inside Kubernetes because then you could manage storage in the same way that you manage your apps. You have a single pane of glass to manage both your applications and your storage. You get all of the advantages of Kubernetes that you had for your apps to your storage as well, things like the scheduler, default tolerance where we could automatically rely on Kubernetes to manage as notes go down, you might automatically bring up another note.
And the advantages of the single control plane where you take that same software or same applications that you’ve paid a lot to develop and support and run it in different environments. You can run it at the data center. You can run it at the edge or the cloud. You know, that whole – that vision that we are all pursuing where we want everything to run at the edge, the data center at the cloud and integrate it.
Vizard: Is this in some ways just the latest flavor of hyper convergence, right, I want to put compute and storage together. I want to have one single administrator. Is that where we’re going?
Ogasawara: Yeah. Yeah, I wouldn’t call – I mean with hyper converge, we tend to think about putting it all in the same resources. With Kubernetes, the beauty is that that distribution layer is at a higher level, so we should be able to take advantage of being able to distribute our resources and our apps across a very flexible layer and one other interesting point is, it’s often when we say compute and storage, we tend to equate them and think of them as equals, but they inherently are much different.
So there is a data gravity issue where it’s very easy to move compute wherever you want. You can move from the cloud, to the edge, to the data center because compute is, you know, a few megabytes of code or a few execution architectures. But storage is very hard to move, so storage as we know, especially at Cloudian, we deal with petabytes, tends of petabytes, hundreds of petabytes of storage, and that has mass, that has gravity, and what we really want to view and think about when we hyper converge or converge, compute and storage is that they’re very different. We always want to and prefer to move the compute to the data rather than the opposite case around. So that’s important for everyone to keep in mind.
Vizard: We have seen object systems up in the cloud for some time. Are we noticing them more in a non-premise environment or out of the edge or what is the benefit of having a highly distributed object storage available to us? What will change?
Ogasawara: Yeah. So what we’ve been seeing in our customers and what we are starting to see is that the cloud computing era is ending or it’s really reaching a dead end because where the data is being generated is not generally in the cloud. Data is being generated at the edge. Right? So think of autonomous vehicles. Right? So, those – all that data is being generated by your LIDAR sensor rig on the vehicle. Right? You can’t afford to have that data make a round trip all the way from your vehicle into the cloud and back down. That’s just one example. It’s across all the verticals, manufacturing, IOT sensors out in the field. We have, you know, this proliferation of surveillance video cameras out there. That data is not in the cloud. That data is out at the edge.
So the hope is that we – with using a platform like Kubernetes, that we could make applications and manage that data and applications across those three layers, right, so the edge, the data center, and the cloud and we’re able to move data and move compute flexibly across those layers. And yeah, so that’s a new framework. I think you see it by all the major cloud providers also pushing out to the edge, right, with their, for example AWS outpost as your stack. They all recognize that to solve this solution, you really need to be out there on premise and at the edge as well.
Vizard: Is this kind of back to the future? I seem to remember back in that _____, you would bring compute to the data and then the cloud era came along and suddenly we were moving data everywhere and that created all kinds of interesting cost and security issues, so are we just returning to some level of sanity?
Ogasawara: It’s exactly back to the future. So, this cloud centric view where everything should be in the cloud is the same as what was seen in the 1960s and 70s with mainframe computers where let’s put all our data, all our compute in this central place and you have all the inherent, you know, advantages and disadvantages of that. Right? So we really need to go back to thinking about architectures and platforms that work everywhere, that we could use and have that – so there’s a lot of work technically and a lot of interesting technical challenges, but everyone wants the same thing. Right? You have something that’s easy to manage, but you can manage it and you don’t care actually if it’s run in the cloud or if it’s run on my autonomous vehicle. I just want that to be done in the right place.
Vizard: Are there performance challenges that people should be thinking about because it seems like containers kind of behave differently than traditional applications, so what are the implications from a scale out, scale up perspective? How do I kind of maintain some level of consistency around, I don’t know, IO ops or whatever other metrics we’re using these days.
Ogasawara: Yeah, it’s a matter of right sizing, I think. It’s – there’s a lot of work that needs to be done here. It’s really about the observability topic and first, you need to be able to measure exactly what you’re getting out of different places in terms of performance and then once you have that information, then you can start making resource decisions and other decisions about how to allocate workloads. But yeah, I mean, I think one of the good points that you raised is that there is some sort of additional layer of management that degrades performance at some level, right, so if we’re needing to coalesce all this information from all these different sources, then that provides a natural additional layer that is affecting performance. So that’s a downside and that’s what needs to be traded off.
Vizard: Do developers need to be aware of this stuff or can we automate much of this going forward ’cause it seems like a lot more responsibility for infrastructure shifting left, but I don’t know that many developers that are storage experts, so what’s your sense of how well that’s coming together?
Ogasawara: So, it’s a good question. I don’t know how it’s exactly going to play out. I think what we need is the right approach is APIs and common APIs that everyone can agree upon. And once we have these standard APIs, the world is much better. Then app developers can all be assured that, you know, if the understand and implement to this API, then they’re good. So, in the file world, we know Posix is a very, very strong and good API that allowed that good proliferation. For object, we have the S3 API which originated of course with Amazon, but it’s become the de facto standard, and that’s a good thing not only for app developers, but also for Amazon as it improves their ecosystem.
So, what’s needed, those are like S3 and – it’s really what we call a data point API and what we really are looking for now is a corresponding control plane API and that’s a lot where there’s a lot of hope that Kubernetes can provide that where it controls say the scheduler, how I allocate resources to different areas, how I provision and deprovision, how I do authentication and security.
All of those, if we had a standard API that everyone can agree upon, you know, the world is a better place, right, then we could get this proliferation of software developers and software companies all building to the same thing and contributing to the same ecosystem. So, the hope is that Kubernetes is really, really getting there on the control plane side, which is the harder side.
Vizard: All right. So what’s your best advice to folk as they kind of start this journey? A lot of people are just starting to build these applications now that are stateful. What should they look out for?
Ogasawara: Sure. So, I guess I would emphasize the point of think about data gravity, so think about where is your data being generated and then think – just consider that it’s more difficult than you think to move that data around. Data is not as flexible to move around as you think. And the second is API first. Think about the APIs that you want to use first and then focus on that. Then you avoid things like being locked into a single vendor or a single approach. So think about those concepts and I think, you know, you have a solid platform to do app development.
Vizard: All right. Great. You guys heard it here. One of the root sins of IT where all things bad start to happen is when you start moving data around and so be careful what you wish for and think twice before you leap as they say. Gary, thanks for being on the show.
Ogasawara: Yep, thanks Mike.
Vizard: All right guys, back to you in the studio.[End of Audio]