7 Ways to Fail at Building a Platform
Given at cfgmgmtcamp on
A talk on the seven ways most platform builds fail: unexpected scope creep, underestimating ongoing investment, treating it as a project instead of a product, homegrown lock-in, retaining the skilled people who built it, keeping up with security and compliance, and resume-driven development. Plus a Wardley-map-flavored bonus on build vs. buy.
Slides
Recording
Further Resources
Transcript
Why platforms, why now
What I want to go over today is platforms - some observations I've had, things people have told me. I don't actually do real work, I just make slides, so this is all through other people's experience. Seven things. Some of them are kind of the same thing, but they're things people stumble with when they decide to build their own platform. By "platform" I mean platform as a service, an internal developer platform, an application platform - whatever you want to call that kind of thing. It has several names that it goes by.
I've been an analyst, I like charts, so a quick warning: if you don't like industry analysts and surveys, prepare yourself. Surveys could be right, they could be wrong, they're somewhere in the middle. As the chart people say, they're directional. Better than just making a claim without a visual.
So why are platforms interesting now? Here's a chart from the most recent CNCF Annual Survey. If you look at the dark bars, you see a steady rise in containers running production applications. I mostly pay attention to production, because that's what's actually running, not just dev and test and staging. Containers - which is to say Kubernetes for the most part - are starting to be a significant part of what's running the world.
A little tempering: if you ask worldwide what percentage of all apps are running in containers, it's actually a hard number to find. Last time I tried to figure it out in the fall, my estimate was around 20% tops, depending on how you count hyperscalers versus banks versus everyone else. Here's another cut from IDC showing the same momentum - the green is VMs, the red is containers, and you can see that same steady rise. Funny enough, as the report writers note, most of those containers run inside VMs anyway.
Now we have all these containers, presumably running applications, and what becomes important is how we manage them. Per Gartner, by 2027, 80% of large organizations will embrace platform engineering - up from less than 30% in 2023. I never quite trust an 80%. Even if 80% is accurate, you should bring it down to 78 or 82, because of the 80/20 spread that people just kind of make up. But the direction is real: most organizations are interested in building platforms.
This is part of a cycle that's been going on for a while. New infrastructure shows up, everyone gets obsessed with the infrastructure, builds it out, often redoes it instead of improving what they already have. Around 2007 you got Heroku and the first wave of platform-as-a-service. Around 2018-2019 the orchestrators arrived, and Kubernetes consumed everyone's attention - container as a service. Now we don't call it platform-as-a-service anymore (erase that from your mind, even though I'll let myself say it), we call it platform. The interest is coming back. Now that all these containers are running, people are finally asking: how do we make it easier for application developers - or our robots, if that's how you're doing it - to deploy and manage things at scale?
The annoying pattern I see: everyone kind of knows they need a platform, and they might appreciate platforms out there, but they're like, "we have special needs, we have unique needs." We might be one of 300 of our type of organization, or one of 3,000 in the world that does pretty much exactly the same thing, but the way we do our platform - something special. Sarah Wells said in 2024, "Don't build something if you can buy it." Kelsey Hightower said in 2025, "Do not blindly start with Kubernetes. Seriously. If your application can get by with a simple PaaS or serverless offering I'd consider that first. Even VMs make sense for most situations." Abby Bangser at Syntasso put it as: "It's not about rebuilding what we can purchase that is available on the market. It's about making sure we spend our time building the things that are bespoke and important for our organization." The experts hint that building your own thing is a bad idea, but we keep doing it.
What is a platform? A couple of years ago the CNCF came out with a platform reference architecture, which is great because when I'm told not to give vendor pitches, I can show you a neutral industry-consortium one. There's a lot of stuff on it, not all of it strictly in the platform - some of it is what your platform integrates with - but the components it covers should feel familiar if your frame of reference is platform-as-a-service. The whole thing is centered around: you write your own application, you want to run it, you want to manage it. Of note, the infrastructure is kind of unimportant in this picture. You run on whatever you want.
And if you're really into 3D diagrams, here's a fun view from Forrester of everything in the IT stack. Maybe they can use this for Tron 3. It's hard to tell if the back part is in front or in back. Good optical illusion. Take the slides home and ponder it.
#1 Unexpected scope creep
I guess all scope creep is unexpected, but this is the kind you can predict. What people often think a platform is - and what they end up building the first time and then abandoning - is base container images, templates, maybe a namespace in a cluster handed over to developers. As a former application developer, I think of this as the blinking cursor problem: the platform is a blinking cursor waiting for me to do everything else. Sure, the templates and base images are best practices. But what you're leaving out is huge.
Here's just stuff off the top of my head: app delivery, backup and restore, patch management, observability, service management, RBAC, vulnerability scanning, dev framework integration, high availability and the other -ilities, multi-region, sovereign cloud, auditing, multi-tenancy, upgrading the platform itself, gateways, brokers, load balancers, CI/CD or its integration. When I look at platform teams in large organizations, they're heavily involved in all of these. They're not just delivering a pipeline and base container images. They have a whole finished thing that takes the toil and worry out of the developers' hands and out of the security and audit people's hands. It automates a ton of work. Adib Saikali made a related point in his Cur8s talk - "Stop Renting Your Knowledge."
To drive the point home, here's the new CNCF Cloud Native Maturity Model, summarized by ChatGPT. I warned you I'd use phrases like "maturity cycle." Across five phases - Build, Operate, Scale, Improve, Adapt - you're constantly expanding from that original scope, adding new features, integrations, and changing the org around. That's a lot of stuff. The point of unexpected scope creep is: you initially think building a platform is going to be easy. I think it's only application developers who think everything can be done in a weekend. Other people, a little wiser, give it a couple of weeks. And then it ends up being more features than you thought, you're partway through, and you don't actually have all of the features.
#2 Underestimating the ongoing investment
That partial scope ends up underestimating the ongoing investment - which is to say, the money you have to pay for it. Model out a platform team. Use whatever currency, it's just simple math. One team of three to eight people, paid 125,000 currency units per person per year - which I realize is not all-in pay depending on region, but bear with me. Chart that cumulative cost out over five years and you start talking about real money for one team that builds the platform, runs it, maintains it, and does the ongoing work a platform team does.
But you don't have one team. Depending on which components you actually build, you have more like three to eight teams of three to eight people. Do that math and the cumulative spend gets big.
And there's a related thing. I was talking with someone three weeks ago at a large bank who pointed out that with the blinking cursor model - just giving developers access to namespaces and clusters - what ended up happening is each application development group needed shadow ops people on those teams who finished out the platform and managed the platform that was given to them. This is a delightful big-bureaucracy enterprise conversation: the managers of those teams loved it, because now they have more headcount and responsibility and they seem cooler. It's a vicious cycle to get caught in unless you want to seem cooler.
In contrast, when teams don't spend their time building out their platform but instead spend it managing one, helping people use it, doing the integration - over the past eight years or so you see pretty impressive ratios. Number of operations people running the platform versus number of developers and apps supported. 30,000 devs to 50 ops. 6,500 devs to 16 ops. 1,200 devs to 6 ops. 350 apps to 7 ops. All based on true stories. If you bank at Rabobank, they're up here in one of these. Run your own platform against these benchmarks and see if you're hitting them.
#3 Platform as a project (instead of as a product)
Number three is the same face from a different angle. People run their platform as a project. What this ends up being is platforms as sprawling projects. You have many different platforms, because someone thought it would be a one-off thing. Someone this morning was saying: probably no one in this room, but everyone pretty much does 80% of a project and then moves on to a new thing. That sprawl is what you get.
Often this happens because people have very important critical needs that only they can build, that can't be achieved through buying a platform or using the corporate standard. Abby Bangser at the last KubeCon had a great 15-minute talk on this. Her analogy - or metaphor, I always get those mixed up despite all the studies I've done - is that those platforms are like Christmas puppies. Or, if you're wealthier, Christmas ponies. I've got three kids and every time they ask for a pet I tell them: I know you want a pet, but to use one of my favorite phrases, that was your gift and now it's my problem.
You can see the consequences in the survey data. People want fewer platforms. They want to consolidate. They want to standardize. There's a bank in this area on the continent whose whole initiative is: we spent five years on silos and specialization, let's stop doing that. They've put initiatives in to consolidate. You see that over and over again.
Instead, you want platform as a product. The product part is the important part. If you have a product and you want it to stay around, you have to realize you have customers - or users, if you prefer. For a platform, the customers are primarily application developers, plus stakeholders like compliance and security people. You product-manage it: how do you build it, add features, gather feedback. Onno Ceelen and Roy Triesscheijn from bol.com - the Amazon of the Netherlands, for those who don't use it - had a great DevOpsDays Amsterdam talk on this, with the platform team in the middle of the silos, unifying and integrating things for developers.
The point about product-managing a platform is: it is a fair amount of work to monitor what your customers want, run experiments, and gather feedback on whether they worked. If you're doing that, you're probably not going to want to spend a ton of time also building the goopy innards of the platform. Here's a survey of the kinds of questions you'd send out to developers if you were product-managing it - drawn from "Developer Toil: The Hidden Tech Debt" by Susie Forbath, Tyson McNulty, and me, plus Michael Galloway's interview questions for platform product managers. These are what a platform team is interested in, versus the plumbing underneath - which you could just outsource.
#4 Homegrown lock-in
Number four: homegrown lock-in. Or, to use a word from earlier today, artisanal lock-in. As a vendor, I encounter this all the time, and like every time you get punched in the stomach - which I assume is very memorable, I imagine - it's memorable for me. People insist they need customized stuff. And, even worse, "if we were to buy a platform we would be locked into it forever." If you ask them what lock-in is, they say, "it's lock-in. We know we don't want lock-in." So we'll get some open source, integrate it together into our own thing, and we're fine. We've avoided lock-in.
Of course, as the title says, what you've built is your own form of lock-in. You're assuming the design and knowledge persists past the people who find a new job after building the platform. You've built your own cage. You can't really go talk with other people about how they do things - you can only talk to yourself, which is delightful on other topics.
Better ways to think about this. Way back in 2006, Simon Phipps at Sun had this notion of "the freedom to leave." It was essentially another way of talking about portability. Back when proprietary versus open source was very controversial, this was the more sophisticated take: you can use whatever, but it should be easy to leave. There's also switching costs - not just license or subscription, but the time you put in, the risk you take on, the time it takes to move and rearrange. You probably want to maximize freedom and portability and minimize switching cost. Gregor Hohpe's "Don't get locked up into avoiding lock-in" and Keith Townsend's "Thinking About VMware Alternatives?" are useful here. Run those criteria against what you actually want, rather than the reflex "if we spend money on something we must be locked in."
#5 Retaining skilled people
Retaining the skilled people who can build platforms - the ones who know about that exciting plumbing - turns out to be hard, because they're very valuable, highly skilled people. As proof, look at the CNCF surveys from the oldest to the newest. Lack of skills around cloud native and Kubernetes has been at the top since 2017. Obviously there's a typo on the slide - that's supposed to be 2025 on the right. We've gotten 4% better at it since 2017. Skills and knowledge of how to run cloud native is always a problem.
Security is always the number one issue in these surveys, so I'd raise up skills above it - of course security, you don't even need to ask, and who's going to answer "top three? Security? No way, don't care"? You almost feel obligated. The skills to build a platform - not just run it - are very valuable. So you hear the story over and over again: lots of organizations try to build out their own platform on whatever container or Kubernetes stack. At the end of a year, most of them haven't really delivered much. Scope creep, boredom, a new executive comes in and redoes the strategy. A few of them actually succeed in building something with a bunch of production applications, and they get up on a much bigger stage than this and give a keynote, and then mysteriously, three or six months later, they're working at a new organization. From a worker perspective, that's great. From the organizational perspective, losing the people - sometimes whole teams - who just built something really cool is not great. You're left with this platform the wizards made for you, and a lack of wizards who know how to do anything with it.
The flip side of homegrown lock-in: if you buy a platform, there are many other organizations using it. From a worker standpoint, the value of the workers goes down a little - sorry about that - but the chance of retaining them goes up, and you don't have the interruption of depending on poached skilled labor. If you're a manager, an alternative strategy: never send your successful platform builders to a conference. Don't let them present. Even more impressive to think about, but it might be effective.
#6 Keeping up with security and compliance
I am by no means a security expert. Security people start talking to me and I start thinking about how I have to finish book four of Dungeon Crawler Carl to get to book five. I just found out there's actually eight books. I thought number five was going to be the last and I'd be caught up. So I'm busy. What was that guy talking about? I forget. I just zone out.
So I tried to think how to represent the increasing time required to worry about security if you've built your own platform. Think about that stack again - all the integrations, all the dependencies. Jerry Gamblin's 2025 CVE Data Review shows it just gets worse every year. Things are constantly attacking what you're doing. Christmas puppy: not only was this your gift, it's also your problem. Security is going to consume a tremendous amount of your time. Meanwhile, you've still got to have your daily standup to figure out what new features you're going to give your developer customers - which is to say, product management.
And not only security. There's also the thrilling world of governance and geopolitical climate and macroeconomic trends - sovereign cloud, all that. If you've built your own unique Christmas platform - a gifted platform - it's also your responsibility to figure out how to move things around to keep up. That sounds like something you'd want to outsource.
#7 Resume-driven development
Number seven is highly related to the skills point, but it's more an effect to watch out for. You've probably heard the phrase resume-driven development. There's actually a paper - it's not just funny tweets or skeets or tootses or whatever they're called now. Some Germans got together and wrote it, of course.
The cycle: when you want to hire people, you make the job seem interesting, at least in tech. You list out new technologies, new opportunities. Hirers signal "work on this cool technology." Potential employees figure out: I have to know this new thing to get a cool job, or just a job. Whether you're an operations person or a developer or in IT, you start thinking you should be working on cool new things, not the old stuff. Consciously or unconsciously, that drives use of new things and moving on from what already runs. There's the joke that we want technology to be boring - which means it runs well, we understand it, it's stable, and we can start making money or providing services to citizens. But boring technologies get ignored as people pursue resume-driven development.
I wouldn't have shown you the paper if its conclusion didn't agree with me. The authors find a cluster: less knowledge about the system, an overall worse system, a high degree of - as they delightfully call it - RDD. There's even a technical term for it.
Bonus: build vs. buy with a Wardley map
That's seven, but I usually skip five slides because I cram too many in. This time it's actually short, so here's some bonus material. If you're into business stuff and you're older, you might remember Wardley maps. I've got to be honest, I never really got into them. But people were thrilled with this stuff when it came out. Here it is for those who like it.
The point is: when you're building a stack, figure out what you should customize - what you should do yourself - versus what you should buy. Find the things that aren't a competitive advantage. Or, the way I think about it: what are the things that do not cause your customers to give you money? When I go to Albert Heijn, the home of our hamster friend, I have no idea what platform they're running. I don't go there because of their platform. I go for the groceries and the mobile app experience - on the self-checkout I don't have to print a receipt, I just scan the barcode on my phone and walk out. They even know to put a trash can right by the gate, because they know that piece of paper is useless. Their applications are good. I have no idea if they run Kubernetes or Mesosphere or whatever. It's not valuable to them. Don't spend time on it - transform money into not spending time on it, and buy the commodity thing.
In contrast, also in the Netherlands, there's a delivery service called Picnic - I don't know how to pronounce it in Dutch, sorry. They custom-build these tiny narrow trucks. I found this out waiting for a lunch a year or two ago. They've customized that because they're a grocery delivery service competing on getting through narrow downtown streets quickly and cheaply. So they built that platform - unlike the Albert Heijn and Jumbo trucks that are just trucks, because what do they care, they pop up on the curb with your groceries. When it comes to building versus buying a platform - or anything - go through one of these exercises. Ask: if we built our own platform, are we going to get more customers? Is anyone going to think, "I really like your discount pants, but the fact that you built your own platform - that's why I'm buying you over the H&M pants"? Probably not.
And one more bonus thing: AI. Maybe I haven't mentioned it until now, which is perhaps a personal best. We're going to be adding this whole new workload - applications written by AI, running models, running plugins. As a platform person, that's a whole other load on your plate. Even more reason not to spend your time building your own platform.
Stop building platforms. Start building apps.
So that's why I generally think you should stop building platforms and start building apps. Don't worry about building your platform - there are plenty out there you can have. Buy everything you can, as Abby Bangser put it. With that, thanks for coming. Email me, grab the slides at cote.io/diy/. There's a paper this is based on if you want to read more.