Production Kubernetes on-premises and GDPR… Does it ring a bell? — Series I
It’s been a very long time since I posted something new, and I can feel the excitement already. I was never the person that publishes stuff that you can read from elsewhere, always the edge cases... Man, I’m stuffed like hell. I need to let lots of information get off my chest immediately. Ergo, expect a new story from me every Sunday.
Disclaimer: In this story and yet more to come, you’ll be reading lots of swearing and sarcasm against people & organizations whom will mostly get offended once they see this. Once and for all, this is my personal area and my personal experience. You are very welcome to fail for doing the exact opposite and still call me for help. Got used to clean up people’s shit anyways.
Today’s topic considers several big challenges for many of us. Yet, includes simplest solutions possible to tackle them. ’cause there aren’t many great solutions out there anyway. Everybody goes nuts over the GDPR, and I’m telling you; It’s coming, and it’s going to hit harder than you can ever imagine. Although GDPR doesn’t say anything against Kubernetes, moving from a cloud Kubernetes or just cloud itself to an on-prem Kubernetes is a real thing when the reason is only GDPR.
This story and upcoming series are technical reads from a CTO’s perspective, specifically focuses on Kubernetes (K8s) deployment on-premises. But in this Series I, we will only be focusing on our Data Center/Hardware specs and in the end “Why all these matters”.
TL;DR — If you just have one or two containerized services, fuck it. Don’t do it for show business. Spin up half a dozen virtual machines (VM) and run this:
sudo su -c "curl -sSL https://get.docker.com/ | sh"
You’ll be a lot more happier, and so do I.
We used to provision K8s clusters on the cloud and let cloud providers handle shitload of work for us, managing master/etcd nodes and doing the all things require ingress/egress. Whereas provisioning K8s clusters on-premises there with us all the time, we still can’t decide exactly on the reliable solution from many aspects. Nobody in my position whom I spoke with thought the labor for managing master/worker nodes, delivering right packets between networks, making things secure and also scalable as much as possible would be that high. This shit is too much to handle even with a two-pizza team with experienced engineers. Unless — you decide to make the wallet wide open or tell the responsible team that “This is your new life, get used to it.”.
To be able to make correct choices with on-prem Kubernetes, you need to know what you are doing from ground to top. You can’t just pick a provisioning tool (which most of them suck, like kubespray.), then try doing something, and eventually be shouting across GitHub Issues “why this thing doesn’t work!”. I would tell you my friend but you wouldn’t get it anyhow.
Yet, we can’t convince the companies to stay on the cloud, understandably. So, due to GDPR/KVKK some of them hugely considers migrating to country grounds (This shit is a crazy trend in Turkey at the moment). While doing this hard operation, why wouldn’t you modernize stuff you haven’t already? Most companies I engaged over-time has lot’s of on-premise resources in several known data centers like Equinix or Sparkle. This story suits the best for these companies.
Here, my friends, by the power given me from the past 3 years’ experience in this specific scenario, I’ll be telling you not to afraid, even from the “Back-off restarting failed container” errors from a Percona MySQL Galera Cluster or filthiest major cluster upgrade scenarios, all on-premises.
Breaking down today's pillars:
Hypervisor: Let’s talk a bit, just a bit, it’s so deep.
Obviously, you haven’t been thinking to deploy on Bare Metal, right..?
In my consideration and experience, VMware ESXi/vSphere is the most reliable hypervisor out there, and thank god most companies use it. It’s enterprise grade, robust, reliable, fast and fairly priced. In our to-be system, this actually is the only thing you are going to have to pay licensing for. It’s worth and also the sweetest-spot.
- Of course a lot of players are in this area, like XEN, oVirt/RHV, Virtuozzo, OpenStack, Proxmox, etc. However, considering from the K8s perspective, cluster provisioning labor is way up high in these, as tools are not that great/natively support machine creation/configuration reliably.
- Talking about the non-K8s perspective, one day on XenServer you can end up hitting disk IO issues (non-licensed installations), or the other day you can suffer ridiculous VDSM/SPM issues on oVirt/RHV that you can’t figure out how.
I’ve never seen any chronic or catastrophic issues with VMware, so far. It’s proven through the years experience and that's a huge plus for my organization. We depend on VMware on lots of scenarios without any hesitation, not just K8s and it’s orchestration.
Storage: Did you forget why we are doing this?
The only thing GDPR actually concerns about is the data, and the way it’s protected. Not the applications god knows where and how they run at, nor the kind of firewall device you initially planned to use. If you are concerned about GDPR, that means you have been collecting/generating/persisting some people’s data, and you will require at the very least one storage solution.
Specifically, Fibre Channel Flash Arrays is a really modern way to start. Never go with 1U/2U Server snap-on type SSD/HDD Disks, they will fail you in the long run. Also the solution must be working with VMware, could be either VMFS or vSAN. As long as when you have launched a new VM with a disk attached, if it’s provisioned in the Datastore outside ESXi’s, we are good.
If you don’t have any storage solution and planning to get one, my recommendation would be Pure Storage. When compared to EMC, NetApp or other players, you might never have heard of them. But they are astonishingly great, the pricing policies on Pure Storage are really sweet and quality of the disks depend on the OEM itself. Unlike big players, Pure can’t steal from your pocket on failed disk swaps. Depending on your storage space needs and other features, you can find a model for your taste pretty quickly.
Network: Have you ever heard of PPS?
As the storage works over the network, switching capability comes to mind. You have to use quality switches especially for storage, however you should also have another switch pool for overall networking for ESXi nodes or your firewall. These hardware switching pools (Two devices per pool) must be completely separated to avoid unnecessary bottlenecks. On the storage side, providing at least 10 Gbit/s per port (25 Gbit/s better nowadays) with forwarding throughput of more than 10 billion packets per second (PPS) for a neat flow.
My bet here is for Brocade, which is a part of Broadcom lately. Other quality brands like Juniper or Cisco are welcome too, as long as they meet specs and you can manage them hassle free. Last time I checked, our Brocade pool has never been rebooted for over 2160 days. I wish you the same outcome.
How’s it going to look like when we set our subnets?
- 192.168.0.0/24 — Only for storage devices, must be physical network.
- 192.168.1.0/24 — Only for bare metal devices, must be physical network.
- 192.168.100.0/24 — Only for Kubernetes VM’s, can be NSX-T (virtual) network or physical. On this specific network it’s required to deploy a DHCP server. Doesn’t matter if it’s from a hardware switch or via the NSX Edge Service Gateway (ESG).
- 192.168.254.0/24 — Management network, must be a physical network. We will plug in our iDRAC, iLO or other IPMI solutions on this network.
Kubernetes VM’s, or in other terms 192.168.100.0/24 must be able to talk with 192.168.1.0/24 bidirectionally. So that, maintaining disks and other stuff like, machine creation, configuration changes or even machine removal at some point can be done.
I’ve mentioned this because you will need some sort of router/firewall to egress your way out to internet from these private networks. While setting your policies, you can now consider this. If you are keen on these things, you can get two garbage servers with sufficient CPU/Memory and deploy pfSense on top to make your own firewall, just for the garbage price.
Don’t worry about the Hardware Load Balancer at this point, unless you got one like F5, Citrix, Barracuda, etc. Never, ever, use your firewall as a load balancer, even though it supports load balancing! It’s not meant to do that. They get easily overheated and bottlenecked during the SSL/TLS encryption or decryption phases. Sucking up their average lifetime and locks up your network traffic, so you are dead when this device dies, so the business.
Provisioner: The most fun part, Trichotillomania. (and some personal stories)
You might have never heard of this disorder. In basic terms, Trichotillomania is pulling hair out due to stress.
It’s been 7 years since Kubernetes became globally available. Yet, still we have to deal with shitty on-premise cluster provisioners. They are either trying to rip off your wallet or skull-f#ck your entire architecture with meaningless deployment approaches and eventually create something unmanageable and faulty from the very beginning. If I’d like to pay for deploying an open-source project, I would’ve gone with Tanzu, considering my current environment… But why would on EARTH, I would do it? It’s still unstable, yet away from lots of features, and expensive as hell. Just because paying up means “I’ll get enterprise support over something unmanageable — even for them” — justifies my pure efforts of dealing things on my own by my team with peace alone.
I tried many things, believe me. I can even make a list named “which one is the shittiest —popularity ordered” and call it a day over the gags.
After 13 weeks of overtime, hard work and sweat; I’ve found my companion.
- Low pod footprint
- Feels right
. . .
The most important of all… IT WORKS!
Almost two years ago, I was a consultant, providing DevOps/Cloud/Consultancy services for most of the biggest companies in the country. Back then, I was at OpenShift 4’s (OCP4) launch event and I saw that, every IT manager in those companies were so excited over OCP4 that they completely lost their minds, got hyped as much as they would. So that we had to change our priorities, pioneer the platform, and then spread it around like a religion. Which also affected my position in place.
We sold a lot, and succeeded. I did my job great, every company I visit at was adoring my efforts to run their nonsensical architectures within a modern approach. During those days, I wasn’t getting why they were paying for OCP4, or Pivotal Kubernetes Service (PKS — became Tanzu), or SUSE Container as a Service Platform (shittiest of all —LITERALLY — imagine bootstrapping a cluster with kubeadm and paying for it to get enterprise support that nobody knows how to help you out, LoL!).
I thought they had reasons. I thought maybe they wanted the features. What it actually was that they were just illustrated with the wrapped version of K8s. Everything means money when you are an enterprise. They were paying for the tool. They were paying for the consultancy for the tool. They were paying for the educations/trainings for those consultants that they supposed to provide support for the tool. They even paid extra hours for their own extra development efforts on top of vanilla K8s to adapt OCP4. As an IT guy, I can’t describe my feelings how f#cked up things were.
Let me tell you what; Several months passed and, technically speaking I still don’t get why they have paid and still are paying for this, LoL! I started to believe I’ll never get it…
As in all other RedHat projects, OpenShift has an open-source edition as well (following a couple releases behind). If you know OCP4, then you must’ve heard Origin Kubernetes Distribution (OKD). They are almost identical in feature-wise, without the obligation to pay for enterprise support or stupid license fees.
Within my prior experience with OpenShift, I thought I would be more comfortable provisioning K8s with extra benefits. So I did, and provisioning phase was so smooth. Still, better installation experience than Rancher and any other on-premise solution thanks to Installer Provisioned Infrastructure (IPI). You don’t even have to worry about load balancing, it deploys keepalived and HAProxy cluster (OpenShift Router) for you. Here it comes…
BUT, RedHat recommends that master nodes must have 16g memory and 4 cores each, with a 3 node cluster, and another cluster containing 3 worker nodes with minimum of 8g memory and 2 cores, so that OKD pods will be running there. Minimum 120g disk space on all 6 nodes.
Can you look at this mess? This is something I forgot during my OCP4 provisioning days. It’s unacceptable. I’m going to sacrifice 18 cores of 2nd Generation Intel Gold processors, 72gb’s of memory and 720g of disk space for what? A fancy cluster? 251 pods are already running, I have just logged-in!
In the screenshot above, you could say that “C’mooon don’t exaggerate, it’s just using 20gb of memory, not 70gb at all.” Well then my genius friend, I’m challenging you to deploy a PXC cluster on a clean OKD/OCP install, with 600m core and 1g memory on each pxc pod and 150m core and 300m memory on each HAProxy pod.
See, you won’t be able to deploy second PXC pod, I guarantee you that. This is mainly because the 48g of memory is already reserved for master nodes cluster and they are unschedulable, more than 200 pods are running at the worker node cluster with only 24g memory, there are requests, limits and QoS policies in play. We are just getting started with OKD. We don’t have a private registry or any build jobs running, or any other cluster-wide used tool deployed in that matter.
That wasn’t the only issue but, I pivoted to solutions like kubespray, conjure-up, and many more… To have a lightweight approach. Well f#ck me, alright? Never, ever going to walk through that path again, not even for a single customer. Provisioning stages are just PITA, and they don’t work as intended. It’s 2021, why would I want to provision my VM’s first and then make an inventory for them so that our wonderful tools will work their way up from where I or Terraform started. If you make this an option to automatic provisioning approaches, then yeah, good work mate. There are freaks who’d like to over-engineer something they shouldn’t touch, good for them. But as an obligation, I don’t think so.
Finally landed my feet on Rancher Server.
Would you look at this beauty? Impeccable. Just 13 pods used and we’re alive with 3 masters and 2 workers, Nginx Ingress Controller and Default Backend pods are enabled. vSphere Cloud Provider was also enabled, so now we can provision any volume we want from RKE at any time or scale our cluster within click of a button in about 5 minutes. Feels like Kubernetes on cloud.
Best part is, I never had to touch any server or provision something from scratch, or spend my days and nights coding Terraform/Ansible scripts.
Up next: Series II. We’ll dig into lots of details with Rancher and peripherals.
Stay tuned and thank you for reading.