Google Kubernetes with Prometheus

If you’ve been following along in the previous posts I’ve also stated that I’d release more content in regards to infrastructure as code and cloud native security content. Like anything if you’d like to follow along this time I’ll actually have a git repo for you to clone and work through should you like to reproduce this in your environment.

If you are going to follow along to reproduce these actions via programmatically you’ll need these permissions in the service account making the API call

roles/compute.viewer
roles/compute.securityAdmin (only required if add_cluster_firewallrules is set to true)
roles/container.clusterAdmin
roles/container.developer
roles/iam.serviceAccountAdmin
roles/iam.serviceAccountUser
roles/resourcemanager.projectIamAdmin (only required if service_account is to create)

Additionally if you’re running a new project you’ll need these APIs enable

Compute Engine API – compute.googleapis.com
Kubernetes Engine API – container.googleapis.com

You can use the repository I’ve created for this post by running

git clone https://github.com/sn0rlaxlife/gcp-demo.git
export PROJECT_ID='xxx-xxx-xxx' (keep this private)

Run a cd /gcp-demo and start with the old faithful terraform init to initialize the repository and run a terraform plan along with apply your output should look like this in the shell.

#code for gke.tf
#module
#using separately managed node pool 
# ref https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/container_cluster

resource "google_service_account" "default" {
  account_id   = "terraform"
  display_name = "Terraform admin account"
}

resource "google_container_cluster" "gke-cluster" {
  name = "gke-cluster"
  location = "us-central1"
  remove_default_node_pool = true
  initial_node_count = 1
  network = google_compute_network.vpc_network.id
  network_policy {
    enabled = true
    provider = "CALICO"
  }
  subnetwork = google_compute_subnetwork.network.id
  
  ip_allocation_policy {
    cluster_secondary_range_name = "tf-subnet-range-2"
    services_secondary_range_name = google_compute_subnetwork.network.secondary_ip_range.0.range_name
  }
}

timeouts {
  create = "30m"
  update = "40m"
}


resource "google_container_node_pool" "primary_preemptible_nodes" {
  name       = google_container_cluster.gke-cluster.name
  location   = "us-central1"
  cluster    = google_container_cluster.gke-cluster.name
  node_count = 2
  labels = {
    developer = "deveng"
  }
  metadata = {
    disable-legacy-endpoints = "true"
  }


  node_config {
    preemptible  = true
    machine_type = "e2-standard-2"
    oauth_scopes = [
      "https://www.googleapis.com/auth/logging.write",
      "https://www.googleapis.com/auth/monitoring",
    ]
  }
  depends_on = [google_container_cluster.gke-cluster]
}

Mind you first we’ve declared in the vpc.tf our network topology to consist of two subnets that will serve as are service entrance and pod range. Additionally we’ve created a new folder /prometheus that grabs the kubeconfig data to authenticate to the cluster and apply the helm manifest

Once the cluster has been created we will then navigate to the Prometheus folder

gcloud container clusters get-credentials gke-cluster --region <region-id> --project <project-id>
cd Prometheus
terraform init
terraform plan
terraform apply

If we navigate to our Kubernetes Engine page in the cloud console we can see the initialization of the cluster with our naming convention “gke-cluster” give this some time to provision as much as five minutes so if you need a coffee/tea break (suggest taking that) or just wait around.

Once you grab the credentials you have the ability to now apply manifests as you can see this generates our kubeconfig which is used to authenticate.

Once we do a terraform init/apply your output should look like this representing the helm chart

We can see our network has been provisioned in the networks page in GCP.

If we select the network we can expand the details to see more of the details we’ve declared in terraform “192.168.1.0/24” – “192.168.64.0/22”

For this demo the issues I was able to work through but you’ll likely run into is quota for the SSD requested depending on your account settings if this is the case change the resource “preemptible_nodes” to have node count one.

To access our Prometheus dashboard we will grab our link by running a kubectl proxy-forward

kubectl port-forward svc/prometheus-grafana -n prometheus 3000:80

The default username is admin and default password is prom-operator

After logging in we are granted access to the dashboard portal

To grab the password we used or be aware that this is stored in our cluster run the following commands

kubectl -n prometheus get secret prometheus-grafana -o json | jq -r '.data."admin-password"' | base64 -d
prom-operator
kubectl -n prometheus get secret prometheus-grafana -o json | jq -r '.data."admin-user"' | base64 -d
admin

If you made it this far you’ve just provisioned a kubernetes cluster running on Google Kubernetes Engine and deployed a helm chart programmatically using terraform. This is intended for a proof-of-concept as SAST scanning has been done and I’m working on hardening this configuration further.

To delete our cluster ensure you navigate to the root folder and run terraform destroy on both Prometheus folder and parent folder.

Summary

Terraform skillset is invaluable as you can have portability to be able to provision infrastructure as code and apply configurations to various cloud service providers, while some items can be confusing for beginners as you work through most errors you’ll find what works for you. It’s important to follow a consistent construct as version control and folders to separate items that might require some initial configuration as we’ve done here with the prometheus folder after authenticating into our cluster.