DevOps/Kubernetes Cheatsheet

DevOps is the combination of cultural philosophies, practices, and tools in software engineering that encourages collaboration between traditionally siloed development and IT operations teams. This is my personal Cheat sheet for various DevOps related tools. If there is some context missing here or there, don't hesitate to reach out to me on Twitter.

Kubernetes

If you're working with kubectl the FIRST thing you want to do is setup autocompletion.

Context & Multiple Clusters

# Get configuration from cloud
exo sks kubeconfig $cluster-name kube-admin --zone ch-dk-2 --group system:masters >> ~/.kube/$CONFIG_NAME

# Create Context
kubectl config --kubeconfig=~/.kube/$CONFIG_NAME set-context $CONTEXT_NAME


# View new cluster and user entry
kubectl config view
nano ~/.kube/$CONFIG_NAME
# Edit names to differentiate between clusters

# Create a new context
kubectl config set-context $CONTEXT_NAME --cluster=$CLUSTER_NAME --user=$USER_NAME --namespace=default

# Use Context
 kubectl config use-context $CONTEXT_NAME

Select Pods by label in all namespaces:

kubectl get pods -l app=longhorn-manager -A

Secrets

Decode a secret:

kubectl get secrets/argo-postgres-config -n argo --template=\{\{.data.password\}\} | base64 -D

Pods

Delete pods with name starting with $STRING. Don't do this and use labels instead.

for pod in $(kubectl get po -n argo | grep "$STRING" | awk '{print $1}'); do kubectl delete pod/$pod -n $NAMESPACE; done;

Deployments

Restart all deployments in $NAMESPACE.

for deployment in $(kubectl get deployment --namespace $NAMESPACE -o jsonpath='{.items[*].metadata.name}'); do
   kubectl rollout status deployment $deployment --namespace $NAMESPACE
done

Restart Deployment:

kubectl rollout restart deployment $DEPLOYMENT --namespace $NAMESPACE

DevSpace ImagePullBackoff

I recently had the problem that my devspace helm deployments could not pull images from a private registry. Since devspace is using the default sa account it is possible to patch the account and set a default imagePullSecret.

kubectl patch sa default -n $NAMESPACE -p '"imagePullSecrets": [{"name": "$SECRET_NAME" }]'

Longhorn PVC/PV Debugging

Error:

Warning  FailedScheduling  26m (x2371 over 40h)  default-scheduler  0/3 nodes are available: 3 pod has unbound immediate PersistentVolumeClaims.

Get PVC name from description kubectl describe $POD -n argo | grep ClaimName More Information about the PVC:

kubectl get pvc $PVC -n $NAMESPACE -o yaml

kubectl get pvc -n postgres
kubectl describe pvc postgres-db-postgres-cluster-1-0  -n postgres

PV stuck in Termintating

kubectl get pv
kubectl patch pv pvc-f87cd151-8497-4a6e-a991-a9d1a99fd761 -p '{"metadata":{"finalizers":null}}'

kubectl describe pvc postgres-db-postgres-cluster-1-0  -n postgres

Error invalid json format of recurringJobSelector: invalid character '}' looking for beginning of object key string:

Changed:

recurringJobSelector: '[
  {"name":"backup-s3",
  "isGroup":true,}
  ]'

# to

recurringJobSelector: '[{"name":"backup-s3","isGroup":true,}]'
recurringJobSelector: '[{"name":"backup-s3","isGroup":true}]'

Without the ,.

Validate with:

kubectl get cm longhorn-storageclass -o yaml -n longhorn-system
kubectl describe kubegres postgres-cluster  -n postgres

Linux General

Encrypt/Decrypt String

echo 12345678901 | openssl enc -e -base64 -aes-128-ctr -nopad -nosalt -k secret_password

echo cSTzU8+UPQQwpRAq | openssl enc -d -base64 -aes-128-ctr -nopad -nosalt -k secret_password

Prometheus / Grafana

Monitoring Argo Workflows with Prometheus and Grafana

Deploy the Helm Chart prometheus-community/kube-prometheus-stack into the namespace monitoring.

Add the label monitoring to the Service workflow-controller-metrics with the value prometheus.

apiVersion: v1
kind: Service
metadata:
labels:
   app: workflow-controller
   monitoring: prometheus
name: workflow-controller-metrics
namespace: argo
ports:
- name: metrics
   port: 9090
   protocol: TCP
   targetPort: 9090
selector:
   app: workflow-controller


3. Create a `ServiceMonitor` for the `workflow-controller-metrics` service. After applying this config the prometheus server should auto discover the service. The label `release: prometheus` is mandatory.

   ```yaml
   apiVersion: monitoring.coreos.com/v1
   kind: ServiceMonitor
   metadata:
    name: workflow-controller-metrics
    labels:
      release: prometheus
   spec:
    endpoints:
      - path: /metrics
        port: metrics
        scheme: http
        scrapeTimeout: 30s
    jobLabel: argo-workflows
    namespaceSelector:
      matchNames: - argo
    selector:
      matchLabels:
      monitoring: prometheus
   ```

4. Add annotations to the Service to enable the Prometheus data scraping. The annotation `prometheus.io/scrape: “true”` is mandatory!

```yaml
apiVersion: v1
kind: Service
metadata:
  annotations:
    prometheus.io/scrape: "true"
    prometheus.io/path: "/metrics"
    prometheus.io/port: "9090"
  labels:
    app: workflow-controller
    monitoring: prometheus
  name: workflow-controller-metrics
spec:
  ports:
    - name: metrics
      port: 9090
      protocol: TCP
      targetPort: 9090
  selector:
    app: workflow-controller
```

5. Reload the prometheus configuration.

```bash
kubectl port-forward svc/prometheus-kube-prometheus-prometheus 9091:9090 -n monitoring
curl -X POST http://localhost:9091/-/reload
```

6. Add the Grafana Dashboard as configmap to the namespace `monitoring`. Download the Dashboard [here](https://grafana.com/grafana/dashboards/13927).
   Change all occurences of `${DS_THANOS-MASTER}` to `prometheus`. This value would normally be set while importing the dashboard, but since we are automatically importing the dashboard, we need to change it. The label `grafana_dashboard: "1"` is crucial, without it the dashboard will not be imported.

   ```yaml
   apiVersion: v1
   kind: ConfigMap
   metadata:
   name: argo-workflows-dashboard
   labels:
     grafana_dashboard: "1"
   data:
   argo-workflows-dashboard.json: |
     ...
   ```

7. Delete the Grafana Pod. After a new Pod is created, it should automatically import the dashboard.

### Debug Prometheus ServiceMonitor not added to configuration.

Example ServiceMonitor:

```yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: workflow-controller-metrics
  labels:
    release: prometheus
spec:
  endpoints:
    - path: /metrics
      port: metrics
      scheme: http
      scrapeTimeout: 30s
  jobLabel: argo-workflows
  namespaceSelector:
    matchNames: - argo
  selector:
    matchLabels:
    monitoring: prometheus
```

- Are the label selectors correct for the ServiceMonitor? Try selecting the services and pods by the labels specified in the ServiceMonitor configuration.
  `kubectl get servicemonitor -l release=prometheus -A`
- Have the default ServiceMonitors any labels you havent added to the custom one? `kubectl get servicemonitor prometheus-kube-prometheus-apiserver -o yaml -n monitoring`
- Check the prometheus configuration in the UI. `kubectl port-forward svc/prometheus-kube-prometheus-prometheus 9091:9090 -n monitoring`. A scrape configuration with the jobName should be generated.
- Reload the Prometheus configuration with `curl -X POST http://localhost:9091/-/reload`.

---

## HELM

---

### Using Helm in a CI/CD pipeline

Sometimes you want to upgrade an existing release and other times you want to install a new release. This can be achieved by setting the `--install` flag on the `upgrade` command. The `--install` flag will install the release if it does not exist. The `--atomic` flag will roll back the release if after 3 minutes (`--wait --timeout 3m0s`) the release is not successful. If there’s no revision to revert to, the chart deployment will be deleted.

```bash
helm repo add prometheus-community/kube-prometheus-stack https://prometheus-community.github.io/helm-charts
helm upgrade --install --wait --timeout 3m0s --atomic -f values.yaml prometheus prometheus-community/kube-prometheus-stack
```

---

## GitLab

---

### GitLab CI/CD

#### Kubectl `apply` Template:

```yaml
.kubectl_deploy_template: &kubectl_template
  image: google/cloud-sdk
  before_script:
    - kubectl config set-cluster k8s --server="$K8S_SERVER_URL"
    - kubectl config set clusters.k8s.certificate-authority-data $CERTIFICATE_AUTHORITY_DATA
    - kubectl config set-credentials gitlab --token="$K8S_USER_TOKEN"
    - kubectl config set-context default --cluster=k8s --user=gitlab
    - kubectl config use-context default
    - kubectl create namespace $NAMESPACE --dry-run=client -o yaml | kubectl apply -f -
    - kubectl config set-context default --namespace=$NAMESPACE
  script:
    - ls -lah $DEPLOYMENT_FOLDER
    - kubectl apply --recursive -f $DEPLOYMENT_FOLDER
```

Template usage:

```yaml
deploy_application:
  <<: *kubectl_template
  stage: deploy_application
  variables:
    NAMESPACE: my-namespace
    DEPLOYMENT_FOLDER: dev/application/
  rules:
    - if: "$CI_COMMIT_BRANCH == 'dev'"
```

#### Helm `upgrade` template:

```yaml
.helm_install_template: &helm_install_template
  image: dtzar/helm-kubectl
  before_script:
    - echo "Before script"
  script:
    - kubectl config set-cluster k8s --server="$K8S_SERVER_URL"
    - kubectl config set clusters.k8s.certificate-authority-data $CERTIFICATE_AUTHORITY_DATA
    - kubectl config set-credentials gitlab --token="$K8S_USER_TOKEN"
    - kubectl config set-context default --cluster=k8s --user=gitlab
    - kubectl config use-context default
    - kubectl create namespace $NAMESPACE --dry-run=client -o yaml | kubectl apply -f -
    - kubectl config set-context default --namespace=$NAMESPACE
    - helm repo add $CHART_NAME $HELM_REPO_URL
    - helm upgrade --install --wait --timeout 3m0s --atomic $HELM_ARGS $RELEASE_NAME $HELM_REPO
    - |
      for deployment in $(kubectl get deployment --namespace $NAMESPACE -o jsonpath='{.items[*].metadata.name}'); do
        kubectl rollout status deployment/$deployment --namespace $NAMESPACE
      done
```

Template usage:

```yaml
deploy_prometheus_base:
<<: *helm_install_template
stage: deploy_prometheus_base
variables:
   NAMESPACE: monitoring
   RELEASE_NAME: prometheus
   HELM_REPO: prometheus-community/kube-prometheus-stack
   HELM_REPO_URL: https://prometheus-community.github.io/helm-charts
   CHART_NAME: prometheus-community
   HELM_ARGS: "-f values.yaml"
rules:
   - if: "$CI_COMMIT_BRANCH == 'dev'"
```