Kubernetes @ ICS

Some of the infromation below may be inaccessible outside of helpdesk staff. Please send mail to helpdesk@ics.uci.edu for more information.

Kubernetes Management Resources(support group only)

Tutorial

Internal Documentation

Quick Start

To gain access to the ICS Kubernetes Cluster, run the following on to obtain credentials:

mkdir -p ~/.kube
curl -u $USER https://cloud.ics.uci.edu/auth/kubeconfig > ~/.kube/config
chmod 600 ~/.kube/config

Enter your ICS Account Password when prompted, and the required configuration file will be saved as ~/.kube/config. To create your first Pod, run

kubectl run --rm -it my-container --image=ubuntu:disco --limits='cpu=200m,memory=512Mi'

After a few seconds, you will get a shell inside a container running Ubuntu 19.04.

Changing kubernetes cluster. In this case you would change to a cluster called 'local'. You're kubectl config file would need to preconfigured for your multiple clusters.

kubectl config use-context local

Upgrading Helm

Download the latest version and run

helm init --upgrade

Common Commands

kubectl

kubectl cluster-info
% kubectl cluster-info
Kubernetes master is running at https://rancher.ics.uci.edu/k8s/clusters/local
CoreDNS is running at https://rancher.ics.uci.edu/k8s/clusters/local/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
kubectl get nodes
% kubectl get nodes                                                                                                                                    /home/hans
NAME           STATUS   ROLES               AGE   VERSION
kyvernitis-1   Ready    controlplane,etcd   28d   v1.14.1
kyvernitis-2   Ready    controlplane,etcd   28d   v1.14.1
kyvernitis-3   Ready    controlplane,etcd   28d   v1.14.1
kyvernitis-4   Ready    worker              28d   v1.14.1
kyvernitis-5   Ready    worker              28d   v1.14.1
kyvernitis-6   Ready    worker              28d   v1.14.1
kubectl get pods
kubectl -n kube-system get pods
NAME                                    READY   STATUS    RESTARTS   AGE
canal-5n86q                             2/2     Running   0          37d
canal-8sq4l                             2/2     Running   2          37d

Building Containers

Docker containers can be run on the Kubernetes @ ICS cluster.

  1. Pull: Choose and download container for Docker Hub
  2. Modify
  3. Build
  4. Push

Harbor

Default SA

For Jupyterlab, it appears necessary to put the imagePullSecrets value into the default SA account:

kubectl -n jhub-ics53-stage  edit sa default
docker build -t containers.ics.uci.edu/jupyter/ics-notebook .          

SA default config <config> # Please edit the object below. Lines beginning with a '#' will be ignored, # and an empty file will abort the edit. If an error occurs while saving this file will be # reopened with the relevant failures. # apiVersion: v1 imagePullSecrets: - name: harbor-jupyter kind: ServiceAccount metadata:

creationTimestamp: "2019-06-27T20:59:48Z"
name: default
namespace: jhub-ics53-stage
resourceVersion: "53685477"
selfLink: /api/v1/namespaces/jhub-ics53-stage/serviceaccounts/default
uid: 7fa3a015-991e-11e9-89c7-78e7d1227a28

secrets: - name: default-token-dnlw4 </config>

Example: Ubuntu

Choose

We have chosen the base Ubuntu 18.04 docker.

Dockerfile

We're going to add some utilities to the docker, so let's make our own dockerfile:

FROM ubuntu:18.04
MAINTAINER ICS Computing Support Helpdesk <helpdesk@ics.uci.edu>
RUN apt-get update
RUN apt-get install -y  gcc gdb net-tools  telnet iputils-ping iproute2 ssh tcpdump strace

There is a lot of documentation for harbor and jupyter lab saying that imagePullSecrets could be put into the helm release object, but that has been remarkably unsuccessful. We thought we had it working for a day, but then it stopped.

Note this is really bad because the default SA is created and deleted with the namespace and we don't haven't setup a yml to it.

Docker Build

docker build -t ubuntu-instruction .

Docker Push

# docker login containers.ics.uci.edu
# docker tag ubunti-instruction containers.ics.uci.edu/hans/ubuntu-instruction
# docker push containers.ics.uci.edu/hans/ubuntu-instruction

Docker images

# docker images

Backup Strategies

  • Ephemoral components are not backed up
  • Persistent components are backed up
  • Kubernetes is not covered via DR

Ephemoral Components

Defintion: Ephemoral: Temporary, fleeting, impermanent.

The overwhelming abundance of kubernetes components are created to be ephemeral. For this reason, there are a number of components that would be permamently lost

  • User Containers or Pods
  • Name Spaces
  • Local storage on a container
  • Rook (Ceph) based persistent volumes and claims.

Persistent Components

  • ETCD Database
  • Harbors, the ICS Containers Registry
  • Containers in the ICS Container registry
  • Netapp Based Persistent Volume and Persistent Volume Claims
  • Addition kubeadm managed databases
  • * Unordered List Item

Disaster Recovery

There are currently no disaster recover plans for the ICS Kubernetes cluster. Production or time critical workloads are not recommended to run in the ICS Kubernetes cluster.

Vulnerable Services/Deployments

kubedb

This deployment has persistent data.

  • namespace: kubedb
  • provided by: kubernetes/kubedb/helmrelease.yml
harbor

This deployment has persistent data.

  • namespace: harbor
  • database: postgres.kubedb.com/postgres 11.1-v1 Running 43d
kube-ics
hackmd

This deployment has persistent data.

  • namesapace hack
  • database: postgres.kubedb.com/postgres
  • provided by:

Backups

kubedb

snap yaml file

Create a snap.yml file:

apiVersion:  kubedb.com/v1alpha1
kind: Snapshot
metadata:
  name: harbor-pgsql-snap
  namespace:  harbor
  labels:
    kubedb.com/kind: Postgres
spec:
  databaseName: postgres
  local:
    mountPath: /repo
    persistentVolumeClaim:
      claimName: snap-harbor-pvc
Running the snap yaml file
kubectl apply -f snap.yml
Backup

Running the snap yaml file will create an archived-<dbase>-kubedb-<snapshotName>-pvc-<alpha-numeric-code> in cybertron's vol11 ics_kyvernitis/k8s_pv directory. Inside this directory will be a dumpfile.sql.

Snaps

Cybertron snapshots are present for volume 11 on cybertron so we will have several copies of the snapshot.

Scheduling

Use cron to schedule reguarly. Make sure your PVC pre-exists. Use get events to see errors.

Schedule Backup

Run: `kubectl -n harbor edit pg hans-pg `

spec:
  backupSchedule:
    cronExpression: "@every 1d"
    databaseName: postgres
    local:
      mountPath: /repo
      persistentVolumeClaim:
        claimName: harbor-snap-pvc 

Restores

DR

Tips and Tricks

Cheatsheet
All namespaces
kubectl get namespace
Events

Because the kubernetes designers decided to jettison proper log reporting as a conduit to rapid product delivery, not all events are reported in a sensible location. Much like systemd actually. Anyhow, see what your missing:

kubectl get events
Logs

You can return logs from a pod by running the 'log' command on a pod name (oddly, for once declaring the object type is unecessary). This was instrumental in solving DTR 71791.

ex.

kubectl -n jhub-ics53-stage   log hub-9747d8b7-2xsjx
Cordoning a Worknode

If you want to run maintenance on a node, use the cordone and evict commands:

kubectl cordon kyvernitis-5
kubectl evict kyvernitis-5
[2:07:30 hans@medusa]kubectl get node                                                       /home/hans/kube-ics/kustomize-deployments
NAME           STATUS                     ROLES               AGE   VERSION
kyvernitis-1   Ready                      controlplane,etcd   81d   v1.14.1
kyvernitis-2   Ready                      controlplane,etcd   81d   v1.14.1
kyvernitis-3   Ready                      controlplane,etcd   81d   v1.14.1
kyvernitis-4   Ready                      worker              81d   v1.14.1
kyvernitis-5   Ready,SchedulingDisabled   worker              81d   v1.14.1
kyvernitis-6   Ready                      worker              81d   v1.14.1
kubectl uncordon kyvernitis-5
Docker Container Secrets

The following directions should work:

* Kubernetes Secrets

…but they don't. Haven't discovered why yet. And you shouldn't put your password into a secret anyhow.

Use a robot account in harbor

kubectl create secret docker-registry NAME_HERE \
 --docker-server containers.ics.uci.edu \
 --docker-username "ics53Jupyter" \
 --docker-password="token" \
  --docker-email="k8s@hub.ics.uci.edu" \
  --n NAMESPACE_HERE \
  --dry-run -o yaml 

In order to save docker credentials to allow login to containers.ics.uci.edu.

kubectl -n jhub-ics53-stage create secret generic regcred --from-file=.dockerconfigjson=/home/luser/.docker/config.json --type=kubernetes.io/dockerconfigjson

Jupyter Hub

  • Image: zhaofengli/ics-k8s-hub:2019042201

Building

This section describes how to build a new ics-k8s-hub image.

Base Image

Choosing a base image from jupyterhub stacks

Troubleshooting

How to See Install Options for Installed Charts

A

helm get values metalb
helm get values jhub

Check your Chrome Console for Errors

Harbor Stops Accepting Logins

Q Found that the https://containers.ics.uci.edu harbor server is not accepting logins.

A

Harbor APIs are all returning 500, checking core logs now

Jupyterlab errors will often be visible in your chrome console.

Try a Different Browser

Crazy world, continuous integration, both browsers and kubernetes are updating constantly. If you run into a problem you can't explain with your chrome console, remember to try a different browser.

Pending_Upgrade

Q This happened when bringing up a jupyter staging environment. Repeated attempts to bring up the helm release saw no changes.

A Delete any configmaps that have the same name as you project from kube-system

helm history jhub-stage                                                                    /home/hans/kube-ics/ics/instructional/jupyter/stage
REVISION	UPDATED                 	STATUS         	CHART           	DESCRIPTION      
1       	Tue Aug 27 14:58:21 2019	DEPLOYED       	jupyterhub-0.8.2	Install complete 
2       	Wed Nov 27 11:49:30 2019	PENDING_UPGRADE	jupyterhub-0.8.2	Preparing upgrade
3       	Wed Nov 27 11:58:08 2019	PENDING_UPGRADE	jupyterhub-0.8.2	Preparing upgrade

Get the list of relevant config maps:

kubectl -n kube-system get configmaps | grep jhub-stage                                    /home/hans/kube-ics/ics/instructional/jupyter/stage
jhub-stage.v1                        1      91d
jhub-stage.v2                        1      41m
jhub-stage.v3                        1      32m

Delete the relevant config maps:

kubectl -n kube-system   delete configmap jhub-stage.v3 jhub-stage.v2 jhub-stage.v1      /home/hans/kube-ics/ics/instructional/jupyter/stage
configmap "jhub-stage.v3" deleted
configmap "jhub-stage.v2" deleted
configmap "jhub-stage.v1" deleted

Your helm install is now gone:

helm history jhub-stage                                                                    /home/hans/kube-ics/ics/instructional/jupyter/stage
Error: release: "jhub-stage" not found

Release your helm, again:

helm upgrade  --install jhub-stage jupyterhub/jupyterhub  --namespace jhub-stage   --version=0.8.2  --values config.yml

Error:

Error: release jhub-stage failed: poddisruptionbudgets.policy "hub" already exists

Alright, at this point things went a little off the rails. I wound up wiping out the entire jhub-stage namespace and needed to start from scratch. Furthermore, `helm upgrade –install` didn't seem to do what it used to do (see about updating helm). In the meantime:

/usr/local/bin/helm install jupyterhub/jupyterhub  --namespace jhub-stage   --version=0.8.2  --values config.yml

Error: timed out waiting for the condition

This happened when recreating the jhub-stage helm release

/usr/local/bin/helm install jupyterhub/jupyterhub  --namespace jhub-stage   --version=0.8.2  --values config.yml

Ask tiller what happened:

kubectl -n kube-system log tiller-deploy-54979dfb45-qkvjj  
...
[tiller] 2019/11/27 21:29:40 warning: Release aged-anteater pre-install jupyterhub/templates/image-puller/job.yaml could not complete: timed out waiting for the condition
....
kubectl -n jhub-stage -o wide  get pods

It can take a long time fo rthe hook-image-puller to pull down your image. Let those guys finish and then run the helm install command again.

Error: a release named jhub-stage already exists

A

Run: helm ls --all jhub-stage; to check the status of the release
Or run: helm del --purge jhub-stage; to delete iError: a release named jhub-stage already exists.

VSCode Can't Open Directory

Q Everytime I click on a directory, change a directory, in the file explorer, the explorer crashes.

A In my case, there was a link in my home directory that pointed to a non-existent file. The presence of this file crashed the file explorer. I deleted the symlink and the file explorere worked normally. – hans 06/01/2019

VS Code GIT complains too many changes to track

you should see if there are any directories you can add .gitignore My repo directory contained

.vagrant
vagrant
vendor
runtime
web/assets

All of which I was not going to add to git

unable to validate against any pod security policy

Q You recieve the following messages:

       Warning   FailedCreate   daemonset/test                    Error creating: pods "test-" is forbidden: unable to validate against any pod security policy: []

Commonly this message was recived when running a deployment on a namespace.

A

See the docs on pod security policies

Pods/PVC stuck "Terminating"

A See if there is a matching pvc which may be preventing the pod from deleting:

#kubectl -n jhub  get pvc                                                                                                                                                                                           /tmp/harbor/harbor/templates/ingress
NAME         STATUS        VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS       AGE
claim-hans   Terminating   pvc-2dcdf159-c089-11e9-ba6d-525400aa5c0c   10Gi       RWO            cyb-inst-scratch   4d3h
# kubectl -n jhub  patch pvc claim-hans  -p '{"metadata":{"finalizers":null}}'                                                                                                                                       /tmp/harbor/harbor/templates/ingress
persistentvolumeclaim/claim-hans patched

A Brute force: rebooting the node that the pod was running on appears to have cleared it. I believed simply restarting docker would have done the trick, but alas, it did not.

404 Default Backend

503 : Service Unavailable Your server appears to be down. Try restarting it from the hub.

A They likely ran out of space on their home directory. In this example, the user is in ics53. Use the following commands to get to the relevant error message:

kubectl get namespaces
kubectl -n jhub-ics53 get pods
kubectl -n jhub-ics53 describe pod jupyter-username
kubectl -n jhub-ics53 logs jupyter-username

will show:
"Disk quota exceeded"

The above commands may need to be ran in conjunction with starting and accessing the user's server from https://ics53.hub.ics.uci.edu/hub/admin.

Another reason for the error is running out of inodes. Run the following command on the user's home dir:

/home/support/bin/inodecount

The ugrad limit is currently 75k.

Kubelet has disk pressure

This is caused when the server runs out of disk space.

A. Add disk space Growing Logical Volumes

Helpdesk Ticket 73309

A. Prune docker images

See docker page for info.

Also see kubernetes garbage collectin

No Kind Job

Error: UPGRADE FAILED: no kind “Job” is registered for version “batch/v1” in scheme “k8s.io/kubernetes/pkg/api/legacyscheme/scheme.go:30”

A https://github.com/helm/helm/issues/6894

At this time (2019/12/19), the only solution is to upgrade to 3 or downgrade to 2.15. This is what is wrong with this confederation of software, everybody is breaking one piece or another on a weekly if not daily basis.

virtual_environments/kubernetes.txt · Last modified: 2021/10/29 15:21 by hans
CC Attribution-Noncommercial-Share Alike 4.0 International
Driven by DokuWiki Recent changes RSS feed Valid CSS Valid XHTML 1.0