Cluster Administration - Kubernetes
Cluster Administration - Kubernetes
Cluster Administration
Lower-level detail relevant to creating or administering a Kubernetes cluster.
1: Certificates
2: Managing Resources
3: Cluster Networking
4: Logging Architecture
5: Metrics For Kubernetes System Components
6: System Logs
7: Traces For Kubernetes System Components
8: Proxies in Kubernetes
9: API Priority and Fairness
10: Installing Addons
Planning a cluster
See the guides in Setup for examples of how to plan, set up, and configure Kubernetes
clusters. The solutions listed in this article are called distros.
Note: Not all distros are actively maintained. Choose distros which have been tested with
a recent version of Kubernetes.
Do you want to try out Kubernetes on your computer, or do you want to build a high-
availability, multi-node cluster? Choose distros best suited for your needs.
Will you be using a hosted Kubernetes cluster, such as Google Kubernetes Engine, or
hosting your own cluster?
Will your cluster be on-premises, or in the cloud (IaaS)? Kubernetes does not directly
support hybrid clusters. Instead, you can set up multiple clusters.
If you are configuring Kubernetes on-premises, consider which networking model fits
best.
Will you be running Kubernetes on "bare metal" hardware or on virtual machines
(VMs)?
Do you want to run a cluster, or do you expect to do active development of
Kubernetes project code? If the latter, choose an actively-developed distro. Some
distros only use binary releases, but offer a greater variety of choices.
Familiarize yourself with the components needed to run a cluster.
Managing a cluster
Learn how to manage nodes.
Learn how to set up and manage the resource quota for shared clusters.
Securing a cluster
Generate Certificates describes the steps to generate certificates using different tool
chains.
https://kubernetes.io/docs/concepts/cluster-administration/_print/ 1/48
6/6/23, 4:02 PM Cluster Administration | Kubernetes
Controlling Access to the Kubernetes API describes how Kubernetes implements access
control for its own API.
Authorization is separate from authentication, and controls how HTTP calls are handled.
Logging and Monitoring Cluster Activity explains how logging in Kubernetes works and
how to implement it.
https://kubernetes.io/docs/concepts/cluster-administration/_print/ 2/48
6/6/23, 4:02 PM Cluster Administration | Kubernetes
1 - Certificates
To learn how to generate certificates for your cluster, see Certificates.
https://kubernetes.io/docs/concepts/cluster-administration/_print/ 3/48
6/6/23, 4:02 PM Cluster Administration | Kubernetes
2 - Managing Resources
You've deployed your application and exposed it via a service. Now what? Kubernetes
provides a number of tools to help you manage your application deployment, including
scaling and updating. Among the features that we will discuss in more depth are configuration
files and labels.
application/nginx-app.yaml
apiVersion: v1
kind: Service
metadata:
name: my-nginx-svc
labels:
app: nginx
spec:
type: LoadBalancer
ports:
- port: 80
selector:
app: nginx
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-nginx
labels:
app: nginx
spec:
replicas: 3
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:1.14.2
ports:
- containerPort: 80
service/my-nginx-svc created
deployment.apps/my-nginx created
https://kubernetes.io/docs/concepts/cluster-administration/_print/ 4/48
6/6/23, 4:02 PM Cluster Administration | Kubernetes
The resources will be created in the order they appear in the file. Therefore, it's best to specify
the service first, since that will ensure the scheduler can spread the pods associated with the
service as they are created by the controller(s), such as Deployment.
A URL can also be specified as a configuration source, which is handy for deploying directly
from configuration files checked into GitHub:
deployment.apps/my-nginx created
In the case of two resources, you can specify both resources on the command line using the
resource/name syntax:
For larger numbers of resources, you'll find it easier to specify the selector (label query)
specified using -l or --selector , to filter resources by their labels:
Because kubectl outputs resource names in the same syntax it accepts, you can chain
operations using $() or xargs :
https://kubernetes.io/docs/concepts/cluster-administration/_print/ 5/48
6/6/23, 4:02 PM Cluster Administration | Kubernetes
If you happen to organize your resources across several subdirectories within a particular
directory, you can recursively perform the operations on the subdirectories also, by specifying
--recursive or -R alongside the --filename,-f flag.
For instance, assume there is a directory project/k8s/development that holds all of the
manifests needed for the development environment, organized by resource type:
project/k8s/development
├── configmap
│ └── my-configmap.yaml
├── deployment
│ └── my-deployment.yaml
└── pvc
└── my-pvc.yaml
error: you must provide one or more resources by argument or filename (.json|.yam
Instead, specify the --recursive or -R flag with the --filename,-f flag as such:
configmap/my-config created
deployment.apps/my-deployment created
persistentvolumeclaim/my-pvc created
The --recursive flag works with any operation that accepts the --filename,-f flag such as:
kubectl {create,get,delete,describe,rollout} etc.
The --recursive flag also works when multiple -f arguments are provided:
https://kubernetes.io/docs/concepts/cluster-administration/_print/ 6/48
6/6/23, 4:02 PM Cluster Administration | Kubernetes
namespace/development created
namespace/staging created
configmap/my-config created
deployment.apps/my-deployment created
persistentvolumeclaim/my-pvc created
If you're interested in learning more about kubectl , go ahead and read Command line tool
(kubectl).
For instance, different applications would use different values for the app label, but a multi-
tier application, such as the guestbook example, would additionally need to distinguish each
tier. The frontend could carry the following labels:
labels:
app: guestbook
tier: frontend
while the Redis master and slave would have different tier labels, and perhaps even an
additional role label:
labels:
app: guestbook
tier: backend
role: master
and
labels:
app: guestbook
tier: backend
role: slave
The labels allow us to slice and dice our resources along any dimension specified by a label:
https://kubernetes.io/docs/concepts/cluster-administration/_print/ 7/48
6/6/23, 4:02 PM Cluster Administration | Kubernetes
Canary deployments
Another scenario where multiple labels are needed is to distinguish deployments of different
releases or configurations of the same component. It is common practice to deploy a canary
of a new application release (specified via image tag in the pod template) side by side with the
previous release so that the new release can receive live production traffic before fully rolling
it out.
For instance, you can use a track label to differentiate different releases.
The primary, stable release would have a track label with value as stable :
name: frontend
replicas: 3
...
labels:
app: guestbook
tier: frontend
track: stable
...
image: gb-frontend:v3
and then you can create a new release of the guestbook frontend that carries the track label
with different value (i.e. canary ), so that two sets of pods would not overlap:
name: frontend-canary
replicas: 1
...
labels:
app: guestbook
tier: frontend
track: canary
...
image: gb-frontend:v4
The frontend service would span both sets of replicas by selecting the common subset of their
labels (i.e. omitting the track label), so that the traffic will be redirected to both applications:
selector:
app: guestbook
tier: frontend
You can tweak the number of replicas of the stable and canary releases to determine the ratio
of each release that will receive live production traffic (in this case, 3:1). Once you're confident,
you can update the stable track to the new application release and remove the canary one.
https://kubernetes.io/docs/concepts/cluster-administration/_print/ 8/48
6/6/23, 4:02 PM Cluster Administration | Kubernetes
Updating labels
Sometimes existing pods and other resources need to be relabeled before creating new
resources. This can be done with kubectl label . For example, if you want to label all your
nginx pods as frontend tier, run:
pod/my-nginx-2035384211-j5fhi labeled
pod/my-nginx-2035384211-u2c7e labeled
pod/my-nginx-2035384211-u3t6x labeled
This first filters all pods with the label "app=nginx", and then labels them with the "tier=fe". To
see the pods you labeled, run:
This outputs all "app=nginx" pods, with an additional label column of pods' tier (specified with
-L or --label-columns ).
Updating annotations
Sometimes you would want to attach annotations to resources. Annotations are arbitrary
non-identifying metadata for retrieval by API clients such as tools, libraries, etc. This can be
done with kubectl annotate . For example:
apiVersion: v1
kind: pod
metadata:
annotations:
description: my frontend running nginx
...
https://kubernetes.io/docs/concepts/cluster-administration/_print/ 9/48
6/6/23, 4:02 PM Cluster Administration | Kubernetes
deployment.apps/my-nginx scaled
To have the system automatically choose the number of nginx replicas as needed, ranging
from 1 to 3, do:
horizontalpodautoscaler.autoscaling/my-nginx autoscaled
Now your nginx replicas will be scaled up and down as needed, automatically.
For more information, please see kubectl scale, kubectl autoscale and horizontal pod
autoscaler document.
kubectl apply
It is suggested to maintain a set of configuration files in source control (see configuration as
code), so that they can be maintained and versioned along with the code for the resources
they configure. Then, you can use kubectl apply to push your configuration changes to the
cluster.
This command will compare the version of the configuration that you're pushing with the
previous version and apply the changes you've made, without overwriting any automated
changes to properties you haven't specified.
deployment.apps/my-nginx configured
Note that kubectl apply attaches an annotation to the resource in order to determine the
changes to the configuration since the previous invocation. When it's invoked, kubectl apply
does a three-way diff between the previous configuration, the provided input and the current
configuration of the resource, in order to determine how to modify the resource.
Currently, resources are created without this annotation, so the first invocation of kubectl
apply will fall back to a two-way diff between the provided input and the current
configuration of the resource. During this first invocation, it cannot detect the deletion of
https://kubernetes.io/docs/concepts/cluster-administration/_print/ 10/48
6/6/23, 4:02 PM Cluster Administration | Kubernetes
properties set when the resource was created. For this reason, it will not remove them.
All subsequent calls to kubectl apply , and other commands that modify the configuration,
such as kubectl replace and kubectl edit , will update the annotation, allowing
subsequent calls to kubectl apply to detect and perform deletions using a three-way diff.
kubectl edit
Alternatively, you may also update resources with kubectl edit :
This is equivalent to first get the resource, edit it in text editor, and then apply the resource
with the updated version:
rm /tmp/nginx.yaml
This allows you to do more significant changes more easily. Note that you can specify the
editor with your EDITOR or KUBE_EDITOR environment variables.
kubectl patch
You can use kubectl patch to update API objects in place. This command supports JSON
patch, JSON merge patch, and strategic merge patch. See Update API Objects in Place Using
kubectl patch and kubectl patch.
Disruptive updates
In some cases, you may need to update resource fields that cannot be updated once
initialized, or you may want to make a recursive change immediately, such as to fix broken
pods created by a Deployment. To change such fields, use replace --force , which deletes
and re-creates the resource. In this case, you can modify your original configuration file:
deployment.apps/my-nginx deleted
deployment.apps/my-nginx replaced
https://kubernetes.io/docs/concepts/cluster-administration/_print/ 11/48
6/6/23, 4:02 PM Cluster Administration | Kubernetes
We'll guide you through how to create and update applications with Deployments.
deployment.apps/my-nginx created
with 3 replicas (so the old and new revisions can coexist):
deployment.apps/my-nginx scaled
That's it! The Deployment will declaratively update the deployed nginx application
progressively behind the scene. It ensures that only a certain number of old replicas may be
down while they are being updated, and only a certain number of new replicas may be
created above the desired number of pods. To learn more details about it, visit Deployment
page.
What's next
Learn about how to use kubectl for application introspection and debugging.
See Configuration Best Practices and Tips.
https://kubernetes.io/docs/concepts/cluster-administration/_print/ 12/48
6/6/23, 4:02 PM Cluster Administration | Kubernetes
3 - Cluster Networking
Networking is a central part of Kubernetes, but it can be challenging to understand exactly
how it is expected to work. There are 4 distinct networking problems to address:
Kubernetes is all about sharing machines between applications. Typically, sharing machines
requires ensuring that two applications do not try to use the same ports. Coordinating ports
across multiple developers is very difficult to do at scale and exposes users to cluster-level
issues outside of their control.
Dynamic port allocation brings a lot of complications to the system - every application has to
take ports as flags, the API servers have to know how to insert dynamic port numbers into
configuration blocks, services have to know how to find each other, etc. Rather than deal with
this, Kubernetes takes a different approach.
See this page for a non-exhaustive list of networking addons supported by Kubernetes.
What's next
The early design of the networking model and its rationale, and some future plans are
described in more detail in the networking design document.
https://kubernetes.io/docs/concepts/cluster-administration/_print/ 13/48
6/6/23, 4:02 PM Cluster Administration | Kubernetes
4 - Logging Architecture
Application logs can help you understand what is happening inside your application. The logs
are particularly useful for debugging problems and monitoring cluster activity. Most modern
applications have some kind of logging mechanism. Likewise, container engines are designed
to support logging. The easiest and most adopted logging method for containerized
applications is writing to standard output and standard error streams.
However, the native functionality provided by a container engine or runtime is usually not
enough for a complete logging solution.
For example, you may want to access your application's logs if a container crashes, a pod gets
evicted, or a node dies.
In a cluster, logs should have a separate storage and lifecycle independent of nodes, pods, or
containers. This concept is called cluster-level logging.
Cluster-level logging architectures require a separate backend to store, analyze, and query
logs. Kubernetes does not provide a native storage solution for log data. Instead, there are
many logging solutions that integrate with Kubernetes. The following sections describe how to
handle and store logs on nodes.
This example uses a manifest for a Pod with a container that writes text to the standard
output stream, once per second.
debug/counter-pod.yaml
apiVersion: v1
kind: Pod
metadata:
name: counter
spec:
containers:
- name: count
image: busybox:1.28
args: [/bin/sh, -c,
'i=0; while true; do echo "$i: $(date)"; i=$((i+1)); sleep 1; done']
pod/counter created
https://kubernetes.io/docs/concepts/cluster-administration/_print/ 14/48
6/6/23, 4:02 PM Cluster Administration | Kubernetes
You can use kubectl logs --previous to retrieve logs from a previous instantiation of a
container. If your pod has multiple containers, specify which container's logs you want to
access by appending a container name to the command, with a -c flag, like so:
By default, if a container restarts, the kubelet keeps one terminated container with its logs. If
a pod is evicted from the node, all corresponding containers are also evicted, along with their
logs.
The kubelet makes logs available to clients via a special feature of the Kubernetes API. The
usual way to access this is by running kubectl logs .
Log rotation
FEATURE STATE: Kubernetes v1.21 [stable]
If you configure rotation, the kubelet is responsible for rotating container logs and managing
the logging directory structure. The kubelet sends this information to the container runtime
(using CRI), and the runtime writes the container logs to the given location.
When you run kubectl logs as in the basic logging example, the kubelet on the node
handles the request and reads directly from the log file. The kubelet returns the content of
the log file.
https://kubernetes.io/docs/concepts/cluster-administration/_print/ 15/48
6/6/23, 4:02 PM Cluster Administration | Kubernetes
Note:
Only the contents of the latest log file are available through kubectl logs .
For example, if a Pod writes 40 MiB of logs and the kubelet rotates logs after 10 MiB,
running kubectl logs returns at most 10MiB of data.
The kubelet and container runtime do not run in containers. The kubelet runs your
containers (grouped together in pods)
The Kubernetes scheduler, controller manager, and API server run within pods (usually
static Pods). The etcd component runs in the control plane, and most commonly also as
a static pod. If your cluster uses kube-proxy, you typically run this as a DaemonSet .
Log locations
The way that the kubelet and container runtime write logs depends on the operating system
that the node uses:
Linux Windows
On Linux nodes that use systemd, the kubelet and container runtime write to journald by
default. You use journalctl to read the systemd journal; for example: journalctl -u
kubelet .
If systemd is not present, the kubelet and container runtime write to .log files in the
/var/log directory. If you want to have logs written elsewhere, you can indirectly run
the kubelet via a helper tool, kube-log-runner , and use that tool to redirect kubelet logs
to a directory that you choose.
You can also set a logging directory using the deprecated kubelet command line
argument --log-dir . However, the kubelet always directs your container runtime to
write logs into directories within /var/log/pods .
For Kubernetes cluster components that run in pods, these write to files inside the /var/log
directory, bypassing the default logging mechanism (the components do not write to the
systemd journal). You can use Kubernetes' storage mechanisms to map persistent storage
into the container that runs the component.
For details about etcd and its logs, view the etcd documentation. Again, you can use
Kubernetes' storage mechanisms to map persistent storage into the container that runs the
component.
Note:
If you deploy Kubernetes cluster components (such as the scheduler) to log to a volume
shared from the parent node, you need to consider and ensure that those logs are
rotated. Kubernetes does not manage that log rotation.
Your operating system may automatically implement some log rotation - for example, if
you share the directory /var/log into a static Pod for a component, node-level log
rotation treats a file in that directory the same as a file written by any component outside
Kubernetes.
https://kubernetes.io/docs/concepts/cluster-administration/_print/ 16/48
6/6/23, 4:02 PM Cluster Administration | Kubernetes
Some deploy tools account for that log rotation and automate it; others leave this as your
responsibility.
You can implement cluster-level logging by including a node-level logging agent on each node.
The logging agent is a dedicated tool that exposes logs or pushes logs to a backend.
Commonly, the logging agent is a container that has access to a directory with log files from
all of the application containers on that node.
Because the logging agent must run on every node, it is recommended to run the agent as a
DaemonSet .
Node-level logging creates only one agent per node and doesn't require any changes to the
applications running on the node.
Containers write to stdout and stderr, but with no agreed format. A node-level agent collects
these logs and forwards them for aggregation.
https://kubernetes.io/docs/concepts/cluster-administration/_print/ 17/48
6/6/23, 4:02 PM Cluster Administration | Kubernetes
By having your sidecar containers write to their own stdout and stderr streams, you can
take advantage of the kubelet and the logging agent that already run on each node. The
sidecar containers read logs from a file, a socket, or journald. Each sidecar container prints a
log to its own stdout or stderr stream.
This approach allows you to separate several log streams from different parts of your
application, some of which can lack support for writing to stdout or stderr . The logic
behind redirecting logs is minimal, so it's not a significant overhead. Additionally, because
stdout and stderr are handled by the kubelet, you can use built-in tools like kubectl
logs .
For example, a pod runs a single container, and the container writes to two different log files
using two different formats. Here's a manifest for the Pod:
admin/logging/two-files-counter-pod.yaml
apiVersion: v1
kind: Pod
metadata:
name: counter
spec:
containers:
- name: count
image: busybox:1.28
args:
- /bin/sh
- -c
- >
i=0;
while true;
do
echo "$i: $(date)" >> /var/log/1.log;
echo "$(date) INFO $i" >> /var/log/2.log;
i=$((i+1));
sleep 1;
done
volumeMounts:
- name: varlog
mountPath: /var/log
volumes:
- name: varlog
emptyDir: {}
https://kubernetes.io/docs/concepts/cluster-administration/_print/ 18/48
6/6/23, 4:02 PM Cluster Administration | Kubernetes
It is not recommended to write log entries with different formats to the same log stream, even
if you managed to redirect both components to the stdout stream of the container. Instead,
you can create two sidecar containers. Each sidecar container could tail a particular log file
from a shared volume and then redirect the logs to its own stdout stream.
admin/logging/two-files-counter-pod-streaming-sidecar.yaml
apiVersion: v1
kind: Pod
metadata:
name: counter
spec:
containers:
- name: count
image: busybox:1.28
args:
- /bin/sh
- -c
- >
i=0;
while true;
do
echo "$i: $(date)" >> /var/log/1.log;
echo "$(date) INFO $i" >> /var/log/2.log;
i=$((i+1));
sleep 1;
done
volumeMounts:
- name: varlog
mountPath: /var/log
- name: count-log-1
image: busybox:1.28
args: [/bin/sh, -c, 'tail -n+1 -F /var/log/1.log']
volumeMounts:
- name: varlog
mountPath: /var/log
- name: count-log-2
image: busybox:1.28
args: [/bin/sh, -c, 'tail -n+1 -F /var/log/2.log']
volumeMounts:
- name: varlog
mountPath: /var/log
volumes:
- name: varlog
emptyDir: {}
Now when you run this pod, you can access each log stream separately by running the
following commands:
https://kubernetes.io/docs/concepts/cluster-administration/_print/ 19/48
6/6/23, 4:02 PM Cluster Administration | Kubernetes
If you installed a node-level agent in your cluster, that agent picks up those log streams
automatically without any further configuration. If you like, you can configure the agent to
parse log lines depending on the source container.
Even for Pods that only have low CPU and memory usage (order of a couple of millicores for
cpu and order of several megabytes for memory), writing logs to a file and then streaming
them to stdout can double how much storage you need on the node. If you have an
application that writes to a single file, it's recommended to set /dev/stdout as the
destination rather than implement the streaming sidecar container approach.
Sidecar containers can also be used to rotate log files that cannot be rotated by the
application itself. An example of this approach is a small container running logrotate
periodically. However, it's more straightforward to use stdout and stderr directly, and
leave rotation and retention policies to the kubelet.
If the node-level logging agent is not flexible enough for your situation, you can create a
sidecar container with a separate logging agent that you have configured specifically to run
with your application.
Note: Using a logging agent in a sidecar container can lead to significant resource
consumption. Moreover, you won't be able to access those logs using kubectl logs
because they are not controlled by the kubelet.
Here are two example manifests that you can use to implement a sidecar container with a
logging agent. The first manifest contains a ConfigMap to configure fluentd.
admin/logging/fluentd-sidecar-config.yaml
https://kubernetes.io/docs/concepts/cluster-administration/_print/ 20/48
6/6/23, 4:02 PM Cluster Administration | Kubernetes
apiVersion: v1
kind: ConfigMap
metadata:
name: fluentd-config
data:
fluentd.conf: |
<source>
type tail
format none
path /var/log/1.log
pos_file /var/log/1.log.pos
tag count.format1
</source>
<source>
type tail
format none
path /var/log/2.log
pos_file /var/log/2.log.pos
tag count.format2
</source>
<match **>
type google_cloud
</match>
Note: In the sample configurations, you can replace fluentd with any logging agent,
reading from any source inside an application container.
The second manifest describes a pod that has a sidecar container running fluentd. The pod
mounts a volume where fluentd can pick up its configuration data.
admin/logging/two-files-counter-pod-agent-sidecar.yaml
https://kubernetes.io/docs/concepts/cluster-administration/_print/ 21/48
6/6/23, 4:02 PM Cluster Administration | Kubernetes
apiVersion: v1
kind: Pod
metadata:
name: counter
spec:
containers:
- name: count
image: busybox:1.28
args:
- /bin/sh
- -c
- >
i=0;
while true;
do
echo "$i: $(date)" >> /var/log/1.log;
echo "$(date) INFO $i" >> /var/log/2.log;
i=$((i+1));
sleep 1;
done
volumeMounts:
- name: varlog
mountPath: /var/log
- name: count-agent
image: registry.k8s.io/fluentd-gcp:1.30
env:
- name: FLUENTD_ARGS
value: -c /etc/fluentd-config/fluentd.conf
volumeMounts:
- name: varlog
mountPath: /var/log
- name: config-volume
mountPath: /etc/fluentd-config
volumes:
- name: varlog
emptyDir: {}
- name: config-volume
configMap:
name: fluentd-config
Cluster-logging that exposes or pushes logs directly from every application is outside the
scope of Kubernetes.
What's next
Read about Kubernetes system logs
Learn about Traces For Kubernetes System Components
Learn how to customise the termination message that Kubernetes records when a Pod
fails
https://kubernetes.io/docs/concepts/cluster-administration/_print/ 22/48
6/6/23, 4:02 PM Cluster Administration | Kubernetes
Kubernetes components emit metrics in Prometheus format. This format is structured plain
text, designed so that people and machines can both read it.
Metrics in Kubernetes
In most cases metrics are available on /metrics endpoint of the HTTP server. For
components that doesn't expose endpoint by default it can be enabled using --bind-address
flag.
kube-controller-manager
kube-proxy
kube-apiserver
kube-scheduler
kubelet
In a production environment you may want to configure Prometheus Server or some other
metrics scraper to periodically gather these metrics and make them available in some kind of
time series database.
If your cluster uses RBAC, reading metrics requires authorization via a user, group or
ServiceAccount with a ClusterRole that allows accessing /metrics . For example:
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: prometheus
rules:
- nonResourceURLs:
- "/metrics"
verbs:
- get
Metric lifecycle
Alpha metric → Stable metric → Deprecated metric → Hidden metric → Deleted metric
Alpha metrics have no stability guarantees. These metrics can be modified or deleted at any
time.
Deprecated metrics are slated for deletion, but are still available for use. These metrics
include an annotation about the version in which they became deprecated.
For example:
https://kubernetes.io/docs/concepts/cluster-administration/_print/ 23/48
6/6/23, 4:02 PM Cluster Administration | Kubernetes
Before deprecation
After deprecation
Hidden metrics are no longer published for scraping, but are still available for use. To use a
hidden metric, please refer to the Show hidden metrics section.
The flag show-hidden-metrics-for-version takes a version for which you want to show
metrics deprecated in that release. The version is expressed as x.y, where x is the major
version, y is the minor version. The patch version is not needed even though a metrics can be
deprecated in a patch release, the reason for that is the metrics deprecation policy runs
against the minor release.
The flag can only take the previous minor version as it's value. All metrics hidden in previous
will be emitted if admins set the previous version to show-hidden-metrics-for-version . The
too old version is not allowed because this violates the metrics deprecated policy.
In release 1.n+1 , the metric is hidden by default and it can be emitted by command line
show-hidden-metrics-for-version=1.n .
In release 1.n+2 , the metric should be removed from the codebase. No escape hatch
anymore.
If you're upgrading from release 1.12 to 1.13 , but still depend on a metric A deprecated in
1.12 , you should set hidden metrics via command line: --show-hidden-metrics=1.12 and
remember to remove this metric dependency before upgrading to 1.14
The responsibility for collecting accelerator metrics now belongs to the vendor rather than the
kubelet. Vendors must provide a container that collects metrics and exposes them to the
metrics service (for example, Prometheus).
https://kubernetes.io/docs/concepts/cluster-administration/_print/ 24/48
6/6/23, 4:02 PM Cluster Administration | Kubernetes
Component metrics
kube-controller-manager metrics
Controller manager metrics provide important insight into the performance and health of the
controller manager. These metrics include common Go language runtime metrics such as
go_routine count and controller specific metrics such as etcd request latencies or
Cloudprovider (AWS, GCE, OpenStack) API latencies that can be used to gauge the health of a
cluster.
Starting from Kubernetes 1.7, detailed Cloudprovider metrics are available for storage
operations for GCE, AWS, Vsphere and OpenStack. These metrics can be used to monitor
health of persistent volume operations.
kube-scheduler metrics
FEATURE STATE: Kubernetes v1.21 [beta]
The scheduler exposes optional metrics that reports the requested resources and the desired
limits of all running pods. These metrics can be used to build capacity planning dashboards,
assess current or historical scheduling limits, quickly identify workloads that cannot schedule
due to lack of resources, and compare actual usage to the pod's request.
The kube-scheduler identifies the resource requests and limits configured for each Pod; when
either a request or limit is non-zero, the kube-scheduler reports a metrics timeseries. The
time series is labelled by:
namespace
pod name
the node where the pod is scheduled or an empty string if not yet scheduled
priority
the assigned scheduler for that pod
the name of the resource (for example, cpu )
the unit of the resource if known (for example, cores )
Once a pod reaches completion (has a restartPolicy of Never or OnFailure and is in the
Succeeded or Failed pod phase, or has been deleted and all containers have a terminated
state) the series is no longer reported since the scheduler is now free to schedule other pods
to run. The two metrics are called kube_pod_resource_request and
kube_pod_resource_limit .
The metrics are exposed at the HTTP endpoint /metrics/resources and require the same
authorization as the /metrics endpoint on the scheduler. You must use the --show-hidden-
metrics-for-version=1.20 flag to expose these alpha stability metrics.
Disabling metrics
You can explicitly turn off metrics via command line flag --disabled-metrics . This may be
desired if, for example, a metric is causing a performance problem. The input is a list of
disabled metrics (i.e. --disabled-metrics=metric1,metric2 ).
https://kubernetes.io/docs/concepts/cluster-administration/_print/ 25/48
6/6/23, 4:02 PM Cluster Administration | Kubernetes
In alpha stage, the flag can only take in a series of mappings as metric label allow-list. Each
mapping is of the format <metric_name>,<label_name>=<allowed_labels> where
<allowed_labels> is a comma-separated list of acceptable label names.
Here is an example:
What's next
Read about the Prometheus text format for metrics
See the list of stable Kubernetes metrics
Read about the Kubernetes deprecation policy
https://kubernetes.io/docs/concepts/cluster-administration/_print/ 26/48
6/6/23, 4:02 PM Cluster Administration | Kubernetes
6 - System Logs
System component logs record events happening in cluster, which can be very useful for
debugging. You can configure log verbosity to see more or less detail. Logs can be as coarse-
grained as showing errors within a component, or as fine-grained as showing step-by-step
traces of events (like HTTP access logs, pod state changes, controller actions, or scheduler
decisions).
Klog
klog is the Kubernetes logging library. klog generates log messages for the Kubernetes system
components.
For more information about klog configuration, see the Command line tool reference.
Kubernetes is in the process of simplifying logging in its components. The following klog
command line flags are deprecated starting with Kubernetes 1.23 and will be removed in a
future release:
--add-dir-header
--alsologtostderr
--log-backtrace-at
--log-dir
--log-file
--log-file-max-size
--logtostderr
--one-output
--skip-headers
--skip-log-headers
--stderrthreshold
Output will always be written to stderr, regardless of the output format. Output redirection is
expected to be handled by the component which invokes a Kubernetes component. This can
be a POSIX shell or a tool like systemd.
In some cases, for example a distroless container or a Windows system service, those options
are not available. Then the kube-log-runner binary can be used as wrapper around a
Kubernetes component to redirect output. A prebuilt binary is included in several Kubernetes
base images under its traditional name as /go-runner and as kube-log-runner in server
and node release archives.
https://kubernetes.io/docs/concepts/cluster-administration/_print/ 27/48
6/6/23, 4:02 PM Cluster Administration | Kubernetes
Klog output
An example of the traditional klog native format:
Structured Logging
FEATURE STATE: Kubernetes v1.23 [beta]
Warning:
Migration to structured log messages is an ongoing process. Not all log messages are
structured in this version. When parsing log files, you must also handle unstructured log
messages.
Structured logging introduces a uniform structure in log messages allowing for programmatic
extraction of information. You can store and process structured logs with less effort and cost.
The code which generates a log message determines whether it uses the traditional
unstructured klog output or structured logging.
The default formatting of structured log messages is as text, with a format that is backward
compatible with traditional klog:
Example:
Strings are quoted. Other values are formatted with %+v , which may cause log messages to
continue on the next line depending on the data.
Contextual Logging
FEATURE STATE: Kubernetes v1.24 [alpha]
Contextual logging builds on top of structured logging. It is primarily about how developers
use logging calls: code based on that concept is more flexible and supports additional use
cases as described in the Contextual Logging KEP.
Currently this is gated behind the StructuredLogging feature gate and disabled by default.
The infrastructure for this was added in 1.24 without modifying components. The component-
base/logs/example command demonstrates how to use the new logging calls and how a
https://kubernetes.io/docs/concepts/cluster-administration/_print/ 28/48
6/6/23, 4:02 PM Cluster Administration | Kubernetes
$ cd $GOPATH/src/k8s.io/kubernetes/staging/src/k8s.io/component-base/logs/example
$ go run . --help
...
--feature-gates mapStringBool A set of key=value pairs that describe featu
AllAlpha=true|false (ALPHA - default=false)
AllBeta=true|false (BETA - default=false)
ContextualLogging=true|false (ALPHA - defaul
$ go run . --feature-gates ContextualLogging=true
...
I0404 18:00:02.916429 451895 logger.go:94] "example/myname: runtime" foo="bar" d
I0404 18:00:02.916447 451895 logger.go:95] "example: another runtime" foo="bar"
The example prefix and foo="bar" were added by the caller of the function which logs the
runtime message and duration="1m0s" value, without having to modify that function.
With contextual logging disable, WithValues and WithName do nothing and log calls go
through the global klog logger. Therefore this additional information is not in the log output
anymore:
Warning:
JSON output does not support many standard klog flags. For list of unsupported klog
flags, see the Command line tool reference.
Not all logs are guaranteed to be written in JSON format (for example, during process
start). If you intend to parse logs, make sure you can handle log lines that are not JSON as
well.
The --logging-format=json flag changes the format of logs from klog native format to JSON
format. Example of JSON log format (pretty printed):
{
"ts": 1580306777.04728,
"v": 4,
"msg": "Pod status updated",
"pod":{
"name": "nginx-1",
"namespace": "default"
},
"status": "ready"
}
https://kubernetes.io/docs/concepts/cluster-administration/_print/ 29/48
6/6/23, 4:02 PM Cluster Administration | Kubernetes
kube-controller-manager
kube-apiserver
kube-scheduler
kubelet
Log location
There are two types of system components: those that run in a container and those that do
not run in a container. For example:
On machines with systemd, the kubelet and container runtime write to journald. Otherwise,
they write to .log files in the /var/log directory. System components inside containers
always write to .log files in the /var/log directory, bypassing the default logging
mechanism. Similar to the container logs, you should rotate system component logs in the
/var/log directory. In Kubernetes clusters created by the kube-up.sh script, log rotation is
configured by the logrotate tool. The logrotate tool rotates logs daily, or once the log size
is greater than 100MB.
Log query
FEATURE STATE: Kubernetes v1.27 [alpha]
To help with debugging issues on nodes, Kubernetes v1.27 introduced a feature that allows
viewing logs of services running on the node. To use the feature, ensure that the
NodeLogQuery feature gate is enabled for that node, and that the kubelet configuration
options enableSystemLogHandler and enableSystemLogQuery are both set to true. On Linux
we assume that service logs are available via journald. On Windows we assume that service
logs are available in the application log provider. On both operating systems, logs are also
available by reading files within /var/log/ .
Provided you are authorized to interact with node objects, you can try out this alpha feature
on all your nodes or just a subset. Here is an example to retrieve the kubelet service logs from
a node:
You can also fetch files, provided that the files are in a directory that the kubelet allows for log
fetches. For example, you can fetch a log from /var/log on a Linux node:
https://kubernetes.io/docs/concepts/cluster-administration/_print/ 30/48
6/6/23, 4:02 PM Cluster Administration | Kubernetes
The kubelet uses heuristics to retrieve logs. This helps if you are not aware whether a given
system service is writing logs to the operating system's native logger like journald or to a log
file in /var/log/ . The heuristics first checks the native logger and if that is not available
attempts to retrieve the first logs from /var/log/<servicename> or
/var/log/<servicename>.log or /var/log/<servicename>/<servicename>.log .
Option Description
query query specifies services(s) or files from which to return logs (required)
tailLine specify how many lines from the end of the log to retrieve; the default is to
s fetch the whole log
# Fetch kubelet logs from a node named node-1.example that have the word "error"
kubectl get --raw "/api/v1/nodes/node-1.example/proxy/logs/?query=kubelet&pattern
What's next
Read about the Kubernetes Logging Architecture
Read about Structured Logging
Read about Contextual Logging
Read about deprecation of klog flags
Read about the Conventions for logging severity
https://kubernetes.io/docs/concepts/cluster-administration/_print/ 31/48
6/6/23, 4:02 PM Cluster Administration | Kubernetes
System component traces record the latency of and relationships between operations in the
cluster.
Kubernetes components emit traces using the OpenTelemetry Protocol with the gRPC
exporter and can be collected and routed to tracing backends using an OpenTelemetry
Collector.
Trace Collection
For a complete guide to collecting traces and using the collector, see Getting Started with the
OpenTelemetry Collector. However, there are a few things to note that are specific to
Kubernetes components.
By default, Kubernetes components export traces using the grpc exporter for OTLP on the
IANA OpenTelemetry port, 4317. As an example, if the collector is running as a sidecar to a
Kubernetes component, the following receiver configuration will collect spans and log them to
standard output:
receivers:
otlp:
protocols:
grpc:
exporters:
# Replace this exporter with the exporter for your backend
logging:
logLevel: debug
service:
pipelines:
traces:
receivers: [otlp]
exporters: [logging]
Component traces
kube-apiserver traces
The kube-apiserver generates spans for incoming HTTP requests, and for outgoing requests
to webhooks, etcd, and re-entrant requests. It propagates the W3C Trace Context with
outgoing requests but does not make use of the trace context attached to incoming requests,
as the kube-apiserver is often a public endpoint.
apiVersion: apiserver.config.k8s.io/v1beta1
kind: TracingConfiguration
# default value
#endpoint: localhost:4317
samplingRatePerMillion: 100
https://kubernetes.io/docs/concepts/cluster-administration/_print/ 32/48
6/6/23, 4:02 PM Cluster Administration | Kubernetes
For more information about the TracingConfiguration struct, see API server config API
(v1beta1).
kubelet traces
FEATURE STATE: Kubernetes v1.27 [beta]
The kubelet CRI interface and authenticated http servers are instrumented to generate trace
spans. As with the apiserver, the endpoint and sampling rate are configurable. Trace context
propagation is also configured. A parent span's sampling decision is always respected. A
provided tracing configuration sampling rate will apply to spans without a parent. Enabled
without a configured endpoint, the default OpenTelemetry Collector receiver address of
"localhost:4317" is set.
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
featureGates:
KubeletTracing: true
tracing:
# default value
#endpoint: localhost:4317
samplingRatePerMillion: 100
If the samplingRatePerMillion is set to one million ( 1000000 ), then every span will be sent to
the exporter.
The kubelet in Kubernetes v1.27 collects spans from the garbage collection, pod
synchronization routine as well as every gRPC method. Connected container runtimes like
CRI-O and containerd can link the traces to their exported spans to provide additional context
of information.
Please note that exporting spans always comes with a small performance overhead on the
networking and CPU side, depending on the overall configuration of the system. If there is any
issue like that in a cluster which is running with tracing enabled, then mitigate the problem by
either reducing the samplingRatePerMillion or disabling tracing completely by removing the
configuration.
Stability
Tracing instrumentation is still under active development, and may change in a variety of
ways. This includes span names, attached attributes, instrumented endpoints, etc. Until this
feature graduates to stable, there are no guarantees of backwards compatibility for tracing
instrumentation.
What's next
Read about Getting Started with the OpenTelemetry Collector
https://kubernetes.io/docs/concepts/cluster-administration/_print/ 33/48
6/6/23, 4:02 PM Cluster Administration | Kubernetes
8 - Proxies in Kubernetes
This page explains proxies used with Kubernetes.
Proxies
There are several different proxies you may encounter when using Kubernetes:
are provided by some cloud providers (e.g. AWS ELB, Google Cloud Load Balancer)
are created automatically when the Kubernetes service has type LoadBalancer
usually supports UDP/TCP only
SCTP support is up to the load balancer implementation of the cloud provider
implementation varies by cloud provider.
Kubernetes users will typically not need to worry about anything other than the first two
types. The cluster admin will typically ensure that the latter types are set up correctly.
Requesting redirects
Proxies have replaced redirect capabilities. Redirects have been deprecated.
https://kubernetes.io/docs/concepts/cluster-administration/_print/ 34/48
6/6/23, 4:02 PM Cluster Administration | Kubernetes
Controlling the behavior of the Kubernetes API server in an overload situation is a key task for
cluster administrators. The kube-apiserver has some controls available (i.e. the --max-
requests-inflight and --max-mutating-requests-inflight command-line flags) to limit the
amount of outstanding work that will be accepted, preventing a flood of inbound requests
from overloading and potentially crashing the API server, but these flags are not enough to
ensure that the most important requests get through in a period of high traffic.
The API Priority and Fairness feature (APF) is an alternative that improves upon
aforementioned max-inflight limitations. APF classifies and isolates requests in a more fine-
grained way. It also introduces a limited amount of queuing, so that no requests are rejected
in cases of very brief bursts. Requests are dispatched from queues using a fair queuing
technique so that, for example, a poorly-behaved controller need not starve others (even at
the same priority level).
This feature is designed to work well with standard controllers, which use informers and react
to failures of API requests with exponential back-off, and other clients that also work this way.
kube-apiserver \
--feature-gates=APIPriorityAndFairness=false \
--runtime-config=flowcontrol.apiserver.k8s.io/v1beta2=false,flowcontrol.apiserver
# …and other flags as usual
Alternatively, you can enable the v1alpha1 and v1beta1 versions of the API group with --
runtime-
config=flowcontrol.apiserver.k8s.io/v1alpha1=true,flowcontrol.apiserver.k8s.io/v1beta
1=true .
Concepts
There are several distinct features involved in the API Priority and Fairness feature. Incoming
requests are classified by attributes of the request using FlowSchemas, and assigned to
priority levels. Priority levels add a degree of isolation by maintaining separate concurrency
limits, so that requests assigned to different priority levels cannot starve each other. Within a
https://kubernetes.io/docs/concepts/cluster-administration/_print/ 35/48
6/6/23, 4:02 PM Cluster Administration | Kubernetes
priority level, a fair-queuing algorithm prevents requests from different flows from starving
each other, and allows for requests to be queued to prevent bursty traffic from causing failed
requests when the average load is acceptably low.
Priority Levels
Without APF enabled, overall concurrency in the API server is limited by the kube-apiserver
flags --max-requests-inflight and --max-mutating-requests-inflight . With APF enabled,
the concurrency limits defined by these flags are summed and then the sum is divided up
among a configurable set of priority levels. Each incoming request is assigned to a single
priority level, and each priority level will only dispatch as many concurrent requests as its
particular limit allows.
The default configuration, for example, includes separate priority levels for leader-election
requests, requests from built-in controllers, and requests from Pods. This means that an ill-
behaved Pod that floods the API server with requests cannot prevent leader election or
actions by the built-in controllers from succeeding.
The concurrency limits of the priority levels are periodically adjusted, allowing under-utilized
priority levels to temporarily lend concurrency to heavily-utilized levels. These limits are based
on nominal limits and bounds on how much concurrency a priority level may lend and how
much it may borrow, all derived from the configuration objects mentioned below.
But some requests take up more than one seat. Some of these are list requests that the
server estimates will return a large number of objects. These have been found to put an
exceptionally heavy burden on the server. For this reason, the server estimates the number of
objects that will be returned and considers the request to take a number of seats that is
proportional to that estimated number.
The normal notifications are sent in a concurrent burst to all relevant watch response
streams whenever the server is notified of an object create/update/delete. To account for this
work, API Priority and Fairness considers every write request to spend some additional time
occupying seats after the actual writing is done. The server estimates the number of
notifications to be sent and adjusts the write request's number of seats and seat occupancy
time to include this extra work.
Queuing
Even within a priority level there may be a large number of distinct sources of traffic. In an
overload situation, it is valuable to prevent one stream of requests from starving others (in
particular, in the relatively common case of a single buggy client flooding the kube-apiserver
with requests, that buggy client would ideally not have much measurable impact on other
clients at all). This is handled by use of a fair-queuing algorithm to process requests that are
assigned the same priority level. Each request is assigned to a flow, identified by the name of
the matching FlowSchema plus a flow distinguisher — which is either the requesting user, the
target resource's namespace, or nothing — and the system attempts to give approximately
https://kubernetes.io/docs/concepts/cluster-administration/_print/ 36/48
6/6/23, 4:02 PM Cluster Administration | Kubernetes
equal weight to requests in different flows of the same priority level. To enable distinct
handling of distinct instances, controllers that have many instances should authenticate with
distinct usernames
After classifying a request into a flow, the API Priority and Fairness feature then may assign
the request to a queue. This assignment uses a technique known as shuffle sharding, which
makes relatively efficient use of queues to insulate low-intensity flows from high-intensity
flows.
The details of the queuing algorithm are tunable for each priority level, and allow
administrators to trade off memory use, fairness (the property that independent flows will all
make progress when total traffic exceeds capacity), tolerance for bursty traffic, and the added
latency induced by queuing.
Exempt requests
Some requests are considered sufficiently important that they are not subject to any of the
limitations imposed by this feature. These exemptions prevent an improperly-configured flow
control configuration from totally disabling an API server.
Resources
The flow control API involves two kinds of resources. PriorityLevelConfigurations define the
available priority levels, the share of the available concurrency budget that each can handle,
and allow for fine-tuning queuing behavior. FlowSchemas are used to classify individual
inbound requests, matching each to a single PriorityLevelConfiguration. There is also a
v1alpha1 version of the same API group, and it has the same Kinds with the same syntax and
semantics.
PriorityLevelConfiguration
A PriorityLevelConfiguration represents a single priority level. Each PriorityLevelConfiguration
has an independent limit on the number of outstanding requests, and limitations on the
number of queued requests.
The bounds on how much concurrency a priority level may lend and how much it may borrow
are expressed in the PriorityLevelConfiguration as percentages of the level's nominal limit.
These are resolved to absolute numbers of seats by multiplying with the nominal limit / 100.0
and rounding. The dynamically adjusted concurrency limit of a priority level is constrained to
lie between (a) a lower bound of its nominal limit minus its lendable seats and (b) an upper
bound of its nominal limit plus the seats it may borrow. At each adjustment the dynamic limits
are derived by each priority level reclaiming any lent seats for which demand recently
appeared and then jointly fairly responding to the recent seat demand on the priority levels,
within the bounds just described.
https://kubernetes.io/docs/concepts/cluster-administration/_print/ 37/48
6/6/23, 4:02 PM Cluster Administration | Kubernetes
Caution: With the Priority and Fairness feature enabled, the total concurrency limit for
the server is set to the sum of --max-requests-inflight and --max-mutating-requests-
inflight. There is no longer any distinction made between mutating and non-mutating
requests; if you want to treat them separately for a given resource, make separate
FlowSchemas that match the mutating and non-mutating verbs respectively.
The queuing configuration allows tuning the fair queuing algorithm for a priority level. Details
of the algorithm can be read in the enhancement proposal, but in short:
Increasing queues reduces the rate of collisions between different flows, at the cost of
increased memory usage. A value of 1 here effectively disables the fair-queuing logic,
but still allows requests to be queued.
Changing handSize allows you to adjust the probability of collisions between different
flows and the overall concurrency available to a single flow in an overload situation.
Note: A larger handSize makes it less likely for two individual flows to collide (and
therefore for one to be able to starve the other), but more likely that a small number
of flows can dominate the apiserver. A larger handSize also potentially increases the
amount of latency that a single high-traffic flow can cause. The maximum number of
queued requests possible from a single flow is handSize * queueLengthLimit.
https://kubernetes.io/docs/concepts/cluster-administration/_print/ 38/48
6/6/23, 4:02 PM Cluster Administration | Kubernetes
FlowSchema
A FlowSchema matches some inbound requests and assigns them to a priority level. Every
inbound request is tested against FlowSchemas, starting with those with the numerically
lowest matchingPrecedence and working upward. The first match wins.
Caution: Only the first matching FlowSchema for a given request matters. If multiple
FlowSchemas match a single inbound request, it will be assigned based on the one with
the highest matchingPrecedence. If multiple FlowSchemas with equal matchingPrecedence
match the same request, the one with lexicographically smaller name will win, but it's
better not to rely on this, and instead to ensure that no two FlowSchemas have the same
matchingPrecedence.
A FlowSchema matches a given request if at least one of its rules matches. A rule matches if
at least one of its subjects and at least one of its resourceRules or nonResourceRules
(depending on whether the incoming request is for a resource or non-resource URL) match
the request.
For the field in subjects, and the verbs , apiGroups , resources , namespaces , and
name
nonResourceURLs fields of resource and non-resource rules, the wildcard * may be specified
to match all values for the given field, effectively removing it from consideration.
Defaults
Each kube-apiserver maintains two sorts of APF configuration objects: mandatory and
suggested.
The mandatory exempt priority level is used for requests that are not subject to flow
control at all: they will always be dispatched immediately. The mandatory exempt
FlowSchema classifies all requests from the system:masters group into this priority
https://kubernetes.io/docs/concepts/cluster-administration/_print/ 39/48
6/6/23, 4:02 PM Cluster Administration | Kubernetes
level. You may define other FlowSchemas that direct other requests to this priority level,
if appropriate.
The mandatory catch-all priority level is used in combination with the mandatory
catch-all FlowSchema to make sure that every request gets some kind of
classification. Typically you should not rely on this catch-all configuration, and should
create your own catch-all FlowSchema and PriorityLevelConfiguration (or use the
suggested global-default priority level that is installed by default) as appropriate.
Because it is not expected to be used normally, the mandatory catch-all priority level
has a very small concurrency share and does not queue requests.
The system priority level is for non-health requests from the system:nodes group, i.e.
Kubelets, which must be able to contact the API server in order for workloads to be able
to schedule on them.
The leader-election priority level is for leader election requests from built-in
controllers (in particular, requests for endpoints , configmaps , or leases coming from
the system:kube-controller-manager or system:kube-scheduler users and service
accounts in the kube-system namespace). These are important to isolate from other
traffic because failures in leader election cause their controllers to fail and restart, which
in turn causes more expensive traffic as the new controllers sync their informers.
The workload-high priority level is for other requests from built-in controllers.
The workload-low priority level is for requests from any other service account, which
will typically include all requests from controllers running in Pods.
The global-default priority level handles all other traffic, e.g. interactive kubectl
commands run by nonprivileged users.
The suggested FlowSchemas serve to steer requests into the above priority levels, and are not
enumerated here.
Each kube-apiserver makes an initial maintenance pass over the mandatory and suggested
configuration objects, and after that does periodic maintenance (once per minute) of those
objects.
For the mandatory configuration objects, maintenance consists of ensuring that the object
exists and, if it does, has the proper spec. The server refuses to allow a creation or update
with a spec that is inconsistent with the server's guardrail behavior.
https://kubernetes.io/docs/concepts/cluster-administration/_print/ 40/48
6/6/23, 4:02 PM Cluster Administration | Kubernetes
Maintenance of a suggested configuration object consists of creating it --- with the server's
suggested spec --- if the object does not exist. OTOH, if the object already exists, maintenance
behavior depends on whether the kube-apiservers or the users control the object. In the
former case, the server ensures that the object's spec is what the server suggests; in the latter
case, the spec is left alone.
The question of who controls the object is answered by first looking for an annotation with
key apf.kubernetes.io/autoupdate-spec . If there is such an annotation and its value is true
then the kube-apiservers control the object. If there is such an annotation and its value is
false then the users control the object. If neither of those conditions holds then the
metadata.generation of the object is consulted. If that is 1 then the kube-apiservers control
the object. Otherwise the users control the object. These rules were introduced in release
1.22 and their consideration of metadata.generation is for the sake of migration from the
simpler earlier behavior. Users who wish to control a suggested configuration object should
set its apf.kubernetes.io/autoupdate-spec annotation to false .
Maintenance also includes deleting objects that are neither mandatory nor suggested but are
annotated apf.kubernetes.io/autoupdate-spec=true .
If you add the following additional FlowSchema, this exempts those requests from rate
limiting.
Caution: Making this change also allows any hostile party to then send health-check
requests that match this FlowSchema, at any volume they like. If you have a web traffic
filter or similar external security mechanism to protect your cluster's API server from
general internet traffic, you can configure rules to block any health check requests that
originate from outside your cluster.
priority-and-fairness/health-for-strangers.yaml
https://kubernetes.io/docs/concepts/cluster-administration/_print/ 41/48
6/6/23, 4:02 PM Cluster Administration | Kubernetes
apiVersion: flowcontrol.apiserver.k8s.io/v1beta3
kind: FlowSchema
metadata:
name: health-for-strangers
spec:
matchingPrecedence: 1000
priorityLevelConfiguration:
name: exempt
rules:
- nonResourceRules:
- nonResourceURLs:
- "/healthz"
- "/livez"
- "/readyz"
verbs:
- "*"
subjects:
- kind: Group
group:
name: "system:unauthenticated"
Diagnostics
Every HTTP response from an API server with the priority and fairness feature enabled has
two extra headers: X-Kubernetes-PF-FlowSchema-UID and X-Kubernetes-PF-PriorityLevel-
UID , noting the flow schema that matched the request and the priority level to which it was
assigned, respectively. The API objects' names are not included in these headers in case the
requesting user does not have permission to view them, so when debugging you can use a
command like
Observability
Metrics
Note: In versions of Kubernetes before v1.20, the labels flow_schema and priority_level
were inconsistently named flowSchema and priorityLevel, respectively. If you're running
Kubernetes versions v1.19 and earlier, you should refer to the documentation for your
version.
When you enable the API Priority and Fairness feature, the kube-apiserver exports additional
metrics. Monitoring these can help you determine whether your configuration is
inappropriately throttling important traffic, or find poorly-behaved workloads that may be
harming system health.
Note: An outlier value in a histogram here means it is likely that a single flow (i.e.,
requests by one user or for one namespace, depending on configuration) is flooding
the API server, and being throttled. By contrast, if one priority level's histogram
shows that all queues for that priority level are longer than those for other priority
https://kubernetes.io/docs/concepts/cluster-administration/_print/ 43/48
6/6/23, 4:02 PM Cluster Administration | Kubernetes
https://kubernetes.io/docs/concepts/cluster-administration/_print/ 44/48
6/6/23, 4:02 PM Cluster Administration | Kubernetes
Debug endpoints
When you enable the API Priority and Fairness feature, the kube-apiserver serves the
following additional paths at its HTTP(S) ports.
https://kubernetes.io/docs/concepts/cluster-administration/_print/ 45/48
6/6/23, 4:02 PM Cluster Administration | Kubernetes
In addition to the queued requests, the output includes one phantom line for each
priority level that is exempt from limitation.
You can get a more detailed listing with a command like this:
Debug logging
At -v=3 or more verbose the server outputs an httplog line for every request, and it includes
the following attributes.
apf_fs : the name of the flow schema to which the request was classified.
apf_pl : the name of the priority level for that flow schema.
apf_iseats : the number of seats determined for the initial (normal) stage of execution
of the request.
apf_fseats : the number of seats determined for the final stage of execution
(accounting for the associated WATCH notifications) of the request.
apf_additionalLatency : the duration of the final stage of execution of the request.
At higher levels of verbosity there will be log lines exposing details of how APF handled the
request, primarily for debugging purposes.
Response headers
APF adds the following two headers to each HTTP response message.
What's next
For background information on design details for API priority and fairness, see the
enhancement proposal. You can make suggestions and feature requests via SIG API
Machinery or the feature's slack channel.
https://kubernetes.io/docs/concepts/cluster-administration/_print/ 46/48
6/6/23, 4:02 PM Cluster Administration | Kubernetes
10 - Installing Addons
Note: This section links to third party projects that provide functionality required by
Kubernetes. The Kubernetes project authors aren't responsible for these projects, which
are listed alphabetically. To add a project to this list, read the content guide before
submitting a change. More information.
This page lists some of the available add-ons and links to their respective installation
instructions. The list does not try to be exhaustive.
Service Discovery
CoreDNS is a flexible, extensible DNS server which can be installed as the in-cluster DNS
for pods.
Infrastructure
KubeVirt is an add-on to run virtual machines on Kubernetes. Usually run on bare-metal
clusters.
The node problem detector runs on Linux nodes and reports system issues as either
Events or Node conditions.
Legacy Add-ons
There are several other add-ons documented in the deprecated cluster/addons directory.
https://kubernetes.io/docs/concepts/cluster-administration/_print/ 48/48