Knavigator

Overview

Knavigator is a project designed to analyze, optimize, and compare scheduling systems, with a focus on AI/ML workloads. It addresses various needs, including testing, troubleshooting, benchmarking, chaos engineering, performance analysis, and optimization.

The term "knavigator" is derived from "navigator," with a silent "k" prefix representing "kubernetes." Much like a navigator, this initiative assists in charting a secure route and steering clear of obstacles within the cluster.

Knavigator interfaces with Kubernetes clusters to manage tasks such as manipulating with Kubernetes objects, evaluating PromQL queries, as well as executing specific operations.

Knavigator can operate both outside and inside a Kubernetes cluster, leveraging the Kubernetes API for task management.

To facilitate large-scale experiments without the overhead of running actual user workloads, Knavigator utilizes KWOK for creating virtual nodes in extensive clusters.

Architecture

Components

K8S control plane: a set of components that manage the state and configuration of a vanilla Kubernetes cluster.
Scheduling Framework: cloud-native job scheduling system for batch, HPC, AI/ML, and similar applications in a Kubernetes cluster.
KWOK: Allows for the rapid setup of simulated Kubernetes clusters with minimal resource usage.
Knavigator: Facilitates communication with the Kubernetes cluster via the Kubernetes API, enabling task management and data retrieval.
Metrics & Dashboard: Gathers and processes metrics from the cluster, focusing on scheduling performance and resource utilization.

Workflow

Knavigator offers versatile configuration options, allowing it to function independently, serve as an HTTP/gRPC server, or seamlessly integrate as a package or library within other systems.

In its standalone mode, Knavigator can be set up using a descriptive YAML file, where users specify the sequence of tasks to be executed. This mode is ideal for isolated testing scenarios where Knavigator operates independently.

Alternatively, in server or package configurations, Knavigator can receive a series of API calls to define the tasks to be performed. This mode facilitates integration with existing systems or frameworks, providing flexibility in how tasks are defined and managed.

Regardless of the configuration mode, Knavigator executes tasks sequentially. Each task is dependent on the successful completion of the preceding one. Therefore, if any task fails during execution, the entire test is marked as failed. This ensures comprehensive testing and accurate reporting of results, maintaining the integrity of the testing process.

Demo

Here's a demo showing how to install and configure Knavigator, and run an example test that deploys a k8s job in a minikube cluster.

Name		Name	Last commit message	Last commit date
Latest commit History 91 Commits
.github/workflows		.github/workflows
charts		charts
cmd/knavigator		cmd/knavigator
demos		demos
docs		docs
hack		hack
pkg		pkg
resources		resources
tests/ci		tests/ci
.gitignore		.gitignore
.golangci.yml		.golangci.yml
.xresources		.xresources
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
go.mod		go.mod
go.sum		go.sum

License

NVIDIA/knavigator

Folders and files

Latest commit

History

Repository files navigation

Knavigator

Overview

Architecture

Components

Workflow

Demo

Documentation

About

Resources

License

Stars

Watchers

Forks

Languages