Setting Up a Kubernetes Cluster on Alibaba Cloud: A Detailed Guide with Best Practices

Kubernetes, a powerful open-source container orchestration platform, has revolutionized the way applications are deployed and managed. Alibaba Cloud, a leading cloud computing provider, offers a managed Kubernetes service called ACK (Alibaba Cloud Kubernetes) that simplifies the process of creating and managing Kubernetes clusters.

In this blog post, we'll guide you through the steps of creating your first Kubernetes cluster on Alibaba Cloud, providing detailed explanations and best practices.


Cluster Configuration

Basic Cluster Information

  • Cluster Name: Naming your cluster is the first step. Choose a descriptive name that helps you easily identify the purpose of the cluster, e.g., production-cluster, dev-cluster, etc.
  • Kubernetes Version: Selecting the appropriate Kubernetes version is critical for compatibility with your applications. It's recommended to choose the latest stable version unless you have specific dependencies on an older version.

Best Practices:

  • Always select the latest stable Kubernetes version to take advantage of new features and security updates.
  • Use consistent naming conventions for your clusters to avoid confusion, especially in large-scale environments.

Networking

  • VPC and Subnets: The VPC (Virtual Private Cloud) is the backbone of your cluster's network architecture. Alibaba Cloud provides you with the option to create a new VPC or use an existing one. Ensure the VPC CIDR block does not overlap with other networks in your infrastructure.
  • vSwitch - You can select from 1 to 5 vSwitches within your VPC. This choice is crucial for defining network subnets in different availability zones, which can help ensure higher availability and fault tolerance for your applications.
  • Security Group: Security groups control the inbound and outbound traffic to your nodes. It is important to configure rules that allow necessary communication but block unauthorized access.
  • Access to API Server - Options to expose your Kubernetes API server via an Elastic IP (EIP), which determines how you can access the Kubernetes API externally.
  • Network Plug-in
    • Flannel - Flannel is a simple and easy-to-use network fabric for containers. It’s one of the oldest and most widely adopted solutions in the Kubernetes ecosystem, originally developed by CoreOS. Flannel is designed to let containers on different hosts communicate with each other without complex configurations, making it an ideal choice for simple Kubernetes deployments
    • Terway - Terway is a CNI (Container Network Interface) plugin developed by Alibaba Cloud specifically for their container services, including Kubernetes. It's designed to integrate closely with Alibaba Cloud’s networking capabilities and services, providing a native cloud experience..

Best Practices:

  • Choose non-overlapping CIDR blocks for VPCs, especially if you have multi-cloud or hybrid cloud setups.
  • Use fine-grained security group rules to limit unnecessary traffic and reduce the attack surface of your cluster.
  • Enable SNAT (Source NAT) to allow nodes and applications inside the cluster to access the Internet securely.

Advanced Options

  • Deletion Protection: Deletion protection ensures that the cluster cannot be deleted accidentally. It is highly recommended to enable this option, especially in production environments.
  • Resource Groups: Organizing your resources under a resource group helps in managing multiple resources logically, providing easier tracking and access control.
  • Time Zone - This setting allows you to define the time zone for your cluster, which will affect things like logging and scheduling within the cluster.
  • Cluster Domain: The default domain for internal communication in your cluster is cluster.local. If you require custom DNS settings, this can be adjusted here.

Best Practices:

  • Always enable deletion protection for production environments.
  • Use resource groups to group related resources together, enabling better management and access control across teams.
  • If you’re integrating with external DNS providers, plan your cluster domain structure to avoid conflicts.

Node Pool Configuration

Node Pool Setup

  • Node Pool Name: Each node pool should have a clear name that reflects its role in the cluster, e.g., frontend-nodepool, backend-nodepool.
  • Container Runtime: Containerd is the recommended container runtime for Kubernetes due to its lightweight nature and active support by the Kubernetes community.
  • Managed Node Pool: Enabling this option allows Alibaba Cloud to manage the node pool for you, including tasks such as auto-repair, auto-scaling, and patch management. This can greatly reduce the operational burden on your team.
    • Auto Recovery Rule: Automatically replaces unhealthy nodes to maintain the desired state and performance.
    • Auto Update Rule: Keeps the nodes updated with the latest patches and updates automatically.
    • Auto CVE Patching (OS): Automatically applies security patches to address Common Vulnerabilities and Exposures (CVE) in the operating system.
    • Maintenance Window: Allows you to schedule a maintenance window during which updates and patches can be applied without affecting the cluster's availability.

Best Practices:

  • Use multiple node pools for different workloads (e.g., separate pools for frontend and backend services) to optimize resource allocation.
  • Choose containerd as the container runtime for better performance and security.
  • Enable managed node pools to leverage Alibaba Cloud’s automated patching and auto-repair functionalities.

Instance Types and Disk Configuration

  • Instance Types: Selecting the right instance type (general-purpose, memory-optimized, compute-optimized) ensures your nodes are properly sized to handle your workloads.
  • Disk Options: Alibaba Cloud offers various storage types (ESSD, SSD). Enterprise SSD is ideal for high-performance workloads. Ensure that you allocate enough IOPS based on your application’s needs.
  • Security Hardening: You can choose to enable OS-level security hardening to enhance the security posture of your node instances. This option is crucial for protecting your infrastructure against vulnerabilities and ensuring compliance with security standards.
  • Logon Type: This setting lets you choose between using a key pair or a password for instance access, which affects the security and management of access to the nodes.

Best Practices:

  • Select instance types based on workload characteristics (e.g., memory-intensive applications should use memory-optimized instances).
  • Use Enterprise SSD for high-throughput, low-latency workloads.
  • Ensure you provision enough disk space for both the system and data disks to avoid running out of storage.

Scaling and Performance

  • System Disk:
    • Category: You can choose the type of disk, such as SSD or HDD, depending on performance needs. The example shows an "Enterprise SSD" selected.
    • Size: The size of the system disk can be chosen based on the expected storage needs of your applications running on the nodes.
    • IOPS: Indicates the input/output operations per second that the disk can handle, important for performance-sensitive applications.
  • Autoscaling: This allows the cluster to scale in or out based on load, ensuring optimal resource utilization without manual intervention.
  • Expected Nodes: Set the expected number of nodes for your workload. Autoscaling ensures this number is maintained automatically during varying load conditions.
  • Click on provided link if you are first time user, you have to authorize role to access ACK

Best Practices:

  • Always enable autoscaling for production environments to handle traffic spikes without manual intervention.
  • Set up alerts and monitoring to keep track of when autoscaling events occur to optimize cost and performance.

Component Configuration

Ingress and Logging

  • Ingress Controller: Alibaba Cloud supports multiple ingress controllers such as ALB and Nginx. Choose based on your requirements; for example, ALB integrates directly with Alibaba Cloud services, while Nginx provides flexibility and customization.
  • Volume Plug-in: Selection of the Container Storage Interface (CSI), which facilitates storage integration and management in Kubernetes environments.
  • Logging and Monitoring: Setting up logging (via Alibaba Cloud Log Service) and monitoring (via Prometheus) helps track cluster performance and troubleshoot issues efficiently.

Best Practices:

  • Use ALB Ingress for seamless integration with Alibaba Cloud’s services.
  • Enable logging and monitoring from the start to have a proactive approach to troubleshooting and performance monitoring.

Security and Monitoring

  • Cost Suite: Activates cost management insights, allowing you to monitor and analyze resource usage and expenditures across the Kubernetes cluster, namespaces, node pools, and workloads.
  • Log Service:
    • Enables centralized log management for the cluster.
    • Provides options to create or select a log service project for cluster logging.
    • Includes features like the Ingress Dashboard for access log analysis and monitoring, and the installation of a node-problem-detector for enhanced operational visibility and alert management.
  • Alerts: Options to use default or custom alert rules to manage and respond to events within the cluster effectively.
  • Log Collection for Control Plane Components: When enabled, logs from critical Kubernetes control plane components (apiserver, controller-manager, scheduler) are collected, helping with deeper operational insights and troubleshooting.
  • Cluster Inspection: This feature scans the cluster for potential security and operational risks, providing suggestions for mitigation to maintain the cluster’s health and security.
  • Cluster Inspection: Enable this feature to automatically scan for security vulnerabilities and operational risks. This feature is crucial for production clusters.
  • Prometheus Monitoring: Prometheus provides in-depth monitoring of your Kubernetes environment, tracking key metrics for performance and health.

Best Practices:

  • Regularly run cluster inspections to detect security vulnerabilities and mitigate risks.
  • Use Prometheus in conjunction with Grafana for advanced dashboards and visualizations to monitor cluster health.

Confirm Order Prior Check

Dependency and Pre-Launch Checks

  • Final Authorization and Service Checks: Before launching the cluster, Alibaba Cloud performs a series of checks to ensure all required roles, services, and permissions are in place. Pay attention to failed checks and resolve them before proceeding to avoid potential issues in production.
  • Services and Role Authorization: For features like managed node pools, logging, and Prometheus, you need to assign the relevant roles and ensure services like Log Service and Apsara NAS are activated.

Best Practices:

  • Carefully review all failed checks before proceeding. Activate services like Log Service and Prometheus to monitor the cluster.
  • Ensure that all role authorizations (e.g., AliyunCISDefaultRole, AliyunCSManagedNClRole) are correctly assigned to avoid interruptions in cluster management.

Conclusion

By following these steps and incorporating best practices, you can successfully deploy your first Kubernetes cluster on Alibaba Cloud. Kubernetes provides a powerful platform for running containerized applications, and Alibaba Cloud's managed Kubernetes service simplifies the process. With proper planning and configuration, you can leverage the benefits of Kubernetes to build scalable and reliable applications..