GKE gRPC Ingress LoadBalancing

Sample showing gRPC Loadbalancing via Ingress to Google Kubernetes Engine (GKE). Each RPC is loadbalanced between backend pods!

In this mode one gRPC connection sends 10 rpc messages. The GKE Ingress L7 loadbalancer intercepts that ssl connection and then transmits each RPC to to differnet pods behind the Ingress Service. Since each RPC goes to differnet endpoints, the load is more evenly distributed between all pods.

On the other hand, if you setup a GKE cluster and expose it externally via L4 (type: LoadBalancer), then each client will connect to ONE pod over gRPC. If that client sends the same 10 rpc requests, then one unlucky pod will handle all of them and cause imbalanced load.

The setting that allows its is a specific annotation applied to GKE Ingress which specifies which protocol the backend service uses. In this case, the grpc backend is ‘fe’ below and is using HTTP2 protocol. GKE recognizes the backend and parses the HTTP2 inbound frame for individual messages and sends each rpc to member pods in that service.

Source

You can find the full source here:

Setup

If you want to try it out, first setup a static IP:

gcloud compute addresses create gke-ingress --global
gcloud compute addresses list
NAME REGION ADDRESS STATUS
gke-ingress 35.241.41.138 RESERVED

the static address gke-address is referenced in the GKE ingress file later.

Now setup a GKE cluster

gcloud container  clusters create cluster-grpc \
--zone us-central1-a --num-nodes 3

setup a firewall rule to test direct access to the gRPC server via Network LB (just to show test and the diffence betwen the types of loadbalancers)

gcloud compute firewall-rules create grpc-nlb-firewall \
--allow tcp:50051

deploy the app:

kubectl apply -f fe-deployment.yaml \
-f fe-ingress.yaml \
-f fe-secret.yaml \
-f fe-srv-ingress.yaml \
-f fe-srv-lb.yaml

Wait maybe 10 mins for the Ingress object to give an IP and provision the LB (yes, it may take up 10mins)

In this example, the loadBalancer and Ingress IPs are:

  • LB: 104.155.151.124
  • Ingress: 35.241.41.138

Make gRPC calls via the LB:

Now make a GRPC API call directly to the L4LB. The response back shows the podname that handled the request. Note below 10 requests from the client is handled by one pod. This is imbalanced load.

Now connect via the ingress L7 LB.

Each response is from a differnt pod !!! :)

So now any client connecting via Ingress sees balanced loads across all pods. There are several other modes for grpc on GKE and GCP. Below find some additional blog posts in this series

Other References