Exposing Kafka through different AWS VPCs

Apache Kafka is an streaming distributed platform that allows companies and organizations to have a central platform where data from their different services can be streamed. It is a resilient distributed platform that ensures zero data loss with its system of data replication over the kafka cluster.

Kafka implements the architecture model Publish-Subscriber. This architecture solves the common problem in organizations about having interconnected services around all the organization, what is commonly known as Spaguetti Architecture.

Without going into too much detail about how Kafka works and how the data is partitioned over all the kafka brokers, this is a simple diagram that shows what Kafka does and what it is the Publish-Subscriber model:

The main objective of this post is to show how a Kafka cluster deployed in AWS can be migrated from a single VPC to a multi VPC architecture.

The main reason to create this kind of architecture instead of a single vpc one, is due to flexibility. The producers can be deployed in different VPCs from different AWS accounts and the Kafka cluster can be deployed in its VPC.

The communication between AWS accounts is achieved by the VPC endpoints. A VPC endpoint Service is created on the Kafka pipeline VPC that is routing the data to an NLB connected to the Kafka brokers. The producer side communicates with the VPC Endpoint Service using a VPC Endpoint.

Structuring a bit, the work to be done can be summarized in:

  1. Kafka pipeline VPC
    • Deploy the NLB and the VPC Endpoint Service.
    • Configure the Kafka instances to add the VPC endpoint Service endpoints as advertised listeners.
  2. Producer VPC
    • Deploy the VPC Endpoint and add a Route 53 endpoint.

Implementation

Network Load Balancer and Target Groups

It is important to notice that there is a one to one relationship between the Kafka Broker(EC2 instance) and the TargetGroup.

It is required to create an NLB. To create an NLB is quite simple:

NetworkLoadBalancer:
  Type: AWS::ElasticLoadBalancingV2::LoadBalancer
  Properties:
    Name: !Sub ${Function}-${Environment}
    Type: network
    Scheme: internal
    Subnets: !Ref NLBSubnets
    LoadBalancerAttributes:
      - Key: load_balancing.cross_zone.enabled
        Value: true

It is important to setup the property load_balancing.cross_zone.enabled to true. It allows the NLB to route the traffic to all the AZ. Our kafka cluster CF template deploys with some logic, the different kafka brokers in a different AZ to comply with the highly availability requirements in case of a AZ failover. By default this property is set to false, so, the NLB just routes data to one subnet. For more information about this property you can visit NLB cross zone balancing

The NLB contains listeners and each listener has an associated target group.

NLBListener:
  Type: AWS::ElasticLoadBalancingV2::Listener
  DependsOn:
    - NLBTargetGroup
  Properties:
    LoadBalancerArn: !Ref LoadBalancerArn
    Port: !Ref NLBPort
    Protocol: TCP
    DefaultActions:
      - Type: forward
        TargetGroupArn: !Ref NLBTargetGroup
NLBTargetGroup:
  Type: AWS::ElasticLoadBalancingV2::TargetGroup
  Properties:
    Name: !Sub ${Function}-${Environment}-T${BrokerId}${VPCEndpointCount}
    VpcId: !Ref VpcId
    Port: !Ref NLBPort
    Protocol: TCP
    HealthCheckIntervalSeconds: 10
    HealthCheckPort: !Ref TargetPort
    HealthCheckProtocol: TCP
    HealthyThresholdCount: 3
    UnhealthyThresholdCount: 3
    Targets:
      - Id:
          Ref: InstanceId
        Port: !Ref TargetPort

There are 2 ports to have in consideration here:

  • NLB port
    • This is the port that is used by the clients. This is the “public” port.
    • These ports start by 2000, for example, 20000, 20001, 20002…
  • Target port
    • Port that the Kafka broker instances listen.

if we have a 3 nodes Kafka Cluster we would need 3 NLB TargetGroup-Listener pairs.

This logic that seems simple it is not possible to be implemented in CloudFormation. CloudFormation is a templating language, but it does not allow to include this kind of logic. To continue using CloudFormation and implement this requirement the only possible way we thought about was to include Jinga2 code.


{% for brokerId in range(env['NUMBER_OF_BROKERS'] | int) %}
   #Here would be defined the EC2 instance component..  

  NLBTarget{{ iter }}Broker{{brokerId}}:
    DependsOn:
      - Kafka{{ brokerId }}
      - NetworkLoadBalancer
    Type: AWS::CloudFormation::Stack
    Properties:
      TemplateURL: !Sub https://s3-${AWS::Region}.amaz...
      Parameters:
        Environment: !Ref Environment
        Function: !Ref Function
        VpcId:
          "Fn::ImportValue": !Ref VpcId
        BrokerId: {{ brokerId }}
        NLBPort: {{ first_digit_port }}{{ '00' if ((brokerId | int)  <10 ) else '0' if ((brokerId | int)  <100 ) else '' if ((brokerId | int)  < 1000 )  else '-1'}}{{ brokerId }}
        TargetPort: 909{{ first_digit_port }}
        LoadBalancerArn: !GetAtt NetworkLoadBalancer.Outputs.NLB
        InstanceId: !GetAtt Kafka{{ brokerId }}.Outputs.EC2Instance
        VPCEndpointCount: {{ iter }}
  {% endfor %}
  

With this code we can dynamically create the number of target groups that are required for the NLB. You do not need to go into too much detail regarding the previous code, but just realize that there is a loop that iterates over the number of brokers. This allows to create multiple TargetGroup-Listener pairs per Kafka Broker per client, as it was showed above.

VPC Endpoint Service

The VPC Endpoint communication is based in two components. The VPC Endpoint Service, defined in the target VPC and the VPC Endpoint, that is defined in the client side of this communication. In our case the VPC Endpoint Service is deployed in the Kafka pipeline VPC and the VPC Endpoint is deployed in the Producers VPC.

This is the CF template associated to the VPC Endpoint Service:

  VPCEndpoint:
    Type: AWS::EC2::VPCEndpointService
    DependsOn:
      - NetworkLoadBalancer
    Properties:
      NetworkLoadBalancerArns: !GetAtt NetworkLoadBalancer.Outputs.Ref

This is the CF template associated to the VPC Endpoint:

  VPCEndpoint:
    Type: AWS::EC2::VPCEndpoint
    DependsOn:
      - SecurityGroup
    Properties:
      ServiceName: !Ref ServiceName
      VpcEndpointType: Interface
      VpcId: !Ref VpcId
      SubnetIds: !Ref SubnetIds
      SecurityGroupIds:
        - !Ref SecurityGroup

Tuning Kafka to listen for VPC Endpoints clients

The server.properties file is the main file to configure your kafka broker.

The properties that we had to change to adapt our previous brokers to the new architecture are:

advertised.listeners=INTERNAL_PLAINTEXT://:9092,CLIENT://aonxdwkafka.prod.aws.ean:20
listeners=INTERNAL_PLAINTEXT://:9092,CLIENT://:9093
inter.broker.listener.name=INTERNAL_PLAINTEXT
listener.security.protocol.map=CLIENT:SSL,INTERNAL_PLAINTEXT:PLAINTEXT

Lets explain these properties.

The advertised listeners are the endpoints that are advertised to connect with Kafka. There is one listener that accept internal communication within the same Kafka VPC. And there is one advertised listener for the client.

The listeners properties specify the different listeners used by the broker and the internal port that they are using. For internal interbroker communication it is used the port 9092. For the client the target port is 9093.

The inter broker listener name specifies the listener that is being used for interbroker communication. Obviously it would make no sense to use the client listener.

The listener security protocol map specifies the security protocol used by each of the listeners. The possible values are SSL and plaintext. If this property is not specified, the default value is plaintext.

Conclusion

As you can see creating a dynamic and scalable kafka cluster that can be used from outside of the VPC is not simple and can be a bit overengineered. But at the same time it is fun and it is doable just like everything in life.

With this implementation it is already possible to send data from one account to a Kafka cluster placed in another account. In the next chapter we will see how to make this communication secure using SSL.