（二）Kafka 地理复制之配置地理复制

一. 前言

二. 配置地理复制（Configuring Geo-Replication）

2.4. 创建和启用复制流（Creating and Enabling Replication Flows）

2.5. 配置复制流（Configuring Replication Flows）

2.6. 保护复制流（Securing Replication Flows）

2.7. 目标集群中复制 Topic 的自定义命名（Custom Naming of Replicated Topics in Target Clusters）

2.8. 防止配置冲突（Preventing Configuration Conflicts）

2.9. 最佳实践（Best Practice: Consume from Remote, Produce to Local）

2.10. 示例：主动/被动高可用性部署

2.11. 示例：主动/主动高可用性部署

2.12. 示例：多群集地理复制

一. 前言

接一篇《（一）Kafka 地理复制之配置地理复制》，第二节从 2.4 小节开始。

二. 配置地理复制（Configuring Geo-Replication）

2.4. 创建和启用复制流（Creating and Enabling Replication Flows）

原文引用：To define a replication flow, you must first define the respective source and target Kafka clusters in the MirrorMaker configuration file.

clusters (required): comma-separated list of Kafka cluster "aliases"
{clusterAlias}.bootstrap.servers (required): connection information for the specific cluster; comma-separated list of "bootstrap" Kafka brokers

Example: Define two cluster aliases primary and secondary, including their connection information.

要定义复制流，必须首先在 MirrorMaker 配置文件中定义相应的源和目标 Kafka 集群。

clusters（必需）：Kafka 集群“别名”的逗号分隔列表。
{clusterAlias}.bootstrap.servers（必需）：特定集群的连接信息；逗号分隔的“bootstrap”Kafka Broker 列表。

示例：定义两个别名为 primary 和 secondary 的集群，包括它们的连接信息。

clusters = primary, secondary
primary.bootstrap.servers = broker10-primary:9092,broker-11-primary:9092
secondary.bootstrap.servers = broker5-secondary:9092,broker6-secondary:9092

原文引用：Secondly, you must explicitly enable individual replication flows with {source}->{target}.enabled = true as needed. Remember that flows are directional: if you need two-way (bidirectional) replication, you must enable flows in both directions.

其次，根据需要您必须显式地启用单个复制流 {source}->{target}.enabled=true。请记住，流是定向的：如果需要相互（双向）复制，则必须启用双向流。

# Enable replication from primary to secondary
primary->secondary.enabled = true

原文引用：By default, a replication flow will replicate all but a few special topics and consumer groups from the source cluster to the target cluster, and automatically detect any newly created topics and groups. The names of replicated topics in the target cluster will be prefixed with the name of the source cluster (see section further below). For example, the topic foo in the source cluster us-west would be replicated to a topic named us-west.foo in the target cluster us-east.

The subsequent sections explain how to customize this basic setup according to your needs.

默认情况下，复制流会将除少数特殊 Topic 和消费者组外的所有 Topic 和消费者组从源集群复制到目标集群，并自动检测任何新创建的 Topic 和组。目标集群中已复制 Topic 的名称将以源集群的名称为前缀（请参阅下面的一节）。例如，源集群 us-west 中的 Topic foo 将复制到目标集群 us- east 中名为 us-west.foo 的 Topic。

接下来的部分将解释如何根据您的需要自定义此基本设置。

2.5. 配置复制流（Configuring Replication Flows）

原文引用：The configuration of a replication flow is a combination of top-level default settings (e.g., topics), on top of which flow-specific settings, if any, are applied (e.g., us-west->us-east.topics). To change the top-level defaults, add the respective top-level setting to the MirrorMaker configuration file. To override the defaults for a specific replication flow only, use the syntax format {source}->{target}.{config.name}.

复制流的配置是顶级默认设置（例如，Topic）的组合，如果有 flow-specific 配置，则在这些设置之上应用（例如，us-west->us-east.topics）。要更改顶级默认值，请将相应的顶级设置添加到MirrorMaker 配置文件中。要仅覆盖特定复制流的默认值，请使用语法格式 {source}->{target}.{config.name}。

原文引用：The most important settings are:

topics: list of topics or a regular expression that defines which topics in the source cluster to replicate (default: topics = .*)
topics.exclude: list of topics or a regular expression to subsequently exclude topics that were matched by the topics setting (default: topics.exclude = .*[\-\.]internal, .*\.replica, __.*)
groups: list of topics or regular expression that defines which consumer groups in the source cluster to replicate (default: groups = .*)
groups.exclude: list of topics or a regular expression to subsequently exclude consumer groups that were matched by the groups setting (default: groups.exclude = console-consumer-.*, connect-.*, __.*)
{source}->{target}.enable: set to true to enable the replication flow (default: false)

Example:

最重要的设置是：

topics：Topic 列表或正则表达式，用于定义源集群中要复制的 Topic（默认值：topics=.*）
topics.exclude：Topic 列表或正则表达式，用于随后排除与 Topic 设置匹配的 Topic（默认值：topics.exclude=.*[\-\.]internal，.*\.recopy，__.*）
groups：定义源集群中要复制的消费者组的 Topic 或正则表达式列表（默认值：groups=.*）
groups.exclude：Topic 列表或正则表达式，用于随后排除由组设置匹配的消费者组（默认值：groups.excelle=console-consumer-.*，connect-.*，__.*）
{source}->{target}.enable：设置为 true 以启用复制流（默认值：false）。

实例

# Custom top-level defaults that apply to all replication flows
topics = .*
groups = consumer-group1, consumer-group2

# Don't forget to enable a flow!
us-west->us-east.enabled = true

# Custom settings for specific replication flows
us-west->us-east.topics = foo.*
us-west->us-east.groups = bar.*
us-west->us-east.emit.heartbeats = false

原文引用：Additional configuration settings are supported which can be left with their default values in most cases. See MirrorMaker Configs.

支持其他配置设置，在大多数情况下可以保留其默认值。请参阅 MirrorMaker 配置。

2.6. 保护复制流（Securing Replication Flows）

原文引用：MirrorMaker supports the same security settings as Kafka Connect, so please refer to the linked section for further information.

Example: Encrypt communication between MirrorMaker and the us-east cluster.

MirrorMaker 支持与 Kafka Connect 相同的安全设置，因此请参阅链接部分了解更多信息。

示例：加密 MirrorMaker 和 us-east 集群之间的通信。

us-east.security.protocol=SSL
us-east.ssl.truststore.location=/path/to/truststore.jks
us-east.ssl.truststore.password=my-secret-password
us-east.ssl.keystore.location=/path/to/keystore.jks
us-east.ssl.keystore.password=my-secret-password
us-east.ssl.key.password=my-secret-password

2.7. 目标集群中复制 Topic 的自定义命名（Custom Naming of Replicated Topics in Target Clusters）

原文引用：Replicated topics in a target cluster—sometimes called remote topics—are renamed according to a replication policy. MirrorMaker uses this policy to ensure that events (aka records, messages) from different clusters are not written to the same topic-partition. By default as per DefaultReplicationPolicy, the names of replicated topics in the target clusters have the format {source}.{source_topic_name}:

目标群集中复制的 Topic（有时称为远程 Topic）将根据复制策略进行重命名。MirrorMaker 使用此策略来确保来自不同集群的事件（也称为记录、消息）不会写入同一 Topic 分区。默认情况下，根据 DefaultReplicationPolicy，目标集群中已复制 Topic 的名称的格式为 {source}.{source_topic_name}：

us-west         us-east
=========       =================
                bar-topic
foo-topic  -->  us-west.foo-topic

原文引用：You can customize the separator (default: .) with the replication.policy.separator setting:

您可以使用 replication.policy.separator 设置自定义分隔符（默认值：.）：

# Defining a custom separator
us-west->us-east.replication.policy.separator = _

原文引用：If you need further control over how replicated topics are named, you can implement a custom ReplicationPolicy and override replication.policy.class (default is DefaultReplicationPolicy) in the MirrorMaker configuration.

如果需要进一步控制复制 Topic 的命名方式，可以在 MirrorMaker 配置中实现自定义ReplicationPolicy 并覆盖 replication.policy.class（默认为 DefaultReplicationPolicy）。

2.8. 防止配置冲突（Preventing Configuration Conflicts）

原文引用：MirrorMaker processes share configuration via their target Kafka clusters. This behavior may cause conflicts when configurations differ among MirrorMaker processes that operate against the same target cluster.

For example, the following two MirrorMaker processes would be racy:

MirrorMaker 进程通过其目标 Kafka 集群共享配置。当针对同一目标群集运行的 MirrorMaker 进程之间的配置不同时，此行为可能会导致冲突。

例如，以下两个 MirrorMaker 进程将是 racy 进程：

# Configuration of process 1
A->B.enabled = true
A->B.topics = foo

# Configuration of process 2
A->B.enabled = true
A->B.topics = bar

原文引用：In this case, the two processes will share configuration via cluster B, which causes a conflict. Depending on which of the two processes is the elected "leader", the result will be that either the topic foo or the topic bar is replicated, but not both.

在这种情况下，两个进程将通过集群 B 共享配置，这会导致冲突。根据这两个进程中哪一个是当选的“Leader”，结果将是复制 Topic foo 或 Topic 栏，但不是两者都复制。

原文引用：It is therefore important to keep the MirrorMaker configuration consistent across replication flows to the same target cluster. This can be achieved, for example, through automation tooling or by using a single, shared MirrorMaker configuration file for your entire organization.

因此，重要的是要在到同一目标群集的复制流之间保持 MirrorMaker 配置的一致性。例如，这可以通过自动化工具或为整个组织使用单个共享的 MirrorMaker 配置文件来实现。

2.9. 最佳实践（Best Practice: Consume from Remote, Produce to Local）

原文引用：To minimize latency ("producer lag"), it is recommended to locate MirrorMaker processes as close as possible to their target clusters, i.e., the clusters that it produces data to. That's because Kafka producers typically struggle more with unreliable or high-latency network connections than Kafka consumers.

为了最大限度地减少延迟（“生产者滞后”），建议将 MirrorMaker 进程定位在尽可能靠近其目标集群的位置，即它生成数据的集群。这是因为 Kafka 生产者通常比 Kafka 消费者更难处理不可靠或高延迟的网络连接。

First DC          Second DC
==========        =========================
primary --------- MirrorMaker --> secondary
(remote)                           (local)

原文引用：To run such a "consume from remote, produce to local" setup, run the MirrorMaker processes close to and preferably in the same location as the target clusters, and explicitly set these "local" clusters in the --clusters command line parameter (blank-separated list of cluster aliases):

要运行这样的“从远程消费，从生产到本地”设置，请在目标集群附近运行 MirrorMaker 进程，最好在与目标集群相同的位置运行，并在 --clusters 命令行参数中显式设置这些“local”集群（集群别名的空白分隔列表）：

# Run in secondary's data center, reading from the remote `primary` cluster
$ ./bin/connect-mirror-maker.sh connect-mirror-maker.properties --clusters secondary

原文引用：The --clusters secondary tells the MirrorMaker process that the given cluster(s) are nearby, and prevents it from replicating data or sending configuration to clusters at other, remote locations.

--clusters 辅助“通知” MirrorMaker 进程给定的集群就在附近，并阻止它复制数据或将配置发送到其他远程位置的集群。

2.10. 示例：主动/被动高可用性部署

原文引用：The following example shows the basic settings to replicate topics from a primary to a secondary Kafka environment, but not from the secondary back to the primary. Please be aware that most production setups will need further configuration, such as security settings.

以下示例显示了将 Topic 从主环境复制到辅助 Kafka 环境的基本设置，而不是从辅助环境复制回主环境。请注意，大多数生产设置都需要进一步配置，例如安全设置。

# Unidirectional flow (one-way) from primary to secondary cluster
primary.bootstrap.servers = broker1-primary:9092
secondary.bootstrap.servers = broker2-secondary:9092

primary->secondary.enabled = true
secondary->primary.enabled = false

primary->secondary.topics = foo.*  # only replicate some topics

2.11. 示例：主动/主动高可用性部署

原文引用：The following example shows the basic settings to replicate topics between two clusters in both ways. Please be aware that most production setups will need further configuration, such as security settings.

以下示例显示了在两个集群之间以两种方式复制 Topic 的基本设置。请注意，大多数生产设置都需要进一步配置，例如安全设置。

# Bidirectional flow (two-way) between us-west and us-east clusters
clusters = us-west, us-east
us-west.bootstrap.servers = broker1-west:9092,broker2-west:9092
Us-east.bootstrap.servers = broker3-east:9092,broker4-east:9092

us-west->us-east.enabled = true
us-east->us-west.enabled = true

原文引用：Note on preventing replication "loops" (where topics will be originally replicated from A to B, then the replicated topics will be replicated yet again from B to A, and so forth): As long as you define the above flows in the same MirrorMaker configuration file, you do not need to explicitly add topics.exclude settings to prevent replication loops between the two clusters.

关于防止复制“循环”的注意事项（其中 Topic 最初将从 A 复制到 B，然后复制的 Topic 将再次从B 复制到 A，依此类推）：只要在同一 MirrorMaker 配置文件中定义了以上流，就不需要显式添加topic.exclude 设置来防止两个集群之间的复制循环。

2.12. 示例：多群集地理复制

原文引用：Let's put all the information from the previous sections together in a larger example. Imagine there are three data centers (west, east, north), with two Kafka clusters in each data center (e.g., west-1, west-2). The example in this section shows how to configure MirrorMaker (1) for Active/Active replication within each data center, as well as (2) for Cross Data Center Replication (XDCR).

First, define the source and target clusters along with their replication flows in the configuration:

让我们把前面部分的所有信息放在一个更大的例子中。假设有三个数据中心（west, east, north），每个数据中心有两个 Kafka 集群（例如，west-1、west-2）。本节中的示例显示如何配置 MirrorMaker（1）用于每个数据中心内的活动/活动复制，以及（2）用于跨数据中心复制（XDCR）。

首先，在配置中定义源集群和目标集群及其复制流：

# Basic settings
clusters: west-1, west-2, east-1, east-2, north-1, north-2
west-1.bootstrap.servers = ...
west-2.bootstrap.servers = ...
east-1.bootstrap.servers = ...
east-2.bootstrap.servers = ...
north-1.bootstrap.servers = ...
north-2.bootstrap.servers = ...

# Replication flows for Active/Active in West DC
west-1->west-2.enabled = true
west-2->west-1.enabled = true

# Replication flows for Active/Active in East DC
east-1->east-2.enabled = true
east-2->east-1.enabled = true

# Replication flows for Active/Active in North DC
north-1->north-2.enabled = true
north-2->north-1.enabled = true

# Replication flows for XDCR via west-1, east-1, north-1
west-1->east-1.enabled  = true
west-1->north-1.enabled = true
east-1->west-1.enabled  = true
east-1->north-1.enabled = true
north-1->west-1.enabled = true
north-1->east-1.enabled = true

原文引用：Then, in each data center, launch one or more MirrorMaker as follows:

然后，在每个数据中心中，启动一个或多个 MirrorMaker，如下所示：

# In West DC:
$ ./bin/connect-mirror-maker.sh connect-mirror-maker.properties --clusters west-1 west-2

# In East DC:
$ ./bin/connect-mirror-maker.sh connect-mirror-maker.properties --clusters east-1 east-2

# In North DC:
$ ./bin/connect-mirror-maker.sh connect-mirror-maker.properties --clusters north-1 north-2

原文引用：With this configuration, records produced to any cluster will be replicated within the data center, as well as across to other data centers. By providing the --clusters parameter, we ensure that each MirrorMaker process produces data to nearby clusters only.

使用此配置，生成到任何集群的记录都将在数据中心内复制，也可以跨数据中心复制到其他数据中心。通过提供 --clusters 参数，我们确保每个 MirrorMaker 进程只向附近的集群生成数据。

原文引用：Note: The --clusters parameter is, technically, not required here. MirrorMaker will work fine without it. However, throughput may suffer from "producer lag" between data centers, and you may incur unnecessary data transfer costs.

注意：从技术上讲，这里不需要 --clusters 参数。MirrorMaker 在没有它的情况下可以正常工作。但是，吞吐量可能会受到数据中心之间“生产者滞后”的影响，并且您可能会产生不必要的数据传输成本。