Flume性能优化实践

On 2018年3月26日2018年3月26日By yuer

最近公司落地Flume日志采集着实反复了好久，简单记录一下性能优化的核心思路。

初始配置所有batch size、transaction size都是1000，channel的capactiy是10000。

版本一

最初我是按Memory Channel做压测，Taildir的source采集增量日志，Memory Channel缓冲数据，Kafka Sink发送数据。

这里面的瓶颈是Kafka Sink，因为Kafka Sink是单线程同步发送，网络延迟就会导致吞吐上不去，大概10MB+的一个吞吐就封顶了。

版本二

翻看了官方文档，打算试验一下sink group来实现多个kafka sink同时发送，结果性能仍旧10MB+。

分析原理，原来sink group仍旧是个单线程sink，相当于多个kafka sink的代理而已，仅仅实现了轮转负载均衡功能。

一个kafka sink的发送延迟高，轮转压根没有意义。

版本三

于是琢磨如何实现多线程跑多个Kafka Sink，于是仍旧使用1个Memory Channel，配置对应3个Kafka Sink，结果带宽可以升高到30MB的样子，但是极不稳定，来回跳跃。

此时发现Memory Channel的填充率接近90%+，应该是因为容量经常塞满导致的流水线阻塞，通过增加memory channel的capacity到10万，batch size和transaction size增加到1万，吞吐提升到60MB~80MB+，填充率小于10%，已经满足需求。

在transaction size=1000的情况下memory channel被填满，而transaction size=1万的情况下memory channel就不会被填满，其实是通过增加channel批处理的包大小，降低了channel访问的频次，解决的是memory channel的锁瓶颈。

同时，这个优化思路也带来了问题，更大的memory channel capacity带来了更大的数据丢失风险，因为宕机时memory channel里缓冲的数据都会丢失。

版本四

实现多个memory channel轮转，每个memory channel由一个kafka sink消费。

这样做目的有2个：

由多个sink竞争消费1个channel改为各自消费1个channel，锁瓶颈解决。
因为锁瓶颈变小，所以可以仍旧保持较小的channel capacity来保障数据可靠性，比如每个channel容量10000，那么3个channel丢失3万，仍旧优于”版本三”。

实现该功能需要自己开发channel selector插件，实现source流量的轮转分发，可以翻看我之前写的博客。

版本五

同事要求使用file channel，保障队列中数据的可靠性，但是经过测试发现吞吐只能跑到10MB左右，上述所说优化手段均无效。

更换SSD盘也没有带来任何提升，File channel自身填充率极低。

个人怀疑瓶颈在File Channel自身，其事务的提交效率太低，阻塞了source的投递动作，无论如何增加channel数量也无济于事，因为source是单线程的，轮转发往多个File Channel的速度仍旧等于单个File Channel速度，导致后续Sink没有足够数据消费，吞吐无法提升。

从FileChannel代码来看，磁盘读写的相关代码全部被加锁处理：

    synchronized FlumeEventPointer put(ByteBuffer buffer) throws IOException {
      if (encryptor != null) {
        buffer = ByteBuffer.wrap(encryptor.encrypt(buffer.array()));
      }
      Pair<Integer, Integer> pair = write(buffer);
      return new FlumeEventPointer(pair.getLeft(), pair.getRight());
    }

    synchronized void take(ByteBuffer buffer) throws IOException {
      if (encryptor != null) {
        buffer = ByteBuffer.wrap(encryptor.encrypt(buffer.array()));
      }
      write(buffer);
    }

    synchronized void rollback(ByteBuffer buffer) throws IOException {
      if (encryptor != null) {
        buffer = ByteBuffer.wrap(encryptor.encrypt(buffer.array()));
      }
      write(buffer);
    }

    synchronized void commit(ByteBuffer buffer) throws IOException {
      if (encryptor != null) {
        buffer = ByteBuffer.wrap(encryptor.encrypt(buffer.array()));
      }
      write(buffer);
      dirty = true;
      lastCommitPosition = position();
    }

synchronized FlumeEventPointer put(ByteBuffer buffer) throws IOException {

if (encryptor != null) {

buffer = ByteBuffer.wrap(encryptor.encrypt(buffer.array()));

}

Pair<Integer, Integer> pair = write(buffer);

return new FlumeEventPointer(pair.getLeft(), pair.getRight());

}

synchronized void take(ByteBuffer buffer) throws IOException {

if (encryptor != null) {

buffer = ByteBuffer.wrap(encryptor.encrypt(buffer.array()));

}

write(buffer);

}

synchronized void rollback(ByteBuffer buffer) throws IOException {

if (encryptor != null) {

buffer = ByteBuffer.wrap(encryptor.encrypt(buffer.array()));

}

write(buffer);

}

synchronized void commit(ByteBuffer buffer) throws IOException {

if (encryptor != null) {

buffer = ByteBuffer.wrap(encryptor.encrypt(buffer.array()));

}

write(buffer);

dirty = true;

lastCommitPosition = position();

}

另外，日志文件的sync刷盘策略分为两种选项，一种是每次提交事务都刷新，另外一个是定时线程刷新（下面是定时线程）：

        syncExecutor.scheduleWithFixedDelay(new Runnable() {
          @Override
          public void run() {
            try {
              sync();
            } catch (Throwable ex) {
              LOG.error("Data file, " + getFile().toString() + " could not " +
                  "be synced to disk due to an error.", ex);
            }
          }
        }, fsyncInterval, fsyncInterval, TimeUnit.SECONDS);

syncExecutor.scheduleWithFixedDelay(new Runnable() {

@Override

public void run() {

try {

sync();

} catch (Throwable ex) {

LOG.error("Data file, " + getFile().toString() + " could not " +

"be synced to disk due to an error.", ex);

}

}, fsyncInterval, fsyncInterval, TimeUnit.SECONDS);

而这个sync()刷盘操作同样被锁保护的，会占用大量的锁时间：

    /**
     * Sync the underlying log file to disk. Expensive call,
     * should be used only on commits. If a sync has already happened after
     * the last commit, this method is a no-op
     *
     * @throws IOException
     * @throws LogFileRetryableIOException - if this log file is closed.
     */
    synchronized void sync() throws IOException {
      if (!fsyncPerTransaction && !dirty) {
        if (LOG.isDebugEnabled()) {
          LOG.debug(
              "No events written to file, " + getFile().toString() +
                  " in last " + fsyncInterval + " or since last commit.");
        }
        return;
      }
      if (!isOpen()) {
        throw new LogFileRetryableIOException("File closed " + file);
      }
      if (lastSyncPosition < lastCommitPosition) {
        getFileChannel().force(false);
        lastSyncPosition = position();
        syncCount++;
        dirty = false;
      }
    }

/**

* Sync the underlying log file to disk. Expensive call,

* should be used only on commits. If a sync has already happened after

* the last commit, this method is a no-op

* @throws IOException

* @throws LogFileRetryableIOException - if this log file is closed.

synchronized void sync() throws IOException {

if (!fsyncPerTransaction && !dirty) {

if (LOG.isDebugEnabled()) {

LOG.debug(

"No events written to file, " + getFile().toString() +

" in last " + fsyncInterval + " or since last commit.");

}

return;

}

if (!isOpen()) {

throw new LogFileRetryableIOException("File closed " + file);

}

if (lastSyncPosition < lastCommitPosition) {

getFileChannel().force(false);

lastSyncPosition = position();

syncCount++;

dirty = false;

}

降低sync()的调用频率，理论上可以降低锁占用时间，让出更多的锁时间给put与take操作。

flume可以配置这些参数，只是官方文档里并没有说明：

  public static final String FSYNC_PER_TXN = "fsyncPerTransaction";
  public static final boolean DEFAULT_FSYNC_PRE_TXN = true;

  public static final String FSYNC_INTERVAL = "fsyncInterval";
  public static final int DEFAULT_FSYNC_INTERVAL = 5; // seconds.

public static final String FSYNC_PER_TXN = "fsyncPerTransaction";

public static final boolean DEFAULT_FSYNC_PRE_TXN = true;

public static final String FSYNC_INTERVAL = "fsyncInterval";

public static final int DEFAULT_FSYNC_INTERVAL = 5; // seconds.

默认是每个事务都sync，这样当然是为了保障数据可靠性，否则也就没必要用FileChannel了。

我尝试改成了定时sync()，发现吞吐仍旧无法提升，那么我继续猜测问题在于事务的commit部分，也就是Sink做的事情：

 /**
   * Synchronization not required as this method is atomic
   *
   * @param transactionID
   * @param type
   * @throws IOException
   */
  private void commit(long transactionID, short type) throws IOException {
    Preconditions.checkState(open, "Log is closed");
    Commit commit = new Commit(transactionID, WriteOrderOracle.next(), type);
    ByteBuffer buffer = TransactionEventRecord.toByteBuffer(commit);
    int logFileIndex = nextLogWriter(transactionID);
    long usableSpace = logFiles.get(logFileIndex).getUsableSpace();
    long requiredSpace = minimumRequiredSpace + buffer.limit();
    if (usableSpace <= requiredSpace) {
      throw new IOException("Usable space exhausted, only " + usableSpace +
          " bytes remaining, required " + requiredSpace + " bytes");
    }
    boolean error = true;
    try {
      try {
        LogFile.Writer logFileWriter = logFiles.get(logFileIndex);
        // If multiple transactions are committing at the same time,
        // this ensures that the number of actual fsyncs is small and a
        // number of them are grouped together into one.
        logFileWriter.commit(buffer);
        logFileWriter.sync();
        error = false;
      } catch (LogFileRetryableIOException e) {
        if (!open) {
          throw e;
        }
        roll(logFileIndex, buffer);
        LogFile.Writer logFileWriter = logFiles.get(logFileIndex);
        logFileWriter.commit(buffer);
        logFileWriter.sync();
        error = false;
      }

/**

* Synchronization not required as this method is atomic

* @param transactionID

* @param type

* @throws IOException

private void commit(long transactionID, short type) throws IOException {

Preconditions.checkState(open, "Log is closed");

Commit commit = new Commit(transactionID, WriteOrderOracle.next(), type);

ByteBuffer buffer = TransactionEventRecord.toByteBuffer(commit);

int logFileIndex = nextLogWriter(transactionID);

long usableSpace = logFiles.get(logFileIndex).getUsableSpace();

long requiredSpace = minimumRequiredSpace + buffer.limit();

if (usableSpace <= requiredSpace) {

throw new IOException("Usable space exhausted, only " + usableSpace +

" bytes remaining, required " + requiredSpace + " bytes");

}

boolean error = true;

try {

LogFile.Writer logFileWriter = logFiles.get(logFileIndex);

// If multiple transactions are committing at the same time,

// this ensures that the number of actual fsyncs is small and a

// number of them are grouped together into one.

logFileWriter.commit(buffer);

logFileWriter.sync();

error = false;

} catch (LogFileRetryableIOException e) {

if (!open) {

throw e;

}

roll(logFileIndex, buffer);

LogFile.Writer logFileWriter = logFiles.get(logFileIndex);

logFileWriter.commit(buffer);

logFileWriter.sync();

error = false;

}

提交事务也只是写入一条日志标记对应的事务完结了，这样宕机重放日志时就会跳过该事务。

我们发现这个操作总是sync()，虽然这个操作不需要锁保护的样子，但是它占用了sink线程的时间，估计吞吐无法提升也离不开它的关系。

关于File Channel瓶颈，有同学有JAVA调优经验的可以具体给FileChannel加一些调试日志，看看到底慢在哪个环节。

我个人会优先选择使用capacity较小（1万-10万）的的memory channel配合多个sink来实现高吞吐，至于对宕机的那点担心实在没有必要，因为大多数时候memory channel的填充率不足1%，也就是丢失10万*0.01=100条而已。

如果文章帮助您解决了工作难题，您可以帮我点击屏幕上的任意广告，或者赞助少量费用来支持我的持续创作，谢谢~

Flume性能优化实践

版本一

版本二

版本三

版本四

版本五

2 thoughts on “Flume性能优化实践”

发表回复取消回复

版本一

版本二

版本三

版本四

版本五

2 thoughts on “Flume性能优化实践”

发表回复 取消回复

发表回复取消回复