[우아한테크코스] 9월 28일 TIL

9 minute read

[infra] elastic search

로그 파일이 계속 쌓이고 있다. 클라우드 워치로 로그를 보고 있지만 디스크를 차지한다.

뿐만 아니라, 로드밸런싱한 서버 중에서 어떠한 서버에서 문제가 발생했는지 알기 어렵다.

로그 관제 시스템을 만들고 싶어 엘라스틱 서치를 도입했다.

엘라스틱 서치는 검색엔진으로 데이터 검색과 분석을 위해 사용된다. 서버로부터 데이터를 수집하고 실시간으로 검색, 분석을 통하여 시각화할 수 있도록 도와준다.

Apache Lucene을 기반으로 하는 Java 오픈소스 분산 검색 엔진이다. 방대한 데이터를 빠르게 (거의 실시간으로) 저장, 검색, 분석해 원하는 결과를 도출해낼 수 있다.

Logstash

데이터 수집과 저장을 위한 것으로 데이터 저장소에서 데이터를 입력받고 데이터 확장, 변경, 필터링, 삭제 등의 필터를 거쳐 데이터 저장소로 전송하는 출력으로 데이터를 처리한다.

Kibana

시각화 도구로 검색과 집계 기능을 통해 엘라스틱 서치로부터 가져와 웹 도구로 시각화한다.

어떤 단어가 검색되었는지, 어디서 검색을 많이 했는지 등의 대시보드 구축
Curator로 오래된 로그 자동 관리
aws s3를 이용한 로그 백업

참고

슬랙 연동

elasticsearch

가이드북

엘라스틱서치는 index를 database로 보며 table은 type, column은 field, row는 doc으로 저장한다.

cluster

가장 큰 시스템 단위로 최소 하나 이상의 노드로 이루어진 노드의 집합

master 노드를 필수적으로 가지고 있는 하나의 시스템을 의미한다.

서로 다른 클러스터 간 데이터는 공유될 수 없다.

여러 서버에 하나의 클러스터

http 포트(9200~9299): 클라이언트와의 통신

Tcp 포트(9300~9399): 노드 간 데이터 교환
하나의 서버에 여러 클러스터
디스커버리: 네트워크 상의 다른 노드들을 찾아 하나의 클러스터로 바인딩

참고

node

elasticsearch를 구성하는 하나의 단위 프로세스, 하나의 서버에 하나의 elasticsearch 서비스 올리면 그게 elasticsearch node이다.

지금은 단위가 작아 하나의 노드를 가진 하나의 클러스터를 만들었지만, 노드의 자원이 많이 소모될 수록 노드를 쪼개서 관리해야한다. 나누어서 crud 전용, data 전용 등으로 관리해야 한다.
- master-eligible node
  
  마스터 노드가 될 수 있는 노드로 인덱스 생성과 삭제, 클러스터 노드들의 추적과 관리, 샤딩 결정
- Data node
  
  crud 작업을 하는 노드로 cpu, 메모리 등의 자원을 많이 소모하므로 모니터링 필요
  
  master 노드와의 분리 추천
- ingest node
  
  데이터 변환과 같은 파이프라인 실행
- Coordination only node
  
  data, master-eligible node 대행 노드(로드밸런서)
index

database와 같은 개념으로 예를 들어 logstash에서 index이름을 주고 output 하면 그 Index를 만들고 알아서 추가, 수정, 삭제 등의 일을 수행한다.
sharding

elasticsearch에서는 알아서 scale out을 하며 인덱스를 분산해서 저장한다.

기본적으로 하나의 샤딩이 있고 검색 성능 향상을 위해 샤드 갯수를 조정하기도 한다.
replica

또 다른 형태의 샤딩으로 노드가 문제가 생겼을 경우 데이터의 신뢰성을 위해 샤드들을 복제하는 것이다.

따라서 서로 다른 노드에 존재하는 것이 좋다.

따라서 엘라스틱서치는 샤딩을 통해 규모가 수평적으로 늘어날 수 있고(scale out), replica를 통해 데이터가 손실되지 않음이 보장된다.

HTTP Restful API를 통해 CRUD가 수행되며 json 문서를 통해 검색을 수행해 스키마가 따로 존재하지 않는다.

inverted index를 통해 빠른 성능을 보장한다.(책 맨 뒤 찾아보기) 문서 기반으로 JSON 형식으로 데이터 전달

filebeat

경량 로그 수집기로 로그와 파일을 경량화된 방식으로 전달하고 중앙 집중화해준다.

filebeat는 로그 수집기이다. logstash의 부하를 막아준다.

로그 파일을 지정하고 데이터가 생길 때마다 데이터를 수확한다. 로그를 수확하면 로그 데이터를 읽어 libbeat에 보낸다. libbeat은 이벤트를 집계하고 filebeat에 걸맞는 출력으로 데이터를 보낸다.

elasticsearch api

참고

인덱스 생성

   PUT library
   {
     "settings": {
       "number_of_shards": 1,
       "number_of_replicas": 0
     }
   }

Bulk index(다량의 도큐먼트 인덱스할 때)

POST library/books/_bulk
{"index":{"_id":1}}
{"title":"The quick brow fox","price":5,"colors":["red","green","blue"]}
{"index":{"_id":2}}
{"title":"The quick brow fox jumps over the lazy dog","price":15,"colors":["blue","yellow"]}
{"index":{"_id":3}}
{"title":"The quick brow fox jumps over the quick dog","price":8,"colors":["red","blue"]}
{"index":{"_id":4}}
{"title":"brow fox brown dog","price":2,"colors":["black","yellow","red","blue"]}
{"index":{"_id":5}}
{"title":"Lazy dog","price":9,"colors":["red","blue","green"]}

검색

전체 검색

GET library/_search

일부 검색(fox 포함된 것 검색)

GET library/_search
{
  "query": {
    "match": {	// "match_phrase": 하면 해당 구절 포함하는 것 검색
      "title": "fox" //띄어쓰기 하면 두개 둘 중 하나 포함하는 것 검색
    }
  }
}

복합 쿼리 - bool 쿼리를 통한 서브쿼리

must: match, match phrase 둘 다 포함된 모든 문서 검색
must_not: 둘 중 하나 포함하지 않는
Boost: 가중치 조절
should: 반드시는 아니지만 매칭되는 경우 더 높은 스코어 부여

GET /library/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "title": "quick"
          }
        },
        {
          "match_phrase": {
            "title": {
              "query": "lazy dog",
              "boost": 3
            }
          }
        }
      ]
    }
  }
}

highlight

검색어와 매칭된 부분을 하이라이트로 표시

GET /library/_search
{
  "query" : {
  	//...
  },
  "highlight" : {
    "fields" : {
      "title": { }
    }
  }
}
   

filter

검색 결과의 부분집합 도출, 캐싱되어 쿼리보다 빠름

must + filter, filter만

GET /library/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "title": "dog"
          }
        }
      ],
      "filter": {
        "range": {
          "price": {
            "gte": 5,
            "lte": 10
          }
        }
      }
    }
  }
}

Analyze

Tokenizer를 통해 문장을 검색어로 쪼갬
filter를 통해 쪼개진 것들을 가공
unique로 중복된 것 제거

GET library/_analyze
{
  "tokenizer": "standard",
  "filter": [
    "lowercase",
    "unique"
  ],
  "text": "Brown fox brown dog"
}

analyzer 사용

GET library/_analyze
{
  "analyzer": "standard",
  "text": "Brown fox brown dog"
}

집계

terms + aggs 사용
여러 aggs 사용

GET library/_search
{
  "size": 0, 
  "aggs": {
    "price-statistics": {
      "stats": {
        "field": "price"
      }
    },
    "popular-colors": {
      "terms": {
        "field": "colors.keyword"
      },
      "aggs": {
        "avg-price-per-color": {
          "avg": {
            "field": "price"
          }
        }
      }
    }
  }
}

설치

import elasticsearch PGP Key

wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -

install apt repository

sudo apt-get install apt-transport-https

save repository definition

echo "deb https://artifacts.elastic.co/packages/7.x/apt stable main" | sudo tee /etc/apt/sources.list.d/elastic-7.x.list

install elasticsearch

sudo apt-get update && sudo apt-get install elasticsearch

install kibana

sudo apt-get update && sudo apt-get install kibana

install logstash

sudo apt-get update && sudo apt-get install logstash

환경설정

chmod 755 ~~로 귀찮은 sudo 제거

cd /etc/elasticsearch/elasticsearch.yaml -> host, node name, network 설정

# ======================== Elasticsearch Configuration =========================
#
# NOTE: Elasticsearch comes with reasonable defaults for most settings.
#       Before you set out to tweak and tune the configuration, make sure you
#       understand what are you trying to accomplish and the consequences.
#
# The primary way of configuring a node is via this file. This template lists
# the most important settings you may want to configure for a production cluster.
#
# Please consult the documentation for further information on configuration options:
# https://www.elastic.co/guide/en/elasticsearch/reference/index.html
#
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
#
#cluster.name: my-application
#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#--- 노드 이름 설정 ---
node.name: node-1
#
# Add custom attributes to the node:
#
#node.attr.rack: r1
#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#### --- 기본 설정 ---
path.data: /var/lib/elasticsearch
#
# Path to log files:
#### --- 기본 설정 ---
path.logs: /var/log/elasticsearch
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
#bootstrap.memory_lock: true
#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#
# By default Elasticsearch is only accessible on localhost. Set a different
# address here to expose this node on the network:
#### --- elasticsearch에 접근 허용할 네트워크 등록 ---
network.host: 0.0.0.0
#
# By default Elasticsearch listens for HTTP traffic on the first free port it
# finds starting at 9200. Set a specific HTTP port here:
#### --- elasticsearch가 동작할 포트 번호 등록 ---
http.port: 9200
#
# For more information, consult the network module documentation.
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when this node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#### --- 동작할 호스트 설정 --- 
discovery.seed_hosts: ["127.0.0.1"]
#
# Bootstrap the cluster using an initial set of master-eligible nodes:
#### --- 마스터 노드 설정(노드 개수 확인) ---
cluster.initial_master_nodes: ["node-1"]
#
# For more information, consult the discovery and cluster formation module documentation.
#
# ---------------------------------- Various -----------------------------------
#
# Require explicit names when deleting indices:
#
#action.destructive_requires_name: true

cd /etc/kibana/kibana.yaml -> server 설정

# Kibana is served by a back end server. This setting specifies the port to use.
server.port: 8000

# The default is 'localhost', which usually means remote machines will not be able to connect.
# To allow connections from remote users, set this parameter to a non-loopback address.
server.host: "0.0.0.0"

# Enables you to specify a path to mount Kibana at if you are running behind a proxy.
# Use the `server.rewriteBasePath` setting to tell Kibana if it should remove the basePath
# from requests it receives, and to prevent a deprecation warning at startup.
# This setting cannot end in a slash.
#server.basePath: ""

# Specifies whether Kibana should rewrite requests that are prefixed with
# `server.basePath` or require that they are rewritten by your reverse proxy.
# This setting was effectively always `false` before Kibana 6.3 and will
# default to `true` starting in Kibana 7.0.
#server.rewriteBasePath: false

# Specifies the public URL at which Kibana is available for end users. If
# `server.basePath` is configured this URL should end with the same basePath.
#server.publicBaseUrl: ""

# The maximum payload size in bytes for incoming server requests.
#server.maxPayload: 1048576

# The Kibana server's name.  This is used for display purposes.
#server.name: "your-hostname"

# The URLs of the Elasticsearch instances to use for all your queries.
elasticsearch.hosts: ["http://localhost:9200"]

cd /etc/logstash/conf.d 에 logstash.conf 파일 추가

input {
  tcp {
    port => 5000
    codec => json_lines
  }
}

output {
  elasticsearch {
    hosts => ["http://localhost:9200"]
    index => "logstash_%{+YYYY.MM.dd}"
    #user => "elastic"
    #password => "changeme"
  }
}

vi /etc/logstash/pipelines.yml 에 파이프라인 추가

- pipeline.id: main
  path.config: "/etc/logstash/conf.d/logstash.conf"

vi /etc/logstash/logstash.yml 에 파이프라인 설정 추가

#
# Set the pipeline event ordering. Options are "auto" (the default), "true" or "false".
# "auto" will  automatically enable ordering if the 'pipeline.workers' setting
# is also set to '1'.
# "true" will enforce ordering on the pipeline and prevent logstash from starting
# if there are multiple workers.
# "false" will disable any extra processing necessary for preserving ordering.
#
# pipeline.ordered: auto
#
# ------------ Pipeline Configuration Settings --------------
#
# Where to fetch the pipeline configuration for the main pipeline
#
# path.config:
#
# Pipeline configuration string for the main pipeline
#
# config.string:
#
# At startup, test if the configuration is valid and exit (dry run)
#
# config.test_and_exit: false
#
# Periodically check if the configuration has changed and reload the pipeline
# This can also be triggered manually through the SIGHUP signal
### --- 설정 자동 재로드 추가 ---
config.reload.automatic: true
#
# How often to check if the pipeline configuration has changed (in seconds)
# Note that the unit value (s) is required. Values without a qualifier (e.g. 60)
# are treated as nanoseconds.
# Setting the interval this way is not recommended and might change in later versions.
### --- 설정 재로드 인터벌 지정 ---
config.reload.interval: 3s
#
# Show fully compiled configuration as debug log message
# NOTE: --log.level must be 'debug'
#
# config.debug: false
#
# When enabled, process escaped characters such as \n and \" in strings in the
# pipeline configuration files.
#
# config.support_escapes: false

vi /etc/filebeat/filebeat.yml에서 가져올 데이터 설정

filebeat.inputs:
- type: log
  paths:
    - /경로1/*.log
  fields:
    index_name: "필드인덱스이름1"
- type: log
  paths:
    - /경로2/*.log
  fields:
    index_name: "필드인덱스이름2"

filebeat.config.modules:
  path: ${path.config}/modules.d/*.yml
  reload.enabled: false

setup.template.settings:
  index.number_of_shards: 3

output.logstash:
  hosts: ["localhost:port"](기본은 5044)

Elasticsearch, kibana, logstash, filebeat 실행

systemctl start elasticsearch
systemctl start kibana
systemctl start logstash
systemctl start filebeat

Elasticsearch, kibana, logstash 상태 확인

systemctl status elasticsearch
systemctl status kibana
systemctl status logstash

Elasticsearch, kibana, logstash 정지

systemctl stop elasticsearch
systemctl stop kibana
systemctl stop logstash

코드에서의 설정

Build.gradle 라이브러리 추가

// Logstash
implementation 'net.logstash.logback:logstash-logback-encoder:6.6'

logstash-appender 추가를 통한 elk 서버에 로그 전송하는 코드 구현

<included>
    <appender name="STASH" class="net.logstash.logback.appender.LogstashTcpSocketAppender">
        <destination>192.168.1.240:5000</destination>

        <encoder class="net.logstash.logback.encoder.LogstashEncoder"/>
    </appender>
</included>

local로 동작 시 logstash-appender 사용하도록 추가

<springProfile name="local">
  <include resource="appenders/console-appender.xml"/>
  <include resource="appenders/logstash-appender.xml"/>

  <root level="INFO">
  <appender-ref ref="CONSOLE_APPENDER"/>
  <appender-ref ref="STASH"/>
  </root>
</springProfile>

보안 설정

참고 elasticsearch.yml

xpack.security.enabled: true
xpack.security.transport.ssl.enabled: true

kibana.yml

elasticsearch.username: "elastic"
elasticsearch.password: "zzimkkong1!"
xpack.security.encryptionKey: "something_at_least_32_characters"    //필수아님
xpack.security.sessionTimeout: 600000   //필수아님

Share on

Twitter Facebook LinkedIn

CHO YEONWOO

[우아한테크코스] 9월 28일 TIL

[infra] elastic search

Logstash

Kibana

elasticsearch

filebeat

elasticsearch api

설치

환경설정

코드에서의 설정

보안 설정

Share on

You may also enjoy

Effective Java - item15

Effective Java - item14

Effective Java - item13

Effective Java - item10-12