4장 실습 프로메테우스와 그라파나 구성하기

목차
1. 프로메테우스 스택 설치
1.1. 프로메테우스 스택 설치
1.2. 프로메테우스 설치 확인
2. 프로메테우스 기본 사용
2.1. 모니터링 대상 확인
2.2. 프로메테우스 ingress 도메인 접속 및 확인
3. 그라파나 대시보드 사용
3.1. 그라파나 정보 확인
3.2. 그라파나 기본 대시보드 확인
3.3. 그라파나 대시보드 사용
4. 애플리케이션 모니터링 설정 및 대시보드 추가
4.1. NGINX 웹 서버 배포
4.2. 프로메테우스 서비스 모니터 확인
4.3. NGINX 모니터링 대시보드 추가
5. 실습 환경 삭제

1. 프로메테우스 스택 설치

이번 실습은 4장 Amazon EKS 원클릭 배포 환경에서 진행합니다.
인프라 배포를 진행하지 않은 경우 링크를 통해 배포 후 복귀 바랍니다.
그리고 새롭게 인프라를 배포하면 아래 기본 설정 명령을 입력 후 진행 바랍니다.

기본 설정 명령어

Default 네임 스페이스 변경

1
kubectl ns default

워커 노드의 IP 변수 선언

1
2
3
4
5
6
7
8
9
10
11
N1=$(kubectl get node --label-columns=topology.kubernetes.io/zone --selector=topology.kubernetes.io/zone=ap-northeast-2a -o jsonpath={.items[0].status.addresses[0].address})

N2=$(kubectl get node --label-columns=topology.kubernetes.io/zone --selector=topology.kubernetes.io/zone=ap-northeast-2b -o jsonpath={.items[0].status.addresses[0].address})

N3=$(kubectl get node --label-columns=topology.kubernetes.io/zone --selector=topology.kubernetes.io/zone=ap-northeast-2c -o jsonpath={.items[0].status.addresses[0].address})

echo "export N1=$N1" >> /etc/profile

echo "export N2=$N2" >> /etc/profile

echo "export N3=$N3" >> /etc/profile

노드에 SSH 접근

1
for node in $N1 $N2 $N3; do ssh ec2-user@$node hostname; done

AWS Load Balancer Controller 설치

1
2
3
4
5
6
7
helm repo add eks https://aws.github.io/eks-charts

helm repo update

helm install aws-load-balancer-controller eks/aws-load-balancer-controller -n kube-system --set clusterName=$CLUSTER_NAME \
  --set serviceAccount.create=false \
  --set serviceAccount.name=aws-load-balancer-controller

ExternalDNS 설치

1
2
3
4
5
6
7
8
9
10
// 자신의 도메인 주소로 설정
MyDomain=<자신의 도메인>

MyDnsHostedZoneId=$(aws route53 list-hosted-zones-by-name --dns-name "${MyDomain}." --query "HostedZones[0].Id" --output text)

echo $MyDomain, $MyDnsHostedZoneId

curl -s -O https://raw.githubusercontent.com/cloudneta/cnaeblab/master/_data/externaldns.yaml

MyDomain=$MyDomain MyDnsHostedZoneId=$MyDnsHostedZoneId envsubst < externaldns.yaml | kubectl apply -f -

kube-ops-view 설치

1
2
3
4
5
6
7
8
9
helm repo add geek-cookbook https://geek-cookbook.github.io/charts/

helm install kube-ops-view geek-cookbook/kube-ops-view --version 1.2.2 --set env.TZ="Asia/Seoul" --namespace kube-system

kubectl patch svc -n kube-system kube-ops-view -p '{"spec":{"type":"LoadBalancer"}}'

kubectl annotate service kube-ops-view -n kube-system "external-dns.alpha.kubernetes.io/hostname=kubeopsview.$MyDomain"

echo -e "Kube Ops View URL = http://kubeopsview.$MyDomain:8080/#scale=1.5"

StorageClass 생성

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
cat <<EOT > gp3-sc.yaml
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: gp3
allowVolumeExpansion: true
provisioner: ebs.csi.aws.com
volumeBindingMode: WaitForFirstConsumer
parameters:
  type: gp3
  allowAutoIOPSPerGBIncrease: 'true'
  encrypted: 'true'
EOT

kubectl apply -f gp3-sc.yaml

ACM 인증서 변수 선언

1
CERT_ARN=`aws acm list-certificates --query 'CertificateSummaryList[].CertificateArn[]' --output text`; echo $CERT_ARN

1.1. 프로메테우스 스택 설치

신규터미널 - 네임 스페이스 추가 및 모니터링

1
2
3
4
5
6
7
// monitoring 네임 스페이스 추가
kubectl create ns monitoring

kubectl get ns

// 신규 터미널 - pod, svc, ingress, pv, pvc 모니터링
watch kubectl get pod,svc,ingress,pv,pvc -n monitoring

helm repo 추가

1
2
// helm chart repository 추가
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts

프로메테우스 스택 설치

1
2
3
4
5
6
7
8
9
10
11
12
13
// 파라미터 파일 다운로드 및 확인
curl -s -O https://raw.githubusercontent.com/cloudneta/cnaeblab/master/_data/monitor-values.yaml

cat monitor-values.yaml | yh

// 환경 변수 선언
export MyDomain=$MyDomain CERT_ARN=$CERT_ARN

// 프로메테우스 스택 배포
envsubst < monitor-values.yaml | helm install kube-prometheus-stack prometheus-community/kube-prometheus-stack --version 45.27.2 \
  --set prometheus.prometheusSpec.scrapeInterval='15s' \
  --set prometheus.prometheusSpec.evaluationInterval='15s' \
  -f - --namespace monitoring

1.2. 프로메테우스 설치 확인

helm list와 설치된 모든 자원 확인

1
2
3
4
5
// monitoring 네임 스페이스에 helm list 확인
helm list -n monitoring

// monitoring 네임 스페이스에 모든 자원 확인
kubectl get-all -n monitoring

sts, ds, deloy 확인

1
2
3
4
5
6
7
8
// monitoring 네임 스페이스에 sts, ds, deploy 확인
kubectl get sts,ds,deploy -n monitoring

// statefulset 상세 정보
kubectl describe sts -n monitoring

// daemonset 상세 정보
kubectl describe ds -n monitoring

crd, servicemonitors, targetgroupbindings 확인

1
2
3
4
5
6
// monitoring 네임 스페이스에 sts, ds, deploy 확인
kubectl get crd | grep monitoring

kubectl get servicemonitors -n monitoring

kubectl get targetgroupbindings -n monitoring

2. 프로메테우스 기본 사용

2.1. 모니터링 대상 확인

node-exporter 확인

1
2
3
4
5
// node-exporter의 포트 정보
kubectl describe ds -n monitoring | grep Port

// node-exporter의 서비스와 엔드포인트 확인
kubectl get svc,ep -n monitoring kube-prometheus-stack-prometheus-node-exporter

모니터링 대상별 /metrics 확인

1
2
3
4
5
6
7
8
9
10
11
12
13
// node-export - 노드에서 localhost로 9100 포트에 /metrics 접속
ssh ec2-user@$N1 curl -s localhost:9100/metrics | tail

// servicemonitors 확인
kubectl get servicemonitors -n monitoring

// 네임 스페이스별 엔드포인트 확인
echo ">>>monitoring NS<<<"; kubectl get ep -n monitoring; \
echo ">>>kube-system NS<<<"; kubectl get ep -n kube-system; \
echo ">>>default NS<<<"; kubectl get ep

// kube-proxy - 노드에서 localhost로 10249 포트에 /metrics 접속
ssh ec2-user@$N1 curl -s localhost:10249/metrics | tail

2.2. 프로메테우스 ingress 도메인 접속 및 확인

프로메테우스 ingress 정보 확인

1
2
3
4
5
6
7
// 프로메테우스 ingress 정보 확인
kubectl get ingress -n monitoring kube-prometheus-stack-prometheus

kubectl describe ingress -n monitoring kube-prometheus-stack-prometheus

// 프로메테우스 ingress 도메인으로 웹 접속
echo -e "Prometheus Web URL = https://prometheus.$MyDomain"

프로메테우스 상단 메뉴

경고(Alert) : 사전에 정의한 시스템 경고 정책(Prometheus Rules)에 대한 상황
그래프(Graph) : 프로메테우스 자체 검색 언어 PromQL을 이용하여 메트릭 정보를 조회 -> 단순한 그래프 형태 조회
상태(Status) : 경고 메시지 정책(Rules), 모니터링 대상(Targets) 등 다양한 프로메테우스 설정 내역을 확인
도움말(Help)

프로메테우스 설정 확인

1
2
3
4
5
6
7
8
9
10
11
12
13
14
// 프로메테우스 설정 파일 확인
kubectl exec -it -n monitoring sts/prometheus-kube-prometheus-stack-prometheus \
  -- cat /etc/prometheus/config_out/prometheus.env.yaml

// 프로메테우스 설정에서 job_name 확인
kubectl exec -it -n monitoring sts/prometheus-kube-prometheus-stack-prometheus \
  -- cat /etc/prometheus/config_out/prometheus.env.yaml | grep job_name:
  
// 프로메테우스 설정에서 node-export 확인
kubectl exec -it -n monitoring sts/prometheus-kube-prometheus-stack-prometheus \
  -- cat /etc/prometheus/config_out/prometheus.env.yaml | grep node-exporter/ -A 9

// 엔드포인트 확인
kubectl get ep -n monitoring;  kubectl get ep -n kube-system

노드 그룹의 보안 그룹 수정

1
2
3
4
5
// 보안 그룹 ID 지정
NGSGID=$(aws ec2 describe-security-groups --filters Name=group-name,Values='*ng1*' --query "SecurityGroups[*].[GroupId]" --output text)

// 작업용 인스턴스의 모든 트래픽 허용
aws ec2 authorize-security-group-ingress --group-id $NGSGID --protocol '-1' --cidr 192.168.1.100/32

PromQL 예시

1
2
3
4
5
6
7
8
// 전체 클러스터 노드의 CPU 사용량 평균
1- avg(rate(node_cpu_seconds_total{mode="idle"}[1m]))

// api-server 상주 메모리 평균
avg(process_resident_memory_bytes{job="apiserver"})

// node-export 스크랩 기간
scrape_duration_seconds{job="node-exporter"}

3. 그라파나 대시보드 사용

3.1. 그라파나 정보 확인

그라파나 정보 확인

1
2
3
4
5
6
7
8
// 그라파나 버전 확인
kubectl exec -it -n monitoring deploy/kube-prometheus-stack-grafana -- grafana-cli --version

// 그라파나 ingress 확인
kubectl get ingress -n monitoring kube-prometheus-stack-grafana

// ingress 도메인으로 웹 접속 : 기본 계정 - admin / prom-operator
echo -e "Grafana Web URL = https://grafana.$MyDomain"

그라파나 메뉴 정보

Starred : 즐겨 찾기 대시보드
Dashboards : 대시보드 목록 확인
Explore : PromQL로 메트릭 정보를 그래프 형태로 표현
Alerting : 경고 정책을 구성해 사용자에게 경고를 전달
Connections : 연결할 데이터 소스 설정
Administration : 사용자, 조직, 플러그인 등을 설정

3.2. 그라파나 기본 대시보드 확인

그라파나 메뉴에서 Connections 확인

Connections 메뉴에서 Your connections에 진입하면 데이터 소스에 프로메테우스 자동 생성
대상 데이터 소스를 클릭하면 상세 정보 출력

프로메테우스 접근 정보 확인

1
2
// 프로메테우스의 서비스와 엔드포인트 주소 확인
kubectl get svc,ep -n monitoring kube-prometheus-stack-prometheus

테스트용 파드를 배포하고 확인

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
// 테스트용 파드 배포
cat <<EOF | kubectl create -f -
apiVersion: v1
kind: Pod
metadata:
  name: netshoot-pod
spec:
  containers:
  - name: netshoot-pod
    image: nicolaka/netshoot
    command: ["tail"]
    args: ["-f", "/dev/null"]
  terminationGracePeriodSeconds: 0
EOF

// 생성된 파드 확인
kubectl get pod netshoot-pod

테스트용 파드에서 프로메테우스 주소 확인

1
2
3
4
5
// 프로메테우스 접속 주소로 nslookup 확인
kubectl exec -it netshoot-pod -- nslookup kube-prometheus-stack-prometheus.monitoring

// 프로메테우스 접속 주소로 그래프 경로 접근
kubectl exec -it netshoot-pod -- curl -s kube-prometheus-stack-prometheus.monitoring:9090/graph -v ; echo

테스트용 파드 삭제

1
kubectl delete pod netshoot-pod

기본 대시보드 사용

Dashboards 메뉴에 진입해서 general 폴더 접근
Kubernetes / Networking / Cluster 선택
Kubernetes / Persistent Volumes 선택
Node Exporter / Nodes 선택

3.3. 그라파나 대시보드 사용

대시보드 유형

기본 대시보드 : 프로메테우스 스택을 통해 기본적으로 설치된 대시보드
공식 대시보드 : 링크

대시보드 추가 방법

Dashboard » New » Import » ID 입력 » Load » Prometheus » Import

추천 공식 대시보드

[Kubernetes / Views / Global] : 15757
[1 Kubernetes All-in-one Cluster Monitoring KR] : 13770 or 17900
[Node Exporter Full] : 1860
[Node Exporter for Prometheus Dashboard based on 11074] : 15172
[kube-state-metrics-v2] : 13332

4. 애플리케이션 모니터링 설정 및 대시보드 추가

4.1. NGINX 웹 서버 배포

helm repository 추가

1
helm repo add bitnami https://charts.bitnami.com/bitnami

NGINX 파라미터 파일 생성

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
// nginx 파라미터 파일 생성
cat <<EOT > nginx-values.yaml
service:
    type: NodePort

ingress:
  enabled: true
  ingressClassName: alb
  hostname: nginx.$MyDomain
  path: /*
  annotations: 
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/target-type: ip
    alb.ingress.kubernetes.io/listen-ports: '[{"HTTPS":443}, {"HTTP":80}]'
    alb.ingress.kubernetes.io/certificate-arn: $CERT_ARN
    alb.ingress.kubernetes.io/success-codes: 200-399
    alb.ingress.kubernetes.io/load-balancer-name: $CLUSTER_NAME-ingress-alb
    alb.ingress.kubernetes.io/group.name: study
    alb.ingress.kubernetes.io/ssl-redirect: '443'

metrics:
  enabled: true

  service:
    port: 9113

  serviceMonitor:
    enabled: true
    namespace: monitoring
    interval: 10s
EOT

cat nginx-values.yaml | yh

NGINX 배포 및 확인

1
2
3
4
5
// nginx 배포
helm install nginx bitnami/nginx --version 14.1.0 -f nginx-values.yaml

// 파드, 서비스, 엔드포인트 확인
kubectl get pod,svc,ep

4.2. 프로메테우스 서비스 모니터 확인

서비스 모니터 확인

1
2
3
4
5
6
7
8
// 서비스 모니터 확인
kubectl get servicemonitor -n monitoring

// nginx 서비스 모니터 확인
kubectl get servicemonitor nginx -n monitoring -o json | jq

// 파드, 서비스, 엔드포인트 확인
kubectl get pod,svc,ep

NGINX 메트릭 확인

1
2
3
4
5
6
7
// NGINX IP 변수 선언
NGINXIP=$(kubectl get pod -l app.kubernetes.io/instance=nginx -o jsonpath={.items[0].status.podIP}); echo $NGINXIP

// NGINX IP로 메트릭 확인
curl -s http://$NGINXIP:9113/metrics

curl -s http://$NGINXIP:9113/metrics | grep ^nginx_connections_active

NGINX 접속

1
2
3
4
5
6
7
8
// NGINX 1회 접속
curl -s https://nginx.$MyDomain

// NGINX 1초 간격 접속
while true; do curl -s https://nginx.$MyDomain -I | head -n 1; date; sleep 1; done

// NGINX 메트릭 확인 - nginx_connections_active
curl -s http://$NGINXIP:9113/metrics | grep ^nginx_connections_active

4.3. NGINX 모니터링 대시보드 추가

NGINX 모니터링 대시보드 추가

NGINX 모니터링 대시보드 : 12708

NGINX 부하 발생

1
2
3
4
yum install -y httpd

// 100회씩 총 1000회 접속 수행
ab -c 100 -n 1000 https://nginx.$MyDomain/

5. 실습 환경 삭제

4장 전체 실습이 종료되어 Amazon EKS 원클릭 배포를 삭제해 모든 실습 환경을 삭제합니다.

실습 종료 후 자원 삭제

1
2
3
4
5
6
7
8
9
10
11
12
// helm chart 삭제
helm uninstall -n kube-system kube-ops-view

helm uninstall nginx

helm uninstall -n monitoring kube-prometheus-stack

// 프로메테우스 PVC 삭제 (프로메테우스 스택에 의해 나머지 삭제된 후)
kubectl delete pvc prometheus-kube-prometheus-stack-prometheus-db-prometheus-kube-prometheus-stack-prometheus-0 -n monitoring

// Amazon EKS 원클릭 배포 삭제
eksctl delete cluster --name $CLUSTER_NAME && aws cloudformation delete-stack --stack-name $CLUSTER_NAME

Warning: Amazon EKS 원클릭 배포의 삭제는 약 15분 정도 소요됩니다. 삭제가 완료될 때 까지 SSH 연결 세션을 유지합니다.

Warning: 만약에 CloudFormation 스택이 삭제되지 않는다면 수동으로 VPC(myeks-VPC)를 삭제 후 CloudFormation 스택을 다시 삭제해 주세요.

여기까지 4장의 모든 실습을 마칩니다.
수고하셨습니다 :)