YJWANG

kubeflow installation (On-premise / kfctl_istio_dex) 본문

60.Cloud/80.Kubernetes

kubeflow installation (On-premise / kfctl_istio_dex)

왕영주 2021. 1. 19. 16:42

refer to


사전 작업


KubeFlow 설치


kubeflow 설치 시 아래와 같은 두 가지 방식이 있다.

  • kfctl_k8s_istio.yaml : istio + kubeflow 설치로 별도 인증이 제공되지 않고 접속 시 바로 이용 가능
  • kfctl_istio_dex.yaml : dev + istio + kubeflow 설치로 사용자 인증으로 인한 Multi-User 환경 구축 가능

본 포스팅에선 이전에 이어 kfctl_istio_dex.yaml 로 진행할 예정이다.

Client 설치

Client 다운로드

[root@master01 kubeflow]# wget https://github.com/kubeflow/kfctl/releases/download/v1.2.0/kfctl_v1.2.0-0-gbc038f9_linux.tar.gz
--2021-01-18 02:43:23--  https://github.com/kubeflow/kfctl/releases/download/v1.2.0/kfctl_v1.2.0-0-gbc038f9_linux.tar.gz
Resolving github.com (github.com)... 52.78.231.108
...

[root@master01 kubeflow]# tar -xvf kfctl_v1.2.0-0-gbc038f9_linux.tar.gz 
./kfctl

환경 변수 설정 및 적용

[root@master01 kubeflow]# cat kubeflow.env 
# Add kfctl to PATH, to make the kfctl binary easier to use.
# Use only alphanumeric characters or - in the directory name.
export PATH=$PATH:"/root/kubeflow"

# Set the following kfctl configuration file:
export CONFIG_URI="https://raw.githubusercontent.com/kubeflow/manifests/v1.2-branch/kfdef/kfctl_istio_dex.v1.2.0.yaml"

# Set KF_NAME to the name of your Kubeflow deployment. You also use this
# value as directory name when creating your configuration directory.
# For example, your deployment name can be 'my-kubeflow' or 'kf-test'.
export KF_NAME=kf-test

# Set the path to the base directory where you want to store one or more 
# Kubeflow deployments. For example, /opt.
# Then set the Kubeflow application directory for this deployment.
export BASE_DIR=/root/kubeflow
export KF_DIR=${BASE_DIR}/${KF_NAME}

[root@master01 kubeflow]# source kubeflow.env 

[root@master01 kubeflow]# env |grep -i kf
KF_DIR=/root/kubeflow/kf-test
KF_NAME=kf-test
CONFIG_URI=https://raw.githubusercontent.com/kubeflow/manifests/v1.2-branch/kfdef/kfctl_istio_dex.v1.2.0.yaml

KubeFlow 배포

KFDIR 생성

[root@master01 kubeflow]# mkdir -p ${KF_DIR}
[root@master01 kubeflow]# cd ${KF_DIR}

설정 파일 다운로드 및 환경변수 지정

[root@master01 kf-test]# wget -O kfctl_istio_dex.yaml $CONFIG_URI
--2021-01-19 16:13:49--  https://raw.githubusercontent.com/kubeflow/manifests/v1.2-branch/kfdef/kfctl_istio_dex.v1.2.0.yaml
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.0.133, 151.101.64.133, 151.101.128.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.0.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2369 (2.3K) [text/plain]
Saving to: ‘kfctl_istio_dex.yaml’

kfctl_istio_dex.yaml               100%[================================================================>]   2.31K  --.-KB/s    in 0s      

2021-01-19 16:13:49 (40.3 MB/s) - ‘kfctl_istio_dex.yaml’ saved [2369/2369]

[root@master01 kf-test]# export CONFIG_FILE=${KF_DIR}/kfctl_istio_dex.yaml

배포 진행

[root@master01 kf-test]# kfctl apply -V -f ${CONFIG_FILE}
(생략 ..)

certificate.cert-manager.io/serving-cert created
issuer.cert-manager.io/selfsigned-issuer created
INFO[0125] Successfully applied application kfserving    filename="kustomize/kustomize.go:291"
INFO[0125] Applied the configuration Successfully!       filename="cmd/apply.go:75"

배포 완료

kfctl appy 끝난 후에 pod가 하나씩 생성되므로 시간이 조금 소요됩니다. 약 15분 ~ 20분
15분 정도까지 지속적으로 Pod가 Error발생 했다가 정상화됐다가 반복하므로 기다리시다가 15분 이후에 지속적이면 Troubleshooting 해보시기 바랍니다.

[root@master01 kf-test]# kubectl get pod -A

배포 확인

pvc 정상 매핑 확인

[root@master01 kf-test]# kubectl get pvc -A
NAMESPACE      NAME              STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
istio-system   authservice-pvc   Bound    pvc-4b66e699-74d2-4fba-ad73-307f9035b5a6   10Gi       RWO            nfsprov        43s
kubeflow       katib-mysql       Bound    pvc-3eba3e02-3192-47fa-b533-137cd6b9d334   10Gi       RWO            nfsprov        39s
kubeflow       metadata-mysql    Bound    pvc-93d65b55-bb65-4e2e-9773-1c07913ec02e   10Gi       RWO            nfsprov        39s
kubeflow       minio-pvc         Bound    pvc-adea71c9-8aae-46e9-8de1-ef5cd2ca1b44   20Gi       RWO            nfsprov        39s
kubeflow       mysql-pv-claim    Bound    pvc-91f2bf88-ed2e-4880-b74e-b361d5fafaf6   20Gi       RWO            nfsprov        3

Pod 확인

[root@master01 kf-test]# kubectl get pod -n kubeflow
NAME                                                     READY   STATUS    RESTARTS   AGE
admission-webhook-bootstrap-stateful-set-0               1/1     Running   2          18m
admission-webhook-deployment-5cd7dc96f5-lvflf            1/1     Running   0          28s
application-controller-stateful-set-0                    1/1     Running   0          20m
argo-ui-65df8c7c84-9xmn5                                 1/1     Running   0          18m
cache-deployer-deployment-5f4979f45-4c75b                2/2     Running   3          18m
cache-server-7859fd67f5-58j79                            2/2     Running   1          18m
centraldashboard-67767584dc-v9v7z                        1/1     Running   0          18m
jupyter-web-app-deployment-8486d5ffff-d7zgv              1/1     Running   0          18m
katib-controller-7fcc95676b-klcnf                        1/1     Running   1          18m
katib-db-manager-85db457c64-5wsf7                        1/1     Running   1          18m
katib-mysql-6c7f7fb869-9qtrf                             1/1     Running   0          18m
katib-ui-65dc4cf6f5-8px4r                                1/1     Running   0          18m
kfserving-controller-manager-0                           2/2     Running   0          18m
kubeflow-pipelines-profile-controller-797fb44db9-skn92   1/1     Running   0          18m
metacontroller-0                                         1/1     Running   0          18m
metadata-db-6dd978c5b-kktjb                              1/1     Running   0          18m
metadata-envoy-deployment-67bd5954c-c7zxx                1/1     Running   0          18m
metadata-grpc-deployment-577c67c96f-8w45l                1/1     Running   4          18m
metadata-writer-756dbdd478-2s5rm                         2/2     Running   1          18m
minio-54d995c97b-q8nfl                                   1/1     Running   0          18m
ml-pipeline-7c56db5db9-rl4jf                             2/2     Running   0          18m
ml-pipeline-persistenceagent-d984c9585-6br5l             2/2     Running   0          18m
ml-pipeline-scheduledworkflow-5ccf4c9fcc-vf7qq           2/2     Running   0          18m
ml-pipeline-ui-7ddcd74489-72qmt                          2/2     Running   0          18m
ml-pipeline-viewer-crd-56c68f6c85-7fwdj                  2/2     Running   1          18m
ml-pipeline-visualizationserver-5b9bd8f6bf-bj9h6         2/2     Running   0          18m
mpi-operator-d5bfb8489-7tmrr                             1/1     Running   5          18m
mxnet-operator-7576d697d6-82p84                          1/1     Running   2          18m
mysql-74f8f99bc8-8wm4w                                   2/2     Running   0          18m
notebook-controller-deployment-5bb6bdbd6d-vnqcc          1/1     Running   0          18m
profiles-deployment-56bc5d7dcb-hpw2s                     2/2     Running   0          18m
pytorch-operator-847c8d55d8-xc6db                        1/1     Running   5          18m
seldon-controller-manager-6bf8b45656-8g8l6               1/1     Running   2          18m
spark-operatorsparkoperator-fdfbfd99-wmnnz               1/1     Running   0          18m
tf-job-operator-58477797f8-mn49w                         1/1     Running   4          18m
workflow-controller-64fd7cffc5-c5ssn                     1/1     Running   0          18m

Troubleshooting

  • PVC pending 현상 : provisioner가 정상적으로 구성됐는지 우선 확인한다.
  • istio-token 을 찾을 수 없음 , mysql 및 다수의 pod가 실행되지 않음 : /etc/kubernetes/manifests/kube-apiserver.yaml 수정 진행했는지 확인
  • metadata-writer pod가 정상적으로 실행되지 않음 : CPU가 avc를 지원하지 않으면 Tensoflow 관련 pod 문제 발생함
    https://www.kubeflow.org/docs/other-guides/troubleshooting/
  • failed for volume "webhook-tls-certs" : secret "webhook-server-tls" not found
    kube-apiserver에 Command 추가
[root@master01 kf-test]# cat /etc/kubernetes/manifests/kube-apiserver.yaml 
(생략)
spec:
  containers:
  - command:
(생략)
- --feature-gates=TokenRequest=true
(생략)

UI 접속

Default User / Password

user : admin@kubeflow.org
password : 12341234

svc port 확인

[root@master01 kf-test]# kubectl get svc -n istio-system  istio-ingressgateway 
NAME                   TYPE       CLUSTER-IP     EXTERNAL-IP   PORT(S)                                                                                                                                      AGE
istio-ingressgateway   NodePort   10.233.19.51   <none>        15020:32701/TCP,80:31380/TCP,443:31390/TCP,31400:31400/TCP,15029:30498/TCP,15030:30087/TCP,15031:32566/TCP,15032:31453/TCP,15443:30333/TCP   24m

접속

http://<server-ip or domain>:31380

Admin 암호 변경 (DEX)

https://artifacthub.io/packages/helm/gabibbo97/dex/1.0.3
https://www.kubeflow.org/docs/started/k8s/kfctl-istio-dex/#add-static-users-for-basic-auth

config 파일 추출 및 hash값 추출

[root@master01 kubeflow]# kubectl get configmap dex -n auth -o jsonpath='{.data.config\.yaml}' > dex-config.yaml
[root@master01 kubeflow]# yum -y install httpd-tools
[root@master01 kubeflow]# htpasswd -bnBC 10 "usr" testtest | cut -d ':' -f 2 | sed 's/2y/2a/'
$2a$10$ZlsN1af0YJR2EdKRFLBo7OvEUEnNlyU2abqjZzL/Q8qgZApNtRh8a

hash 값 변경

[root@master01 kubeflow]# cat dex-config.yaml 
issuer: http://dex.auth.svc.cluster.local:5556/dex
storage:
  type: kubernetes
  config:
    inCluster: true
web:
  http: 0.0.0.0:5556
logger:
  level: "debug"
  format: text
oauth2:
  skipApprovalScreen: true
enablePasswordDB: true
staticPasswords:
- email: admin@kubeflow.org
  #hash: $2y$12$ruoM7FqXrpVgaol44eRZW.4HWS8SAvg6KYVVSCIwKQPBmTpCm.EeO
  hash: $2a$10$ZlsN1af0YJR2EdKRFLBo7OvEUEnNlyU2abqjZzL/Q8qgZApNtRh8a
  username: admin
  userID: 08a8684b-db88-4b73-90a9-3cd1661f5466
staticClients:
- id: kubeflow-oidc-authservice
  redirectURIs: ["/login/oidc"]
  name: 'Dex Login Application'
  secret: pUBnBOY80SnXgjibTYM9ZWNzY2xreNGQok

변경 사항 적용 및 확인

[root@master01 kubeflow]# kubectl create configmap dex --from-file=config.yaml=dex-config.yaml -n auth --dry-run -oyaml | kubectl apply -f -
W0119 17:21:02.902797   29658 helpers.go:553] --dry-run is deprecated and can be replaced with --dry-run=client.
configmap/dex configured

[root@master01 kubeflow]# kubectl rollout restart deployment dex -n auth
deployment.apps/dex restarted

-

[root@master01 kubeflow]# kubectl get deployments.apps -n auth
NAME   READY   UP-TO-DATE   AVAILABLE   AGE
dex    1/1     1            1           63m

변경된 암호로 접속

반응형