일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | 2 | |||||
3 | 4 | 5 | 6 | 7 | 8 | 9 |
10 | 11 | 12 | 13 | 14 | 15 | 16 |
17 | 18 | 19 | 20 | 21 | 22 | 23 |
24 | 25 | 26 | 27 | 28 | 29 | 30 |
Tags
- terraform
- grafana-loki
- Linux
- OpenStack
- ceph-ansible
- libvirt
- kolla
- Kubernetes
- awx
- Docker
- kolla-ansible
- KVM
- pacman
- Ansible
- i3
- Arch
- cloud-init
- ubuntu
- Kubeflow
- k8s
- HTML
- repository
- yum
- archlinux
- golang
- Octavia
- cephadm
- nfs-provisioner
- port open
- ceph
Archives
- Today
- Total
YJWANG
kubeflow installation (On-premise / kfctl_istio_dex) 본문
refer to
사전 작업
- Dynamic volume Provisioner 구성
- kube-apiserver Option 추가
https://yjwang.tistory.com/entry/kubeflow-minimal-installation
KubeFlow 설치
kubeflow 설치 시 아래와 같은 두 가지 방식이 있다.
- kfctl_k8s_istio.yaml : istio + kubeflow 설치로 별도 인증이 제공되지 않고 접속 시 바로 이용 가능
- kfctl_istio_dex.yaml : dev + istio + kubeflow 설치로 사용자 인증으로 인한 Multi-User 환경 구축 가능
본 포스팅에선 이전에 이어 kfctl_istio_dex.yaml 로 진행할 예정이다.
Client 설치
Client 다운로드
[root@master01 kubeflow]# wget https://github.com/kubeflow/kfctl/releases/download/v1.2.0/kfctl_v1.2.0-0-gbc038f9_linux.tar.gz
--2021-01-18 02:43:23-- https://github.com/kubeflow/kfctl/releases/download/v1.2.0/kfctl_v1.2.0-0-gbc038f9_linux.tar.gz
Resolving github.com (github.com)... 52.78.231.108
...
[root@master01 kubeflow]# tar -xvf kfctl_v1.2.0-0-gbc038f9_linux.tar.gz
./kfctl
환경 변수 설정 및 적용
[root@master01 kubeflow]# cat kubeflow.env
# Add kfctl to PATH, to make the kfctl binary easier to use.
# Use only alphanumeric characters or - in the directory name.
export PATH=$PATH:"/root/kubeflow"
# Set the following kfctl configuration file:
export CONFIG_URI="https://raw.githubusercontent.com/kubeflow/manifests/v1.2-branch/kfdef/kfctl_istio_dex.v1.2.0.yaml"
# Set KF_NAME to the name of your Kubeflow deployment. You also use this
# value as directory name when creating your configuration directory.
# For example, your deployment name can be 'my-kubeflow' or 'kf-test'.
export KF_NAME=kf-test
# Set the path to the base directory where you want to store one or more
# Kubeflow deployments. For example, /opt.
# Then set the Kubeflow application directory for this deployment.
export BASE_DIR=/root/kubeflow
export KF_DIR=${BASE_DIR}/${KF_NAME}
[root@master01 kubeflow]# source kubeflow.env
[root@master01 kubeflow]# env |grep -i kf
KF_DIR=/root/kubeflow/kf-test
KF_NAME=kf-test
CONFIG_URI=https://raw.githubusercontent.com/kubeflow/manifests/v1.2-branch/kfdef/kfctl_istio_dex.v1.2.0.yaml
KubeFlow 배포
KFDIR 생성
[root@master01 kubeflow]# mkdir -p ${KF_DIR}
[root@master01 kubeflow]# cd ${KF_DIR}
설정 파일 다운로드 및 환경변수 지정
[root@master01 kf-test]# wget -O kfctl_istio_dex.yaml $CONFIG_URI
--2021-01-19 16:13:49-- https://raw.githubusercontent.com/kubeflow/manifests/v1.2-branch/kfdef/kfctl_istio_dex.v1.2.0.yaml
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.0.133, 151.101.64.133, 151.101.128.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.0.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2369 (2.3K) [text/plain]
Saving to: ‘kfctl_istio_dex.yaml’
kfctl_istio_dex.yaml 100%[================================================================>] 2.31K --.-KB/s in 0s
2021-01-19 16:13:49 (40.3 MB/s) - ‘kfctl_istio_dex.yaml’ saved [2369/2369]
[root@master01 kf-test]# export CONFIG_FILE=${KF_DIR}/kfctl_istio_dex.yaml
배포 진행
[root@master01 kf-test]# kfctl apply -V -f ${CONFIG_FILE}
(생략 ..)
certificate.cert-manager.io/serving-cert created
issuer.cert-manager.io/selfsigned-issuer created
INFO[0125] Successfully applied application kfserving filename="kustomize/kustomize.go:291"
INFO[0125] Applied the configuration Successfully! filename="cmd/apply.go:75"
배포 완료
kfctl appy 끝난 후에 pod가 하나씩 생성되므로 시간이 조금 소요됩니다. 약 15분 ~ 20분
15분 정도까지 지속적으로 Pod가 Error발생 했다가 정상화됐다가 반복하므로 기다리시다가 15분 이후에 지속적이면 Troubleshooting 해보시기 바랍니다.
[root@master01 kf-test]# kubectl get pod -A
배포 확인
pvc 정상 매핑 확인
[root@master01 kf-test]# kubectl get pvc -A
NAMESPACE NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
istio-system authservice-pvc Bound pvc-4b66e699-74d2-4fba-ad73-307f9035b5a6 10Gi RWO nfsprov 43s
kubeflow katib-mysql Bound pvc-3eba3e02-3192-47fa-b533-137cd6b9d334 10Gi RWO nfsprov 39s
kubeflow metadata-mysql Bound pvc-93d65b55-bb65-4e2e-9773-1c07913ec02e 10Gi RWO nfsprov 39s
kubeflow minio-pvc Bound pvc-adea71c9-8aae-46e9-8de1-ef5cd2ca1b44 20Gi RWO nfsprov 39s
kubeflow mysql-pv-claim Bound pvc-91f2bf88-ed2e-4880-b74e-b361d5fafaf6 20Gi RWO nfsprov 3
Pod 확인
[root@master01 kf-test]# kubectl get pod -n kubeflow
NAME READY STATUS RESTARTS AGE
admission-webhook-bootstrap-stateful-set-0 1/1 Running 2 18m
admission-webhook-deployment-5cd7dc96f5-lvflf 1/1 Running 0 28s
application-controller-stateful-set-0 1/1 Running 0 20m
argo-ui-65df8c7c84-9xmn5 1/1 Running 0 18m
cache-deployer-deployment-5f4979f45-4c75b 2/2 Running 3 18m
cache-server-7859fd67f5-58j79 2/2 Running 1 18m
centraldashboard-67767584dc-v9v7z 1/1 Running 0 18m
jupyter-web-app-deployment-8486d5ffff-d7zgv 1/1 Running 0 18m
katib-controller-7fcc95676b-klcnf 1/1 Running 1 18m
katib-db-manager-85db457c64-5wsf7 1/1 Running 1 18m
katib-mysql-6c7f7fb869-9qtrf 1/1 Running 0 18m
katib-ui-65dc4cf6f5-8px4r 1/1 Running 0 18m
kfserving-controller-manager-0 2/2 Running 0 18m
kubeflow-pipelines-profile-controller-797fb44db9-skn92 1/1 Running 0 18m
metacontroller-0 1/1 Running 0 18m
metadata-db-6dd978c5b-kktjb 1/1 Running 0 18m
metadata-envoy-deployment-67bd5954c-c7zxx 1/1 Running 0 18m
metadata-grpc-deployment-577c67c96f-8w45l 1/1 Running 4 18m
metadata-writer-756dbdd478-2s5rm 2/2 Running 1 18m
minio-54d995c97b-q8nfl 1/1 Running 0 18m
ml-pipeline-7c56db5db9-rl4jf 2/2 Running 0 18m
ml-pipeline-persistenceagent-d984c9585-6br5l 2/2 Running 0 18m
ml-pipeline-scheduledworkflow-5ccf4c9fcc-vf7qq 2/2 Running 0 18m
ml-pipeline-ui-7ddcd74489-72qmt 2/2 Running 0 18m
ml-pipeline-viewer-crd-56c68f6c85-7fwdj 2/2 Running 1 18m
ml-pipeline-visualizationserver-5b9bd8f6bf-bj9h6 2/2 Running 0 18m
mpi-operator-d5bfb8489-7tmrr 1/1 Running 5 18m
mxnet-operator-7576d697d6-82p84 1/1 Running 2 18m
mysql-74f8f99bc8-8wm4w 2/2 Running 0 18m
notebook-controller-deployment-5bb6bdbd6d-vnqcc 1/1 Running 0 18m
profiles-deployment-56bc5d7dcb-hpw2s 2/2 Running 0 18m
pytorch-operator-847c8d55d8-xc6db 1/1 Running 5 18m
seldon-controller-manager-6bf8b45656-8g8l6 1/1 Running 2 18m
spark-operatorsparkoperator-fdfbfd99-wmnnz 1/1 Running 0 18m
tf-job-operator-58477797f8-mn49w 1/1 Running 4 18m
workflow-controller-64fd7cffc5-c5ssn 1/1 Running 0 18m
Troubleshooting
- PVC pending 현상 : provisioner가 정상적으로 구성됐는지 우선 확인한다.
- istio-token 을 찾을 수 없음 , mysql 및 다수의 pod가 실행되지 않음 : /etc/kubernetes/manifests/kube-apiserver.yaml 수정 진행했는지 확인
- metadata-writer pod가 정상적으로 실행되지 않음 : CPU가 avc를 지원하지 않으면 Tensoflow 관련 pod 문제 발생함
https://www.kubeflow.org/docs/other-guides/troubleshooting/ - failed for volume "webhook-tls-certs" : secret "webhook-server-tls" not found
kube-apiserver에 Command 추가
[root@master01 kf-test]# cat /etc/kubernetes/manifests/kube-apiserver.yaml
(생략)
spec:
containers:
- command:
(생략)
- --feature-gates=TokenRequest=true
(생략)
UI 접속
Default User / Password
user : admin@kubeflow.org
password : 12341234
svc port 확인
[root@master01 kf-test]# kubectl get svc -n istio-system istio-ingressgateway
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
istio-ingressgateway NodePort 10.233.19.51 <none> 15020:32701/TCP,80:31380/TCP,443:31390/TCP,31400:31400/TCP,15029:30498/TCP,15030:30087/TCP,15031:32566/TCP,15032:31453/TCP,15443:30333/TCP 24m
접속
http://<server-ip or domain>:31380
Admin 암호 변경 (DEX)
https://artifacthub.io/packages/helm/gabibbo97/dex/1.0.3
https://www.kubeflow.org/docs/started/k8s/kfctl-istio-dex/#add-static-users-for-basic-auth
config 파일 추출 및 hash값 추출
[root@master01 kubeflow]# kubectl get configmap dex -n auth -o jsonpath='{.data.config\.yaml}' > dex-config.yaml
[root@master01 kubeflow]# yum -y install httpd-tools
[root@master01 kubeflow]# htpasswd -bnBC 10 "usr" testtest | cut -d ':' -f 2 | sed 's/2y/2a/'
$2a$10$ZlsN1af0YJR2EdKRFLBo7OvEUEnNlyU2abqjZzL/Q8qgZApNtRh8a
hash 값 변경
[root@master01 kubeflow]# cat dex-config.yaml
issuer: http://dex.auth.svc.cluster.local:5556/dex
storage:
type: kubernetes
config:
inCluster: true
web:
http: 0.0.0.0:5556
logger:
level: "debug"
format: text
oauth2:
skipApprovalScreen: true
enablePasswordDB: true
staticPasswords:
- email: admin@kubeflow.org
#hash: $2y$12$ruoM7FqXrpVgaol44eRZW.4HWS8SAvg6KYVVSCIwKQPBmTpCm.EeO
hash: $2a$10$ZlsN1af0YJR2EdKRFLBo7OvEUEnNlyU2abqjZzL/Q8qgZApNtRh8a
username: admin
userID: 08a8684b-db88-4b73-90a9-3cd1661f5466
staticClients:
- id: kubeflow-oidc-authservice
redirectURIs: ["/login/oidc"]
name: 'Dex Login Application'
secret: pUBnBOY80SnXgjibTYM9ZWNzY2xreNGQok
변경 사항 적용 및 확인
[root@master01 kubeflow]# kubectl create configmap dex --from-file=config.yaml=dex-config.yaml -n auth --dry-run -oyaml | kubectl apply -f -
W0119 17:21:02.902797 29658 helpers.go:553] --dry-run is deprecated and can be replaced with --dry-run=client.
configmap/dex configured
[root@master01 kubeflow]# kubectl rollout restart deployment dex -n auth
deployment.apps/dex restarted
-
[root@master01 kubeflow]# kubectl get deployments.apps -n auth
NAME READY UP-TO-DATE AVAILABLE AGE
dex 1/1 1 1 63m
변경된 암호로 접속
반응형