kubernetes.mdx 8.8 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295
  1. ---
  2. sidebar_position: 4
  3. ---
  4. import Tabs from '@theme/Tabs';
  5. import TabItem from '@theme/TabItem';
  6. # Set Up with Kubernetes
  7. This section provides a quick guide to using SeaTunnel with Kubernetes.
  8. ## Prerequisites
  9. We assume that you have a local installations of the following:
  10. - [docker](https://docs.docker.com/)
  11. - [kubernetes](https://kubernetes.io/)
  12. - [helm](https://helm.sh/docs/intro/quickstart/)
  13. So that the `kubectl` and `helm` commands are available on your local system.
  14. For kubernetes [minikube](https://minikube.sigs.k8s.io/docs/start/) is our choice, at the time of writing this we are using version v1.23.3. You can start a cluster with the following command:
  15. ```bash
  16. minikube start --kubernetes-version=v1.23.3
  17. ```
  18. ## Installation
  19. ### SeaTunnel docker image
  20. To run the image with SeaTunnel, first create a `Dockerfile`:
  21. <Tabs
  22. groupId="engine-type"
  23. defaultValue="flink"
  24. values={[
  25. {label: 'Flink', value: 'flink'},
  26. ]}>
  27. <TabItem value="flink">
  28. ```Dockerfile
  29. FROM flink:1.13
  30. ENV SEATUNNEL_VERSION="2.3.0"
  31. ENV SEATUNNEL_HOME = "/opt/seatunnel"
  32. RUN mkdir -p $SEATUNNEL_HOME
  33. RUN wget https://archive.apache.org/dist/seatunnel/${SEATUNNEL_VERSION}/apache-seatunnel-incubating-${SEATUNNEL_VERSION}-bin.tar.gz
  34. RUN tar -xzvf apache-seatunnel-incubating-${SEATUNNEL_VERSION}-bin.tar.gz
  35. RUN cp -r apache-seatunnel-incubating-${SEATUNNEL_VERSION}/* $SEATUNNEL_HOME/
  36. RUN rm -rf apache-seatunnel-incubating-${SEATUNNEL_VERSION}*
  37. ```
  38. Then run the following commands to build the image:
  39. ```bash
  40. docker build -t seatunnel:2.3.0-flink-1.13 -f Dockerfile .
  41. ```
  42. Image `seatunnel:2.3.0-flink-1.13` need to be present in the host (minikube) so that the deployment can take place.
  43. Load image to minikube via:
  44. ```bash
  45. minikube image load seatunnel:2.3.0-flink-1.13
  46. ```
  47. </TabItem>
  48. </Tabs>
  49. ### Deploying the operator
  50. <Tabs
  51. groupId="engine-type"
  52. defaultValue="flink"
  53. values={[
  54. {label: 'Flink', value: 'flink'},
  55. ]}>
  56. <TabItem value="flink">
  57. The steps below provide a quick walk-through on setting up the Flink Kubernetes Operator.
  58. You can refer to [Flink Kubernetes Operator - Quick Start](https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/docs/try-flink-kubernetes-operator/quick-start/) for more details.
  59. > Notice: All the Kubernetes resources bellow are created in default namespace.
  60. Install the certificate manager on your Kubernetes cluster to enable adding the webhook component (only needed once per Kubernetes cluster):
  61. ```bash
  62. kubectl create -f https://github.com/jetstack/cert-manager/releases/download/v1.8.2/cert-manager.yaml
  63. ```
  64. Now you can deploy the latest stable Flink Kubernetes Operator version using the included Helm chart:
  65. ```bash
  66. helm repo add flink-operator-repo https://downloads.apache.org/flink/flink-kubernetes-operator-1.3.1/
  67. helm install flink-kubernetes-operator flink-operator-repo/flink-kubernetes-operator \
  68. --set image.repository=apache/flink-kubernetes-operator
  69. ```
  70. You may verify your installation via `kubectl`:
  71. ```bash
  72. kubectl get pods
  73. NAME READY STATUS RESTARTS AGE
  74. flink-kubernetes-operator-5f466b8549-mgchb 1/1 Running 3 (23h ago) 16d
  75. ```
  76. </TabItem>
  77. </Tabs>
  78. ## Run SeaTunnel Application
  79. **Run Application:**: SeaTunnel already providers out-of-the-box [configurations](https://github.com/apache/seatunnel/tree/dev/config).
  80. <Tabs
  81. groupId="engine-type"
  82. defaultValue="flink"
  83. values={[
  84. {label: 'Flink', value: 'flink'},
  85. ]}>
  86. <TabItem value="flink">
  87. In this guide we are going to use [seatunnel.streaming.conf](https://github.com/apache/seatunnel/blob/2.3.0-release/config/v2.streaming.conf.template):
  88. ```conf
  89. env {
  90. execution.parallelism = 1
  91. job.mode = "STREAMING"
  92. checkpoint.interval = 2000
  93. }
  94. source {
  95. FakeSource {
  96. result_table_name = "fake"
  97. row.num = 160000
  98. schema = {
  99. fields {
  100. name = "string"
  101. age = "int"
  102. }
  103. }
  104. }
  105. }
  106. transform {
  107. FieldMapper {
  108. source_table_name = "fake"
  109. result_table_name = "fake1"
  110. field_mapper = {
  111. age = age
  112. name = new_name
  113. }
  114. }
  115. }
  116. sink {
  117. Console {
  118. source_table_name = "fake1"
  119. }
  120. }
  121. ```
  122. Generate a configmap named seatunnel-config in Kubernetes for the seatunnel.streaming.conf so that we can mount the config content in pod.
  123. ```bash
  124. kubectl create cm seatunnel-config \
  125. --from-file=seatunnel.streaming.conf=seatunnel.streaming.conf
  126. ```
  127. Once the Flink Kubernetes Operator is running as seen in the previous steps you are ready to submit a Flink (SeaTunnel) job:
  128. - Create `seatunnel-flink.yaml` FlinkDeployment manifest:
  129. ```yaml
  130. apiVersion: flink.apache.org/v1beta1
  131. kind: FlinkDeployment
  132. metadata:
  133. name: seatunnel-flink-streaming-example
  134. spec:
  135. image: seatunnel:2.3.0-flink-1.13
  136. flinkVersion: v1_13
  137. flinkConfiguration:
  138. taskmanager.numberOfTaskSlots: "2"
  139. serviceAccount: flink
  140. jobManager:
  141. replicas: 1
  142. resource:
  143. memory: "1024m"
  144. cpu: 1
  145. taskManager:
  146. resource:
  147. memory: "1024m"
  148. cpu: 1
  149. podTemplate:
  150. spec:
  151. containers:
  152. - name: flink-main-container
  153. volumeMounts:
  154. - name: seatunnel-config
  155. mountPath: /data/seatunnel.streaming.conf
  156. subPath: seatunnel.streaming.conf
  157. volumes:
  158. - name: seatunnel-config
  159. configMap:
  160. name: seatunnel-config
  161. items:
  162. - key: seatunnel.streaming.conf
  163. path: seatunnel.streaming.conf
  164. job:
  165. jarURI: local:///opt/seatunnel/starter/seatunnel-flink-starter.jar
  166. entryClass: org.apache.seatunnel.core.starter.flink.SeaTunnelFlink
  167. args: ["--config", "/data/seatunnel.streaming.conf"]
  168. parallelism: 2
  169. upgradeMode: stateless
  170. ```
  171. - Run the example application:
  172. ```bash
  173. kubectl apply -f seatunnel-flink.yaml
  174. ```
  175. </TabItem>
  176. </Tabs>
  177. **See The Output**
  178. <Tabs
  179. groupId="engine-type"
  180. defaultValue="flink"
  181. values={[
  182. {label: 'Flink', value: 'flink'},
  183. ]}>
  184. <TabItem value="flink">
  185. You may follow the logs of your job, after a successful startup (which can take on the order of a minute in a fresh environment, seconds afterwards) you can:
  186. ```bash
  187. kubectl logs -f deploy/seatunnel-flink-streaming-example
  188. ```
  189. looks like the below:
  190. ```shell
  191. ...
  192. 2023-01-31 12:13:54,349 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Source: SeaTunnel FakeSource -> Sink Writer: Console (1/1) (1665d2d011b2f6cf6525c0e5e75ec251) switched from SCHEDULED to DEPLOYING.
  193. 2023-01-31 12:13:56,684 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Deploying Source: SeaTunnel FakeSource -> Sink Writer: Console (1/1) (attempt #0) with attempt id 1665d2d011b2f6cf6525c0e5e75ec251 to seatunnel-flink-streaming-example-taskmanager-1-1 @ 100.103.244.106 (dataPort=39137) with allocation id fbe162650c4126649afcdaff00e46875
  194. 2023-01-31 12:13:57,794 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Source: SeaTunnel FakeSource -> Sink Writer: Console (1/1) (1665d2d011b2f6cf6525c0e5e75ec251) switched from DEPLOYING to INITIALIZING.
  195. 2023-01-31 12:13:58,203 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Source: SeaTunnel FakeSource -> Sink Writer: Console (1/1) (1665d2d011b2f6cf6525c0e5e75ec251) switched from INITIALIZING to RUNNING.
  196. ```
  197. If OOM error accur in the log, you can decrease the `row.num` value in seatunnel.streaming.conf
  198. To expose the Flink Dashboard you may add a port-forward rule:
  199. ```bash
  200. kubectl port-forward svc/seatunnel-flink-streaming-example-rest 8081
  201. ```
  202. Now the Flink Dashboard is accessible at [localhost:8081](http://localhost:8081).
  203. Or launch `minikube dashboard` for a web-based Kubernetes user interface.
  204. The content printed in the TaskManager Stdout log:
  205. ```bash
  206. kubectl logs \
  207. -l 'app in (seatunnel-flink-streaming-example), component in (taskmanager)' \
  208. --tail=-1 \
  209. -f
  210. ```
  211. looks like the below (your content may be different since we use `FakeSource` to automatically generate random stream data):
  212. ```shell
  213. ...
  214. subtaskIndex=0: row=159991 : VVgpp, 978840000
  215. subtaskIndex=0: row=159992 : JxrOC, 1493825495
  216. subtaskIndex=0: row=159993 : YmCZR, 654146216
  217. subtaskIndex=0: row=159994 : LdmUn, 643140261
  218. subtaskIndex=0: row=159995 : tURkE, 837012821
  219. subtaskIndex=0: row=159996 : uPDfd, 2021489045
  220. subtaskIndex=0: row=159997 : mjrdG, 2074957853
  221. subtaskIndex=0: row=159998 : xbeUi, 864518418
  222. subtaskIndex=0: row=159999 : sSWLb, 1924451911
  223. subtaskIndex=0: row=160000 : AuPlM, 1255017876
  224. ```
  225. To stop your job and delete your FlinkDeployment you can simply:
  226. ```bash
  227. kubectl delete -f seatunnel-flink.yaml
  228. ```
  229. </TabItem>
  230. </Tabs>
  231. Happy SeaTunneling!
  232. ## What's More
  233. For now, you are already taking a quick look at SeaTunnel, you could see [connector](/category/connector) to find all source and sink SeaTunnel supported.
  234. Or see [deployment](../deployment.mdx) if you want to submit your application in another kind of your engine cluster.