Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CONNECTOR] Add script to start hdfs fuse #1480

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

chaohengstudent
Copy link
Collaborator

@chaohengstudent chaohengstudent commented Feb 5, 2023

related to #1438

啟動方法

./docker/start_hdfs_fuse.sh hdfs://192.168.103.44:12345

進入 container 操作 hdfs


03/12 update
hdfs jira: https://issues.apache.org/jira/browse/HDFS-16930
目前狀況:
測試時腳本中會需要用到

hadoop-common
commons-collections
hadoop-shaded-guava
hadoop-hdfs-client
woodstox-core
commons-compress
slf4j-api
commons-logging
commons-lang3
stax2-api

以上 jar檔
還在理解需要如何使用到這些依賴

RUN echo \"user_allow_other\" >> /etc/fuse.conf

WORKDIR /opt/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/fuse-dfs
RUN sed -i -e '18aexport CLASSPATH=\\\${HADOOP_HOME}/etc/hadoop:\`find \\\${HADOOP_HOME}/share/hadoop/ | awk '\"'\"'{path=path\":\"\\\$0}END{print path}'\"'\"'\`' \\
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

這段蠻複雜的,可否說明一下?

&& ./b2 --without-python \\
&& ./b2 --without-python install

ENV JAVA_HOME /usr/lib/jvm/java-11-openjdk-amd64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ubuntu 安裝 JDK 時應該已經暴露了 JAVA_HOME,請問還有需要再寫一次嗎

@chia7712
Copy link
Contributor

chia7712 commented Feb 8, 2023

@chaohengstudent 另外其實要建構 Hadoop native library,可以重複使用 Hadoop 已經建構好的腳本 (https://github.com/apache/hadoop/blob/trunk/start-build-env.sh)

如此就可以大幅簡化我們自己的腳本,只要做到幾件事情:

  1. 下載Hadoop source code並且切到特定版本
  2. 使用 Hadoop 腳本建構 native library
  3. 建構 docker image 將步驟 2.的結果包裝起來

你覺得呢?

@chaohengstudent
Copy link
Collaborator Author

使用 Hadoop 腳本建構 native library

想請問是指透過腳本在使用者本地端建構 native library 的意思嗎?

@chia7712
Copy link
Contributor

chia7712 commented Feb 9, 2023

想請問是指透過腳本在使用者本地端建構 native library 的意思嗎?

Hadoop 官方腳本也是用容器來建構 native library,它的容器會先安裝好所有必備的套件,然後將 host 上的hadoop掛載進容器然後建構,所以最後產出的 native library 會保留在 host 上面

我們可以重複利用該腳本,這樣我們可以比較方便獲得 native library,同時也不需花太多心力維護(交給 hadoop 社群)。不過我們的腳本要確保 host 身上有 git, hadoop source code, maven repo 等東西的存在,並且在建構好 native library後把該些東西再包進我們自己的 docker image裡面

Comment on lines +105 to +120
function generateFuseDfsWrapper() {
cat > "$FUSE_DFS_WRAPPER_SH" << 'EOF'
#!/usr/bin/env bash

export FUSEDFS_PATH="$HADOOP_HOME/hadoop-hdfs-project/hadoop-hdfs-native-client/target/main/native/fuse-dfs"
export LIBHDFS_PATH="$HADOOP_HOME/hadoop-hdfs-project/hadoop-hdfs-native-client/target/native/target/usr/local/lib"
export PATH=$FUSEDFS_PATH:$PATH
export LD_LIBRARY_PATH=$LIBHDFS_PATH:$JAVA_HOME/lib/server
while IFS= read -r -d '' file
do
export CLASSPATH=$CLASSPATH:$file
done < <(find "$HADOOP_HOME/hadoop-tools" -name "*.jar" -print0)

fuse_dfs "$@"
EOF
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fuse_dfs_wrapper.sh may not work out of box. 
To use it, look at all the paths in fuse_dfs_wrapper.sh and either correct them 
or set them in your environment before running.

原始版本:https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/fuse-dfs/fuse_dfs_wrapper.sh

這是目前修改的方式

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

請問一下你觀察到的錯誤是什麼?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

編譯3.3.4版路徑

  1. LIBHDFS_PATH
#origin
export LIBHDFS_PATH="$HADOOP_HOME/hadoop-hdfs-project/hadoop-hdfs-native-client/target/usr/local/lib"
#edit
export LIBHDFS_PATH="$HADOOP_HOME/hadoop-hdfs-project/hadoop-hdfs-native-client/target/native/target/usr/local/lib"
  1. LD_LIBRARY_PATH
#origin
export LD_LIBRARY_PATH=$LIBHDFS_PATH:$JAVA_HOME/jre/lib/$OS_ARCH/server
#edit (JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64)
export LD_LIBRARY_PATH=$LIBHDFS_PATH:$JAVA_HOME/lib/server
  1. CLASSPATH
    原本是加入
    1.$HADOOP_HOME/hadoop-client (此版本中路徑為$HADOOP_HOME/hadoop-client-modules)2.$HADOOP_HOME/hadoop-hdfs-project 的jar
    實際操作是需要加入 $HADOOP_HOME/hadoop-tools 的 jar

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

這邊的修正有一些太多,可能比較適合貢獻回 Hadoop 社群

你能試著去 Hadoop JiRa 開這個議題並且提供 patch 嗎?

等社群修正後,這隻PR就可以改成抓 trunk branch 下來建置

cloneSrcIfNeed
cd $HADOOP_SRC_PATH
git checkout rel/release-${VERSION}
replaceLine 17 USER=\$\(whoami\) start-build-env.sh
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

這裡是想避免 ./start-build-env.sh 的 USER 參數被 docker_build_common.sh 的 USER 影響,導致建構失敗。

@chia7712
Copy link
Contributor

chia7712 commented Mar 9, 2023

@chaohengstudent 麻煩把開到 Hadoop jira 的連接寫到描述,並請說明一下狀況

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants