ABOUT ME

-

Today
-
Yesterday
-
Total
-
  • Synology Docker 기반으로 Hadoop 설치하기
    etc. 2020. 12. 20. 16:12

    대용량 분산처리 시스템 Hadoop 공부를 위해서 설치를 진행해 봤습니다.

    환경 : Synology Docker - Ubuntu container
    apt-get update
    apt-get upgrade
    
    apt-get install software-properties-common
    
    add-apt-repository ppa:openjdk-r/ppa 
    
    apt-get update
    
    apt-get install openjdk-8-jdk
    java -version

     

    기본 환경 및 java 설치 후 hadoop 유저 설정 및 hadoop 홈 폴더를 만들고 파일을 내려받아 설치합니다.

    sudo adduser hadoop
    passwd hadoop -d
    
    su hadoop
    
    cd ~ 
    mkdir hadoop 
    cd hadoop
    
    wget https://downloads.apache.org/hadoop/common/hadoop-3.3.0/hadoop-3.3.0.tar.gz
    tar xvzf hadoop-3.3.0.tar.gz 
    
    vi ~/.bashrc 
    export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
    export HADOOP_HOME=~/hadoop/hadoop-3.3.0
    export HADOOP_CONFIG_HOME=$HADOOP_HOME/etc/hadoop
    export PATH=$PATH:$HADOOP_HOME/bin
    export PATH=$PATH:$HADOOP_HOME/sbin 
    source ~/.bashrc 
    
    cd $HADOOP_HOME/ 
    
    mkdir tmp 
    mkdir namenode 
    mkdir datanode 

     

    namenode, datanode 폴더를 만들고 hadoop 환경 설정을 시작합니다.

    cd $HADOOP_CONFIG_HOME/ 
    
    vi hadoop-env.sh
    #추가
    export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
    export HDFS_NAMENODE_USER="hadoop"
    export HDFS_DATANODE_USER="hadoop"
    export HDFS_SECONDARYNAMENODE_USER="hadoop"
    export YARN_RESOURCEMANAGER_USER="hadoop"
    export YARN_NODEMANAGER_USER="hadoop"
    export HADOOP_SSH_OPTS="-p 22"
    
    vi core-site.xml
    #추가
    <!-- Put site-specific property overrides in this file. -->
    
    <configuration>
            <property>
                     <name>hadoop.tmp.dir</name>
                    <value>/home/hadoop/hadoop/hadoop-3.3.0/tmp</value>
            </property>
    
            <property>
                    <name>fs.default.name</name>
                    <value>hdfs://{host}:9000</value>
                    <final>true</final>
            </property>
    </configuration>
    vi mapred-site.xml
    #추가
    <!-- Put site-specific property overrides in this file. -->
    
    <configuration>
        <property>
            <name>mapreduce.framework.name</name>
            <value>yarn</value>
        </property>
    
        <property>
            <name>mapred.job.tracker</name>
            <value>{host}:9001</value>
        </property>
        
        <property>
            <name>dfs.http.address</name>
            <value>{host}:9870</value>
        </property>
    </configuration>
    vi hdfs-site.xml
    #추가
    <configuration>
        <property>
            <name>dfs.replication</name>
            <value>3</value>
            <final>true</final>
        </property>
    
        <property>
            <name>dfs.namenode.name.dir</name>
            <value>/home/hadoop/hadoop/hadoop-3.3.0/namenode</value>
            <final>true</final>
        </property>
    
        <property>
            <name>dfs.datanode.data.dir</name>
            <value>/home/hadoop/hadoop/hadoop-3.3.0/datanode</value>
            <final>true</final>
        </property>
    </configuration>
    vi yarn-site.xml
    #추가
    <configuration>
    
    <!-- Site specific YARN configuration properties -->
      <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
      </property>
      <property>
        <name>yarn.resourcemanager.address</name>
        <value>{host}:8032</value>
      </property>
      <property>
        <name>yarn.resourcemanager.scheduler.address</name>
        <value>{host}:8030</value>
      </property>
      <property>
        <name>yarn.resourcemanager.resource-tracker.address</name>
        <value>{host}:8031</value>
      </property>
      <property>
        <name>yarn.resourcemanager.admin.address</name>
        <value>{host}:8033</value>
      </property>
      <property>
        <name>yarn.resourcemanager.webapp.adress</name>
        <value>{host}:8088</value>
      </property>
    </configuration>

     

     

    컨테이너를 종료하고 Docker 이미지를 내보냅니다.

    sudo docker commit hadoop hadoop-img

     

    생성한 이미지로 sub 컨테이너를 만들고 main 컨테이너를 링크합니다. 

     

    vi /etc/hosts
    
    #추가
    {컨테이너 IP}      slave1
    {컨테이너 IP}      slave2
    
    
    vi $HADOOP_CONFIG_HOME/slaves 
    # datanode containers  
    slave1
    slave2
    hadoop
    
    
    vi $HADOOP_CONFIG_HOME/workers
    localhost
    slave1
    slave2
    hadoop

     

    start-all.sh로 하둡을 실행할 때 pdsh@hadoop: hadoop: connect: Connection refused 에러가 발생할 경우

    아래와 같이 수정해 주세요

    vi $HADOOP_HOME/libexec/hadoop-functions.sh
    
    #수정
    if [[ -e '/usr/bin/pdsh' ]]; then
    
    ->
    if [[ ! -e '/usr/bin/pdsh' ]]; then

     

    hdfs namenode -format 

     

    다음으로 SSH 접속을 설정합니다.

    <root 계정>
    apt-get install -y openssh-server openssh-client net-tools
    
    <hadoop 계정>
    ssh-keygen -t rsa -b 4096 -C "{email Address}" -f ~/.ssh/id_rsa
    
    cd ~/.ssh
    cat id_rsa.pub >> authorized_keys
    
    <root 계정>
    mkdir ~/var/run/sshd
    sed -i 's/#Port 22/Port 22/g' /etc/ssh/sshd_config
    
    vi /etc/ssh/sshd_config
    #수정
    PermitRootLogin yes #prohibit-password
    PasswordAuthentication yes
    UseLogin yes
    
    sudo passwd root
    
    <hadoop 계정>
    #test
    ssh localhost
    
    <root 계정>
    service ssh start
    (sub node에서도 실행)
    vi ~/.bashrc
    
    #autorun  
    /usr/sbin/sshd
    
    source ~/.bashrc

     

    start-all.sh 

     

     

    {host 주소}:8088 로 접속하면 아래와 같은 Resource Manager 페이지가 나타납니다.

     

    hadoop 관리자 페이지는 버전 3점대부터 기본 port가 50070 -> 9870으로 바뀌었습니다.

     

    테스트를 위해 WordCount 예제를 실행해 보았습니다.

    cd $HADOOP_HOME
    
    hadoop fs -mkdir -p /test
    hadoop fs -put LICENSE.txt /test
    hadoop fs -ls /test
    
    hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.0.jar wordcount /test /test-out
    hadoop fs -cat /output/*
Designed by Tistory.