22

1、下載kevinweil-hadoop-lzo
2、rpm -ivh liblzo2_2-2.03-6.el4.x86_64.rpm
3、rpm -ivh libminilzo2-2.03-6.el4.x86_64.rpm
4、rpm -ivh lzo-2.03-6.el4.x86_64.rpm
5、rpm -ivh lzo-devel-2.03-6.el4.i386.rpm
6、cd kevinweil-hadoop-lzo
     ant compile-native tar
7、進入build目錄下,把hadoop-lzo-0.4.0.jar拷貝到hadoop的lib目錄,native目錄也拷貝到hadoop的lib目錄下
編輯core-stie.xml文件,增加
        <property>
                <name>io.compression.codecs</name>
                <value>com.hadoop.compression.lzo.LzopCodec</value>
        </property>
        <property>
                <name>io.compression.codec.lzo.class</name>
                <value>com.hadoop.compression.lzo.LzoCodec</value>
        </property>
使map的中間結果也是用lzo壓縮,編輯mapred-site.xml文件,增加
        <property>
                <name>mapred.compress.map.output</name>
                <value>true</value>
        </property>
        <property>
                <name>mapred.map.output.compression.codec</name>
                <value>com.hadoop.compression.lzo.LzoCodec</value>
        </property>

7、安裝lzop
     rpm -ivh –force lzo-1.08-4.2.el4.rf.x86_64.rpm
     rpm -ivh lzop-1.01-2.el4.rf.x86_64.rpm
8、Indexing LZO Files
使用lzop壓縮日誌文件后傳到hdfs上
index it in-process via:
hadoop jar /path/to/your/hadoop-lzo.jar com.hadoop.compression.lzo.LzoIndexer big_file.lzo

index it in a map-reduce job via:
hadoop jar /path/to/your/hadoop-lzo.jar com.hadoop.compression.lzo.DistributedLzoIndexer big_file.lzo

將TextInputFormat 修改為 LzoTextInputFormat

Tags: ,

作者:Jock

Switch to our mobile site