Hadoop笔记

hadoop 的安装

我主要参考 国人写的这篇 [Hadoop安装教程_单机/伪分布式配置_Hadoop2.6.0/Ubuntu14.04] (http://www.powerxing.com/install-hadoop/)

hadoop fs的使用

基本同 linux shell下的文件操作一样 ls mkdir rm cat 不同点 多了一个get put;

put: 把文件放进hadoop fs

get: 把文件从hadoop fs 拿出来.


hadoop fs -ls
hadoop fs -mkdir hello
hadoop fs -put /etc/passwd hello
hadoop -ls
hadoop fs -ls
hadoop fs -ls -al
hadoop fs -ls -R
hadoop fs -cat hello/passwd
hadoop fs -get hello/passwd

实验mapreducer

实验使用 Hadoop Streaming

Writing an Hadoop MapReduce Program in Python

The “trick” behind the following Python code is that we will use the Hadoop Streaming API (see also the corresponding wiki entry) for helping us passing data between our Map and Reduce code via STDIN (standard input) and STDOUT (standard output).


hadoop jar ./share/hadoop/tools/lib/hadoop-streaming-2.7.3.jar -file /home/hadoop/mapper.py    -mapper /home/hadoop/mapper.py -file /home/hadoop/reducer.py   -reducer /home/hadoop/reducer.py -input hello/* -output hello/gutenberg-output

参考

Hadoop的Python框架指南

Loading Disqus comments...
Table of Contents