#安装环境
- Hadoop版本:1.1.2
- 虚拟机:virtualbox 4.3.8.0
- 服务器:Ubuntu service 13.10 x64
- java:openJDK 7
#安装步骤
##环境安装
- 安装系统
-
安装虚拟机
-
安装操作系统
-
切换系统软件源
切换软件源为oschina的软件源(使用原软件源,速度太慢)
sudo cp /etc/apt/sources.list /etc/apt/sources.list.backup sudo vi/etc/apt/sources.list
复制以下代码:
deb http://mirrors.oschina.net/ubuntu/ saucy main restricted universe multiverse deb http://mirrors.oschina.net/ubuntu/ saucy-backports main restricted universe multiverse deb http://mirrors.oschina.net/ubuntu/ saucy-proposed main restricted universe multiverse deb http://mirrors.oschina.net/ubuntu/ saucy-security main restricted universe multiverse deb http://mirrors.oschina.net/ubuntu/ saucy-updates main restricted universe multiverse deb-src http://mirrors.oschina.net/ubuntu/ saucy main restricted universe multiverse deb-src http://mirrors.oschina.net/ubuntu/ saucy-backports main restricted universe multiverse deb-src http://mirrors.oschina.net/ubuntu/ saucy-proposed main restricted universe multiverse deb-src http://mirrors.oschina.net/ubuntu/ saucy-security main restricted universe multiverse deb-src http://mirrors.oschina.net/ubuntu/ saucy-updates main restricted universe multiverse
保存退出
最后更新源索引
sudo apt-get update
-
安装jdk
sudo apt-get install openjdk-7-jdk
安装完成之后,可以使用java -version 来判断是否安装成功,安装目录为:/usr/lib/jvm/java-7-openjdk-amd64
-
安装ssh服务
安装ssh-service
sudo apt-get install ssh openssh-server
配置免密码登录localhost
ssh-keygen -t rsa -P "" cp ~/.ssh/id_rsa.pub ~/.ssh/authorized_keys
测试
ssh localhost
##创建用户
创建新的用户组和用户来操作Hadoop
-
创建hadoop用户组
sudo addgroup hadoop4group
-
创建Hadoop用户
sudo adduser -ingroup hadoop4group hadoop4user
-
给hadoop用户添加权限,打开/etc/sudoers文件
sudo vi /etc/sudoers
在root ALL=(ALL:ALL) ALL下添加hadoop4userALL=(ALL:ALL) ALL
hadoop4user ALL=(ALL:ALL) ALL
##hadoop安装
-
解压hadoop到user/local下
sudo cp hadoop-1.1.2.tar.gz /usr/local/cd /usr/localsudo tar -zxf hhadoop-1.1.2.tar.gzsudo mv hadoop-1.1.2 hadoop
-
设置该hadoop文件夹的权限
sudo chown -R hadoop4group:hadoop4user hadoop
-
配置conf/hadoop-env.sh 在hadoop-env.sh中加入jdk路径
sudo vi hadoop/conf/hadoop-env.sh//添加如下代码:export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
-
打开conf/core-site.xml文件
sudo vi hadoop/conf/core-site.xml
修改成如下: <!-- lang: shell --> <configuration>
<property> <name>fs.default.name</name> <value>hdfs://192.168.1.121:9000</value> </property> </configuration> -
打开conf/mapred-site.xml文件
sudo vi hadoop/conf/mapred-site.xml
修改成如下: <!-- lang: shell --> <configuration>
<property> <name>mapred.job.tracker</name> <value>192.168.1.121:9001</value> </property> </configuration> -
打开conf/hdfs-site.xml文件
sudo vi hadoop/conf/hdfs-site.xml
修改成如下:
dfs.replication 1 dfs.permissions false hadoop.tmp.dir /home/hadoop4user/hadoop dfs.data.dir /home/hadoop4user/hadoop/data dfs.name.dir /home/hadoop4user/hadoop/name 注意:安装上述代码创建相对应的目录,并设置目录权限为755
#运行Hadoop
-
进入Hadoop目录,格式化hdfs文件系统
cd /usr/local/hadoop/ bin/hadoop namenode -format
-
启动Hadoop
bin/start-all.sh
-
通过jps工具检查hadoop运行状况
$ jps 4590 TaskTracker 4368 JobTracker 4270 SecondaryNameNode 4642 Jps 4028 DataNode 3801 NameNode
-
也可以通过 netstat 命令来检查 hadoop 是否正常运行
$ sudo netstat -plten | grep java tcp 0 0 0.0.0.0:50070 0.0.0.0:* LISTEN 1001 9236 2471/java tcp 0 0 0.0.0.0:50010 0.0.0.0:* LISTEN 1001 9998 2628/java tcp 0 0 0.0.0.0:48159 0.0.0.0:* LISTEN 1001 8496 2628/java tcp 0 0 0.0.0.0:53121 0.0.0.0:* LISTEN 1001 9228 2857/java tcp 0 0 127.0.0.1:54310 0.0.0.0:* LISTEN 1001 8143 2471/java tcp 0 0 127.0.0.1:54311 0.0.0.0:* LISTEN 1001 9230 2857/java tcp 0 0 0.0.0.0:59305 0.0.0.0:* LISTEN 1001 8141 2471/java tcp 0 0 0.0.0.0:50060 0.0.0.0:* LISTEN 1001 9857 3005/java tcp 0 0 0.0.0.0:49900 0.0.0.0:* LISTEN 1001 9037 2785/java tcp 0 0 0.0.0.0:50030 0.0.0.0:* LISTEN 1001 9773 2857/java
-
可以进入logs目录查看启动情况
tail -f conf/..log
-
在浏览器查看是否启动成功
#windows下eclipse远程连接hadoop开发
eclipse的插件式google搜索找到的
-
安装好插件,将下载的jar复制到eclipse的plugins目录
-
配置hadoop安装目录
这个安装目录只是你下载的hadoop存放的目录。
- 切换至“Map/Reduce”工作目录
选择“Window”菜单下选择“Open Perspective–>Other”,弹出一个窗体,从中选择“Map/Reduce”选项即可进行切换至“Map/Reduce”工作目录。
- 配置eclipse中的Map/Reduce Locations
点击新建hadoop location
配置hadoop location可以参考eclipse Workspace目录下的.metadata.plugins\org.apache.hadoop.eclipse\locations的xml插件配置文件:
fs.s3n.impl org.apache.hadoop.fs.s3native.NativeS3FileSystem mapreduce.job.counters.max 120 mapred.task.cache.levels 2 dfs.client.use.datanode.hostname false hadoop.tmp.dir /home/user4hadoop/hadoop hadoop.native.lib true map.sort.class org.apache.hadoop.util.QuickSort dfs.namenode.decommission.nodes.per.interval 5 dfs.https.need.client.auth false ipc.client.idlethreshold 4000 dfs.datanode.data.dir.perm 755 mapred.system.dir /home/user4hadoop/hadoop/tmp/mapred/system mapred.job.tracker.persist.jobstatus.hours 0 dfs.datanode.address 0.0.0.0:50010 dfs.namenode.logging.level info dfs.block.access.token.enable false io.skip.checksum.errors false fs.default.name hdfs://192.168.1.127:9000/ mapred.cluster.reduce.memory.mb -1 mapred.child.tmp ./tmp fs.har.impl.disable.cache true dfs.safemode.threshold.pct 0.999f mapred.skip.reduce.max.skip.groups 0 dfs.namenode.handler.count 10 dfs.blockreport.initialDelay 0 mapred.heartbeats.in.second 100 mapred.tasktracker.dns.nameserver default io.sort.factor 10 mapred.task.timeout 600000 mapred.max.tracker.failures 4 hadoop.rpc.socket.factory.class.default org.apache.hadoop.net.StandardSocketFactory mapred.job.tracker.jobhistory.lru.cache.size 5 fs.hdfs.impl org.apache.hadoop.hdfs.DistributedFileSystem eclipse.plug-in.jobtracker.port 9001 dfs.namenode.stale.datanode.interval 30000 dfs.block.access.key.update.interval 600 mapred.skip.map.auto.incr.proc.count true mapreduce.job.complete.cancel.delegation.tokens true io.mapfile.bloom.size 1048576 mapreduce.reduce.shuffle.connect.timeout 180000 dfs.safemode.extension 30000 mapred.jobtracker.blacklist.fault-timeout-window 180 tasktracker.http.threads 40 mapred.job.shuffle.merge.percent 0.66 fs.ftp.impl org.apache.hadoop.fs.ftp.FTPFileSystem dfs.namenode.kerberos.internal.spnego.principal ${dfs.web.authentication.kerberos.principal} mapred.output.compress false io.bytes.per.checksum 512 mapred.combine.recordsBeforeProgress 10000 mapred.healthChecker.script.timeout 600000 topology.node.switch.mapping.impl org.apache.hadoop.net.ScriptBasedMapping dfs.https.server.keystore.resource ssl-server.xml mapred.reduce.slowstart.completed.maps 0.05 dfs.namenode.safemode.min.datanodes 0 mapred.reduce.max.attempts 4 mapreduce.ifile.readahead.bytes 4194304 fs.ramfs.impl org.apache.hadoop.fs.InMemoryFileSystem dfs.block.access.token.lifetime 600 dfs.name.edits.dir /home/user4hadoop/hadoop/name mapred.skip.map.max.skip.records 0 mapred.cluster.map.memory.mb -1 hadoop.security.group.mapping org.apache.hadoop.security.ShellBasedUnixGroupsMapping mapred.job.tracker.persist.jobstatus.dir /jobtracker/jobsInfo dfs.block.size 67108864 fs.s3.buffer.dir ${hadoop.tmp.dir}/s3 job.end.retry.attempts 0 fs.file.impl org.apache.hadoop.fs.LocalFileSystem dfs.datanode.max.xcievers 4096 mapred.local.dir.minspacestart 0 mapred.output.compression.type RECORD dfs.datanode.ipc.address 0.0.0.0:50020 dfs.permissions true topology.script.number.args 100 mapreduce.job.counters.groups.max 50 io.mapfile.bloom.error.rate 0.005 mapred.cluster.max.reduce.memory.mb -1 mapred.max.tracker.blacklists 4 mapred.task.profile.maps 0-2 dfs.datanode.https.address 0.0.0.0:50475 mapred.userlog.retain.hours 24 dfs.secondary.http.address 0.0.0.0:50090 dfs.namenode.replication.work.multiplier.per.iteration 2 dfs.replication.max 512 mapred.job.tracker.persist.jobstatus.active false hadoop.security.authorization false local.cache.size 10737418240 eclipse.plug-in.jobtracker.host 192.168.1.127 dfs.namenode.delegation.token.renew-interval 86400000 mapred.min.split.size 0 mapred.map.tasks 2 mapred.child.java.opts -Xmx200m eclipse.plug-in.user.name user4hadoop dfs.https.client.keystore.resource ssl-client.xml mapred.job.queue.name default dfs.https.address 0.0.0.0:50470 mapred.job.tracker.retiredjobs.cache.size 1000 dfs.balance.bandwidthPerSec 1048576 ipc.server.listen.queue.size 128 dfs.namenode.invalidate.work.pct.per.iteration 0.32f job.end.retry.interval 30000 mapred.inmem.merge.threshold 1000 mapred.skip.attempts.to.start.skipping 2 mapreduce.tasktracker.outofband.heartbeat.damper 1000000 hadoop.security.use-weak-http-crypto false fs.checkpoint.dir /home/user4hadoop/hadoop/dfs/namesecondary mapred.reduce.tasks 1 mapred.merge.recordsBeforeProgress 10000 mapred.userlog.limit.kb 0 mapred.job.reduce.memory.mb -1 dfs.max.objects 0 webinterface.private.actions false hadoop.security.token.service.use_ip true io.sort.spill.percent 0.80 mapred.job.shuffle.input.buffer.percent 0.70 eclipse.plug-in.socks.proxy.port 1080 dfs.datanode.dns.nameserver default mapred.map.tasks.speculative.execution true hadoop.http.authentication.type simple hadoop.util.hash.type murmur dfs.blockreport.intervalMsec 3600000 mapred.map.max.attempts 4 mapreduce.job.acl-view-job mapreduce.ifile.readahead true dfs.client.block.write.retries 3 mapred.job.tracker.handler.count 10 mapreduce.reduce.shuffle.read.timeout 180000 mapred.tasktracker.expiry.interval 600000 dfs.secondary.namenode.kerberos.internal.spnego.principal ${dfs.web.authentication.kerberos.principal} dfs.https.enable false mapred.jobtracker.maxtasks.per.job -1 mapred.jobtracker.job.history.block.size 3145728 keep.failed.task.files false dfs.datanode.use.datanode.hostname false dfs.datanode.failed.volumes.tolerated 0 mapred.task.profile.reduces 0-2 ipc.client.tcpnodelay false mapred.output.compression.codec org.apache.hadoop.io.compress.DefaultCodec io.map.index.skip 0 hadoop.http.authentication.token.validity 36000 ipc.server.tcpnodelay false mapred.jobtracker.blacklist.fault-bucket-width 15 dfs.namenode.delegation.key.update-interval 86400000 mapred.job.map.memory.mb -1 dfs.default.chunk.view.size 32768 hadoop.logfile.size 10000000 mapred.reduce.tasks.speculative.execution true mapreduce.tasktracker.outofband.heartbeat false mapreduce.reduce.input.limit -1 dfs.datanode.du.reserved 0 hadoop.security.authentication simple eclipse.plug-in.socks.proxy.host host fs.checkpoint.period 3600 dfs.web.ugi webuser,webgroup mapred.job.reuse.jvm.num.tasks 1 mapred.jobtracker.completeuserjobs.maximum 100 dfs.df.interval 60000 dfs.data.dir /home/user4hadoop/hadoop/data mapred.task.tracker.task-controller org.apache.hadoop.mapred.DefaultTaskController fs.s3.maxRetries 4 dfs.datanode.dns.interface default mapred.cluster.max.map.memory.mb -1 mapreduce.reduce.shuffle.maxfetchfailures 10 mapreduce.job.acl-modify-job dfs.permissions.supergroup supergroup mapred.local.dir /home/user4hadoop/hadoop/tmp/mapred/local fs.hftp.impl org.apache.hadoop.hdfs.HftpFileSystem fs.trash.interval 0 fs.s3.sleepTimeSeconds 10 dfs.replication.min 1 mapred.submit.replication 10 fs.har.impl org.apache.hadoop.fs.HarFileSystem mapred.map.output.compression.codec org.apache.hadoop.io.compress.DefaultCodec hadoop.relaxed.worker.version.check false mapred.tasktracker.dns.interface default dfs.namenode.decommission.interval 30 dfs.http.address 0.0.0.0:50070 eclipse.plug-in.namenode.port 9000 dfs.heartbeat.interval 3 mapred.job.tracker 192.168.1.127:9001 hadoop.http.authentication.signature.secret.file ${user.home}/hadoop-http-auth-signature-secret io.seqfile.sorter.recordlimit 1000000 dfs.name.dir /home/user4hadoop/hadoop/name mapred.line.input.format.linespermap 1 mapred.jobtracker.taskScheduler org.apache.hadoop.mapred.JobQueueTaskScheduler dfs.datanode.http.address 0.0.0.0:50075 eclipse.plug-in.masters.colocate yes fs.webhdfs.impl org.apache.hadoop.hdfs.web.WebHdfsFileSystem mapred.local.dir.minspacekill 0 dfs.replication.interval 3 io.sort.record.percent 0.05 hadoop.http.authentication.kerberos.principal HTTP/localhost@LOCALHOST fs.kfs.impl org.apache.hadoop.fs.kfs.KosmosFileSystem mapred.temp.dir ${hadoop.tmp.dir}/mapred/temp mapred.tasktracker.reduce.tasks.maximum 2 dfs.replication 3 eclipse.plug-in.socks.proxy.enable no fs.checkpoint.edits.dir /home/user4hadoop/hadoop/dfs/namesecondary mapred.tasktracker.tasks.sleeptime-before-sigkill 5000 eclipse.plug-in.location.name hadoop4Ubuntu mapred.job.reduce.input.buffer.percent 0.0 mapred.tasktracker.indexcache.mb 10 mapreduce.job.split.metainfo.maxsize 10000000 mapred.skip.reduce.auto.incr.proc.count true hadoop.logfile.count 10 io.seqfile.compress.blocksize 1000000 fs.s3.block.size 67108864 mapred.tasktracker.taskmemorymanager.monitoring-interval 5000 hadoop.http.authentication.simple.anonymous.allowed true mapred.queue.default.state RUNNING mapred.acls.enabled false mapreduce.jobtracker.staging.root.dir /home/user4hadoop/hadoop/tmp/mapred/staging dfs.namenode.check.stale.datanode false mapred.queue.names default dfs.access.time.precision 3600000 fs.hsftp.impl org.apache.hadoop.hdfs.HsftpFileSystem mapred.task.tracker.http.address 0.0.0.0:50060 mapred.disk.healthChecker.interval 60000 mapred.reduce.parallel.copies 5 io.seqfile.lazydecompress true eclipse.plug-in.namenode.host 192.168.1.127 io.sort.mb 100 ipc.client.connection.maxidletime 10000 mapred.compress.map.output false hadoop.security.uid.cache.secs 14400 mapred.task.tracker.report.address 127.0.0.1:0 mapred.healthChecker.interval 60000 ipc.client.kill.max 10 ipc.client.connect.max.retries 10 fs.s3.impl org.apache.hadoop.fs.s3.S3FileSystem hadoop.socks.server host:1080 mapred.user.jobconf.limit 5242880 mapreduce.job.counters.group.name.max 128 mapred.job.tracker.http.address 0.0.0.0:50030 io.file.buffer.size 4096 mapred.jobtracker.restart.recover false io.serializations org.apache.hadoop.io.serializer.WritableSerialization dfs.datanode.handler.count 3 mapred.task.profile false dfs.replication.considerLoad true jobclient.output.filter FAILED dfs.namenode.delegation.token.max-lifetime 604800000 hadoop.http.authentication.kerberos.keytab ${user.home}/hadoop.keytab mapred.tasktracker.map.tasks.maximum 2 mapreduce.job.counters.counter.name.max 64 io.compression.codecs org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.BZip2Codec,org.apache.hadoop.io.compress.SnappyCodec fs.checkpoint.size 67108864
差不多这样eclipse就可以远程连接hadoop了, 可以查看是否连接成功:
上传一个文件尝试,如果报告错误: org.apache.hadoop.security.AccessControlException: Permission denied: user=administrator, access=EXECUTE, inode="job_201111031322_0003":heipark:supergroup:rwx-.
查看conf/hdfs-site.xml是否添加配置: dfs.permissions属性为false(默认为true)。可查看上面Hadoop安装配置过程。
请注意conf下配置文件配置时不要使用localhost ,而使用局域网下的ip地址