博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
Hadoop伪分布式安装
阅读量:6626 次
发布时间:2019-06-25

本文共 28466 字,大约阅读时间需要 94 分钟。

hot3.png

#安装环境

  • Hadoop版本:1.1.2
  • 虚拟机:virtualbox 4.3.8.0
  • 服务器:Ubuntu service 13.10 x64
  • java:openJDK 7

#安装步骤

##环境安装

  1. 安装系统
  • 安装虚拟机

  • 安装操作系统

  • 切换系统软件源

    切换软件源为oschina的软件源(使用原软件源,速度太慢)

    sudo cp /etc/apt/sources.list /etc/apt/sources.list.backup sudo vi/etc/apt/sources.list

    复制以下代码:

    deb http://mirrors.oschina.net/ubuntu/ saucy main restricted universe multiverse deb http://mirrors.oschina.net/ubuntu/ saucy-backports main restricted universe multiverse deb http://mirrors.oschina.net/ubuntu/ saucy-proposed main restricted universe multiverse deb http://mirrors.oschina.net/ubuntu/ saucy-security main restricted universe multiverse deb http://mirrors.oschina.net/ubuntu/ saucy-updates main restricted universe multiverse deb-src http://mirrors.oschina.net/ubuntu/ saucy main restricted universe multiverse deb-src http://mirrors.oschina.net/ubuntu/ saucy-backports main restricted universe multiverse deb-src http://mirrors.oschina.net/ubuntu/ saucy-proposed main restricted universe multiverse deb-src http://mirrors.oschina.net/ubuntu/ saucy-security main restricted universe multiverse deb-src http://mirrors.oschina.net/ubuntu/ saucy-updates main restricted universe multiverse

    保存退出

    最后更新源索引

    sudo apt-get update
  1. 安装jdk

    sudo apt-get install openjdk-7-jdk

    安装完成之后,可以使用java -version 来判断是否安装成功,安装目录为:/usr/lib/jvm/java-7-openjdk-amd64

  2. 安装ssh服务

    安装ssh-service

    sudo apt-get install ssh openssh-server

    配置免密码登录localhost

    ssh-keygen -t rsa -P "" cp ~/.ssh/id_rsa.pub ~/.ssh/authorized_keys

    测试

    ssh localhost

##创建用户

创建新的用户组和用户来操作Hadoop

  1. 创建hadoop用户组

    sudo addgroup hadoop4group
  2. 创建Hadoop用户

    sudo adduser -ingroup hadoop4group hadoop4user
  3. 给hadoop用户添加权限,打开/etc/sudoers文件

    sudo vi /etc/sudoers

    在root ALL=(ALL:ALL) ALL下添加hadoop4userALL=(ALL:ALL) ALL

    hadoop4user ALL=(ALL:ALL) ALL

##hadoop安装

  1. 解压hadoop到user/local下

    sudo cp hadoop-1.1.2.tar.gz /usr/local/cd /usr/localsudo tar -zxf hhadoop-1.1.2.tar.gzsudo mv hadoop-1.1.2 hadoop
  2. 设置该hadoop文件夹的权限

    sudo chown -R hadoop4group:hadoop4user hadoop
  3. 配置conf/hadoop-env.sh 在hadoop-env.sh中加入jdk路径

    sudo vi hadoop/conf/hadoop-env.sh//添加如下代码:export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
  4. 打开conf/core-site.xml文件

    sudo vi hadoop/conf/core-site.xml

    修改成如下: <!-- lang: shell --> <configuration>

    <property>
    <name>fs.default.name</name>
    <value>hdfs://192.168.1.121:9000</value>
    </property>
    </configuration>

  5. 打开conf/mapred-site.xml文件

    sudo vi hadoop/conf/mapred-site.xml

    修改成如下: <!-- lang: shell --> <configuration>

    <property>
    <name>mapred.job.tracker</name>
    <value>192.168.1.121:9001</value>
    </property>
    </configuration>

  6. 打开conf/hdfs-site.xml文件

    sudo vi hadoop/conf/hdfs-site.xml

    修改成如下:

    dfs.replication
    1
    dfs.permissions
    false
    hadoop.tmp.dir
    /home/hadoop4user/hadoop
    dfs.data.dir
    /home/hadoop4user/hadoop/data
    dfs.name.dir
    /home/hadoop4user/hadoop/name

    注意:安装上述代码创建相对应的目录,并设置目录权限为755

#运行Hadoop

  1. 进入Hadoop目录,格式化hdfs文件系统

    cd /usr/local/hadoop/ bin/hadoop namenode -format
  2. 启动Hadoop

    bin/start-all.sh
  3. 通过jps工具检查hadoop运行状况

    $ jps 4590 TaskTracker 4368 JobTracker 4270 SecondaryNameNode 4642 Jps 4028 DataNode 3801 NameNode
  4. 也可以通过 netstat 命令来检查 hadoop 是否正常运行

    $ sudo netstat -plten | grep java tcp 0 0 0.0.0.0:50070 0.0.0.0:* LISTEN 1001 9236 2471/java tcp 0 0 0.0.0.0:50010 0.0.0.0:* LISTEN 1001 9998 2628/java tcp 0 0 0.0.0.0:48159 0.0.0.0:* LISTEN 1001 8496 2628/java tcp 0 0 0.0.0.0:53121 0.0.0.0:* LISTEN 1001 9228 2857/java tcp 0 0 127.0.0.1:54310 0.0.0.0:* LISTEN 1001 8143 2471/java tcp 0 0 127.0.0.1:54311 0.0.0.0:* LISTEN 1001 9230 2857/java tcp 0 0 0.0.0.0:59305 0.0.0.0:* LISTEN 1001 8141 2471/java tcp 0 0 0.0.0.0:50060 0.0.0.0:* LISTEN 1001 9857 3005/java tcp 0 0 0.0.0.0:49900 0.0.0.0:* LISTEN 1001 9037 2785/java tcp 0 0 0.0.0.0:50030 0.0.0.0:* LISTEN 1001 9773 2857/java
  5. 可以进入logs目录查看启动情况

    tail -f conf/..log
  6. 在浏览器查看是否启动成功

#windows下eclipse远程连接hadoop开发

eclipse的插件式google搜索找到的

  1. 安装好插件,将下载的jar复制到eclipse的plugins目录

  2. 配置hadoop安装目录

Hadoop插件配置

这个安装目录只是你下载的hadoop存放的目录。

  1. 切换至“Map/Reduce”工作目录

选择“Window”菜单下选择“Open Perspective–>Other”,弹出一个窗体,从中选择“Map/Reduce”选项即可进行切换至“Map/Reduce”工作目录。

切换至“Map/Reduce”工作目录

  1. 配置eclipse中的Map/Reduce Locations

新建location

点击新建hadoop location

General

Advanced parameters

配置hadoop location可以参考eclipse Workspace目录下的.metadata.plugins\org.apache.hadoop.eclipse\locations的xml插件配置文件:

fs.s3n.impl
org.apache.hadoop.fs.s3native.NativeS3FileSystem
mapreduce.job.counters.max
120
mapred.task.cache.levels
2
dfs.client.use.datanode.hostname
false
hadoop.tmp.dir
/home/user4hadoop/hadoop
hadoop.native.lib
true
map.sort.class
org.apache.hadoop.util.QuickSort
dfs.namenode.decommission.nodes.per.interval
5
dfs.https.need.client.auth
false
ipc.client.idlethreshold
4000
dfs.datanode.data.dir.perm
755
mapred.system.dir
/home/user4hadoop/hadoop/tmp/mapred/system
mapred.job.tracker.persist.jobstatus.hours
0
dfs.datanode.address
0.0.0.0:50010
dfs.namenode.logging.level
info
dfs.block.access.token.enable
false
io.skip.checksum.errors
false
fs.default.name
hdfs://192.168.1.127:9000/
mapred.cluster.reduce.memory.mb
-1
mapred.child.tmp
./tmp
fs.har.impl.disable.cache
true
dfs.safemode.threshold.pct
0.999f
mapred.skip.reduce.max.skip.groups
0
dfs.namenode.handler.count
10
dfs.blockreport.initialDelay
0
mapred.heartbeats.in.second
100
mapred.tasktracker.dns.nameserver
default
io.sort.factor
10
mapred.task.timeout
600000
mapred.max.tracker.failures
4
hadoop.rpc.socket.factory.class.default
org.apache.hadoop.net.StandardSocketFactory
mapred.job.tracker.jobhistory.lru.cache.size
5
fs.hdfs.impl
org.apache.hadoop.hdfs.DistributedFileSystem
eclipse.plug-in.jobtracker.port
9001
dfs.namenode.stale.datanode.interval
30000
dfs.block.access.key.update.interval
600
mapred.skip.map.auto.incr.proc.count
true
mapreduce.job.complete.cancel.delegation.tokens
true
io.mapfile.bloom.size
1048576
mapreduce.reduce.shuffle.connect.timeout
180000
dfs.safemode.extension
30000
mapred.jobtracker.blacklist.fault-timeout-window
180
tasktracker.http.threads
40
mapred.job.shuffle.merge.percent
0.66
fs.ftp.impl
org.apache.hadoop.fs.ftp.FTPFileSystem
dfs.namenode.kerberos.internal.spnego.principal
${dfs.web.authentication.kerberos.principal}
mapred.output.compress
false
io.bytes.per.checksum
512
mapred.combine.recordsBeforeProgress
10000
mapred.healthChecker.script.timeout
600000
topology.node.switch.mapping.impl
org.apache.hadoop.net.ScriptBasedMapping
dfs.https.server.keystore.resource
ssl-server.xml
mapred.reduce.slowstart.completed.maps
0.05
dfs.namenode.safemode.min.datanodes
0
mapred.reduce.max.attempts
4
mapreduce.ifile.readahead.bytes
4194304
fs.ramfs.impl
org.apache.hadoop.fs.InMemoryFileSystem
dfs.block.access.token.lifetime
600
dfs.name.edits.dir
/home/user4hadoop/hadoop/name
mapred.skip.map.max.skip.records
0
mapred.cluster.map.memory.mb
-1
hadoop.security.group.mapping
org.apache.hadoop.security.ShellBasedUnixGroupsMapping
mapred.job.tracker.persist.jobstatus.dir
/jobtracker/jobsInfo
dfs.block.size
67108864
fs.s3.buffer.dir
${hadoop.tmp.dir}/s3
job.end.retry.attempts
0
fs.file.impl
org.apache.hadoop.fs.LocalFileSystem
dfs.datanode.max.xcievers
4096
mapred.local.dir.minspacestart
0
mapred.output.compression.type
RECORD
dfs.datanode.ipc.address
0.0.0.0:50020
dfs.permissions
true
topology.script.number.args
100
mapreduce.job.counters.groups.max
50
io.mapfile.bloom.error.rate
0.005
mapred.cluster.max.reduce.memory.mb
-1
mapred.max.tracker.blacklists
4
mapred.task.profile.maps
0-2
dfs.datanode.https.address
0.0.0.0:50475
mapred.userlog.retain.hours
24
dfs.secondary.http.address
0.0.0.0:50090
dfs.namenode.replication.work.multiplier.per.iteration
2
dfs.replication.max
512
mapred.job.tracker.persist.jobstatus.active
false
hadoop.security.authorization
false
local.cache.size
10737418240
eclipse.plug-in.jobtracker.host
192.168.1.127
dfs.namenode.delegation.token.renew-interval
86400000
mapred.min.split.size
0
mapred.map.tasks
2
mapred.child.java.opts
-Xmx200m
eclipse.plug-in.user.name
user4hadoop
dfs.https.client.keystore.resource
ssl-client.xml
mapred.job.queue.name
default
dfs.https.address
0.0.0.0:50470
mapred.job.tracker.retiredjobs.cache.size
1000
dfs.balance.bandwidthPerSec
1048576
ipc.server.listen.queue.size
128
dfs.namenode.invalidate.work.pct.per.iteration
0.32f
job.end.retry.interval
30000
mapred.inmem.merge.threshold
1000
mapred.skip.attempts.to.start.skipping
2
mapreduce.tasktracker.outofband.heartbeat.damper
1000000
hadoop.security.use-weak-http-crypto
false
fs.checkpoint.dir
/home/user4hadoop/hadoop/dfs/namesecondary
mapred.reduce.tasks
1
mapred.merge.recordsBeforeProgress
10000
mapred.userlog.limit.kb
0
mapred.job.reduce.memory.mb
-1
dfs.max.objects
0
webinterface.private.actions
false
hadoop.security.token.service.use_ip
true
io.sort.spill.percent
0.80
mapred.job.shuffle.input.buffer.percent
0.70
eclipse.plug-in.socks.proxy.port
1080
dfs.datanode.dns.nameserver
default
mapred.map.tasks.speculative.execution
true
hadoop.http.authentication.type
simple
hadoop.util.hash.type
murmur
dfs.blockreport.intervalMsec
3600000
mapred.map.max.attempts
4
mapreduce.job.acl-view-job
mapreduce.ifile.readahead
true
dfs.client.block.write.retries
3
mapred.job.tracker.handler.count
10
mapreduce.reduce.shuffle.read.timeout
180000
mapred.tasktracker.expiry.interval
600000
dfs.secondary.namenode.kerberos.internal.spnego.principal
${dfs.web.authentication.kerberos.principal}
dfs.https.enable
false
mapred.jobtracker.maxtasks.per.job
-1
mapred.jobtracker.job.history.block.size
3145728
keep.failed.task.files
false
dfs.datanode.use.datanode.hostname
false
dfs.datanode.failed.volumes.tolerated
0
mapred.task.profile.reduces
0-2
ipc.client.tcpnodelay
false
mapred.output.compression.codec
org.apache.hadoop.io.compress.DefaultCodec
io.map.index.skip
0
hadoop.http.authentication.token.validity
36000
ipc.server.tcpnodelay
false
mapred.jobtracker.blacklist.fault-bucket-width
15
dfs.namenode.delegation.key.update-interval
86400000
mapred.job.map.memory.mb
-1
dfs.default.chunk.view.size
32768
hadoop.logfile.size
10000000
mapred.reduce.tasks.speculative.execution
true
mapreduce.tasktracker.outofband.heartbeat
false
mapreduce.reduce.input.limit
-1
dfs.datanode.du.reserved
0
hadoop.security.authentication
simple
eclipse.plug-in.socks.proxy.host
host
fs.checkpoint.period
3600
dfs.web.ugi
webuser,webgroup
mapred.job.reuse.jvm.num.tasks
1
mapred.jobtracker.completeuserjobs.maximum
100
dfs.df.interval
60000
dfs.data.dir
/home/user4hadoop/hadoop/data
mapred.task.tracker.task-controller
org.apache.hadoop.mapred.DefaultTaskController
fs.s3.maxRetries
4
dfs.datanode.dns.interface
default
mapred.cluster.max.map.memory.mb
-1
mapreduce.reduce.shuffle.maxfetchfailures
10
mapreduce.job.acl-modify-job
dfs.permissions.supergroup
supergroup
mapred.local.dir
/home/user4hadoop/hadoop/tmp/mapred/local
fs.hftp.impl
org.apache.hadoop.hdfs.HftpFileSystem
fs.trash.interval
0
fs.s3.sleepTimeSeconds
10
dfs.replication.min
1
mapred.submit.replication
10
fs.har.impl
org.apache.hadoop.fs.HarFileSystem
mapred.map.output.compression.codec
org.apache.hadoop.io.compress.DefaultCodec
hadoop.relaxed.worker.version.check
false
mapred.tasktracker.dns.interface
default
dfs.namenode.decommission.interval
30
dfs.http.address
0.0.0.0:50070
eclipse.plug-in.namenode.port
9000
dfs.heartbeat.interval
3
mapred.job.tracker
192.168.1.127:9001
hadoop.http.authentication.signature.secret.file
${user.home}/hadoop-http-auth-signature-secret
io.seqfile.sorter.recordlimit
1000000
dfs.name.dir
/home/user4hadoop/hadoop/name
mapred.line.input.format.linespermap
1
mapred.jobtracker.taskScheduler
org.apache.hadoop.mapred.JobQueueTaskScheduler
dfs.datanode.http.address
0.0.0.0:50075
eclipse.plug-in.masters.colocate
yes
fs.webhdfs.impl
org.apache.hadoop.hdfs.web.WebHdfsFileSystem
mapred.local.dir.minspacekill
0
dfs.replication.interval
3
io.sort.record.percent
0.05
hadoop.http.authentication.kerberos.principal
HTTP/localhost@LOCALHOST
fs.kfs.impl
org.apache.hadoop.fs.kfs.KosmosFileSystem
mapred.temp.dir
${hadoop.tmp.dir}/mapred/temp
mapred.tasktracker.reduce.tasks.maximum
2
dfs.replication
3
eclipse.plug-in.socks.proxy.enable
no
fs.checkpoint.edits.dir
/home/user4hadoop/hadoop/dfs/namesecondary
mapred.tasktracker.tasks.sleeptime-before-sigkill
5000
eclipse.plug-in.location.name
hadoop4Ubuntu
mapred.job.reduce.input.buffer.percent
0.0
mapred.tasktracker.indexcache.mb
10
mapreduce.job.split.metainfo.maxsize
10000000
mapred.skip.reduce.auto.incr.proc.count
true
hadoop.logfile.count
10
io.seqfile.compress.blocksize
1000000
fs.s3.block.size
67108864
mapred.tasktracker.taskmemorymanager.monitoring-interval
5000
hadoop.http.authentication.simple.anonymous.allowed
true
mapred.queue.default.state
RUNNING
mapred.acls.enabled
false
mapreduce.jobtracker.staging.root.dir
/home/user4hadoop/hadoop/tmp/mapred/staging
dfs.namenode.check.stale.datanode
false
mapred.queue.names
default
dfs.access.time.precision
3600000
fs.hsftp.impl
org.apache.hadoop.hdfs.HsftpFileSystem
mapred.task.tracker.http.address
0.0.0.0:50060
mapred.disk.healthChecker.interval
60000
mapred.reduce.parallel.copies
5
io.seqfile.lazydecompress
true
eclipse.plug-in.namenode.host
192.168.1.127
io.sort.mb
100
ipc.client.connection.maxidletime
10000
mapred.compress.map.output
false
hadoop.security.uid.cache.secs
14400
mapred.task.tracker.report.address
127.0.0.1:0
mapred.healthChecker.interval
60000
ipc.client.kill.max
10
ipc.client.connect.max.retries
10
fs.s3.impl
org.apache.hadoop.fs.s3.S3FileSystem
hadoop.socks.server
host:1080
mapred.user.jobconf.limit
5242880
mapreduce.job.counters.group.name.max
128
mapred.job.tracker.http.address
0.0.0.0:50030
io.file.buffer.size
4096
mapred.jobtracker.restart.recover
false
io.serializations
org.apache.hadoop.io.serializer.WritableSerialization
dfs.datanode.handler.count
3
mapred.task.profile
false
dfs.replication.considerLoad
true
jobclient.output.filter
FAILED
dfs.namenode.delegation.token.max-lifetime
604800000
hadoop.http.authentication.kerberos.keytab
${user.home}/hadoop.keytab
mapred.tasktracker.map.tasks.maximum
2
mapreduce.job.counters.counter.name.max
64
io.compression.codecs
org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.BZip2Codec,org.apache.hadoop.io.compress.SnappyCodec
fs.checkpoint.size
67108864

差不多这样eclipse就可以远程连接hadoop了, 可以查看是否连接成功:

查看hadoop文件

上传一个文件尝试,如果报告错误: org.apache.hadoop.security.AccessControlException: Permission denied: user=administrator, access=EXECUTE, inode="job_201111031322_0003":heipark:supergroup:rwx-.

查看conf/hdfs-site.xml是否添加配置: dfs.permissions属性为false(默认为true)。可查看上面Hadoop安装配置过程。

请注意conf下配置文件配置时不要使用localhost ,而使用局域网下的ip地址

转载于:https://my.oschina.net/mercury5/blog/205000

你可能感兴趣的文章
Liferay 6开发学习(二十六):数据库连接相关问题
查看>>
【20170506】贝业新兄弟IT总监李济宏:第三方家居物流的IT架构探索
查看>>
poj3517
查看>>
iphone http下载文件
查看>>
poj 1195:Mobile phones(二维树状数组,矩阵求和)
查看>>
Codeforces 433 C. Ryouko&#39;s Memory Note
查看>>
java中的Static class
查看>>
实例讲解Linux下的makefile
查看>>
json lib 2.4及其依赖包下载
查看>>
计算机中文核心期刊
查看>>
8148 8168 中移植live55 出现except rtsp 中途莫名的断流
查看>>
【BZOJ】3832: [Poi2014]Rally
查看>>
[转]看懂ExtJS的API
查看>>
推荐15款制作 SVG 动画的 JavaScript 库
查看>>
转:OpenResty最佳实践(推荐了解lua语法)
查看>>
转:CEO, CFO, CIO, CTO, CSO是什么
查看>>
andriod自定义视图
查看>>
linux下vim更改注释颜色
查看>>
在SSL / https下托管SignalR
查看>>
Using JRuby with Maven
查看>>