请稍侯

Hbase初体验

21 July 2015
更多

Hbase是hadoop的一个子项目,基于hdfs,用于处理大数据存储分析。在这个信息爆炸的时代分析运用Hbase是大势所趋。

Apache官网的QuickStart是很好的教材,于是按照quickstart一步步来。首先运行一个Standalone模式的Hbase实例,Ubuntu之类的系统需要配置hosts,官网有介绍。此处略过。

1.在stable目录下载 hbase-0.98.13-hadoop2-bin.tar.gz 注意不是hbase-1.0.1.1-bin.tar之类的。hbase-1.0.1.1-bin.tar估计是没有带hadoop和zookeeper的。

2.解压到合适的位置

3.在conf/hbase-env.sh配置JAVA_HOME路径,如果你的操作系统中已经配置了JAVA_HOME,其实也可以不用设置。Hbase会自动使用系统默认JVM

4.编辑conf/hbase-site.xml文件,设置hbase.rootdir以及zookeeper数据文件存储目录到合适位置,注意不需要自己去创建那个目录。

hbase.rootdir file:///home/testuser/hbase hbase.zookeeper.property.dataDir /home/testuser/zookeeper

5.启动hbas bin/start-hbase.sh 通过jps命令看看,如果能看到Hmaster说明启动成功。

6.启动成功后可以通过 bin/hbase shell 终端连接HBASE可以通过终端创建一些表,增加一些数据具体见Apache官网 QuickStart: http://hbase.apache.org/book.html#quickstart

伪分布式部署: 1.首先需要设置ssh localhost免密码登陆,具体操作步骤: cd 到当前用户目录下.ssh目录,如没有该目录,创建 ssh-keygen -t rsa

Generating public/private rsa key pair.

//输入保存的文件名 Enter file in which to save the key (/Users/myname/.ssh/id_rsa): id_rsa id_rsa already exists.

//如果该文件已存在,会提示是否覆盖,选择是 Overwrite (y/n)? y //这里直接按回车 切记 Enter passphrase (empty for no passphrase): //直接按回车 Enter same passphrase again: //出现以下信息说明已经生成key成功 Your identification has been saved in id_rsa. Your public key has been saved in id_rsa.pub. The key fingerprint is: d7:29:77:09:34:27:e3:1e:c8:f3:a0:03:d8:53:f6:64

以上步骤顺利,执行: cat id_rsa.pub >authorized_keys 然后试试ssh localhost

注意如果是Mac,需要先在偏好设置 > 共享 中勾选允许远程登陆,否则会出现refuse

2.需要下载一个hadoop,我下载的是hadoop-2.7.1.tar 并且伪分布式部署一个hadoop实例, 配置core-site.xml:


<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
  <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:9000</value>
    </property>
</configuration>

hdfs-site.xml:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
  <property>
        <name>dfs.replication</name>
        <value>1</value>
  </property>
 </configuration>

启动hdfs, sbin/start-dfs.sh 如果启动正常,配置hbase-site.xml:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
/**
 *
 * Licensed to the Apache Software Foundation (ASF) under one
 * or more contributor license agreements.  See the NOTICE file
 * distributed with this work for additional information
 * regarding copyright ownership.  The ASF licenses this file
 * to you under the Apache License, Version 2.0 (the
 * "License"); you may not use this file except in compliance
 * with the License.  You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */
-->
<configuration>
<!--<property>
    <name>hbase.rootdir</name>
    <value>file:///Users/wangchaobo/Desktop/itools/haop/h_data</value>
  </property>
-->
   
  <property>
    <name>hbase.zookeeper.property.dataDir</name>
    <value>/Users/wangchaobo/Desktop/itools/haop/z_data</value>
  </property>

  <property>
    <name>hbase.cluster.distributed</name>
    <value>true</value>
  </property>

  <property>
    <name>hbase.rootdir</name>
    <value>hdfs://localhost:9000/hbase</value>
  </property>
</configuration>

注意设置Hbase.rootdir目录为hadoop中hdfs的位置。

然后启动hbase ./start-hbase.sh 启动成功后可以在hadoop hdfs中查看hbase产生的数据 进入hadoop根目录 然后./bin/hadoop fs -ls /hbase可以看到hbase产生的数据

在浏览器访问http://localhost:60010 可以查看当前Hbase状态,如果不能访问,说明启动失败

启动成功后可以试试用java 客户端api去创建表和数据操作这里有个很好的例子,很全面:

http://www.cnblogs.com/zhenjing/p/hbase_example.html

这一篇文章介绍对各个接口的很清晰:

http://blog.csdn.net/lifuxiangcaohui/article/details/39997205

需要注意hbase-site.xml文件需要添加到项目的classpath中(直接复制到项目src目录即可) 在代码中HBaseConfiguration会读取该配置信息,连接Hbase的各个节点。如读取不到则使用默认设置

实际环境分布式部署Apache官网介绍很详细,需要注意用户ssh 免密码登陆