The method by which python operates on hbase data

2020-05-19 05:05:34
OfStack

Configuration thrift

python USES the package thrift

The python compiler for personal use is pycharm community edition. In the project Settings, find project interpreter, in the corresponding project, find package, then select "+" to add, search for hbase-thrift (Python client for HBase Thrift interface), then install the package.

Install server-side thrift.

Refer to the official website, also can be installed in the machine for terminal use.

thrift Getting Started

You can also refer to the HBase example of the installation method python calling HBase

First, install thrift

Download thrift. In this case, I'm using thrift-0.7.0-dev.tar.gz

tar xzf thrift-0.7.0-dev.tar.gz
cd thrift-0.7.0-dev
sudo. / configure � with - cpp = no � with - ruby = no
sudo make
sudo make install

Then, go to the source package of HBase and find it

src/main/resources/org/apache/hadoop/hbase/thrift/

perform

thrift � gen py Hbase thrift
mv gen - py/hbase / / usr/lib/python2. 4 / site packages/(according to python versions may have different)

Get data example 1


# coding:utf-8

from thrift import Thrift
from thrift.transport import TSocket
from thrift.transport import TTransport
from thrift.protocol import TBinaryProtocol
from hbase import Hbase
# from hbase.ttypes import ColumnDescriptor, Mutation, BatchMutation
from hbase.ttypes import *

import csv


def client_conn():
 # Make socket
 transport = TSocket.TSocket('hostname,like:localhost', port)
 # Buffering is critical. Raw sockets are very slow
 transport = TTransport.TBufferedTransport(transport)
 # Wrap in a protocol
 protocol = TBinaryProtocol.TBinaryProtocol(transport)
 # Create a client to use the protocol encoder
 client = Hbase.Client(protocol)
 # Connect!
 transport.open()
 return client

if __name__ == "__main__":

 client = client_conn()

 # r = client.getRowWithColumns('table name', 'row name', ['column name'])
 # print(r[0].columns.get('column name')), type((r[0].columns.get('column name')))

 result = client.getRow("table name","row name")
 data_simple =[]

 # print result[0].columns.items()

 for k, v in result[0].columns.items(): #.keys()
  #data.append((k,v))
  # print type(k),type(v),v.value,,v.timestamp
  data_simple.append((v.timestamp, v.value))

 writer.writerows(data)
 csvfile.close()

 csvfile_simple = open("data_xy_simple.csv", "wb")
 writer_simple = csv.writer(csvfile_simple)
 writer_simple.writerow(["timestamp", "value"])
 writer_simple.writerows(data_simple)
 csvfile_simple.close()

 print "finished"

The basic python should know that result is list, result[0]. columns.items () is a key-value pair of dict. You can check relevant information. Or by output variables, observe the value and type of the variable.

Note: transport.open () is linked in the above program. After execution, transport.close () should be disconnected.

For now, only read data is involved, and other dbase operations will continue to be updated later.