The method by which python operates on hbase data
- 2020-05-19 05:05:34
- OfStack
Configuration thrift
python USES the package thrift
The python compiler for personal use is pycharm community edition. In the project Settings, find project interpreter, in the corresponding project, find package, then select "+" to add, search for hbase-thrift (Python client for HBase Thrift interface), then install the package.
Install server-side thrift.
Refer to the official website, also can be installed in the machine for terminal use.
thrift Getting Started
You can also refer to the HBase example of the installation method python calling HBase
First, install thrift
Download thrift. In this case, I'm using thrift-0.7.0-dev.tar.gz
tar xzf thrift-0.7.0-dev.tar.gz
cd thrift-0.7.0-dev
sudo. / configure � with - cpp = no � with - ruby = no
sudo make
sudo make install
Then, go to the source package of HBase and find it
src/main/resources/org/apache/hadoop/hbase/thrift/
perform
thrift � gen py Hbase thrift
mv gen - py/hbase / / usr/lib/python2. 4 / site packages/(according to python versions may have different)
Get data example 1
# coding:utf-8
from thrift import Thrift
from thrift.transport import TSocket
from thrift.transport import TTransport
from thrift.protocol import TBinaryProtocol
from hbase import Hbase
# from hbase.ttypes import ColumnDescriptor, Mutation, BatchMutation
from hbase.ttypes import *
import csv
def client_conn():
# Make socket
transport = TSocket.TSocket('hostname,like:localhost', port)
# Buffering is critical. Raw sockets are very slow
transport = TTransport.TBufferedTransport(transport)
# Wrap in a protocol
protocol = TBinaryProtocol.TBinaryProtocol(transport)
# Create a client to use the protocol encoder
client = Hbase.Client(protocol)
# Connect!
transport.open()
return client
if __name__ == "__main__":
client = client_conn()
# r = client.getRowWithColumns('table name', 'row name', ['column name'])
# print(r[0].columns.get('column name')), type((r[0].columns.get('column name')))
result = client.getRow("table name","row name")
data_simple =[]
# print result[0].columns.items()
for k, v in result[0].columns.items(): #.keys()
#data.append((k,v))
# print type(k),type(v),v.value,,v.timestamp
data_simple.append((v.timestamp, v.value))
writer.writerows(data)
csvfile.close()
csvfile_simple = open("data_xy_simple.csv", "wb")
writer_simple = csv.writer(csvfile_simple)
writer_simple.writerow(["timestamp", "value"])
writer_simple.writerows(data_simple)
csvfile_simple.close()
print "finished"
The basic python should know that result is list, result[0]. columns.items () is a key-value pair of dict. You can check relevant information. Or by output variables, observe the value and type of the variable.
Note: transport.open () is linked in the above program. After execution, transport.close () should be disconnected.
For now, only read data is involved, and other dbase operations will continue to be updated later.