A detailed explanation of some usage problems and related matters needing attention based on HBase Thrift interface

  • 2020-06-03 06:03:26
  • OfStack

HBase provides Thrift interface support for non-ES1en languages. Based on the experience of using HBase Thrift interface (version 0.92.1 of HBase), this paper summarizes 1 of the problems encountered and the matters need attention.
1. Byte storage order
In HBase, since row (row key and column family, column qualifier, time stamp) is sorted in lexicodicary order, the data of short, int, long, etc., can be sorted by Bytes. toBytes(...). After converting to an byte array, it must be stored in big-end mode (high-byte at low-address, low-byte at high-address). The same is true for value. Therefore, when using Thrift API (C++, Php, Python, etc.), it is better to treat row and value in large terminal.
For example, in C++, the int type variable is converted to lexicographical order by:

string key;
  int32_t timestamp = 1352563200;
  const char* pTs =(const char*) &timestamp;
  size_t n = sizeof(int32_t);
  key.append(pTs, n);

Convert dictionary order to int by:

const char * ts = key.c_str();
int32_t timestamp = *((int32_t*)(ts));

Php provides pack and unpack methods for conversion:

  $key = pack("N", $num);
  $num = unpack("N", $key);

2. Use of TScan traps
In the PHP Thrift interface of HBase, TScan can directly set startRow, stopRow, columns, filter and other attributes, which default to null and then become non-null (through the constructor of TScan or directly assign the member variable of TScan). When RPC is operated by write() method and Thrift Server, the direct judgment is based on the fact that these attributes are not null, and they are transmitted to the Thrift Server terminal via Thrift protocol.
However, in the C++ Thrift interface, there is a variable of type _TScan___, ES77en, ES78en, with the internal structure as follows:

typedef struct _TScan__isset {
  _TScan__isset() : startRow(false), stopRow(false), timestamp(false), columns(false), caching(false), filterString(false) {}
  bool startRow;
  bool stopRow;
  bool timestamp;
  bool columns;
  bool caching;
  bool filterString;
} _TScan__isset;

The write() method of TScan depends on determining if attributes such as startRow, stopRow, columns, filter, etc. are set on each of the bool variable tags under _ES84enES85en, and whether these attributes are transmitted to the Thrift end via the Thrift protocol, and must be set by the method of ES94en_ES95en () to be effective! In the default constructor of TScan, the tag corresponding to these attributes is not set to true!
Therefore, if you initialize the startRow, stopRow, columns, filter attributes directly through the constructor of TScan, the table will be traversed from start to finish, and the corresponding bool identity will be set to true only if the method with EACH SUCCESSIVE set_ES106en () is called, so that the server gets the startRow, stopRow, columns, filter, and so on, scanned by Thrift Server.
3. Number of concurrent access threads
First, to minimize the time overhead due to network traffic, Thrift Server is best deployed on the same machine as the application client. When Thrift Server is started, the number of concurrent threads can be configured through parameters, otherwise it is easy to cause Thrift Server threads to be full and do not respond to read and write requests from the client. The specific command is: bin/ ES125en-ES126en sh thrift -- ES130en-ES131en 200-ES132en 500 (for more parameters, see here: bin/ ES134en-ES135en.
4. Maximum heap memory configuration
If the client reads the scan operation sequentially with Thrift Server and sets a fixed number of cache records (through TScan's int32_t caching variable setting), then the number of records that are caching may take up a significant portion of the heap memory of Thrift Server, especially when accessed concurrently by multiple clients.
Therefore, you can increase the maximum heap of memory before Thrift Server is started, otherwise the process may be killed due to an ES156en.lang.OutOfMemoryError exception, especially if the larger NUMBER of caching records is set with Scan (default is export HBASE_HEAPSIZE=1000MB, which can be set in conf/ hbase-ES167en.sh).

Related articles: