Read the Python class file object from the C language

  • 2020-10-31 21:54:07
  • OfStack

The problem

You will write the C extension to read data from any Python class file object (such as plain file, StringIO object, etc.).

The solution

To read the data of a class file object, you need to repeatedly call the read() method and decode the data correctly.

Here is an example of an C extension function that simply reads all the data in a class file object and outputs it to standard output:


#define CHUNK_SIZE 8192

/* Consume a "file-like" object and write bytes to stdout */
static PyObject *py_consume_file(PyObject *self, PyObject *args) {
 PyObject *obj;
 PyObject *read_meth;
 PyObject *result = NULL;
 PyObject *read_args;

 if (!PyArg_ParseTuple(args,"O", &obj)) {
  return NULL;
 }

 /* Get the read method of the passed object */
 if ((read_meth = PyObject_GetAttrString(obj, "read")) == NULL) {
  return NULL;
 }

 /* Build the argument list to read() */
 read_args = Py_BuildValue("(i)", CHUNK_SIZE);
 while (1) {
  PyObject *data;
  PyObject *enc_data;
  char *buf;
  Py_ssize_t len;

  /* Call read() */
  if ((data = PyObject_Call(read_meth, read_args, NULL)) == NULL) {
   goto final;
  }

  /* Check for EOF */
  if (PySequence_Length(data) == 0) {
   Py_DECREF(data);
   break;
  }

  /* Encode Unicode as Bytes for C */
  if ((enc_data=PyUnicode_AsEncodedString(data,"utf-8","strict"))==NULL) {
   Py_DECREF(data);
   goto final;
  }

  /* Extract underlying buffer data */
  PyBytes_AsStringAndSize(enc_data, &buf, &len);

  /* Write to stdout (replace with something more useful) */
  write(1, buf, len);

  /* Cleanup */
  Py_DECREF(enc_data);
  Py_DECREF(data);
 }
 result = Py_BuildValue("");

 final:
 /* Cleanup */
 Py_DECREF(read_meth);
 Py_DECREF(read_args);
 return result;
}

To test this code, construct a class file object such as an StringIO instance, and pass it in:


>>> import io
>>> f = io.StringIO('Hello\nWorld\n')
>>> import sample
>>> sample.consume_file(f)
Hello
World
>>>

discuss

Unlike normal system files, a class file object does not need to be built using low-level file descriptors. Therefore, you cannot access it using normal C library functions. You need to use C API of Python to manipulate class file objects as if they were normal files.

In our solution, read() Method is extracted from the object being passed. A list of parameters is built and then passed continuously PyObject_Call() To call this method. To check the end of the file (EOF), use PySequence_Length() To see if the object is returned with length 0.

For all I/O operations, you need to pay attention to the underlying encoding format and the differences between the bytes and those before Unicode. This section demonstrates how to read 1 file in text mode and decode the resulting text to 1 byte encoding so that it can be used in C. If you want to read a file in base 2, just change it by 1 dot, for example:


...
/* Call read() */
if ((data = PyObject_Call(read_meth, read_args, NULL)) == NULL) {
 goto final;
}

/* Check for EOF */
if (PySequence_Length(data) == 0) {
 Py_DECREF(data);
 break;
}
if (!PyBytes_Check(data)) {
 Py_DECREF(data);
 PyErr_SetString(PyExc_IOError, "File must be in binary mode");
 goto final;
}

/* Extract underlying buffer data */
PyBytes_AsStringAndSize(data, &buf, &len);
...

The hardest part of this section is getting memory management right. When dealing with PyObject * When dealing with variables, you need to take care to manage reference counts and clean up the values of variables when they are not needed. right Py_DECREF() That's what the call does.

The code in this section is written in a generic way, so it can also be applied to other file operations, such as writing files. For example, to write data, you only need to get the class file object's write() Method, converts the data to the appropriate Python object (byte or Unicode), and then calls the method to write the input to a file.

Finally, although class file objects often provide other methods (such as readline(), read_info()), we are better off using only the basic read() and write() Methods. When writing the C extension, keep it as simple as possible.

That's how you read the Python class file objects from C. More on reading Python class files from C, check out other articles on this site!


Related articles: