Create tutorials for declarative mini languages with Python

2020-05-09 18:49:43
OfStack

When most programmers think about programming, they envision imperative styles and techniques for writing applications. The most popular general-purpose programming languages (including Python and other object-oriented languages) are overwhelmingly imperative in style. On the other hand, many programming languages are declarative, including functional and logical languages, as well as general-purpose and special-purpose languages.

Let's list a few languages that fall into each category. Many readers have used many of these tools without necessarily considering the type differences between them. Python, C, C++, Java, Perl, Ruby, Smalltalk, Fortran, Basic, and xBase are all simple imperative programming languages. Some of them are object-oriented, but that's just a matter of organizing the code and data, not the basic programming style. Using these languages, you command a program to execute a sequence of instructions: put some data into the (put) variable; Get (fetch) data from variables; Loop (loop) 1 instruction block until (until) satisfies certain conditions; If (if) a statement is true, then some operations are performed. One of the beauties of all these languages is that they are easy to consider in terms familiar from everyday life. Everyday life is all about doing something, choosing something, doing something else, maybe using some tools. It is easy to think of a computer running a program as a cook, a bricklayer, or a car driver.

Languages such as Prolog, Mercury, SQL, XSLT, EBNF syntax, and real configuration files in various formats declare that something is the case, or that certain constraints are applied. Functional languages (such as Haskell, ML, Dylan, Ocaml, and Scheme) are similar, but they place more emphasis on stating the internal (functional) relationships between programming objects (recursion, lists, and so on). Our daily lives (at least in terms of narrative quality) provide no direct simulation of the programming constructs of these languages. However, for problems that can be described in these languages, declarative descriptions are far simpler and less error-prone than imperative solutions. For example, consider the following system of linear equations:
Listing 1. Sample linear equation system


10x + 5y - 7z + 1 = 0
17x + 5y - 10z + 3 = 0
5x - 4y + 3z - 6 = 0

This is a pretty simple expression that describes several relationships between objects (x, y, and z). In real life you might find these answers in different ways, but actually using pen and paper to "solve x" is annoying and error-prone. From a debugging point of view, writing the solution steps in Python is probably even worse.

Prolog is a language closely related to logic or mathematics. With this language, you simply write statements that you know are correct, and let the application produce the results for you. Statements are not formed in a particular order (no order, as in linear equation 1), and you (the programmer or the user) have no idea what steps are taken to produce the result. Such as:
Listing 2. family.pro Prolog sample


/* Adapted from sample at:
<http://www.engin.umd.umich.edu/CIS/course.des/cis479/prolog/>
This app can answer questions about sisterhood & love, e.g.:
# Is alice a sister of harry?
?-sisterof( alice, harry )
# Which of alice' sisters love wine?
?-sisterof( X, alice ), love( X, wine)
*/
sisterof( X, Y ) :- parents( X, M, F ),
          female( X ),
          parents( Y, M, F ).
parents( edward, victoria, albert ).
parents( harry, victoria, albert ).
parents( alice, victoria, albert ).
female( alice ).
loves( harry, wine ).
loves( alice, wine ).

It is not exactly the same as the EBNF (extended bacos normal form, Extended Backus-Naur Form) syntax statement, but it is substantially similar. You can write statements like this:
Listing 3. EBNF sample


word    := alphanums, (wordpunct, alphanums)*, contraction?
alphanums  := [a-zA-Z0-9]+
wordpunct  := [-_]
contraction := "'", ("clock"/"d"/"ll"/"m"/"re"/"s"/"t"/"ve")

If you come across a word and want to say what it might look like, but don't really want to give a sequence of instructions on how to recognize it, this is a neat way to do it. Regular expressions are similar (and in fact can meet the needs of this particular syntax product).

For another declarative example, examine the document type declaration that describes the valid XML document dialect:
Listing 4. XML document type declaration


<!ELEMENT dissertation (chapter+)>
<!ELEMENT chapter (title, paragraph+)>
<!ELEMENT title (#PCDATA)>
<!ELEMENT paragraph (#PCDATA | figure)+>
<!ELEMENT figure EMPTY>

As with the other example 1, the DTD language does not contain any instructions on how to identify or create a valid XML document. It just describes what the document would look like if it existed. Declarative language is subjunctive.
Python as the interpreter vs Python as the environment

The Python library can take advantage of declarative languages in one of two distinct ways. Perhaps a more common technique is to parse and process non-Python declarative languages as data. An application or library can read in an external source (or an internally defined string that is only used as "blob") and then indicate a set of imperative steps to perform that are in some form 1 to those external declarations. In essence, these types of libraries are "data-driven" systems; There is a conceptual and conceptual difference between a declarative language and the operations that an Python application performs or exploits its declarations. In fact, a fairly common point is that libraries that handle those same declarations are also used to implement other programming languages.

All the examples given above belong to the first technique. Library PyLog is an Python implementation of the Prolog system. It reads the Prolog data file like the sample, and then creates the Python object to model the Prolog declaration. The EBNF sample USES the specialized variant SimpleParse, which is an Python library that converts these declarations into a state table that can be used by mx.TextTools. mx.TextTools itself is an extended library for Python, which USES the underlying C engine to run the code stored in the Python data structure, but has little to do with Python per se. Python is an excellent adhesive for these tasks, but the language that sticks to it is very different from Python. Also, most Prolog implementations are not written in Python, as most EBNF parsers are.

DTD is similar to other examples. If you use a validating parser like xmlproc, you can use DTD to validate the dialect of the XML document. But the language of DTD is not Python; xmlproc only USES it for data that needs to be parsed. Moreover, the XML validating parser has been written in many programming languages. The XSLT transformation is similar and is not Python specific, and modules such as ft.4xslt only use Python as the "glue".

While there's nothing wrong with the above approach and the tools mentioned above (which I've been using for a long time), Python itself would be more subtle and in some ways clearer if it were a declarative language. If nothing else, a library that helps with this will not allow a programmer to write an application in two (or more) languages. Sometimes, relying on Python's introspection capabilities to implement "native" declarations is both simple and useful.

The magic of introspection

The parsers Spark and PLY let users declare Python values with Python, and then use some magic to configure the Python runtime environment for parsing. For example, let's look at the PLY syntax equivalent of the previous SimpleParse syntax. Spark is similar to this example:
Listing 5. PLY sample


tokens = ('ALPHANUMS','WORDPUNCT','CONTRACTION','WHITSPACE')
t_ALPHANUMS = r"[a-zA-Z0-0]+"
t_WORDPUNCT = r"[-_]"
t_CONTRACTION = r"'(clock|d|ll|m|re|s|t|ve)"
def t_WHITESPACE(t):
  r"\s+"
  t.value = " "
  return t
import lex
lex.lex()
lex.input(sometext)
while 1:
  t = lex.token()
  if not t: break

I have written about PLY in my forthcoming book Text Processing in Python, and about Spark in this column (see resources for a link). Without going into the details of the library, you should note here that it is the Python binding itself that is configured for parsing (in this case, lexical analysis/tokenization). The PLY module runs in the Python environment to act on these schema declarations, so it happens to know the environment well.

How PLY knows what it does on its own involves some very bizarre Python programming. At first, intermediate programmers will find it possible to pinpoint the contents of the globals() and locals() dictionaries. It would be nice if the declaration styles were slightly different. For example, the hypothetical code would look more like this:
Listing 6. Using the imported module namespace


import basic_lex as _
_.tokens = ('ALPHANUMS','WORDPUNCT','CONTRACTION')
_.ALPHANUMS = r"[a-zA-Z0-0]+"
_.WORDPUNCT = r"[-_]"
_.CONTRACTION = r"'(clock|d|ll|m|re|s|t|ve)"
_.lex()

This style is not badly declarative, and it can be assumed that the basic_lex module contains something as simple as the following:
Listing 7. basic_lex. py


def lex():
  for t in tokens:
    print t, '=', globals()[t]

This will result in:


% python basic_app.py
ALPHANUMS = [a-zA-Z0-0]+
WORDPUNCT = [-_]
CONTRACTION = '(clock|d|ll|m|re|s|t|ve)

PLY managed to insert the namespace of the import module using the stack frame information. Such as:
Listing 8. magic_lex py


import sys
try: raise RuntimeError
except RuntimeError:
  e,b,t = sys.exc_info()
  caller_dict = t.tb_frame.f_back.f_globals
def lex():
  for t in caller_dict['tokens']:
    print t, '=', caller_dict['t_'+t]

This produces an output similar to that given by the basic_app.py sample, but with a declaration using the previous t_TOKEN style.

The actual PLY module is even more magical than this. We've seen that tags named after the pattern t_TOKEN can actually be strings that contain regular expressions, or functions that contain regular expression docstrings and manipulation code. Some type checks allow the following polymorphic behavior:
Listing 9. polymorphic_lex


# ...determine caller_dict using RuntimeError...
from types import *
def lex():
  for t in caller_dict['tokens']:
    t_obj = caller_dict['t_'+t]
    if type(t_obj) is FunctionType:
      print t, '=', t_obj.__doc__
    else:
      print t, '=', t_obj

Obviously, the real PLY module can do more interesting things with these declared patterns than with the playable examples, but these examples demonstrate some of the techniques involved.

The magic of inheritance

Having the support library insert and manipulate the application's namespace everywhere enables a subtle declarative style. But in general, the use of the inheritance structure and introspection makes for better flexibility.

The module gnosis.xml.validity is a framework for creating classes that map directly to DTD products. Any gnosis.xml.validity class can only be instantiated with parameters that conform to the validity constraints of the XML dialect. In fact, this is not 10 points true; When there is only one clear way to "promote" a parameter to the correct type, the module can infer the correct type from a simpler parameter.

Since I've written the gnosis.xml.validity module, I'm inclined to think about whether its use is interesting in its own right. But for this article, I'll just look at the declarative style of creating validity classes. The 1 set of rules/classes matching the previous DTD sample include:
Listing 10. gnosis.xml.validity rule declaration


/* Adapted from sample at:
<http://www.engin.umd.umich.edu/CIS/course.des/cis479/prolog/>
This app can answer questions about sisterhood & love, e.g.:
# Is alice a sister of harry?
?-sisterof( alice, harry )
# Which of alice' sisters love wine?
?-sisterof( X, alice ), love( X, wine)
*/
sisterof( X, Y ) :- parents( X, M, F ),
          female( X ),
          parents( Y, M, F ).
parents( edward, victoria, albert ).
parents( harry, victoria, albert ).
parents( alice, victoria, albert ).
female( alice ).
loves( harry, wine ).
loves( alice, wine ).

You can create instances from these declarations using the following command:


/* Adapted from sample at:
<http://www.engin.umd.umich.edu/CIS/course.des/cis479/prolog/>
This app can answer questions about sisterhood & love, e.g.:
# Is alice a sister of harry?
?-sisterof( alice, harry )
# Which of alice' sisters love wine?
?-sisterof( X, alice ), love( X, wine)
*/
sisterof( X, Y ) :- parents( X, M, F ),
          female( X ),
          parents( Y, M, F ).
parents( edward, victoria, albert ).
parents( harry, victoria, albert ).
parents( alice, victoria, albert ).
female( alice ).
loves( harry, wine ).
loves( alice, wine ).

Note that these classes match the previous DTD very well. The mapping is basically 11; In addition to the need to quantify and interchangeably use intermediaries for nested tags (intermediary names are marked with leading underscores).

Also note that although these classes are created using the standard Python syntax, they also have an unusual (and more concise) feature: they have no methods or instance data. Define classes separately to inherit classes from a framework that is limited by the class attributes of single 1. For example, < chapter > Is the other tag sequence, namely < title > It is followed by one or more < paragraph >The tag. But all we need to do to ensure that the constraints are observed in the instance is declare the chapter class in this simple way.

. Write like gnosis xml. validity. Seq parent program mainly involved in the "skills", is the research instance during initialization. __class__ properties. Class chapter does not initialize itself, so call its parent class's s s 249en__ () method. But the self passed to the parent class s 250en__ () is an instance of chapter, and self knows chapter. In order to illustrate this 1 point, the following is a list of some gnosis. xml. validity. Seq implementation:
Listing 11. Class gnosis. xml. validity. Seq


/* Adapted from sample at:
<http://www.engin.umd.umich.edu/CIS/course.des/cis479/prolog/>
This app can answer questions about sisterhood & love, e.g.:
# Is alice a sister of harry?
?-sisterof( alice, harry )
# Which of alice' sisters love wine?
?-sisterof( X, alice ), love( X, wine)
*/
sisterof( X, Y ) :- parents( X, M, F ),
          female( X ),
          parents( Y, M, F ).
parents( edward, victoria, albert ).
parents( harry, victoria, albert ).
parents( alice, victoria, albert ).
female( alice ).
loves( harry, wine ).
loves( alice, wine ).

Once the application programmer attempts to create an instance of chapter, the instantiation code checks to see if chapter has been declared with the required._order class property, and to see if the property is the desired tuple object. The.validate () method does a step forward check to make sure that the object used to initialize the instance belongs to the corresponding class specified in._order.

When the statement

Declarative programming styles are almost always more straightforward in terms of declaring constraints than imperative or procedural styles. Of course, not all programming problems are about constraints - or at least that's not always a natural law. But if rules-based systems, such as syntax and inference systems, can be described declaratively, then their problems are easier to deal with. Imperative validation of grammatical correctness can quickly become a very convoluted "spaghetti code" (spaghetti code) and difficult to debug. The declaration of patterns and rules can still be simpler.

Of course, at least in Python, validation and enhancement of declarative rules always comes down to procedural checks. But this kind of procedural checking is appropriate in well-tested library code. Individual applications should rely on simpler declarative interfaces provided by libraries such as Spark or PLY or gnosis.xml.validity. Other libraries, such as xmlproc, SimpleParse, or ft.4xslt, can use declarative styles even though they are not declared in Python (Python certainly applies to their domain).