./admin-tools/make-op-tables.sh builds the JSON tables.
Also, environment variable MATHICS_CHARACTER_ENCODING can be used to set $SystemCharacterEncoding
and the initial value of $CharcterEncoding
.
Some adjustments to tests with DifferentialD were made so we use standard Unicode symbols not WMA unicode.
There was a bug in gstest from a prior commit in mathics-scanner where the "pre-scanner()" was replacing strings starting with "|".
I had a hard time understanding the flow of what's going on, and this process was both painful and made me a bit annoyed.
I was pleased though at @mmatera start to move the makeboxes into its own module, and we have a little bit of segregating boxing routines, so that I suppose helps a little.
It is not that the code isn't sophisticated or complicated. It is just that it is not organized and changes haven't been coordinated.
Let me describe a little of the history around this PR and the code I see, and reflect. The aim show of this reflection shows how the code has become of poorer quality, is in constant need of serious refactoring, and can get worse with uncoordinated bug fixes or feature improvements. Functionally the code does a quite a bit; but it is at the cost of lots of code with undue complexity. I assume that much of the complicated code is to ensure correct behavior.
However, where we find that the code is not correct or lacking, we have a hard time figuring out where to go and what to change that isn't going to impact a lot of other things.
mmatera notes a problem with infix operator behavior in the ascii-op-to-Unicode branch. I had sort of seen this when I wrote that branch initially. One thing that I noticed that was was wrong in my PR is that the place where the change occurs felt wrong: it is code inside a "MakeBox" rule. But this rule is defined on class creation or registering each of the infix operators. These rules are is never changed. Something like $CharacterEncoding can change at will, so this location is too static.
Therefore I had to give up (temporarily) on making this work for $CharacterEncoding. Instead, I used the less dynamic $SystemCharacterEncoding instead.
Looking over this in a second pass, I see now that a rule like this is boneheaded:
formatted_output = 'MakeBoxes[Infix[{%s}," %s ",%d,%s], form]' % (
replace_items,
operator,
self.precedence,
self.grouping,
)
default_rules = {
...
"MakeBoxes[{0}, form:InputForm|OutputForm]".format(op_pattern):
formatted_output
}
Do you see why? ...
Hint: it is in the " %s " part.
This is supposed to be a rule that is called when boxing infix operators. And already it is deciding that operators should be surrounded by spaces for "InputForm" and "OutputForm".
In other words, right here in part which dictates rules to Box Infix expressions, we are already making decisions about the low-level formatting: that there should be spaces around the operator and what characters to use to represent the operator.
And when we actually get to the low-level format to final string, we have lost structure. Basically we have violated the principle that you evaluate to get an M-expression, then you Box the result, and then after Boxing then a low-level formatting is done.
If we want to write dozens more forms, the above isn't scalable. And we make high-quality formatting harder.
At some level this was probably noticed, at least implicitly; when there are such few comments it is hard to know what was noticed and what was just random programming by reacting to problems that come up.
Since everything can't be done by MakeBox rules, we have do_format_xxx routines in mathics-core/mathics/core/formatter.py as well as special routines in builtin/arithfns/basic.py, builtin/pympler/asizeof.py, some that are done on inside MakeBox() internal functions.
In short, since principles if they were decided or defined, they were not communicated anywhere I can tell; so naturally formatting codes is scattering at many of places in the code at several conceptual levels. Possibly some of the code is redundant or worse, works cross purposes.
If discussion of high-level principle were lacking, so are basic description code. Things like docstrings on functions. Or making an effort to name what a function does. For example take the get_op() an interanal function inside evaluation method apply_infix() leaving aside the missing apply_ prefix that I have mentioned many times before.
What is get_op() "getting"? You already pass it some sort of operator. If you look at the function you realize it is not getting or accessing something, but rather converiting or formatting.
And then once you realize that you are then in a position to ask why is this routine formatting inside a Boxing routine? In the overall architecture isn't a separate step. Or should low-level formatting routines, be put together?
Vagueness in code, lack of description and discussion around the code has led to very haphazard code that, in the end result, does not seem fully thought out.
With all the effort spent in just figuring out what he code does; by the time I understand that, I am exhausted and often not in a mood to think about how it should be designed or whether it follows a design or how to best write this.
Let me come back to adding spaces around the operator and losing the operator structure (the " %s " part) one more time. Because of this, the "remedy" was put in place that makes things worse. A routine was added that scans strings and unconditionally changes some strings (e.g. ASCII-formatted operators) into Unicode characters. I doesn't matter if this was done by ignorance, willful frustration: in the end, it makes the code a bigger mess.
It is, as I say, like trying to solve a Rubik's cube by getting adjacent faces in line one at a time. In the beginning there is a certain satisfaction because you can "make progress". However getting to the end this way is much harder than understanding what is going on performing operations that work in conjunction with the principles and groups of the Rubik's cube.
Given all this, my inclination right now is to hold off on adding new Forms, or correcting the existing ones to be more correct. Instead if we just make things work they do but in a logical sensible and extensible way, I think this would buy us the most to then correct the existing behavior to match the specifications more closely and to add more Forms.