Saturday 1 June 2019

Where to start? About code complexity

While creating an app that would calculate the expected value of several options, I wrote several functions that cleaned and preparet the input so the relevant numbers could be calculated. It turns out that this simple task created a small lesson in a possible cause of code complexity, or at least confusion.

Bakground: The original input was a text file. The text file was cleaned (delete comments and so on) before another functions broke it all down into rows in a list, and there was a function that translated the whole text file into a dictionary strucure. The values in the dictionary was further processes to an expression that could be evaluated by the computer, and this was evaluated to a number. Then there were functions dealing with how to analyse the numbers, quanitfy the uncertainty, draw graphs, compare different alternatives and much more.

One way of approaching this, was to view thw functions as a long line where many of the functions would take the output of the previous function as input.

process_text --> ceate a dictionary -> fix dictionary -> get_numbers -> analyse

The create_dict functions might require a preprocesses string  to work, or it may take the text file itself and internally process it.

Unfortunately I switched back on forth on this. Sometimes the functions I created would work with the original text file as input - and just processs the fileit it wasnot processed already. Sometimes I created functuons that required a specific input that was already trnasformed using other functions.

This quickly became a little complex and difficult to remember.

Why did I do it both ways? I think there were at least two reasons. First: To increase the speed of the code. Sometimes the functons were used repeatedly and it seemed wasteful to require multiple transformations. Second, it just seemed intuitive that internal helper functions could take variouos inputs while the user facing API would be more consistent (always taking the model as a string as input).

But I was wrong. Or at least I think I was wrong.  I shuld have used the same input in most places (the model as a string). It would be a lot easier to remember. The alternative required me (and, God forbid, others!) maintain a mental book about how processed the input of different functions had to be before the output was OK. It did not scale well, in terms of the cognitive load it placed on the programmer.'

Of course, it might have been better to rewite the whole thing and avoid some of these problems. But it might be a general problem that cannot be avoided:

the functions could also be viwed as  more stand alone units that should take the original text as input, and give the result back.