dataHans: 2019

Saturday, 1 June 2019

Where to start? About code complexity

While creating an app that would calculate the expected value of several options, I wrote several functions that cleaned and preparet the input so the relevant numbers could be calculated. It turns out that this simple task created a small lesson in a possible cause of code complexity, or at least confusion.

Bakground: The original input was a text file. The text file was cleaned (delete comments and so on) before another functions broke it all down into rows in a list, and there was a function that translated the whole text file into a dictionary strucure. The values in the dictionary was further processes to an expression that could be evaluated by the computer, and this was evaluated to a number. Then there were functions dealing with how to analyse the numbers, quanitfy the uncertainty, draw graphs, compare different alternatives and much more.

One way of approaching this, was to view thw functions as a long line where many of the functions would take the output of the previous function as input.

process_text --> ceate a dictionary -> fix dictionary -> get_numbers -> analyse

The create_dict functions might require a preprocesses string to work, or it may take the text file itself and internally process it.

Unfortunately I switched back on forth on this. Sometimes the functions I created would work with the original text file as input - and just processs the fileit it wasnot processed already. Sometimes I created functuons that required a specific input that was already trnasformed using other functions.

This quickly became a little complex and difficult to remember.

Why did I do it both ways? I think there were at least two reasons. First: To increase the speed of the code. Sometimes the functons were used repeatedly and it seemed wasteful to require multiple transformations. Second, it just seemed intuitive that internal helper functions could take variouos inputs while the user facing API would be more consistent (always taking the model as a string as input).

But I was wrong. Or at least I think I was wrong. I shuld have used the same input in most places (the model as a string). It would be a lot easier to remember. The alternative required me (and, God forbid, others!) maintain a mental book about how processed the input of different functions had to be before the output was OK. It did not scale well, in terms of the cognitive load it placed on the programmer.'

Of course, it might have been better to rewite the whole thing and avoid some of these problems. But it might be a general problem that cannot be avoided:

the functions could also be viwed as more stand alone units that should take the original text as input, and give the result back.

Sunday, 12 May 2019

Python extend and replace: A common mistake?

If you want to replace some characters in a string in Python, you can write

new_string=old_string.replace('OLD', 'NEW')

Note that the string itself is not modified in-place, i.e. the following would not modify the content of old_string:

old_string.replace('OLD', 'NEW')

With lists, however, it is different. Extending a list with another list works inplace i.e. old_list will be modified if you write:

old_list.extend(another_list)

In fact, if you will create a bug if you try:

new_list = old_list.extend(another_list)

In this case the content of new_list, somewhat surprisingly for many, becomes None.

There are good reasons for all of this: Object that are mutable can be changed in-place, while immutable objects cannot be changed in-place. Strings are immutable, lists are mutable. But for beginners who does not know this distinction, the difference may be very confusing and lead to bugs.

Not being an expert, one might ting that methods either always or never work in-place on the object by default. Instead you have to remember this on a case-by-case basis. Once you know more of the logic of "when and why" it becomes easier.