Friday, 10 June 2016

More about naming things: Consistency is king or allow intuitive exceptions to general rules?

To separate or not to separate?
Consider the following example where I use the pandas library to create a list of the unique ids for people with a specific disease.

ibd_patients = df.ids.unique().to_list()

This creates an error since there is no "to_list()" method. Instead it is called "tolist()."

My bad, but I kept making the same mistake several times. Why? I admit that I may be a bit slow, but there might be an additional cause: The pandas library often use underscore to separate the terms in methods (to_csv(), to_datatime(), to_numeric() and so on). Because "tolist()" is an exception to this general rule it becomes an easy mistake to make.

The lesson? Consistency is king! If you start splitting words using underscore, do so everywhere!

But, I hear myself cry, sometimes it is quite intuitive and easy to join the words. Can't we just have some exceptions?

Perhaps, but when I grant myself the right to exceptions in my own programming, I often regret it later. The key problem is that it is not obvious when an exception is intuitive. So the next day when I continue to write my code, I make silly mistakes, referring to a variable or a method with terms separated with an underscore that my old self believed was an intuitive exception.

So, no exceptions for me. They create mess since my future self have a different intuition than my current self, and tend to disagree on what an intuitive exception is.

Systematic exceptions?
Or? Maybe, maybe it is possible to create a system with some systematic exceptions? Here are some that I have tried:


  • no underscore when there are only two terms (so getlist() is OK), but use underscore when there are three terms, like "get_from_list()"
  • no underscore when using short terms like "get", "to" and so on
  • no underscrore between text and numbers. In this case the following name for a variable or a dataframe would be OK:


patients2015and2016 = ... (a list or a dataframe)
Instead of keystroke-challenged version:
patients_2015_and_2016

But in the end I keep failing and it seems like the only safe rule, at least for me, is to be consistent, however painful and non-intuitive it may seem in some individual cases.

Acceptable exceptions to the rule that consistency is king?
But wait, there are some exceptions that are common. For instance: lstrip, nunique, groupby and so on. As for lstrip and so on, the tend to be accepted, perhaps, because the first term is an abbreviation. Or just because it it so common.

The problems never end!
And as if this was not enough, I keep messing up with several other consistency problems:

- Should the name of an object with many elements always be in plural?

bad_year = [2001, 2002]

or

bad_years = [2001, 2002]

My feeling: Yes, but sometimes it feels unintuitive. In that case I remind myself:  Consistency is king!

The same problems occurrs in naming methods. For instance in the pandas it is easy to forget whether it is:

df.value_counts()
or
df.value_count()

since the plural naming scheme is not consistently implemented in all methods.

I have some mixed feelings about whether the following variable naming in a loop is good or bad:

for var in vars
for year in years

While I use it, it is very easy to confuse objects with so similar names when reading the code (only distinguished by a plural ending). On the other hand, it is sometimes very logical. My solution is to add a term to the plural:

for var in surgery_vars
for year in bad_years


- Should the type of object be indicated in the name or left implicit?
bad_year_list = [2001, 2002]
or just
bad_years = [2001, 2002]

My answer: The cost in terms of verbosity is not worth the benefit I occasionally get from knowing fast (by reading the variable name) how I should slice or index or get items out from the object. How I get information out depends on whether the object is a list or a dict, but it just becaomes tooo much if I have to do this for all objects and since consistency is king, I try to avoid it even if it would be useful sometimes.



No comments:

Post a Comment