Monday, 18 March 2013

Web development tutorials, from beginner to advanced | Nettuts+

This was quite good:

Web development tutorials, from beginner to advanced | Nettuts+:

For instance:

Python

And more specifically (a good beginners intro):

Object Oriented Programming

'via Blog this'

Polar predictions and hierarchical models

I was recently asked to predict how many of the states which will implemented the Medicaid expansion by 2016 and 2020. The question led to a reflection about the nature of predictions in which the extremes are more likely than the middle. The probability of many states having signed up in 2016 is high given the level of federal subsities offered to the states (100% for the first three years, then 90%). On the other hand, there may be political costs (seemingly accepting the ACA may have a political cost) and worries about the credibility of the promised federal funding. In addition, however, the presidential election in 2016 may drastically alter the system. So, I find myself believing that in 2020 there is a high probability that very many or very few (none!) of the states will be enrolled in the expansion. Exacly how would I derive the probabilities?

The answer is that a hierarchy of distriution and beliefs must be aggregated. We have beliefs about a democratic vs. republican vicory in 2016. We also have beliefs about the extent to which the Republicans will change different aspects of ACA. Then there is the risk of a financial crisis and renegotiation of the terms. Taken together this may lead to a polar prediction distribution - with fat end points and little in between.

So what? Well, it may be obviuous, but still one ofte thinks about probabilities as monotonically increasing. If 36 is most likely, then 35 is quite likely and 0 is very unlikely. The example reminds me that this intuition is wrong. It is perfectly possible that the extremes have high probabilities.

Weight measurement, uncertainty and statistical significance

During a demonstration of the electronic health records in a VA hospital, we were shown a graph of the weight of the patient at different visits. A participant then asked whether it would be possible to make the program also indicate whether a change in weight was statistically significant. Initially I thought it was a bad idea for several reasons: First of all, the weight of a patient is not associated with the same type of uncertainty we have when we draw a random sample from the population in an in an opinion survey or similar samples. The measured weight of the persion is the weight of the person. OK, there might me measurement errors, but that is a different kind of uncertainty and the error associated with the scale should not be large in a hospital.

Discussing this with a friend, however, nuanced my view a little. The weight of a person might differ depending on whether the person has just eaten and so on. This means that if the interesting parameter is"the average weight within a short time period" and not the "weight right now",  then there current measurement will be drawn from a distribution. Both could be relevant. Weight right now may be more relevant for dosage to be used right away, weight "in general" would be more relevant to assess weight loss or proper dosage of drugs over the short term.

However, the uncertainty from weight variation will still not be the same as the sample uncertainty we get when we draw individuals from a large population. Instead, it seems that we should use the knowledge we have about how much the weight might plausibly vary during a day to model the uncertainty. Measurements righ after a meal might, perhaps, increase the weight by 1 kg. After exercise and no drinking, the weight might be 1 kg below average. As a first approximation one might assume that the weight of the person is drawn from a normal distribution with 0.5 kg as the standard deviation so 95% would be within +/- 1 kg.  

Still, this solution seems far from perfect. The weight of a person will have sudden spikes (meals, exercise, bathroom, drinks) and measurements are not equally likely to be taken at every point during the day. Now, there is a difference between the individual and the aggregate, but I still worry a little about the spikes and timing. I am not sure how much it matters, so it may just be a theoretical worry, but before I am willing to ignore it it would be good to try it out.

How? A Bayesian model using R and JAGS might be be used to model the distribution of measured weights relative to actual "average weight" depending on different assumptions about the distribution. I do not know hoe to model spikes like the ones we would observe with weight, but it could be an intersting exercise. 

Of course, one might just avoid the whole problem by arguing that there is little point in adding or changing a graph that is easily visualized and continous into something - statistical significance -  that is discrete and numerical. I agree. In this sense the example simple reveals how standard classic intuitions about statistics lead to wrong demands.

Reference
http://journals.cambridge.org/action/displayAbstract?fromPage=online&aid=885300