Strange Data: 2015

Sunday, 11 October 2015

Do you have a higher probability of seeing a niqab at a citizenship ceremony or contracting a rare and terrible disease?

The niqab. It's become controversial lately.

Forget the argument that the Conservative party has been using this as a wedge issue to cynically pick off votes from the NDP and Liberals. Forget the argument that the debate about the niqab has been veiled in quasi-anti-Muslim rhetoric (if not full blown racism). Forget that several panels of judges including the federal court of appeal (one step below the Supreme Court of Canada) have ruled that banning the niqab at citizenship ceremonies is unlawful. Forget the argument that it is deeply ironic to accuse people of forcing a type of dress on a woman while at the same time deciding that the government can decide what are acceptable clothing standards for any woman in Canada.

Lets uncover some statistics here. How likely are you to see someone wearing a niqab at a citizenship ceremony? This is not an unimportant fact. A federal government has only so much political capital to spend on an agenda. When it makes decisions to address issues it means that it cannot address other, perhaps more important, issues. I mean it's taken the Conservative party almost 10 years to fulfill its overarching legislative agenda of destroying the research capacity of the public sector. I have to assume that's the main reason why it hasn't accomplished anything else.

I joke but the point is that it doesn't really matter what you put forward as a part of a political agenda. It all takes time to work itself through the legislative intestines of government. So let's put this debate in statistical context to see how important it really is. And the best context I know is medicine. What is the chance of seeing a niqab at a citizenship ceremony as compared to your chance of contracting a rare and terrible medical condition?

First, what is the risk of seeing a niqab at a citizenship ceremony? Well according to this CBC article a total of two people have refused to remove the niqab in order to become a citizen of Canada in the last four years. Now that doesn't seem like a lot but if we only had, like, ten new citizens over that four-year period that's a big deal. According to Citizenship and Immigration Canada though, the number of new Canadians over that period was more like 686,195. That number is a lot bigger than ten.

Biostatisticians like to use measures to document incidence of disease or how many new cases of a disease occur in a time frame. To keep things comparable we'll do the same thing here with the niqab. The way I'll measure incidence of the niqab and disease is in person-years, which is just a fancy way of giving the rate of new disease/niqab occurrences in a years stretch. Biostatisticians also don't like small numbers and so they usually multiply these rates by 100,000, which gives you the incidence of a disease/niqab occurring per 100,000 people.

In the case of the niqab, the risk of observing one at a citizenship ceremony is (2/686,195). This is an incidence rate of 0.3 niqabs per 100,000 person years.

Now, what's your risk of contracting a serious infectious disease? Well first, let's talk about three pretty serious infectious diseases. Human immunodeficiency virus or HIV has a long and deadly track record of killing people. It was first described in the 1980s after a mysterious decade where gay men in particular were coming into hospitals with severe unexplained immune deficiencies. In 2013 alone, over 1.5 million people died of HIV with most deaths occurring in the developing world. Thanks to advances in antiretroviral therapies and prevention, far fewer people are being infected with the disease than ever before (at least in the developed world). Nevertheless, the incidence of infection in North America is about 15 per 100,000 people years or 50 times the probability of seeing a niqab at a citizenship ceremony.

How about tuberculosis? TB is also, thankfully, becoming a rare disease after having a long and storied history. Before doctors understood that bacteria and other microorganisms made people sick (which wasn't that long ago), they described a disease called "pulmonary consumption". It was caused by an infection of the lungs that makes people cough like crazy and lose a lot of weight until they became emaciated. Albert Camus, Paul Gaugin, and Robbie Burns all had or died of TB and at one point "consumption" was considered bohemian, hence why Nicole Kidman had it in Moulin Rouge. Today about 25 people per 100,000 will get TB each year in North America. You are over 83 times more likely to get TB than see a niqab at a citizenship ceremony.

How about an even weirder disease. Late syphilis is a sexually transmitted infection and its known as the "great imitator" for its ability to look like many other diseases. It can strike up to 15 years after initial infection and can make you go crazy (neurosyphilis), sprout giant growths all over your body and face (gummatous syphilis), or burn out your heart (cardiovascular syphilis). It was all but eradicated after the introduction of antibiotics like penicillin, which is why you so rarely see anyone walking around with giant gummatous growths all over their face. About 5.5 people per 100,000 will get late symptoms of syphilis. You are still 18 times as likely to get late syphilis as you are to see a niqab at a citizenship ceremony.

Ok, so that was infectious diseases. What about cancers? Well the rate of a true malignant brain tumor is about 9.3 people per 100,000 population. The rate of getting pancreatic cancer (essentially a death sentence if you get it) is 8.8 per 100,000. Mesothelioma, a lung cancer that you get from working in an asbestos mine for most of your life, has an incidence rate of 0.98 people per 100,000 people.

I'm just going to start listing them now. Multiple sclerosis - a neurologic condition where your immune system attacks your own nerves - 3.6 cases per 100,000 people. Psoriatic arthritis - a disease that obliterates your joints and causes your hands to look like this - 6 cases per 100,000 people. Cholangiocarcinoma - a cancer that blocks up your liver and causes you to turn yellow - 1.5 cases per 100,000 people. Aortic dissection - where the main blood vessel in your chest splits and you bleed to death in a matter of minutes - 3 cases per 100,000 people. Multiple myeloma - a blood cancer that can eat away your bones - 4.5 cases per 100,00 people. Endocarditis - an infection of the heart valves associated with IV drug use - 15 per 100,000 people. Pheochromocytoma - a tumor that pumps out catecholamines and causes your heart rate and blood pressure to go through the roof - 0.8 cases per 100,000 people. Stevens-Johnsons syndrome - look it up, it's gross! - 0.7 cases per 100,000 people. I could go on but you get the idea.

Let me put this in another perspective for you. I'm becoming a family doctor. A family doctor should probably see about 25 patients per day. If I see 25 patients a day for five days a week over a 50-year period I should see about 300,000 patients. Keep in mind that it's usually the case that the same patients keep coming back so that 300,000 patient-years is a very high estimate. Using this as a benchmark, if I'm lucky I might see one or two pheochromocytomas in my lifetime as a physician. If I'm lucky I might see four or five cholangiocarcinomas. If I'm lucky I might see 20 or 30 cases of a brain tumor.

Just so I'm clear, this is not a defense of the niqab. I find the niqab anti-feminist. But I also do think that in an open society we treat people like adults and let them wear whatever they want. And if someone is forcing a woman to wear a certain type of clothing then we have domestic abuse laws to deal with those cases whatever the type of clothing may be. I'm flabbergasted that the same man who described the long-form census as a massive invasion of government into the private lives of Canadians is so willing to decide what clothes they wear.

But if you're not convinced by that classically liberal argument then maybe you'll be convinced by the statistics. On the list of priorities of Canadians, maybe it should go HIV, then brain cancer, and then the niqab. Then, after all of that, we can prioritize re-electing someone who stooped low enough to use intolerance and xenophobia as a way to get votes.

Wednesday, 30 September 2015

What happened the last time we let in 10,000 refugees?

Lyndon Johnson was not a very likable president. He came into office after a hugely popular president (who reportedly hated him) was shot. He was known to be rude and overbearing especially when he was the leader of the Senate and wanted to push legislation. He physically assaulted Lester B. Pearson. He effectively turned the Vietnam war from a minor regional police action into full-fledged quagmire.

He is also known as the American president that passed the Civil Rights Act of 1964, one of the most fundamental pieces of legislation in American history. It dissolved voter registration requirements and integrated schools at a time when "separate-but-equal" was something African-Americans experienced on a daily basis. As a result, all Americans could use water fountains, attend decent schools, and freely vote for the first time in the history of the United States. He did this by using his considerable legislative pull and knowledge to manipulate Congress in a process that would make Frank Underwood blush.

He also did this at a time when the Democratic party in the United States could still get elected in the south. Apocryphally, after signing the bill he turned to a staffer and said that he had single-handedly lost the south for the Democrats for a generation. Johnson was re-elected by a landslide in the 1964 election but his prediction was prophetic. As a result of signing the Civil Rights Act, Democrats had become endangered legislators in the southern United States by the 1990s.

Lyndon Johnson was many things, but amoral he was not. He knew his strong support for civil rights legislation came with a cost to his party and his own election chances. He did it anyway because it was the right thing to do.

Contrast that episode with how well this meatball has handled the issue of Syrian refugees this last month or so. If history judges Lyndon Johnson to be a bad president despite what he did for civil rights, how will history judge Stephen Harper?

Now, partisanship is unseemly and I try my best throughout my blog to avoid it. So in the spirit of tri-partisanship let me say that neither of these other two meatballs really seemed to care about the Syrian refugee issue until it became politically expedient to do so. That should tell you something about their character. But lets face it, it's pretty easy to crap all over politicians. On the ease-of-crapping-on meter, it goes Nazis, then murderers, and then it's a toss-up between politicians and fat-cat Wall street bankers.

So while we're being all self-righteous, let's remind ourselves that really, the reason why politicians didn't care is because Canadians didn't really care. It says something about our character that it took this horrible picture to ignite this issue. Before there was Aylan Kurdi, 2,500 other refugees died trying to cross the Mediterranean to get to Europe in 2014 alone. And since 2011, over 200,000 people, many of them children, have died in a brutal civil war in Syria.

And while I'm pointing the finger at all of you quite frankly, it's not like I've had the initiative to write about Syria or refugees or anything like that. Crowing about how I'm going to Moneyball a playoff hockey pool (I'm going to finish that series, I swear) and discussing my coffee consumption habits are not going to change the world.

Nobody comes off smelling good in this whole affair and it's a reminder that just because people are reduced to a number doesn't mean they're irrelevant. When refugees are described by a statistic they still matter. There's a lesson about empathy here when we don't pay attention to 200,000 deaths but we do to a single photograph.

This post has gone pretty Disney so let's get back to some hard-hearted statistics. And to compensate for that Kodak moment let's make this post extra boring by introducing some history. There have been a lot arguments against letting a large number of refugees into the country in very quick order. How will they integrate? Do we have to pay for them? Terrorists?

Luckily we have a reasonable historical test case of what would happen if we decided to accept a large number of refugees into Canada. Coincidentally, we also have Lyndon Johnson to thank for this.

We also have Richard Nixon to thank, which is a sentence I never thought I'd write. By early 1969 Nixon began withdrawing American forces from a war that Johnson had started because even Richard fucking Nixon could see that Vietnam was a loser. As a result of that decision and the comically inept performance of the American-trained South Vietnamese army, North Vietnam swept through the south and turned the whole place into just Vietnam.

In turning Vietnam into their version of Vietnam, the communist party decided that many people needed to think the way of North Vietnam. So they tried that old communist stand-by known as "re-education" which is a euphemism for hard-labor camps and beatings. In doing so they created a giant group of people trying to flee from Vietnam. This group of refugees became known as the Vietnamese boat people. On top of that, conflicts in Laos and Cambodia meant that millions of people (literally 3 million) were looking for a new home.

The Indochina refugee crisis lasted from 1975 when Saigon fell to about 1995. It peaked in 1979/1980 and during this period Canada took in about 15,000-25,000 refugees from Southeast Asia. By the end of 1985, over 100,000 refugees came to Canada from the region. The following time series reflects that with the huge rise in Southeast Asian immigrants in the 1975-1979 and 1980 periods. Just to add some context I've added the immigrant data for entrants from the UK and the USA. This allows some comparisons to a group of immigrants one might consider more like "old-stock Canadians".

So what happened to this cohort of Southeast asian refugees who came to Canada? The first available long-range data we have on them is probably contained in the 1981 Census which was a very good vintage (unlike the current 2011 National Household survey version of the census). The 1981 census asked about immigration to Canada and when these immigrants landed. They also asked about the amount and sources of their income. Now in this particular census the whizzes at Statistics Canada didn't provide any information on where in Asia any of these people came from, which means we can't see any in-depth information on Southeast Asian immigrants in particular. Nevertheless, below is a boxplot of the incomes of each cohort of immigrants from Asia, the USA, and the UK based upon the year they arrived in Canada. It's census data, so this is essentially a complete picture of the immigrants that arrived in Canada from Asia, the USA and the UK between 1971 and 1981 (because of weighting). It accounts for a sample of about 12,000 people. This is in complete contrast to the current 2011 NHS survey which couldn't count immigrants with Sesame Street's help (one immigrant, two immigrant, THREE IMMIGRANT, AH HA HA HA HA!)

The middle bar of each boxplot is the median or 50th percentile. The top and bottoms of the boxes are 75th and 25th percentiles respectively. The whiskers on each side are basically a measure to show outliers, which are any of the dots outside of those whiskers (it's technically a measure of 1.5 times the interquartile range).

What the above plot shows though is that at the height of the massive influx of immigrants from Southeast Asia there was a discrepancy between the incomes of immigrants from Asia and immigrants from the US and UK. The Asian cohort that arrived in 1980 did have measurably lower incomes than their American and UK counterparts. But that difference did not persist very much longer than 1979. Asian immigrants who arrived earlier than 1979 seemed to adjust to Canada and earned just as much as American and UK immigrants who arrived in the same year. Notably when you take a look at this same boxplot for just government transfers, Asian immigrants don't seem to receive any more government assistance than immigrants from the USA or UK at any point between 1971 and 1980.

But "So what?" you say. This is probably being driven by all of those rich immigrants from Japan and Hong Kong and Taiwan. "Touché", I respond: I can't exclude that possibility in this data. What I can do though is fast forward to the 1986 census (another good vintage in contrast to the vinegar-tasting 2011 NHS) and repeat the same exercise. In 1986, someone at StatsCanada clued in when they saw a bunch of Southeast Asian names on the surveys they were sending out and started asking questions about immigrants from Southeast Asia. So this particular sample is about 30,000 people from the 1986 census who immigrated at some point between 1971 and 1986. The benefit of this is that we can actually see how the incomes of the Southeast Asian cohort who landed in 1979/1980 adjusted as compared to their American and UK "old stock" counterparts.

And what this shows is that the the 1979/1980 cohort probably did a little poorer than their UK counterparts but about as well as immigrants from the US. Even Southeast Asian immigrants who came after 1980 are not all that worse off than the other groups of immigrants. Immigrants from Southeast Asia integrated pretty well and in very short order after coming to Canada.

There are a number of issues with comparing the experience we had with the Southeast Asian boat people and the current Syrian refugee crisis. The refugee samples are most certainly different. Say what you will about the North Vietnamese, they were able to keep a semblance of order in Vietnam after they took over. This is in complete contrast to Syria, which is currently doing its best impression of 1992 Somalia. In Vietnam, Cambodia, and Laos, it was mostly the intelligentsia trying to get out. Everyone wants to get the hell out of Syria.

People also worry a lot about cultural differences in the Syrian refugees but I don't really think this argument holds a lot of water. Syria, prior to the civil war, wasn't some backwards regime that didn't allow women to drive cars or show their faces or go to university (ring any bells?). The government is ostensibly secular (for those people trying to play the radical Islam card). I'm not defending any of the obviously horrible things that the government of Syria has done or did but I am saying that Syrians have grown up in an essentially modern middle eastern society. The culture argument is an overblown one.

Finally, we also should remember that when we took in 100,000 Southeast Asian immigrants it was the 1980s. Walkmans were just becoming cool. Ronald Reagan, a (pretty mediocre) ACTOR, became the most powerful man in the world. Canada basically couldn't change its own constitution until 1982. We were a lot poorer and a lot more uncertain of our place in the world. If we could accept 100,000 immigrants in that weird decade from a war-torn area of the world how can we not do the same today?

Sunday, 16 August 2015

Can your apartment get you pregnant?

This past month or so I've been pretty busy, hence the radio silence with respect to the blog. I moved to a new city, I got a new apartment, and I'm finally semi-employed.

On this latter point, my new job has a serendipitously coincidental title that relates to this post. I am now a family medicine resident at the University of Toronto. The etymology of medical resident is very old and harks back to medical training programs where they would work their medical learners for such long hours that they literally had to live at the hospital - ergo the term "resident" or "house staff". The first recorded use of these terms is in Edinburgh in 1744 at the Royal Infirmary where a clerk of the house or resident of the house carried out duties that resemble what medical residents do today. The first doctor who held this title got into a fight with the head of nursing at the hospital and was forced to resign as a result. The eternal war between nurses and residents has deep roots.

I am now a resident in Toronto in two separate senses but this was not so about about three months ago. Over the May long weekend I had the good fortune of hanging out with some old friends from grad school for an annual cabin weekend (I've been told that as a new Ontarian I now have to call it a cottage). Naturally, as economists, most of the conversations revolved around topics that would put most cocaine-enthusiast insomniacs to sleep. It was the most fun I've had in a long time.

Several people, including myself, were moving or planning on moving, and so for one night the conversation subconsciously gravitated to housing. Now, at the time, I was going to be a homeless person living in Toronto because I had yet to be organized enough to find an apartment. This procrastination went on until the day I arrived in Toronto when I managed to sign a lease. This was about four days before I started my medical residency. I then bounced the first cheque I gave my new landlord. It has not been an auspicious start to my life here. But at the time the conversation struck a chord because of my future lack of housing.

The topic that we were arguing about at the cabin (cottage) was housing for people who really are in need of housing. Not because they can't organize themselves out of a paper bag but because they truly can't afford it. There had been a recent article in the popular press about subsidized housing in Moose Jaw. The city provided social housing to the homeless and saw the costs of taking care of these people plummet. To add to this, somebody had been finishing up a research project on the homeless in southern Quebec. Much discussion ensued about whether housing really was the number one priority for the homeless.

There is some evidence on the impact of giving housing to the homeless and less wealthy. Recent preliminary data from a Canadian randomized control trial corroborates how improvements in health can come with housing the homeless who have mental illnesses. The Moving to Opportunity trial, which recently published results showed that there were significant reductions in obesity, diabetes and crime in persons who moved to better neighbourhoods in the United States. Interestingly, there was no improvement in the earnings of these movers, which suggests these effects were not mediated by income.

I can't run a randomized control trial except maybe on myself and that wouldn't get past my own ethics board because I don't really want to sleep on the street. But there is a decent amount of data at the neighbourhood level on the OpenData Toronto website on housing and in particular on housing insecurity.

In Toronto there are several programs that are administered by the city that are meant to prevent homelessness. Two of these are rent bank loans and subsidized housing. The former is a program that provides short-term bridge loans that are meant to be limited and eventually paid back. These are given when all other sources of rent help are exhausted and the nature of the program is meant to help get people back to a point where they can pay for their apartments themselves. When the number of applicants for this program go up in a neighbourhood, that neighbourhood has a higher level of short-term housing insecurity.

The latter program, subsidized housing, is meant as a pure transfer program. Rent is geared towards income, which is to say that the city does not expect you to pay a market price to use the apartment. It is designed to be a pure transfer, but because of this, there are long wait lists to get into subsidized housing. People applying for these programs know that in the long run they will need this housing because their earning potential for whatever reason will be low. In neighbourhoods where the wait list is high relative to the number of subsidized units there should be a higher level of long-term housing insecurity.

I'm going to focus on how housing insecurity affects one major health outcome: the teenage pregnancy rate. Of all of the health factors in the OpenData Toronto repository, this is by far the most interesting. I only have neighbourhood-level data for 2008 so this is a cross-sectional relationship, which should give some cause for skepticism. Nevertheless, it really is an interesting relationship.

First, what is the geography of teen pregnancy in Toronto? Another great tip I got on my cabin (cottage) weekend from a guy who works in economic consulting was that, outside of economics, nobody cares about anything other than the map. You can show them as many regression results or tables as you want but none of it is going to get through until you get to the slide with the map on it. So here is a map.

A little orientation: this is a map of Toronto broken into its neighbourhoods. The bottom of this map is lake Ontario - Toronto Island is just south of downtown at the very centre of the map - it looks like a small Japan. To the left of Toronto are the suburbs that extend along the lake (sort of southwest-ish) through Mississauga and eventually down to Hamilton. To the right is the same with Ajax and Oshawa. Above Toronto is what is adorably referred to as "rural Ontario" by Torontonians. Also up there are Markham and Vaughn.

This map shows the rate of pregnancy per 1,000 female teens (aged 15-19) for each neighbourhood. The top three neighbourhoods are Broadview North (77 per 1000), Beechborough-Greenbrook (70 per 1000) and Moss Park (59 per 1000). Broadview North is that dark area right at the centre of the map. Beechborough-Greenbrook is among those darker neighbourhoods at the Northwest corner of Toronto. Moss Park is closer to the downtown area, just north of Toronto island.

Now a map of the first measure: the number of rent bank applications in a neighbourhood. Like I said earlier, this should be a reflection of the short-term housing instability. This is also population standardized so that it's the number of people who applied for the rent bank divided by the population in a neighbourhood.

On this map, darker means a higher number of rent bank applicants. There is a cluster of high rent bank applicant neighbourhoods in the northwest as well as closer to Scarborough on the east side of Toronto. If you control for neighbourhood-level income and the number of health providers (a rough reflection of access to health care advice), a 10% increase in the number of rent bank applicants leads to an increase in the teen pregnancy rate by about 14%. Not huge, but statistically significant. Interestingly, the same effect shows up in the general female fertility rate. Higher prevalence of short-term housing instability is associated with higher levels of pregnancy.

Similar results have been seen with income in other more real studies. That is to say when teenage girls live in families on the lower end of the income spectrum, they tend to have more pregnancies. There is significant debate over why this is because it's not a causal relationship in the true sense of the word. Having less money does not hypnotize teens into having sex with one another. One explanation is that teens from disadvantaged families look out into the world and see little opportunity for any other kind of social advancement or life improvement. They usually have bad jobs, little access to education, etc. Having a kid is relatively easy and kids can be very rewarding. Just ask my parents. So teens in these situations have kids. But low income is a red herring variable that really reflects the lack of opportunity teens have. Housing instability could be the same way, so finding this result in short term housing instability makes some sense.

Next is the ratio of the social housing wait list to number of social housing units in a neighbourhood. This should be a reflection of a neighbourhood's long-term housing instability.

Again, darker means a higher ratio of social housing applicants to social housing units. And again, maybe there is a cluster in the northwest of the city, where some neighbourhoods have high numbers of people applying to social housing with few units. There are also some scattered neighbourhoods that have this high ratio. In the regression analysis though, the opposite relationship exists between this measure and teenage pregnancy. When the prevalence of long-term housing insecurity rises in a neighbourhood, the teenage pregnancy rate goes down after controlling for income and health care providers. A 10% rise in the prevalence of long-term housing insecurity leads to a decrease in the teen pregnancy rate by about 11%.

So what explains this relationship - why is it that short-term housing insecurity leads to higher teen pregnancy rates but long-term housing insecurity leads to lower teen pregnancy rates? I have no idea. Some speculation follows.

Let's consider that the relationship between long-term housing insecurity and teen pregnancy is wrong. In this scenario, the real relationship should echo the relationship between short-term housing instability and teen pregnancy. Hypothetically then, higher levels of instability should lead to higher levels of teen pregnancy.

First: is the fact that we see the reverse relationship the result of an omitted variable issue? Basically, did I forget to control for some confounder that should be in the regression equation? All regression equations are almost always misspecified in some way and there are certainly some reasons why, at the neighbourhood level, housing insecurity might be correlated with some omitted variable. If this is the case though, housing insecurity should be negatively correlated with whatever this variable is so that the relationship is underestimated.

One of the reasons I can think of (although I really hate this explanation) is a cultural one. In walking around Toronto, you get the feeling that expat communities tend to cluster in neighbourhoods. It may be that certain cultural communities tend to have differential rates of using social services and at the same time, have differential rates of teen pregnancy. As an example (just speculative), imagine that Portuguese or Italian Catholic neighbourhoods have high rates of teen pregnancy because of religious bans on contraception. At the same time, the religious community is very active in helping people find social housing when they need it. We would observe in the regression model that these neighbourhoods would have both high pregnancy rates and low wait times. As housing wait lists go down, teen pregnancy goes up, which in the regression model would be falsely ascribed to the housing insecurity rather than Portuguese culture. For what it's worth, I tried throwing in a bunch of variables controlling for culture or proxies of culture and nothing really changed the result.

Second, what if there's an element of reverse causality in this regression equation? This could come for a couple of reasons. It may be that teens who have kids or who are pregnant are less likely to apply for social housing. They might just move in with their parents. But alternatively it may be a reason related to city planning. Despite Rob Ford's best efforts, the city administration did not conduct city planning by throwing darts at a map of Toronto. They likely had a plan of how to deal with teen pregnancy as well as other markers of social upheaval. If they used the teen pregnancy rate as a way to allocate social housing to neighbourhoods, then places with high teen pregnancy rates would get first dibs at new social housing. This would lead to lower social housing wait lists and we would falsely observe that this caused higher teen pregnancy rates rather than the other way around.

Finally, what if this is a true relationship in the way that there is just something about long-term housing insecurity that means that teenagers get pregnant less? There are two major reasons I can think of why this would be the case - one is more depressing and the other is more uplifting.

First the depressing option. What we might be observing in this data is that the social situation of these teens is so bad that they can't even contemplate the option of having a kid because they really are that badly off. Teens who have short term housing insecurity might look out into the world and see that their situation is bad, but still ok enough to have a kid. Things might get better down the road and so having a kid is still an option. Teens who have long-term housing insecurity on the other hand are so badly off that having a kid is not even an option given the high investment that doing so requires. The data that I have can't really show that because the measures that I have are prevalence of housing insecurity in a neighbourhood rather than a continuous variable that shows how long the housing insecurity might be (i.e., a family might have housing insecurity for several months because of losing a job vs. a family that has housing insecurity that would be several years because of a family member that has a chronic health issue). You could think of this relationship like this though.

For an arbitrary amount of insecurity, deemed to be "short-term", teens make the decision to have a kid because things are not so bad. But once you go beyond a certain point, the relationship flips and things get progressively worse to the point that this dissuades teens from having pregnancies.

Alternately though, perhaps the function of the long-term social housing help has exactly the opposite effect. Teens who are in families who qualify for housing support have significantly more long-term options because of the fact that they are being supported and so decide to delay pregnancy to take advantage of now being able to go to college, get better jobs, etc. Social housing is a promise that support is coming and will be around for a while whereas the point of the loan is, in the words of George Clooney, "to ferry wounded souls across the river of dread until the point were hope is dimly visible. And then stop the boat, shove them in the water and make them swim." Loans are not a promise of long-term help, but social housing is and perhaps might be taken as a signal of better prospects.

I'll finish this post using the conclusion of almost every medical paper I've ever read. More research is required.

Monday, 1 June 2015

A brief and incomplete history of Winnipeg as told by its cemeteries.

When I was very young I had a fascination with cemeteries. We would drive out to see my grandma in rural Manitoba and I would insist that we stop at a cemetery that I knew was along the way to look at graves. I was a creepy, weird kid.

I grew out of that obsession with the dead and any lingering creepiness was quickly quashed in medical school. The only autopsy I saw was one of the more unpleasant things I have ever experienced. It was a biker who was hit by a truck and had been pancaked across the side of a road. I got pretty woozy, pretty quickly. Being able to see the pulped insides of a former person is not a great sales pitch for going into pathology but on the other hand a pathologists patients can't talk back to them. Even though the deceased may be silent, the dead do tell tales (contrary to what a pirate might say), and they can be pretty interesting ones.

The city of Winnipeg curates an open data set with all burials that have occurred in city of Winnipeg cemeteries going back to the 1880s. This data set includes about 116,000 burials in four separate cemeteries. The ostensible reason is for Winnipegers to be able to search their family members for genealogies and I can spot some people who look like they might be in my family tree. But in aggregate this burial record tells a little bit of the history of the city of Winnipeg since 1900. If you look closely you can also pick out events and trends that were affecting not just the city but the world as well.

Here's the aggregate burial time series. Its not standardized for population or anything like that. Its just a straight number of burials on a monthly basis.

The first three recorded burials occurred in October 1878 and they were all infants. Winnipeg continued to record a small number of burials in the period between 1880 and 1905 but the real explosion in burials begins around 1906. This rapid climb in burials correlates to a time when Winnipeg was considered a hip happening place. Due to its strategic location as a gateway to the rest of western North America the population exploded from 4000 people in 1879 to 160,000 people in 1916. The tail end of this boom is apparent in the cemetery data with the rapid climb in deaths from 1906 to 1914. Burials were increasing because so many more people were immigrating to Winnipeg.

It was at this time when Winnipeg was a booming metropolis. Anything that needed to be transported to western Canada went through the railroad in Winnipeg and the rail-yards in Winnipeg were the largest in the Commonwealth reflecting its status as a massive transportation hub. The city was known as the "breadbasket" for the British empire due to its humongous farming exports. Property in downtown Winnipeg was worth as much as downtown Chicago and the largest bank vault in the world at the time was built in Winnipeg. Charlie Chaplin came to perform and Union Station in downtown Winnipeg was designed by the same architects as those that designed Grand Central Station in New York.

It was also during this era when Winnipeg was essentially known as the Las Vegas of Canada. With this massive wave of immigration came prostitution, gambling, and lots of alcohol. This activity mainly centered around the neighbourhood of Douglas Point which continues to be an area of ill repute today. More than 50 brothels were sandwiched into this area, the unofficial red light district of Winnipeg. Public intoxication and prostitution topped the list of arrests in the city around this time.

With this massive influx of people came slums which were crammed full of recent immigrant families. In one neighborhood survey there were 120 families in 41 houses. Because of overcrowding and otherwise unsanitary conditions infant mortality in the North End of Winnipeg where most of these new-comers settled was double the rate in the more affluent portions of the city. The current rate of infant mortality in Canada is about 5 deaths per 1000 births. In the North end, at the turn of the century, it was about 250 deaths per 1000 births.

Then in 1914 everything went sideways for Winnipeg when the Panama canal was completed. This allowed anyone who wanted to ship goods to the west coast to shave weeks off of the trip and led to the cities decline. Of note in the time series around this period is the giant spike of over 300 burials in December of 1918. This is the second wave of the Spanish influenza, brought back by soldiers returning from World War I, clobbering Winnipeg.

The stagnation in economic activity and burials continued through the 1920s. Then the Great Depression hit the western World in the 1930s exacerbating Winnipeg's slide. Winnipeg's unemployment rate was the second highest in Canada and rather than pay out unemployment relief benefits, the city council elected to deport immigrants who were unemployed. Reflecting this in the data is a slow decrease in burials as people left and the cities growth stalled during the period of 1914 to 1935.

This was the trend until World War II when, as young men returned from military service, and started to have families the burial rate flips and begins to climb again. The burial number in Winnipeg then stabilized likely as a result of several factors including younger demographics, improvements in medicine, and slowish population growth.

This is by no means a complete history of Winnipeg but, lets face it, all of the cool stuff in Winnipeg happened before the 1950s. This history lesson of Winnipeg is brought to you by several sources but the one that is by far the most interesting and readable (and the one I steal the most liberally from) is a Free Press article series on the architecture and history of Winnipeg - found here, you should read it.

But the burial records also show a couple of interesting things for certain subgroups of people who died and were subsequently buried in Winnipeg. In particular, we can track infant deaths and military deaths in Winnipeg for the last century and these patterns are reflective of world-wide events.

Take infants for example. We can identify a lot of them in the burial data because many were unnamed when they were buried. It's likely that these are neonates who died very soon after birth as their first names are just recorded as infant or baby or some derivation of those two terms. Infant deaths in Winnipeg peaked in 1918 with the Spanish Flu and then declined to almost nothing by the 1970s. Over the last century there were huge investments made in public health and medicine in order to prevent infant deaths and this trend is evident in Winnipeg cemeteries.

As an aside, I also think this measured decline in infant burials might have come as a result of declining fertility over the century. There's only so many good names that you can give to your kids and if one of them dies that's one less name you can use for the next one. This doesn't matter when you only have two kids but it does when you have ten and three of them die and your last kid is stuck with the name Cletus. With smaller families, even if your kids dies, you can still name them and not be too worried about running out of good names (if I'm thinking about this correctly the marginal value of a name increases with each kid). This could have only happened with families having one or two kids which is the way things were going by the 1970s. Just a thought.

Military vets are another group we can pick out of the data because they're buried in special sections of Winnipeg cemeteries.

These military burials peak in about the mid 1950s which is about the time when the main cohort of World War I vets hit their 60s and early 70s. This is about the life expectancy of a male born at the time. The number of veteran deaths tails off until about the 1990s when a second, smaller peak occurs. This coincides with a time when the cohort of World War II vets hit their 60s and 70s and likely starts to die off.

Finally, a group of burials that I have a small direct connection with (I didn't kill them if that's what you're thinking). Below is a time series of the number of people who have been buried in a special plot reserved for persons who have donated their bodies to medical science.

One of the more essential parts of an undergraduate medical education is having human cadavers to learn anatomy from. Dissections have a long history that goes back to Greek and Roman physicians like Galen and Herophilos but unlike today these cadavers were usually the "donated" bodies of executed criminals. At the end of the Roman empire, the dissection of human bodies was prohibited which made it very difficult for any anatomical research or education to occur. Often physicians would steal bodies and dissect them illegally and in secret. This prohibition continued until the Renaissance when, once again, dissection was allowed on executed criminals in England. This supply of cadavers was often not enough to meet demand which led to further grave-raiding by physicians.

Today the supply of cadavers for anatomy education is provided by generous donors who volunteer their bodies after death. The donation program for the University of Manitoba began in 1932 and in 1952 it was thought that the donors should be recognized for their significant contribution to medical education and science. The University of Manitoba was the first university in Canada to do so. A monument exists for these people and every year a burial ceremony occurs where donor families and medical students are invited to celebrate their contribution. I went in 2012 and it was a very beautiful ceremony for a group of very giving patients.

Wednesday, 6 May 2015

Can I out-predict a bunch of yahoos with statistics? NHL Playoff Edition (Update 1)

Forecasting is hard. If I could do it well I wouldn't be picking an NHL fantasy team. I would be picking stocks. And I would be swanning around on my yacht in the mediterranean. With that excuse in mind how are things going in the play-off pool.

After the first round of play-offs I've come out ahead of the pack but nowhere near first place. That honour goes to team Rye followed by team Disaronno. I follow in a close third, ten points back from the front.

--------------------------------------

rank | Team Points

----------+---------------------------

1 | Rye 51

2 | Disaronno 43

3 | Strobes 41

4 | Rum 38

5 | Everglo 37

6 | Moonshine 35

7 | Absinthe 34

8 | Jack Daniels 33

9 | Jagermeister 31

10 | Beer 29

11 | Brandy 28

12 | Gin 27

--------------------------------------

But as Henry Jones Sr. would say "thersh no shilver medal for finishing shecond". I either place first or I lose. So based upon how things have been going so far how am I projected to finish? This would be the time for any Jets HR people to send me that contract and not finish reading this blog.

Turns out not well. I project forward how I expect points will evolve based upon a couple key pieces of information. First, half of all teams are eliminated from the playoffs each round. This reduces the number of players in the pool by roughly half each round that we go through. It also means that on average each person in the pool should see their points per round half each successive round. This is a basic geometric growth process that you might have seen in grade 11 math and it should be a fairly reasonable way to project forward how many points each fantasy team should get based on information from the first round.

In addition to this we also know which NHL players continue to be in the pool and which ones have exited. Even if a fantasy team continues to have ten of their players, if they're not particularly productive then it really doesn't matter that a majority of the fantasy team has not yet been eliminated. Conversely, if someone has only a few very productive players they might be able to milk a whole lot of points out of them even though their fantasy team has been eliminated. Team Rye finds themselves in this latter situation where they only have four players after the first round but they're unfortunately very productive players.

So this next assumption that I make is that, for the second round, each team will get the same amount of points that they got in the first round, minus the points of the players that were eliminated in the first round. This amount then halves for each additional round. In math terms it looks like this:

Total projected points are equal to the already earned points in the first round, Xp, plus the projected points from the remaining rounds. Xp-n is a term for all of the points earned by players in the first round who were eliminated and therefore cannot earn any more points. Taking into account that half of all NHL players will be eliminated, the difference is then divided by a geometric factor as the rounds of play-offs progress.

This is how everyones points are projected to evolve.

And this is the final rank estimation based upon this projection.

------------------------

Rank | Team

----------+-------------

1 | Disaronno

2 | Rye

3 | Jack Daniels

4 | Gin

5 | Everglo

6 | Strobes

7 | Jagermeister

8 | Rum

9 | Beer

10 | Absinthe

11 | Moonshine

12 | Brandy

------------------------

This bodes poorly for my fantasy NHL team. So what went wrong here? In looking back at the regression results from my original model a couple of things leap out. First, the model put a big emphasis on goaltending. The model estimated a large coefficient for the save percentage term but it didn't do it very precisely. This is the root cause of the model predicting Montreal to do well. It overemphasized the great season that Carey Price had. On the back of this I picked two Montreal players. The remaining variables are also not all that precisely estimated which should induce a lot of uncertainty in the predictions.

The second thing that went wrong was idiot human error in the form of my own stupid picks. Despite saying that I would only pick players from teams that had a good chance of advancing in a series I picked two players from Nashville. I probably should have avoided the Nashville - Chicago series where the betting markets basically predicted a toss-up.

But all is not lost. As of writing this I am in decent position with a ten point gap between me and the next fantasy team. I've kept pace with teams Rye and Disaronno. There are a couple of reassuring reasons why these projections might be wrong. First, I bet the heaviest on Anaheim by picking up three of their players. They then proceeded to destroy Winnipeg in very short order (don't deny it, it happened) but that meant that the series only went four games. Fantasy teams including team Disarrono bet heavily on Chicago where the series went to seven games and his players had three additional games to get points. The Minnesota - Chicago series does not look like it is going seven games. Besides this, team Rye has three of his remaining players on Minnesota and team Disarrono has five players on Chicago. One of them is very screwed after this round.

Sunday, 3 May 2015

How many doctors become disabled?

Irrelevant statistic to this post: I watched the Pacquio-Mayweather fight last night and my favourite statistic was that the total take from the fight was greater than the gross national incomes of the 29 poorest nations.

As medical school winds down several things occur. The snow starts to melt. The rotations on various medical services become devoid of fourth years. Medical students become lazy and relaxed before the long slog of residency starts. And companies looking to vampire as much money out of a group of future one percenters scuttle out of the woodwork looking for a piece of the financial action.

Over the last couple of weeks my classmates and I have been subjected to a parade of slick, bespoke suit wearing, banker types who all gave presentations on how much money we were all eventually going to make. The bankers gave pitches that were all subtle variations on "you're all going to be stinking rich. Now let me take some of that money and make you even more stinking rich and myself stinking rich too!" And people wonder why medical students get such big egos. On top of that, every one of them brought us a "free lunch" to go with the presentation. I technically have two degrees on the topic of free lunches so I put that phrase in quotes for a reason.

A lot of the pitches were on the importance of navigating tax loopholes. Essentially you pay these guys so that you don't have to pay the government even more. But in addition to this, almost all of them pitched disability insurance to us. The line was that the greatest earning potential was "in our brains" and that we should "protect it" by insuring it. Almost invariably, at some point during the pitch there was a story about some doctor who had got their hands chopped off in a random golfing/boating/gambling accident and who had filed for disability insurance the week before but the paperwork hadn't gone through yet and if only he had purchased disability insurance sooner! Now his kids hated him and his wife divorced him and he was an alcoholic all because he didn't have disability insurance.

Disability insurance has been pitched to me more than a couple of times during medical school, which automatically makes me sceptical of it. Every time some finance guy tells me about disability insurance, I get the feeling that he's looking at me as though I was transforming into a giant sack of money with a dollar bill on the front of it. Given that so many people want to sell medical students disability insurance, it must be a fairly lucrative field. Most medical students are young and don't have a lot of risks to become disabled in the next couple of years. I would also have thought that doctors in general have a pretty low risk of becoming disabled given the income they make and the lifestyles they lead. None of these bankers have ever given me any actuarial odds on how likely it was for me to become disabled at any point. They just told me the anecdote about the pathetic doctor without disability insurance. If doctors were getting their hands chopped off left and right, the bankers would have pitched disability insurance that way. Let's see if this actually stands up to statistical evidence.

Now a couple of things before I go into methodology here. First, I'm flattered that some people have mistaken my economics degrees as evidence that I have any real knowledge about how insurance works. The type of insurance that I learned about was more like this (scroll down to the appendix). Try applying any of this to real life. But there are two broad principles that leap out of insurance theory in economics. First, purchasing insurance is related to risk. What is the true risk of becoming disabled? This is really the focus of this post.

Second, purchasing insurance is related to how unhappy you are at accepting a certain level of risk. How risk averse are you and what do you have to lose if you don't purchase insurance? Most people are in some way risk averse but some are more so than others. I don't have any kids (that the legal system acknowledges), but if I did I would be a lot more worried about becoming disabled even if my true risk of becoming disabled was the same. Similarly, I'm a pretty relaxed guy but if I was an uptight worrier I might consider getting disability insurance to give myself "peace of mind".

I can give you something of an answer on the first reason to purchase insurance, but the second reason is all up to you. So you are to take none of what follows as sound financial advice. This is not "actionable" as my lawyers would say. It's up to you to decide whether you should get disability insurance.

So at what risk are you, fellow medical student, of becoming disabled during your career? It's actually much more difficult to answer this question because the public use labour force surveys don't identify doctors in the workforce. There are so few doctors that if you actually noted them in the national data you could start to pick out individuals with a couple of other key variables. To avoid this they don't identify doctors but they do identify people working in certain fields and in certain professions. They identify people with certain degrees. So for what follows, I haven't actually been able to identify any doctors but I have identified people who according to the data look very much like doctors. These are people with graduate degrees, who work in the health sector, and who are professionals in the health sector. This group of people does not include nurses, but it probably includes people who are pharmacists, dentists, chiropractors, physiotherapists, and other professional health care workers. I will be denoting this group with the abbreviation "PWLLD" (people who look like doctors).

This data is taken from the 2013 cross-sectional Labour Force Survey (LFS). The LFS is a monthly survey that evaluates trends in Canadian labour force participation and employment. It also identifies whether someone is unemployed or underemployed because of a personal disability or illness. This data is re-weighted using a frequency weight to provide nationally representative results.

The one major problem with my strategy here is that because I cannot identify doctors directly, this will necessarily include chiropractors or dentists or other such people who may be different in their respective probabilities of becoming disabled. I think the true probability of disability will be close to this, but there will most certainly be some error in this estimate.

I look at two major outcomes. First, what is the probability of being disabled or ill to the point where one has to work part time? Second, what is the probability of being so badly disabled or ill that one cannot work?

So what is the probability in this group of people who look like doctors of becoming disabled so they can only work part time? The following pie charts are estimated percentages of people who are in the labour force and their reasons for working part time. The chance that someone in the PWLLD group has to take part-time work because of a disability or illness is about half a percent. PWLLD do actually become disabled at a higher rate than the general population but it's not really that much higher. A full 83 percent of PWLLD are either not underemployed or consider themselves outside of the labour force (the "not applicable" category). Interestingly, PWLLD tend to work part time because of personal preference more so than the general labour force. An additional 5% of the PWLLD take time off as a personal preference and this fully accounts for the difference between the two groups in the "not applicable" category. They also tend to work part time more than the general population in order to care for their children. High salaries among the PWLLD may be the reason that these people can take part time work for personal preference or to take care of children.

Reason for working part time - People who look like doctors

Reason for working part time - General labour force

PWLLD do much better than the general population when it comes to job-ending illnesses or disabilities. Less than .02% of PWLLD get a job-ending disability or illness. The general population gets a job-ending disability or illness about 20 times this rate.

Reason for being unemployed- People who look like doctors

Reason for being unemployed - General labour force

My suspicions about the age of onset of a disability or illness for PWLLD are also confirmed. Of the PWLLD who get a work-limiting disability or illness and who have to work part time, about 90% occur after the age of 50. About 98% occur after the age of 40 (the red line represents PWLLD, the blue line represents all people with graduate degrees, and the green line represents the general population). I looked at job-ending illnesses and disabilities as well, but because there are so few PWLLD who get job-ending illnesses or disabilities, it's not really informative.

What age do people get disabilities/illnesses causing part-time work? Red line is PWLLD, Blue line is people with graduate degrees, green line is the general population.

So the reason why I think all of these financial companies are trying to sell medical students on disability insurance is two-fold. First, it's free money for the first ten to fifteen years of a doctor's life. The likelihood that an insurance company is going to pay out to any doctor is already low, and the chance it'll have to pay out to a 30 year-old doctor is exceedingly low. Second, it's a way for a company to worm its way into future doctors' financial affairs and reel them in for more lucrative financial services down the road. So I probably won't buy disability insurance just yet.

But if I ever get my hands chopped off in a random golfing/boating/gambling accident, I'm really going to be eating my words.

Friday, 17 April 2015

Can I out-predict a bunch of yahoos with statistics? NHL Playoff Edition

The answer is probably not. More on that later.

There comes a time in every male data analysts life where their testosterone builds to frightening levels and they have to apply their skills to over-analyze sports. For me, today is that day. The NHL playoffs began last night and for a Canadian econometrician who is male (most of them are) the playoffs are like Christmas to a kid or new-trailer-release day to a Star Wars fan. There's data and probability and forecasting a plenty. If you've heard of a statistics term it can probably be applied in some way to sports.

More importantly, econometric forecasting in the playoffs can be used to make money and even more importantly to take other peoples money away from them. And really, what else is there to life other than getting obscenely wealthy at someone else's expense? So at the last minute I managed to finagle my way into a group of guys who were in a pool for the playoffs. The buy-in was $20. Each guy drafted twelve NHL players and whoever gets the most aggregate points (ie. goals and assists for all of their players) wins the pot. Second place gets their money back.

I signed up for this draft on Monday afternoon and the draft itself was at 8PM that night. I dropped everything and rushed home to run regressions with great abandon. What follows is a sketch of my process and playoff drafting rules of thumb but I did this all in about three hours and so there is probably going to be some major problems with them. I didn't look at the current hockey forecasting literature, I didn't use any fancy stats, I didn't backtest the statistical model. I cut corners all over the place.

Now that I've effectively exonerated myself in case my team sucks, lets talk about what I did. The basis for my draft is a statistical model that predicted how many points a player will get dependent upon several factors. Its a multivariate ordinary least squares model that uses player and team data from Hockey-Reference.com for the years of 2012 to 2015. In addition, I pulled one other variable from a Las Vegas betting website on historical team odds for winning the Stanley Cup.

The general way I predicted how many points a player would get was to use variables from the regular season in my regression to see how well they predicted play-off performance in that year. The model would spit out an estimate for how much each variable predicted play-off point performance for the years of 2012-2014. I then used these coefficients to predict how players would do in the 2015 post-season using regular season data from 2015. This was the easiest way to make any predictions for this year as (obviously) we only have regular season statistics for 2015. I realize that previous play-off performance for each player would be useful but again, I was under the gun to get this done and I didn't have time.

Onto model selection. When I was thinking about what would accurately predict how a player would do I basically classified the variables into three major categories. First, is the individual players skill. Sidney Crosby is likely going to get a fair amount of points because he's not the average NHL hockey player. He's a superstar and, like many other superstar players, skill likely plays a large role in the play-offs even if your team is bad and you aren't expected to go far. A guy who gets 10 points in the first round but who gets eliminated from the play-offs is probably not a bad pick depending on when you get him. So for individual skill I used regular season points and time on ice. I've also included dummy variables for what position the player plays (ie. LW vs. RW vs. C vs. D)

The second group of variables that I included were team related variables. Sidney Crosby is also on a team with a number of very good players who can score and pass him the puck so he can score. A player's team is also important for how deep your drafted players will go in a play-off run. A team that can go deep into the play-offs will likely have more chances to get more points for their players. I assumed that these factors were in some way predicted by team offense and defense statistics. For these variables I've included number of regular season wins, goals for, goals against and average save percentage of the teams goalie for the team each player is on.

The final group of variables that I wanted to include were related to opponents that a team would be facing. Sidney Crosby and his band of merry hockey players are very good but they're also playing the Rangers who have been to the finals last year and who have had a very good regular season track record. This means that even though the Penguins are good they might not go very far. To control for this I added the Las Vegas betting odds that a team would win the Stanley Cup. Not only should the betting markets build into their price that a team is skilled and has a shot at winning the cup (which I've already attempted to control for with the second group of variables) but it should incorporate information on the opposition that they face in reaching the Stanley Cup.

In mathematical terms the model is basically this:

Post-season points for player i in year y are a function of all of the variables that I outlined above as well as an error term that is distributed iid. The betas are estimated using 2012 to 2014 data and then they are incorporated into an equation that uses 2015 data to predict a players 2015 post-season point totals.

Regression results here:

Source | SS df MS Number of obs = 648

-------------+------------------------------ F( 10, 637) = 30.18

Model | 4028.8757 10 402.88757 Prob > F = 0.0000

Residual | 8504.40208 637 13.3507097 R-squared = 0.3215

-------------+------------------------------ Adj R-squared = 0.3108

Total | 12533.2778 647 19.3713721 Root MSE = 3.6539

------------------------------------------------------------------------------

playpts | Coef. Std. Err. t P>|t| [95% Conf. Interval]

-------------+----------------------------------------------------------------

pts | .1181479 .0140399 8.42 0.000 .0905777 .145718

poscode |

D | -.6354497 .4383468 -1.45 0.148 -1.496229 .2253298

LW | .0316336 .4398774 0.07 0.943 -.8321514 .8954186

RW | -.4466971 .4306716 -1.04 0.300 -1.292405 .3990106

toi | .0006188 .0005999 1.03 0.303 -.0005592 .0017969

w | .0261791 .0601923 0.43 0.664 -.0920203 .1443784

gf | -.0043203 .0164696 -0.26 0.793 -.0366616 .028021

ga | -.0147689 .0120157 -1.23 0.219 -.0383641 .0088262

svper | 73.9586 23.63515 3.13 0.002 27.54636 120.3708

odds | -.0322589 .0166924 -1.93 0.054 -.0650378 .0005199

_cons | -64.19289 21.57956 -2.97 0.003 -106.5686 -21.81722

------------------------------------------------------------------------------

The predictive model kicks out this list of players in ranked order from most expected points to least expected points. This was the backbone for how I chose my twelve players. A couple of things about this list. First, the top twenty or thirty players on this list kind of make sense. This is reassuring that the model is making semi-decent picks. But anyone can pick a lot of these players off of a list of top performers in the playoffs. The real utility in the model will be in the middle picks (a Lars Eller or Brent Seabrook type pick) where it might take a more discerning choice to make or break a fantasy hockey team. Theres a total of 144 players that were picked by the pool so not all of them can be a Sidney Crosby or a Steven Stamkos where point production is assured.

Second, this model was not designed to pick a Stanley Cup winner but it does show some interesting things at the team level. If you graph the average rank of the players on the list by team (which in turn should rank the average number of points that each team should expect per player) this bar chart pops out. This shows that Montreal players have the highest average predicted points among all teams in the playoffs this year. The next tier of teams is St. Louis, NYR, Anaheim and Nashville. I would expect these five teams to be the favourites for the cup. At the other end are Calgary, Ottawa, and Winnipeg. Now this does not mean that these teams won't advance. I can think of scenarios where all the games they play are defensive battles without a lot of scoring. But especially since the model predicts high point totals from Montreal and Anaheim I would expect both Winnipeg and Ottawa to be out fairly soon. That being said, it's the playoffs and anything can happen. Also my model might be awful. Don't send me hate mail Winnipeg.

So this ranking list was the basic tool I used to make draft picks but in addition to this I also used two heuristics that I stole from finance. Full disclosure, I have never taken a real finance course in my life but I have taken a couple of macroeconomics courses and a financial risk analysis course (which was basically a primer on how not to tank the macroeconomy). I did however learn three things about finance from these courses. First, finance is so boring that you have to pay people a truck load of money to get them to do it. This is the least applicable lesson to this blog post (but maybe the most applicable lesson to life in general). Second, groups of people usually make more accurate decisions than individuals (a weak version of the efficient market hypothesis). Third, diversify, diversify, diversify. I use the analogy of a stock portfolio here because it's relevant. If you only invest in one stock, as opposed to several, you are likely to have returns that are highly volatile. They may be very high but they also may be very low. Similarly, if you pick players from one team, the team may go far but it may also flame out in the first round. Diversifying picks, just like stocks, is the least risky strategy.

These later two lessons are the basis for my heuristics. First, where the list identified two players who had similar predicted points I picked the player on a team that had better odds of progressing past the first round. This was according to the betting odds that I got from the Las Vegas bookie website.

Second, where I had two similar players, I would pick a player who was not on a team of a player that I had already drafted. In essence I made sure that I didn't have more than three players from the same team. This was to avoid tanking my team in one fell swoop if an NHL team with a whole bunch of my players exited the playoffs.

This resulted in the list of drafted players here. Its a solid list and includes three out of the top five ranked players and five out of the top 15. It did however, suggest some weird players like Mike Ribeiro.

It was interesting to watch the other members of the pool pick players because it was clear that most of them were also following their own little rules of thumb. Most of them followed a similar heuristic to my first, i.e. pick the players on the consensus favourite teams. This has the obvious benefit of allowing your players to go deep and will likely get you more points.

The heuristic that they didn't follow though was my second one. Most of them picked a team they liked and thought would go far and loaded up on players from that team. Eight of the twelve members have at least 50% of their players on two NHL teams. This is a high risk, high reward strategy, and it's probably the reason why I won't win the overall pool. Out of the group of yahoos, on average one of them is going to pick the Stanley cup winning team. Their players will go far in the playoffs and earn a lot of points. The rest will flame out in spectacular fashion once their team of choice gets knocked out of the playoffs.

For what its worth though here is the average rank of the players on each team by predicted points. To protect identities (and in honour of Alcohol Awareness month) I have coded the names of the pool members except for me.

If you believe my little model then I'm ahead both on the average rank of player chosen and the predicted number of points. But not by that much in the case of team Rum and team Rye. Interestingly (although I'm not sure if this was their strategy) team Rye and team Absinthe also diversified their picks as much as I did.

So I'll try and give updates on the pool rankings over the four rounds of play-offs not only to see how the model is doing but to publicly brag or mourn depending on results.

Also if the Jets want to give me a sweet seven figure salary, shoot me a message and I can tell you guys how badly you'll lose in a private venue.