October 27, 2013

Erik’s final night as a single man.

Tonight’s secret identity for the wedding.

October 20, 2013
October 7, 2013

I spent most of high school writing, practicing, and performing music. I played guitar in two separate bands, and was the lead vocalist in one of those bands, and played trumpet in various wind ensembles and the jazz band at school. When I wasn’t a part of the creation process myself, there is a pretty good chance I was listening to music. Back then, it seemed trivial to find a new artist or album to obsess over.

Despite being steeped in music, I have always found it hard to write about. The truth is, I have limited ability to use words to explain just what makes a particular piece of music so wonderful. Oh sure, I could discuss structure, point out a particular hook in a particular section and how it sits in the mix. I could talk about the tone of the instrument or about quality of the performance or any number of other things. The problem with this language is it reduces what is great about this piece of music to a description that could easily fit some other piece of music. Verbalizing the experience of music projects a woefully flattened artifact of something breathtaking.

Now it might seem that recorded music has greatly diminished this challenge. After all, the experience of recorded music can scale– anyone can listen. Unfortunately, I found this to be completely untrue. When I play music for other people, it actually sounds different than when I experience it for myself. Little complexities that seem crucial to the mix seem to cower and hide rather than loom large in the presence of others. It is not really feasible to point out what makes the song so great while listening, because it disrupts the experience. Worst of all, no one else seems to experience what I experience when I listen.

Of course, all of this may seem obvious to someone who has read about aesthetics. I have not.

September 22, 2013
September 16, 2013

How do we calculate student mobility? I am currently soliciting responses from other data professionals across the country. But when I needed to produce mobility numbers for some of my work a couple of months ago, I decided to develop a set of business rules without any exposure to how the federal government, states, or other existing systems define mobility. 1

I am fairly proud of my work on mobility. This post will review how I defined student mobility. I am hopeful that it matches or bests current techniques for calculating the number of schools a student has attended. In my next post, I will share the first two major versions of my implementation of these mobility business rules in R. 2 Together, these posts will represent the work I referred to in my previous post on the importance of documenting business rules and sharing code.

The Rules

Working with district data presents a woefully incomplete picture of the education mobile students receive. Particularly in a state like Rhode Island, where our districts are only a few miles wide, there is substantial interdistrict mobility. When a student moves across district lines, their enrollment is not recorded in local district data. However, even with state level data, highly mobile students cross state lines and present incomplete data. A key consideration for calculating how many schools a student has attended in a particular year is capturing “missing” data sensibly.

The typical structure of enrollment records looks something like this:

Unique Student ID School Code Enrollment Date Exit Date
1000000 10101 2012-09-01 2012-11-15
1000000 10103 2012-11-16 2013-06-15

A compound key for this data consists of the Unique Student ID, School Code, and Enrollment Date, meaning that each row must be a unique combination of these three factors. The data above shows a simple case of a student enrolling at the start of the school year, switching schools once with no gap in enrollment, and continuing at the new school until the end of the school year. For the purposes of mobility, I would define the above as having moved one time.

But it is easy to see how some very complex scenarios could quickly arise. What if student 1000000’s record looked like this?

Unique Student ID School Code Enrollment Date Exit Date
1000000 10101 2012-10-15 2012-11-15
1000000 10103 2013-01-03 2013-03-13
1000000 10103 2013-03-20 2013-05-13

There are several features that make it challenging to assign a number of “moves” to this student. First, the student does not enroll in school until October 15, 2012. This is nearly six weeks into the typical school year in the Northeastern United States. Should we assume that this student has enrolled in no school at all prior to October 15th or should we assume that the student was enrolled in a school that was outside of this district and therefore missing in the data? Next, we notice the enrollment gap between November 15, 2012 and January 3, 2013. Is it right to assume that the student has moved only once in this period of time with a gap of enrollment of over a month and a half? Then we notice that the student exited school 10103 on March 13, 2013 but was re-enrolled in the same school a week later on March 20, 2013. Has the student truly “moved” in this period? Lastly, the student exits the district on May 13, 2013 for the final time. This is nearly a month before the end of school. Has this student moved to a different school?

There is an element missing that most enrollment data has which can enrich our understanding of this student’s record. All district collect an exit type, which explains if a student is leaving to enroll in another school within the district, another school in a different district in the same state, another school in a different state, a private school, etc. It also defines whether a student is dropping out, graduating, or has entered the juvenile justice system, for example. However, it has been my experience that this data is reported inconsistently and unreliably. Frequently a student will be reported as changing schools within the district without a subsequent enrollment record, or reported as leaving the district but enroll within the same district a few days later. Therefore, I think that we should try and infer the number of schools that a student has attended using soley the enrollment date, exit date, and school code for each student record. This data is far more reliable for a host of reasons, and, ultimately, provides us with all the information we need to make intelligent decisions.

My proposed set of business rules examines school code, enrollment date, and exit date against three parameters: enrollment by, exit by, and gap. Each students minimum enrollment date is compared to enrollment by. If that student entered the data set for the first time before the enrollment by, the assumption is that this record represents the first time the student enrolls in any school for that year, and therefore the student has 0 moves. If the student enrolls for the first time after enrollment by, then the record is considered the second school a student has attended and their moves attribute is incremented by 1. Similarly, if a student’s maximium exit date is after exit by, then this considered to be the student’s last school enrolled in for the year and they are credited with 0 moves, but if exit date is prior to exit by, then that student’s moves is incremented by 1.

That takes care of the “ends”, but what happens as students switch schools in the “middle”? I proposed that each exit date is compared to the subsequent enrollment date. If enrollment date occurs within gap days of the previous exit date, and the school code of enrollment is not the same as the school code of exit, then a student’s moves are incremented by 1. If the school codes are identical and the difference between dates is less than gap, then the student is said to have not moved at all. If the difference between the enrollment date and the previous exit date is greater than gap, then the student’s moves is incremented by 2, the assumption being that the student likely attended a different school between the two observations in the data.

Whereas calculating student mobility may have seemed a simple matter of counting the number of records in the enrollment file, clearly there is a level of complexity this would fail to capture.

Check back in a few days to see my next post where I will share my initial implementation of these business rules and how I achieved an 10x speed up with a massive code refactor.


  1. My ignorance was intentional. It is good to stretch those brain muscles that think through sticky problems like developing business rules for a key statistic. I can’t be sure that I have developed the most considered and complete set of rules for mobility, which is why I’m now soliciting other’s views, but I am hopeful my solution is at least as good. ↩︎

  2. I think showing my first two implementation of these business rules is an excellent opportunity to review several key design considerations when programming in R. From version 1 to version 2 I achieved a 10x speedup due to a complete refactor that avoided for loops, used data.table, and included some clever use of recursion. ↩︎

August 14, 2013

This post originally appeared on my old blog on January 2, 2013 but did not make the transition to this site due to error. I decided to repost it with a new date after recovering it from a cached version on the web.

Rhode Island passed sweeping pension reform last fall, angering the major labor unions and progressives throughout the state. These reforms have significantly decreased both the short and long-run costs to the state, while decreasing the benefits of both current and future retirees.

One of the most controversial measures in the pension reform package was suspending annual raises 1 for current retirees. I have noticed two main critiques of this element. The first criticism was that ending this practice constitutes a decrease in benefits to existing retirees who did not consent to these changes, constituting a breach of contract and assault on property rights. This critique is outside of the scope of this post. What I would like to address is the second criticism, that annual raises are critical to retirement security due to inflation, especially for the most vulnerable pensioners who earn near-poverty level wages from their pensions.

While I am broadly supportive of the changes made to the pension system in Rhode Island, I also believe that it is important to recognize the differential impact suspending annual raises has on a retired statehouse janitor who currently earns $22,000 a year from their pension and a former state department director earning $70,000 a year from their pension. Protecting the income of those most vulnerable to inflation is a worthy goal 2.

I have a simple recommendation that I think can have a substantial, meaningful impact on the most vulnerable retirees at substantially less cost than annual raises. This recommendation will be attractive to liberals and conservatives, as well as the “business elite” that have long called for increasing Rhode Island’s competitiveness with neighboring states. It is time that Rhode Island leaves the company of just three other states– Minnesota, Nebraska, and Vermont– that have no tax exemptions for retirement income 3. Rhode Island should exempt all income from pensions and social security up to 200% of the federal poverty level from state income taxes. This would go a long way to ensuring retirement security for those who are the most in need. It would also bring greater parity between our tax code and popular retirement destination states, potentially decreasing the impulse to move to New Hampshire, North Carolina, and Florida.

It’s a progressive win. It’s a decrease in taxes that conservatives should like. It shouldn’t have a serious impact on revenues, especially if it goes a long way toward quelling the union and progressive rancor about the recent reforms. And it’s far from unprecedented– in fact, some form of retirement income tax exemption exists in virtually every other state.

We should not be proud of taking away our most vulnerable pensioners' annual raises, even if it was necessary. Instead of ignoring the clear impact of this provision, my hope for 2013 is that we address it, while keeping an overall pretty good change to Rhode Island’s state retirement system.


  1. Not a cost-of-living adjustment, or COLA, as some call them. ↩︎

  2. Interesting, increases in food prices has largely slowed and the main driver of inflation are healthcare costs. I wonder to what extent Medicare/Medicaid and Obamacare shield retirees from rising healthcare costs ↩︎

  3. http://www.ncsl.org/documents/fiscal/TaxonPensions2011.pdf ↩︎

July 28, 2013

One of the most interesting discussions I had in class during graduate school was about how to interpret the body of evidence that existed about Teach for America. At the time, Kane, Rockoff and Staiger (KRS) had just published “What does certification tell us about teacher effectiveness? Evidence from New York City” in Economics of Education Review . KRS produced value-added estimates for teachers and analyzed whether their initial certification described any variance in teacher effectiveness at raising student achievement scores. The results were, at least to me, astonishing. All else being equal, there was little difference if teachers were uncertified, traditionally certified, a NYC teaching fellow, or a TFA core member.

Most people viewed these results as a positive finding for TFA. With minimal training, TFA teachers were able to compete with teachers hired by other means. Is this not a vindication that the selection process minimally ensures an equal quality workforce?

I will not be discussing the finer points of

[points out: scholasticadministrator.typepad.com/thisweeki…

June 19, 2013

The Economic Policy Institute has release a short issue brief on the Rhode Island Retirement Security Act (RIRSA) by Robert Hiltonsmith that manages to get all of the details right but the big picture entirely wrong.

The EPI Issue Brief details the differences between the retirement system for state workers before and after the passage of RIRSA as accurately and clearly as I have ever seen. Mr. Hiltonsmith has done a notable job explaining the differences between the new system and the old system.

The brief, unfortunately, fails by engaging in two common fallacies to support its broader conclusions. The first is the straw man fallacy. Mr. Hiltonsmith takes a limited set of the objectives of the entire RIRSA legislation and says defined contribution plans do not meet those objectives. That is true, but ignores the other objectives it does accomplish which were also part of the motivation behind RIRSA. The second is circular reasoning. In this case, Mr. Hiltonsmith states that the reason for a low funding ratio is because the state did not put 100% of its paper liability into the pension fund. This is a tautology and not in dispute and should not be trumpeted as a conclusion of analysis.

Here are his three main points that he believes makes RIRSA a bad policy:

  1. The defined contribution plan does not save the state money from its annual pension contributions.
  2. The defined contribution plan is likely to earn lower returns and therefore result in lower benefits for retirees.
  3. The defined contribution plan does not solve the low funding ratio of the pension plan which exists because law makers did not make required contributions.

Of course, the defined contribution portion of RIRSA was not in place to do any of these three things. The purpose of including a defined contribution plan in the new state pension system is to create stability in annual budget allocations and avoid locking the government into promises it has demonstrated it fails to keep. Defined benefit plans require the state to change pension contributions when there are market fluctuations and leads to anti-cyclical costs, where the state is forced to put substantially more resources into pensions when revenues are lowest and spending on social welfare is most important. The defined contribution plan keeps the payments required by the state consistent and highly predictable. This is far preferable from a budget perspective.

It is unfortunate that there are lower returns to defined contribution plans which may lead to a decrease in overall benefits. It is my opinion that the unions in Rhode Island should be pushing for a substantially better match on the defined contribution portion of their plan that more closely resembles private sector match rates. This could more than alleviate the difference in benefits while maintaining the predictability, for budgeting purposes, of the defined contribution plan. I doubt this policy would have much hope of passing while Rhode Island slowly crawls out of a deep recession, but it is certainly a reasonable matter for future legislatures.

There are only two ways to decrease the current pension fund shortfalls: increase payments to the fund or decrease benefits. There is no structural magic sauce to get around this. Structural changes in the pension system are aimed at reducing the likelihood that the state will reproduce its current situation, with liabilities well outstripping funds. It is true that the “savings” largely came from cutting benefits. I have not heard anyone claim otherwise. The only alternative was to put a big lump sum into the pension fund. That clearly was not a part of RIRSA.

It is absurd to judge RIRSA on the ability of defined contribution plans to achieve policy objectives that are unrelated to the purpose of this structural change.

Perhaps the most troubling conclusion of this brief was that,

The shortfall in Rhode Island’s pension plan for public employees is largely due not to overly generous benefits, but to the failure of state and local government employers to pay their required share of pensions' cost.

I read that and expected to see evidence of skipped payments or a discussion of overly ambitious expectations for investment returns, etc. Instead, it seems that this conclusion is based simply on the fact that the benefits in Rhode Island were not deemed outrageously large, and therefore Rhode Island should just pay the liability hole. The “failure” here is predicated entirely on the idea that the pensions as offered should be met, period, whatever the cost to the government. This is the “required share”. Which, of course, is technically true without a change in the law, but feels disingenuous. It is essentially a wholesale agreement with the union interpretation of the state pension system as an immutable contract. The courts will likely resolve whether or not this is true. My objection is that Mr. Hiltonsmith makes a definitive statement on this rationale without describing it. In such a lucid description of how the retirement system has changed, it seems this could only be intentional omission intended to support a predetermined conclusion rather than illuminate the unconvinced.

Mr. Hiltonsmith also claims that, “Over the long term, RIRSA may cost the state upwards of $15 million a year in additional contributions while providing a smaller benefit for the average full-career worker.” I am not 100% certain, but based on his use of the normal cost 1 to do these calculations, it appears this conclusion is drawn only based on the marginal contributions to current employees. In other words, if we completely ignore the existing liability, the new plan cost the state more money marginally while potentially decreasing benefits for employees. It is my opinion that Mr. Hiltonsmith is intentionally creating the perception that RIRSA costs more than the current plan while providing fewer benefits. Again, this is true for future liabilities, but ignores that RIRSA also dramatically decreased the unfunded liabilities through cutting existing retiree benefits. So the overall cost for the act is far less, while the marginal cost was increased with the objective of decreasing the instability in government appropriations.

We can have a serious debate about whether there is value in the state goals of a defined contribution plan. In my view, the purpose of switching to this structure is about:

  1. Portability of plans for more mobile workers, potentially serving to attract younger and more highly skilled employees.
  2. Stability in government expenditures on retiree benefits from year to year that are less susceptible to market forces. This includes avoiding the temptation to reduce payments when there are strong market returns as well as the crushing difficulty of increasing payments when the market (and almost certainly government receipts) are down.
  3. Insulating workers from a government that perpetually writes checks they can cash, as was the case with the current system.

This paper does not address any of these objectives or others I might have forgotten. In essence, the brief looks at only one subset of the perceived costs of this structural change, but it is far from a comprehensive analysis of the potential universe of both costs and benefits. In fact, it fails to even address the most commonly cited benefits. That is why I view it as heavily biased and flawed, even if I might draw similar conclusions from a more thorough analysis.


  1. Definition: Active participants earn new benefits each year. Actuaries call that the normal cost. The normal cost is always reflected in the cash and accounting cost of the plan. Source In other words, the normal cost only looks at the new benefits added to the liability, not the existing liability. ↩︎

June 9, 2013
May 29, 2013

One thing I really dislike about Google Reader is it replaces the links to posts in my RSS feed. My Pinboard account is littered with links that start with http://feedproxy.google.com. I am quite concerned that with the demise of Google Reader on July 1, 2013, these redirects will no longer work.

It’s not just Google that obscures the actually address of links on the internet. The popularity of using link shortening services, both to save characters on Twitter and to collect analytics, has proliferated the Internet of Redirects.

Worse still, after I am done cutting through redirects, I often find that the ultimate link include all kinds of extraneous attributes, most especially a barrage of utm_* campaign tracking.

Now, I understand why all of this is happening and the importance of the services and analytics this link cruft provides. I am quite happy to click on shortened links, move through all the redirects, and let sites know just how I found them. But quite often, like when using a bookmarking service or writing a blog post, I just want the simple, plain text URL that gets me directly to the permanent home of the content.

One part of my workflow to deal with link cruft is a TextExpander snippet I call cleanURL. It triggers a simple Python script that grabs the URL in my clipboard, traces through the redirects to the final destination, then strips links of campaign tracking attributes, and ultimately pastes a new URL that is much “cleaner”.

Below I have provided the script. I hope it is useful to some other folks, and I would love some recommendations for additional “cleaning” that could be performed.

My next task is expanding this script to work with Pinboard so that I can clean up all my links before the end of the month when Google Reader goes belly up.

:::python
#!/usr/bin/python
import requests
import sys
from re import search
from subprocess import check_output

url = check_output('pbpaste')

# Go through the redirects to get the destination URL
r = requests.get(url)

# Look for utm attributes
match =  search(r'[?&#]utm_', r.url)

# Because I'm not smart and trigger this with
# already clean URLs
if match:
  cleanURL = r.url.split(match.group())[0]
else:
  cleanURL = r.url

print cleanURL
January 27, 2013

Murder mystery weekend are fun. I was not the killer, but I did have quiet the violent past.

1940s. All’s Fair in Art and Money.

The name is Victor Lamarr. Art Critic and sommelier.

December 8, 2012
December 7, 2012

This is how you streetcar #sanjosé

November 22, 2012
November 7, 2012

Snow time. Also means I’m outta here. #officeview

November 1, 2012

I like this piece in Slate on Paul Cuffee Middle School, a charter school right here in Providence. Most of what I know about child development seems to suggest that middle schools are sort of ridiculous. At the moment children are looking for role models and close relationships with adults (and not just the kids around them), we decide that kids should have many teachers, teachers should have higher loads, and the kids stay consistent while the adults change constantly.

In many ways, the elementary school model works better for middle school students and vice versa.

Anyway, some research showing K-8 schools have a built-in advantage against the traditional middle school:

The Middle School Plunge Stuck in the Middle 1


  1. A more “popular” version on Education Next here↩︎

October 7, 2012

I drive for 10+ hrs over 48, go to a wedding, and get a 630 fire alarm wake up and GRACIE is sleepy?