March 4, 2018

3:52PM

Some important work I need to do involves uninterrupted thinking. Not desk work like coding, but creative and strategic work. I’m having a hard time revving down on evenings & weekends bc that’s when my brain has space. Is an “unplugged” day or 1/2 day each week a crazy idea?

5:58PM

I don’t want to wait until July for Record of a Spaceborn Few. 📚

March 3, 2018

9:30AM

A great blog post about the twin awful experiences of dating and job hunting.

March 2, 2018

12:35AM

“You think a newborn knows what it all means? It just happens and then you go about the mean business of being alive. Awareness comes later, if it comes at all.”

Book 8 of the year was remarkable.

February 27, 2018

8:03PM

My frequent flouting of my (sometimes) crippling imposter syndrome is not a cry for reassurance. It’s a hope/acknowledgment that maybe someone out there looks up to me and will take solace in the fact that I too suffer from feelings of inadequacy and overwhelm.

8:14PM

The only thing that ever really calms my imposter syndrome is recognizing just how much I’ve grown. I hate reading past me, prose or code. I don’t have regrets, but I sure know I do things better today than I did before. I am so much happier with who I am today than who I was.

February 26, 2018

1:50PM

It’s as though toxic masculinity was collected and coalesced into one living man.

February 25, 2018

Missing the vulnerability of a smaller, pseudonymous internet

4:14PM

I think the internet stopped being fun for me when I was 18 in 2005.

Our family signed up for America Online (and WOW by CompuServe, and MSN, and various other ISPs that gave away hours) starting from about 1996 when I was 9. Putting aside chatrooms and the emergence of messaging services, what I remember most about the internet from my time in middle school through high school were pseudonyms, personal websites that we would now call “blogs” (and their further development with things like LiveJournal), and fan sites.

What was so attractive as a pre-teen and then teenager about the internet was that it was somewhere you can connect with other people in a deeply personal and vulnerable way. You could meet someone with the same interest you thought was obscure. You could share ideas that seemed bizarre, or even radical, and find out that someone else felt the same way, or didn’t, and you learned from that conversation. You could try on personalities and traits that were unlike your own. And because the internet could be anonymous or pseudonymous, and because sites and services and data disappeared, you could do these things without repercussion.

As the world caught on to the internet, there were more and more incentives and requirements to move toward using your “real ID” online. First, and often, as virtue signaling about the seriousness with which you held believes on forums and in chatrooms and on blogs. Second, as a means to ensure that you and only you defined what would be found when increasingly easy and common searches for your name were conducted. And finally, as a strong requirement of the internet services and applications we used which want your real identity because without it you and your data hold little value to them.

I greeted a lot of this with open arms. I remember when I was 18 changing my online pseudonyms all over to my real name. Because I grew up, and the internet grew up. Rather than liberation, anonymity/pseudonymity and acting without repercussion morphed from enabling profound vulnerability to enabling profound harm. It was time for the internet and the real world to come together.

But I miss those early days. It was important to my development as a person to experiment with identity and ideas and to be vulnerable “in public” with other ideas and identities on the web. It was healthy. But it would take a monster amount of work to access the web like that today, and even then, with the internet operating as the largest surveillance apparatus ever constructed, I don’t think I could ever have that naive trust required to be so deeply vulnerable again.

8:36PM

I have to admit, I’m pretty torn on whether I should move json.blog and my other domains to https. Scripting News: HTTP still under attack

8:49PM

I remember when Sunlit came out for ADN and I remembered liking it but I remembered very little else. Watching this video showing a peak at Sunlit 2.0 has me really excited.

February 24, 2018

How I start my work in R

7:00PM

Ideation

At the start of every project, there’s a blinking cursor.

Actually, that’s almost never true for me. If I start staring at a blinking cursor, I’m almost guaranteed to keep looking at a blinking cursor, often for hours. The real work almost always starts weeks or months before I actually type anything. I think it’s easy for folks for whom their ultimate product is a bunch of code or an analysis report to undervalue that our work is creative. Writing a package or doing data analysis is still fundamentally creative work. We’re in the business of using computers to generate evidence to support insights into how things work. If all there was to it was a procedural search through models, then this would all have been automated already.

When I think, “How do I wish I could write my code to solves this problem?” I know that I am getting a great idea for a package. Often, I’m staring at a function I just wrote to make my work easier and start to think “This is still too specific to my work.” I can start to see the steps of generalizing my solution a little bit further. Then I start to see how further generalization of this function will require supporting scaffolding and steps that would have been valuable. I start to think through what other problems exist in data sets unlike my own or future data I expect to work with. And I ask myself again and again, “How do I wish I could write my code to solves this problem?”

Data analysis almost always starts with an existing hypothesis of interest. My guiding thoughts are “What do I need to know to understand this data? What kind of evidence would convince me?” Sometimes the first thoughts are how I would model the data, but most of the time I begin to picture 2-3 data visualizations that would present the main results of my work. Nothing I produce is meant to convince an academic audience or even other data professionals of my results. Everything I make is about delivering value back to the folks who generate the data I use in the first place. I am trying to deliver value back to organizations by using data on their current work to inform future work. So my hypotheses are “What decisions are they making with this data? What decisions are they making without this data that should be informed by it? How can I analyze and present results to influence and improve both of these processes?” The answer to that is rarely a table of model specifications. But even if your audience is one of peer technical experts, I think it’s valuable to start with what someone should learn from your analysis and how can you present that most clearly and convincingly to that audience.

Don’t rush this process. If you don’t know where you’re heading, it’s hard to do a good job getting there. That doesn’t mean that once I do start writing code, I always know exactly what I am going to do. But I find it far easier to design the right data product if I have a few guiding light ideas of what I want to accomplish from the start.

Design

The next step is not writing code, but it may still happen in your code editor of choice. Once I have some concept of where I am headed, I start to write out my ideas for the project in a README.md in a new RStudio project. Now is the time to describe who your work is for and how you expect them to interact with that work. Similar to something like a “project charter”, your README should talk about what the goals are for the project, what form the project will take (a package? an Rmd -> pdf report? a website? a model to be deployed into production for use in this part of the application?), and who the audience is for the end product. If you’re working with collaborators, this is a great way to level-set and build common understanding. If you’re not working with collaborators, this is a great way to articulate the scope of the project and hold yourself accountable to that scope. It also is helpful for communicating to managers, mentors, and others who may eventually interact with your work even if they will be less involved at the inception.

For a package, I would write out the primary functions you expect someone to interact with and how those functions interact with each other. Use your first README to specify that this package will have functions to get data from a source, process that data into a more easy to use format, validate that data prior to analysis, and produce common descriptive statistics and visuals that you’d want to produce before using that data set for something more complex. That’s just an example, but now you have the skeletons for your first functions: fetch, transform, validate, and describe. Maybe each of those functions will need multiple variants. Maybe validate will get folded into a step at fetch. You’re not guaranteed to get this stuff right from the start, but you’re far more likely to design a clear, clean API made with composable functions that each help with one part of the process if you think this through before writing your first function. Like I said earlier, I often think of writing a package when I look at one of my existing functions and realize I can generalize it further. Who among us hasn’t written a monster function that does all of the work of fetch, transform, validate, and describe all at once?

Design Your Data

I always write up a data model at the start of a new project. What are the main data entities I’ll be working with? What properties do I expect they will have? How do I expect them to relate to one another? Even when writing a package, I want to think about “What are the ideal inputs and outputs for this function?”

Importantly, when what I have in mind is a visualization, I actually fake data and get things working in ggplot or highcharter, depending on what the final product will be. Why? I want to make sure the visual is compelling with a fairly realistic set of data. I also want to know how to organize my data to make that visualization easy to achieve. It helps me to define the output of my other work far more clearly.

In many cases, I want to store my data in a database, so I want to start with a simple design of the tables I expect to have, along with validity and referential constraints I want to apply. If I understand what data I will have, how it is related, what are valid values, and how and where I expect the data set to expand, I find it far easier to write useful functions and reproducible work. I think this is perhaps the most unique thing I do and it comes from spending a lot of time thinking about data architectures in general. If I’m analyzing school district data, I want to understand what district level properties and measures I’ll have, what school properties and measures I’ll have, what student property and measures I’ll have, what teacher properties and measures I’ll have, etc. Even if the analysis is coming from or will ultimately produce a single, flattened out, big rectangle of data, I crave normality.

Make Files

So now my README defines a purpose, it talks about how I expect someone to interact with my code or what outputs they should expect from the analysis, and has a description of the data to be used and how it’s organized. Only then do I start to write .R files in my R/ directory. Even then I’m probably not writing code but instead pseudocode outlines of how I want things to work, or fake example data to be used later. I’m not much of a test-driven development person, but the first code I write looks a lot like test data and basic functions that are meeting some test assertions. Here’s some small bit of data, can I pass it into this function and get what I want out? What if I create this failure state? What if I can’t assume column are right?

Writing code is far more fun when I know where I am heading. So that’s how I start my work.

9:58PM

sqlite is totally the right tool for this job, but I ❤️ PostgreSQL too much to stop myself.

February 20, 2018

1:32PM

I have only read two of the Nebula nominees for best novel, which means I can add 5 more books to my “want to read” shelf on Goodreads! 📚

February 19, 2018

2:13PM

Altered Carbon was good, but probably about 100-150 pages too long. 📚

⭐️⭐️⭐️⭐️, book 7 of the year 💪🏻

February 17, 2018

1:56PM

Reunited.

3:13PM

My typical Saturday position. Just missing the Kindle.

3:25PM

Tell me about the temporal_tables PostgreSQL extension. Anyone use it? Any gotchas (other than no RDS support)?

4:33PM

It’s been 9 long years that I haven’t had my amp and pedal board in my home. Here’s hoping I play more this next 9 years. (And get my other guitars here!)

February 16, 2018

7:41PM

Get you a weather app that understands your feels.

February 13, 2018

8:21AM

What annoys the hell out of me about Brian Chen’s HomePod review is I remember how annoyed I was that it took 2 weeks to get Spotify’s Discover Weekly playlist. Apple Music was a way better day one experience (but Spotify gets so good over time).

February 11, 2018

9:59PM

Just booked a trip to Tapei and then Hong Kong. Never been anywhere near that part of the world. Should be an exciting adventure.

February 10, 2018

1:53PM

Finished my 6th book of the year and it feels so damn good.

February 7, 2018

9:31AM

Interesting to consider different protections for commercial versus residential real estate. Outside of CA, it should spur needed housing growth. If I lived in CA, I’d want this tied to state reform of local restrictions on building housing, but that would sink the effort.

10:32AM

Solid work view for this Baltimore resident.

February 6, 2018

4:48PM

Seeing that dummy sitting in a Tesla Roadster in outer space is somehow way cooler than I could have possibly imagined. 🚀