No more syllabi: how do questions define work tasks?

Once we recycle our school notebooks and leave our classroom behind, there are no more rubrics. There is rarely a syllabus or a well-written assignment. Instead we are met with vague requests and requirements which grow over time. Scope-creep it is often called. But, sometimes it isn’t scope creep. Sometimes there are questions that no one thought to ask until they saw the next set of data. Or, perhaps, there are opportunities to grow an analysis into a different direction because new information or new strategy is available. As a younger data scientist or analyst, there are many skills you can develop to help you navigate this amorphous quagmire of ongoing project requirements via questions.

Let’s put ourselves into a Demo meeting. And let’s say you carefully built out an algorithm for section A of your company’s portfolio. It’s tuned and measured against a hold-out set in section A. You gathered the data carefully and managed several exceptions and ‘gotchas’. You put together a nice notebook and successfully walked your leader and peers through the content. Now people are starting to ask questions. These mysterious questions are offering you insight into what you should add to your metaphorical homework assignment page. Somehow you need to translate all these questions into tasks that you can tackle and report back on. Let’s walk through a few of the most common questions and what you should take from those questions.

First, if the questions are about the specific details of your project, then you need to address them and know the answers. These often sound like:

  • “Did you filter out for exception Y?”
  • “Which table did you pull this data from?”
  • “Does this include data A or data B as the training set?”
  • “This numbers looks high, I’m worried there is a bug somewhere.”

For this kind of question, know that you are probably going to learn something OR your team is going to learn something about you. Maybe you made a mistake because no one told you that data A is missing element Y during that time period. This is nothing personal, there’s a lot to remember and not everyone will remember all the gotchas in the data. Just say something like, “great, I will take a note of that is fix it. Thanks for sharing that information, I hadn’t learned that yet.” On the other hand, maybe your work is great and you can answer every question with the desired answer. In that case, your team learned something about you and your coding. Chances are good that next time you’ll have fewer questions, because it means that your team is beginning to trust your work.

One way or another, you need to know the answers to all of these technical questions and be able to speak intelligently about the work that you did.

Second, there are questions which can (eventually) be expected. In industrial data science, the most common question is “Does this new solution solve a valuable problem?” This question can come in many formats:

  1. What’s the value proposition of this model?
  2. How much of the problem is this algorithm going to identify?
  3. Is the model output precisely aligned with our goals?

Anticipating these questions is an important step in the first few years of industrial work. Which version of this question does your leader like to ask? It’s very likely that your technical leader will ask a different version of the question than your non-technical leader. I encourage you to take your metaphorical pencil and add “value estimation” as a task within the project. Most leaders will not explicitly ask you to do this analysis (Because, of course, they already believe that the thing you are doing is worthwhile or they would not have asked you to do it!). But, the analysis on size of prize is critical to understanding how good your solution is. If you don’t know how big the problem is, then you’ll have no idea when you are done. And you won’t know what to write down on your annual self-review!

Eventually, you’ll want to prepare in advance to be able to include the answer to the value question within your presentation.

Lastly, there are questions which cannot be expected. These are often from your leader who, presumably, has more context than you. Now, let’s go back to that Demo meeting where you talked through your work on training a model for section A. Let’s pretend that the first question your leader asks is, “How well would this algorithm perform on section B?” This question feels off topic and totally random! How DARE my leader not comment on anything that I actually did and only ask me about something that specifically is not in scope! It feels so rude!

A leader will often ask a sideways question when your work is already solid. The “tangential” question actually says several wonderful things about your work. Chiefly, it says that: your leader isn’t concerned about the details of your solution. It means you proved yourself competent and your methods sound. I guarantee that if your work was suspicious then your leader would be quizzing you about your methodology (see above). But, unlike school, where your leader (I mean teacher!) needs to prove that you know what you are doing to grade you, your leader will eventually trust that your work is sound. They don’t need to see all the details to believe that the project was done appropriately. This frees them up to think about additional applications or that conversation with the section B leader who was desperately concerned about the same thing the section A leader is, but whom didn’t bother talking to your leader until yesterday.

So, take heart, no applicable questions is a good thing in this situation. If your leader skips the first two types of questions, I promise your leader isn’t trying to snub you. In fact, it’s the opposite! They trust you so much that they just forgot to compliment you on your solution before diving into the next idea.

Posted in Communicating Math, data science | Tagged , , | 1 Comment

Transitioning from Academia: A Recommendation

I just read a fabulous article put together by Kshiti Mishra and Neha Bora based on a webinar by Dr. Karoline Pershell. I agree with 100% of the things Karoline recommends in this article.

In particular, I feel incredibly strongly about the need to create alignment between your preferred reward system and the reward system of your job. If your job rewards you for doing something that you don’t enjoy doing, then you should find a different position. For my experience, I have moved teams and changed job titles in order to create better alignment between my values and the values of the leaders. Go read it!

Transitioning from Academia to Industry: Tips and tricks

Additionally, while you are here, here is a great LinkedIn post with a video clip from Theo Priestley with similarly tangible advice for career decisions.

Posted in Business, Communicating Math | Tagged , , , , | Leave a comment

Police Transparency

There is surprisingly little published information on police activities or data analysis on their actions by outside sources. At the time of this writing, fivethirtyeight has only published 2 articles in all of 2020 with analysis police actions. Almost nothing comes up on my google searches of mathematical analysis of police data. There is no way to ground ourselves in facts if data scientist community members cannot get their hands on data to analyze and validate the stories we tell ourselves.  In a world that is becoming more and more divisive in terms of the political narratives being presented in the media, I think we have a duty to validate the stories we tell ourselves.

As a community of data scientists, we learn and create change with our data. Our analysis changes the strategies our employers. Our commitment to data and facts helps our company’s leaders make better choices for the financial future of the company.  Should we not use the same skill set to make the communities we live in safer and more equitable?

I live in Twin Cities, Minnesota. We were the epicenter of police protests in May 2020 after the death of George Flyod. As I was nursing my newborn through this crisis, I was surprised and stunned at how little outside analysis there was about police efforts. We had videos and narratives, but very few facts that both sides could agree on.

Twin Cities is a shockingly segregated area. Back in 2015, I published about the Parable of Polygons. A wonderful analysis that vi hart and Nicky Case put together about how small choices can dramatically reduce segregation and increase diversity. It doesn’t surprise me much that our cities, whose housing and community inequalities are so great, are also a home base for other inequalities.

Many Twin Cities cities have police activity transparency commitments. As I researched this, I saw that the transparency is often focused on “Crime Statistics”. Police departments provide summarized views of the type of crime that happens in the cities. I used these summaries in the past to help me decide on where to live when I first moved to the Twin Cities. But they do not provide insights with the lens of police activity. We can’t use data to learn about police activities if the data presented to the public is only focused on the crimes investigated.

Minneapolis Dashboard
Saint Paul Crime Statistics
Roseville Transparency Page
Saint Louis Park Crime Maps and Statistics
Brooklyn Park Crime Statistics

These summaries do little to give a data scientist the level of detail they need to complete causality and/or correlation analysis. In order to get true transparency, data scientists need to have access to the data of police activities in the Twin Cities. And then we need to let the data speak to us and share our findings with the public.


Posted in Communicating Math | Tagged , | 3 Comments