Welcome to the first issue of The Cluster, a newsletter for all things data science!
The Cluster is written by journalists and published by the producers of DATAx, a cross-industry event for business leaders, strategists, and practitioners looking for best practices and strategic insights around data science.
The Evolution of the Chief Data Officer Role Continues
The Chief Data Officer role started as a joke at Yahoo, according to Dr. Usama Fayyad of the Initiative for Analytics and Data Science Standards (IADSS). But shouldn’t the job be taken more seriously now? After all, the first chief data officer was appointed at Capital One way back when WorldCom went bankrupt.
The laughing has stopped but the questions continue. The results of the 8th annual NewVantage Partners Big Data and AI survey landed last week, and although CDOs are in place at 68% of the companies surveyed, the job remains ill-defined. Almost 18% of executives viewed the CDO role as interim or unnecessary. Worse: about 25% of the companies said there was no single point of accountability for their firm’s data, even if they had a CDO. (Data ownership is one of the key tenets of the CDO role, of course.)
The survey had another interesting point: 23% of organizations (from numerous industries) said they struggle with high turnover in the chief data officer position. That could be both a good and a bad signal. It may mean burgeoning opportunities for CDOs, but also reflect that the job is so poorly defined and lacking in influence at many companies that appointees don’t stay long.
A related problem is where the data science team belongs in the org chart. Said Dan Gremmell, the vice president of data science at Plated (which Albertson’s has apparently shut down), at our November DATAx conference:
“If data science is just off in a silo doing data science and nobody wants it, [it’s] not going to be effective."
So where does data science fit in? Gremmell has some great advice about how your data science team should and should not be structured.
Incidentally, Gremmell is also the head of financial planning and analysis at Plated, which makes him qualified to give advice on how data scientists can boost a company’s financial performance. Three questions CDOs (and their CFO colleagues) should be asking, says Gremmell: How do I use data science to drive revenue? How do I use data science to save money that directly contributes to a product? How do I use data science to cut expenses?
It’s the Decade of …
… ambitious predictions for the pace of advancement of data science and its associated technologies, apparently. Reading the plethora of commentary on what’s ahead for data science this year and the next 10 reminds us that trying to forecast the future based on history, without accounting for the unexpected, the random, and the unpredictable, is not very smart.
One of the most outlandish prognostications we have seen: by 2022, 50% of global tier-one banks will be using quantum computing to review portfolio allocations, algorithmic trading, and pricing strategies. Given how slowly banks adopt new systems and how often society severely overestimates a new technology’s adoption rate, three years seems awfully optimistic.
Many data science prognostications are less bad, like Medium’s The 4 Hottest Trends for Data Science in 2020 and Forbes’ 6 Predictions About Data in 2020 and The Coming Decade. Still, when Forbes says “enterprises will accelerate their shift from focusing on managing (collecting, storing, analyzing) internal data to investing the greater part of their IT resources in managing (collecting, storing, analyzing) external data, most of it ‘unstructured,’” I cringe. It will happen, but not yet.
When Data Science Isn’t …
Unfortunately, for job seekers, many jobs labeled “data science” are anything but, according to a recent discussion on r/datascience. Roles labeled as such can be merely administrative, says Reddit user data_science_is_cool. Managing spreadsheets, organizing files, taking meeting minutes (yikes!) — all are tasks that some “data scientists” are asked to perform. One PhD has a simple tip for sniffing out whether a data science job is authentic: ask the hiring manager (1) what function the role is a part of (2) who leads that function and (3) what other people work in that role or function.
Overreaction to Metrics
Thank goodness someone finally wrote about C-suites overreacting to data points! “Routine variation” is not a complex concept, but it’s often forgotten. If a company’s working capital ratio falls half a percentage point in a quarter, it doesn’t mean it’s time to go looking for root causes or interrogate the data scientist about the meaning of the “trend.” Sometimes numbers just fluctuate within a range.
Model Explanation
It’s a long read, but FiveThirtyEight’s description of how it built its Democratic primary forecasting model is fascinating, even if you (1) don’t follow politics much or (2) are still angry at political forecasters for blowing the 2020 presidential race. One thing Nate Silver and his team make clear this time: the forecasts are probabilistic, and the degree of uncertainty high. (Photo by Alex Wong/Getty Images)
Reverse Peephole
I laugh when a CEO claims his or her decision making has been all from the gut. In his new book, “Entrepreneurial Leadership,” the chairman of JetBlue, Joel Peterson, writes “data science offers sophisticated tools for decision-making … but apart from Wall Street, [it’s] rarely used in the typical C-suite.” Maybe, but the fruits of data science are definitely starting to change the conversations in boardrooms. Peterson remarks that he hasn’t done a Monte Carlo simulation since business school. Still, he’s not basing his decisions on tea leaves: “I never make an important decision without setting up the problem as a simulation with an array of potential outcomes — and thinking hard on whether I’ve included all potential costs and thoughtfully assessed the probabilities,” he also writes.
Setting the Terms
We’re going with Hillary Mason’s definition of machine learning. She says it’s cribbed from Tom Mitchell at Carnegie Mellon: “the study of systems that improve with the introduction of more data.”
Reference Shelf
A nod to Data Elixir for pointing us to this new data project checklist. If there’s one thing we’re constantly hearing it’s that data scientists are approaching projects in a suboptimal way. Or (more commonly) that they’re sunk inside a business unit and are having a hard time getting started. The checklist from fast.ai may help. However, note that one of the first items on the list is this question: What are the 5 most important strategic issues at your organization today? If your senior management can’t give you those (many can’t), then the company has bigger problems than which data science projects to tackle.
Money Flow$
Business intelligence startup Sisense is now a unicorn thanks to a new $100 million-plus funding round led by Insight Partners. Sisense’s total outside funding now stands at more than $300 million (counting the capital raised by Periscope Data, which it bought last year). Periscope Data, BTW, is now being pitched as “Sisense for Cloud Data Teams.”
Big Jobs
Mastercard named Raj Seshadri president of its data and services organization. Seshadri joined Mastercard in 2016 from BlackRock, where she was a managing director leading the company’s iShares U.S. retail ETF business. Prior to that, she was global chief marketing officer of iShares. She has a doctorate in physics from Harvard.
Overfit
SAS’s top 10 data visualization’s of 2019 list was uneven in quality, but we really liked the graphic, How Much Renewable Energy Does Your State Generate? … Want to feel like an underachiever? A data scientist was named Indiana State Fair Queen. … Can artificial intelligence be fully responsible for an act of invention? asks MIT Technology Review. Some patent applicants overseas think so. The European Patent Office, at least for the foreseeable future, sees it as just a tool. … High velocity data provides the basis for real-time interaction and often serves as an early-warning system for potential problems and systemic malfunctions, according to Randy Bean of NewVantage Partners. The problem? Only 3% of big data investment goes to data velocity. … Many experts talk about data democratization, but few know how to get there (or will share how). … A company that can guarantee the privacy and security of its customers’ data will find it has a far easier time convincing customers to give it more data, or at least so says Medium. … BTW, the r/datascience vote for what a group of data scientists should be called is over. The winner? A cluster.
Visual Sweets
Google’s contribution to the FaceForensics benchmark, an effort to combat deep-fakery, is kinda spooky. (Look closely.)
Shameless plug
DATAx San Francisco is not far away — June 10 and 11. We already have some cool speakers lined up: Chris Benson, principal artificial intelligence strategist at Lockheed Martin; Jack Hanlon, global head of data at Reddit; and Mario Vinasco, director of BI and Analytics at Credit Sesame. Early reg ends in late February. If you’re like us and enjoy a more intimate conference (@ 400 attendees) that offers great content, comfortable networking opportunities, and hands on product demos, this event is for you.
Follow us: @DATAxEvents
@DATAxMedia Facebook page