双语:什么是机器学习?这取决于你问谁

Increasingly, the term “machine learning” is also beginning to acquire a catch-all status. Or, at the very least, machine learning has become a convenient handle that today’s data scientists use to refer to the wide range of leading-edge techniques for automating knowledge and pattern discovery from fresh data, much of it unstructured. People’s working definitions of machine learning seem to be creeping into broader, vaguer territory.

That’s my impression from reading the recent article “Learning and Teaching Machine Learning: A Personal Journey.” In it, author Joseph R. Barr of San Diego State University and True Bearing Analytics discusses both the history of machine learning and his own education in the topic. He states that “it’s safe to regard machine learning, data mining, predictive analysis, and advanced analytics as more or less synonymous.”

I’m not sure that lumping machine learning with all of these other techniques makes sense. As noted above, machine learning primarily applies to unstructured data, whereas data mining is specific to structured data sets. Also, machine learning, like data mining, is principally concerned with finding diverse patterns in historical data, whereas predictive analysis focuses specifically on finding those predictive patterns that can be tested empirically through gathering of fresh data in the future. And whereas machine learning, data mining, and predictive analysis are all narrowly scoped, advanced analytics is a broader scope that includes them all.

It seems to me that machine learning has one foot in data science and the other in computer science. That’s how I interpret what Barr has to say here: “Machine learning grew out of several not-necessarily disjoint mathematical subjects, notable among these are mathematical statistics, computing and algorithm, information theory, and mathematical optimization…. In those ancient times, machine learning was bundled with AI…. [M]ost topics in machine learning lie in the convex hull of (the theories of) probability, combinatorics, convexity and optimization, statistics, information, and computing. To this list I would add the three extra dimensions: heuristics, empirics, and applications.”

That’s a lot to bite off and chew on! As this discussion makes clear, machine learning has a formidable learning curve, for which years of classroom and laboratory work at the university level may prove essential. And that in fact is the crux of Barr’s article: His own machine learning schooling as a professional data scientist plus the challenges he now faces defining the right machine learning curriculum for tomorrow’s data scientists.

The definitional scope creep afflicting the machine learning arena mirrors these challenges. The disparate disciplines under this umbrella will continue to cross-fertilize in innovative ways that will stretch every data scientist’s thinking as well as the terminology they use to define machine learning.

By James Kobielus 译者:北理大数据教育-苏道强 36大数据专稿,拒绝转载