Learning Machines: Polo Chau Explains Data Visualizations

Learning Machines_headerimage (1)

Welcome to Learning Machines, where we’ll talk with faculty members from the Machine Learning Center at Georgia Tech (ML@GT) about their main research area and the future of their work.

Today we talked with Polo Chau, the associate director for corporate relations at ML@GT, an associate professor in the School of Computational Science and Engineering, and director of industry relations for the Institute for Data and Engineering Science (IDEaS.)

Chau’s research group, The Polo Club of Data Science, works at the intersection of data-mining and human-computer interaction. They create scalable, interactive, and interpretable tools that amplify human’s abilities to understand and interact with billion-scale data and machine learning models.

We spoke to Chau about a technique called large graph visualization.

Hi Polo! Thanks for joining us. For those who are not familiar with machine learning and visualization, can you give a brief explanation of them?

Machine learning (ML) is a very general term. When we work on ML, usually that means we’re developing methods (e.g., computer algorithms) that can automatically learn “rules” from data that would help accomplish some tasks.

For example, a common example is handwriting recognition – coming up with a ML method that recognizes words by looking at the pixels (e.g., their placements, how they form strokes). Those rules could be simple and easily understandable or complex and hard to decipher (but possibly very powerful).

Technically, visualization refers to methods that create imagery (e.g., charts, animations) to help communicate information or to support data exploration. For ML, visualization is a powerful way to help people more easily understand complex phenomena. It is one of our group’s specialties and research foci — we call it “Interpretable Artificial Intelligence (AI)” —our other focus area is “Secure AI).

You do a lot of work with visualization and visual analytics. What drew you to this area of research?

It all started when I was a little kid. I still recall drawing human figures on the wall using colored pens (surprisingly my parents did not stop me!) I found it’s a lot of fun to color things, draw whatever I want, and sometimes use my drawings to help explain things.

For a long time, I had no idea what visualization and visual analytics was. I just kept “design” as my hobby. I didn’t have formal design training, but I always enjoyed every design opportunity I had — I would volunteer to help design book covers, posters for concerts, and websites for research groups. I spent extra time improving the design of the user interfaces for software projects (my undergrad was in engineering,) even though the design itself didn’t count for any points.My experience as an engineering student who was interested in design  got me to question why interface design wasn’t more valued or studied in software engineering.

What kinds of problems do visualizations and especially interactive visualizations help solve?

Many kinds of problems indeed. Humans have really powerful visual perception — we can easily detect patterns in imagery that computers have a hard time figuring out (or “seeing”). Patterns and outliers often jump right out to us. Visualizations aim to leverage this powerful perception capability of humans.

Some familiar visualizations include charts for data (e.g., tables, bar charts, line plots) to help summarize and communicate results. It’s much quicker and easier to see a line that’s going up than to read many numbers and assess if they really describe an upward trend. Other examples include scientific visualization like physics simulation or medical imaging).

What are the main challenges or problems that you encounter when working on a visualization project?

Since we work with large datasets and complex models, “visual scalability” is often a problem to solve. The crux is how to effectively prioritize what to show from the data or models so we don’t overwhelm the users, and what not to show while providing the means to show them when the users need them.

Tell us about a project of yours that you are particularly proud of.

There are many actually! My students would complain if I only mention one 🙂  Some recent examples include CNN Explainer and GAN Lab that aim to explain and help students learn how popular deep learning models work, and the Summit system that summarizes what ML models have learned. The FairVis system helps users discover bias in ML methods. You can check out more projects on our group website.

What kinds of skills do you think someone needs to be successful in this particular area of research?

Be proactive and eager to learn. These are actually not skills, but rather how we approach research (and possibly life.) If a student adopts this mindset, they will grow quickly — if they don’t know something yet, they’ll learn it. I think this mindset is particularly relevant today as technologies advance so rapidly. Things we learn today could become outdated in just a few years. That’s also the reason I say they’re more like “life skills” than technical skills.

Before we go, tell us what you’re currently working on and how our readers can connect with you.

Quite a number of projects. Here are a few: (1) DARPA GARD project on developing defenses for AI; (2) PeopleMap for mapping researchers; (3) RECAST to audit toxicity in language models.

The best way to connect is through email. Chau is also active on Twitter at @PoloChau.

For more information on Chau and his work, visit his website.

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.