Human-Computer Interaction Researcher
Jun Kato is a senior researcher at the National Institute of Advanced Industrial Science and Technology (AIST) in Japan. He also serves as a technical advisor at Arch, a Japanese animation production company. Since April 2024, he has been a Visiting Scientist at Université Paris-Saclay. In his work, he conducts human-computer interaction research, which involves studying how technology can be used to support individuals’ creative pursuits such as programming, video authoring, storyboarding, and animation creation.
STEM to the Sky
Aug 5, 2024
Human-Computer Interaction (HCI) is the primary focus of my research, and it’s essentially about mediating the relationship between humans and computers. It’s important to consider how we can utilize computers to benefit human beings in a broader sense. HCI involves identifying issues in the human use of computing, conducting empirical studies, like interviews, and proposing technical or socio-technical solutions. Researchers build prototype systems, test them with real people, and evaluate how well the “solutions” work. It is a cumulative cycle of defining research problems and solving them with these novel technologies that we developed with computers.
What really attracts me to HCI is its relevance in the post-AI era. Most people today are familiar with artificial intelligence (AI) and its technologies like ChatGPT and Stable Diffusion. These technologies can serve to enhance our daily lives or work lives. It is important to build such systems as human-centered AI, which involves designing AI in a way that benefits people. When we say “augmented human,” it’s really about amplifying people’s intelligence or abilities. There are many different computer-supported hardware technologies, such as those designed for people with disabilities. We can develop systems that help them, and what’s interesting is that these advancements aren’t just about bridging gaps but also about expanding capabilities beyond what we previously thought was possible. For example, when we watch the Olympics, we witness people who run, but there are paralympians who can also run. Paralympians can actually run faster than able-bodied athletes because of the specially designed prosthetic limbs that function as legs, surpassing the performance of natural limbs.
It’s really possible to develop something that enhances our cognitive and physical abilities these days with the help of computer-based technology. What captivates me is the question of how humans should behave in the post-AI era. AI can automate many tasks, which should free us to focus on more interesting things that cannot be done by AI. This also means grappling with the challenge of defining meaningful roles for humans, especially when AI can be more cost-effective in certain scenarios.
Navigating this complex landscape is a central concern in the field of HCI. In HCI, we think about how we can enhance people’s happiness and well-being with the help of novel technologies, not by forcing them to do something, but rather by leveraging their abilities and creativity in their everyday lives. This is called “mini-creativity.”
I like to work closely with the end-users, creators, and programmers. I have developed many programming environments and tested them with real programmers or primary school students in workshops. I have also created systems to enhance the creativity of musicians and video authors, as well as digital tools that empower anime directors to draw storyboards. These concrete projects reflect my role as a toolsmith, making creativity support tools (CSTs), toolkits and programming environments for creators and programmers.
Jun Kato is a Toolsmith researcher. He is standing in front of the display showing the result of his research, TextAlive (https://textalive.jp), a web-based tool for creating text animations.
This is a difficult question to answer because when I was a young kid, there was no term like “STEM.” People would ask, “Do you like humanities or do you like science?” The vague distinction was there, but we didn’t think of STEM as an established concept at the time. I was a kid who loved reading books, not necessarily focused on science or natural sciences, but rather stories and novels. At some point, my parents bought me a computer when I was in primary school. It was an Apple MacIntosh—a standalone local computer. When I entered junior high school, I bought my own Windows-based personal computer. This allowed me to access the Internet, which was eye-opening for me because it was the first time I could connect with people of various ages, not just my peers. I started doing a lot of creative stuff such as designing websites by writing source code and taking photos. That would describe my initial STEM interest at a young age.
I gradually moved into programming by myself, and I created a music player software that I published to a website. It was really gratifying to see that a community of people was actually using my software. The experience of making a music player is also somehow connected to my trajectory towards HCI. It looks like the music player is still online, and its last update date is 2007, making it 16 years old! The music player website has many different “skins” you can play around with to change the appearance of the page. For instance, there are skins featuring animated characters. I open-sourced the definition of these skin files, allowing for the creation of numerous fan-made skins available online. Even nearly 20 years ago, I was already attached to the idea of making a platform that other people can then use to do cool things. I feel proud to make a tool that allows people to do that.
When I was thinking about entering university, I considered attending an art school, specifically drawn to their design departments. I was even considering law school since I was also interested in the humanities. But ultimately, I chose to attend the University of Tokyo, where I didn’t have to declare a major until the end of my second year. It was only after two years of university that I settled on majoring in computer science.
When I entered university, I was surprised by the diverse range of courses offered. I initially enrolled in the science department, where many students were passionate about biology. However, even within that specific department, I discovered a broad array of courses spanning not only science but also humanities. One particularly interesting course was led by the famous Japanese journalist, Takashi Tachibana. His publications are very diverse, covering political, economic, scientific, and even philosophical topics—from the universe to brain-computer interfaces. He was interested in everything, and he could interview whomever he wanted to as a journalist. Joining his seminar was a transformative experience for me. That seminar wasn’t just about listening to what he said. Rather, he would leave us with the topics that inspired us the most. During this time, we attended sessions led by the University of Tokyo and also other organizations where esteemed researchers discussed topics ranging from RNA to fusion science. Among the topics, I was responsible for leading a project that was about connecting brain-machine interface science to the world of animation. We got to attend discussion sessions where Takashi shared intriguing insights, accompany him during interviews with anime directors, and organize university festivals. All of this was not only related to brain-machine interfaces but also humanities, exploring the philosophical implications and contemplating how life should proceed with these kinds of technologies. This experience from my university life helped me build a broad idea of what science is, what humanities is, and how we can connect the two disciplines with engineering research.
Dr. Kato has been working on creating tools since his youth. This is a website for his music player software (https://digitalmuseum.jp/software/arxmp/), which he developed more than two decades ago.
Programming experience is about equipping programmers with the tools to create programs. It ties into the broader scope of creativity support research.
In the system that I built (shown in video), you can see there is a Lego Mindstorm-based robot raising a white flag. On the right side, there is a pose library where there are many photos of the robot. Usually when programmers write code for detecting human postures or controlling robots, they need to write a text-based source code containing statements like “if (human.getPose().eq(…)),” or “robot.setPose(...).” All of these commands are done in either text-based or symbolic ways. However, I wanted to consider the ubiquitous nature of programming in today’s world. Where there are robots, there can also be cameras. And cameras are capable of detecting human postures. How can we design a programming experience that allows us to delegate physical tasks to computers? This question connects to the broader idea of designing programming experiences with real-world elements.
Imagine there are multiple nodes all connected to each other. The relationships between the nodes define how the robot moves or behaves. But these are still either text-based or symbolic representations. If we want to point to a specific posture, we cannot easily name it. But if we want to write a traditional text-based program, we need to name it somehow. In many cases, people resort to using dates or times in variable names. In some cases, we could name a posture as “human with right hand up” or “human running.” However, the real world offers countless possibilities for various statuses, making it difficult to represent them adequately through text or symbolic-based source code.
The simple idea is to integrate photos directly into the code editor, allowing people to read the source code as an integration of photos and text-based code. This approach is way more accessible than relying on random text-based representations or arrays of numerical values. Photos are inherently comprehensible to humans, providing a visual context that aids understanding. When the program runs, the photos are translated into numerical data representing human or robot postures, which would be meaningless to the human user. The environment effectively manages the relationship between these photos and numerical values. Humans can easily interpret the graphical representations while computers can focus on their numerical counterparts. I actually held a workshop for primary school students who were not necessarily knowledgeable about programming but could take photos and enjoy editing the program themselves. This is a concrete example of what a programming experience can look like.
Other examples of programming experiences include utilizing pressure-sensitive keyboards, incorporating timely interfaces, enabling real-time program editing, or even allowing designers to create robotic elements that adds mobility to everyday objects like alarm clocks. These novel programming tools enable a spectrum of programming experiences, supporting creative and unconventional ideas.
Dr. Kato makes tools for creative activities and gets a lot of help from the domain experts, in this case an anime director using the storyboarding tool Griffith (https://research.archinc.jp/en/griffith).
AIST is actually not an industry job because it’s a national research institute. It’s really in between academic and industry, and unlike universities, we don’t take regular students. We do welcome student interns though. We collaborate with them to do research together, but it’s not “teaching.” From my perspective, AIST is taking the benefits of both sides of academic and industry jobs. I have a permanent position like an academic job because it’s a nationally funded research organization. I can do research for as long as I want, but I can also make a direct impact on society through our research tech transfer or developing the software to be used by the creators of programmers by ourselves.
Whereas, Arch Inc. is on the industry side. It is a Japanese animation production company, and Arch Research is a small team in Arch with interdisciplinary researchers including me, a computer scientist, and Dr. Ryotaro Mihara, as a cultural anthropologist. We are doing research on creativity support for animation, and studying how anime is being made in the industry. In many cases, the research and development team in companies need to focus more on the projects that will bring benefits in the near future for the company, but currently, we focus more on what’s really interesting about animation production.
At AIST, I of course do research but I also work on the actual development, such as helping companies run events with our research and development outcome, and I also do development that directly benefits creators.
For example, I’ve made an interface for musicians, where they can choose from available songs, search for available songs, and upload new songs. If I choose a song from the predefined list, and if you hit the playback button, you can see where the lyrics are being animated. You can switch between many different kinds of animations or change the parameters such as coloring. This interface is for novices in the sense that you do not have to be a professional to make animated lyric videos. This is a video authoring environment that I made for creators. It also has a user interface for programmers, where the programmers can define animation algorithms and you can see the modifications live as well. I originally intended it to be used for live music performances, but nowadays, people are using it for virtual reality performances. Everything is built into a single website called TextAlive (https://textalive.jp).
And because I’m a researcher, I like to publish papers. You can read the paper on TextAlive if you’d like. You’ll notice that the screenshot looks very different from the current system. At the time of the paper being published, I spent time turning the desktop application into a working website and finally made it public. It has been operating for over nine years, since 2015.
More recently, we’ve shifted our focus from lyric videos to lyric apps. One limitation of lyric videos is that they are sort of static—it is always the same. So our question was, how can we turn this into an interactive media? As part of this journey, we participated in a programming contest associated with Magical Mirai (https://magicalmirai.com/index_en.html), an exhibition of creative culture dedicated to the renowned character Hatsune Miku. The contest has been held for multiple years in a row as an integral part of the exhibition. Now, if you open a lyric app (https://magicalmirai.com/2023/procon/entry_2023_app/entry_app04/), you can click on the lyrics and interact with the scenery in virtual reality. Our technologies allow programmers to create these novel artistic expressions.
TextAlive allows novice users to create compelling music videos with synchronized animated lyrics.
Recently, in the spring of 2024, he was transformed into a cartoon character as part of AIST's PR efforts (https://www.aist.go.jp/aist_j/magazine/20240415.html).
The Association for Computing Machinery (ACM) Magazine for Students, or simply called XRDS (Crossroads), published twice a year, is the largest computer science-related academic community in the world. The Summer Edition of 2023 featured Computer-Aided Media, a topic related to what I demonstrated earlier. I write an article here titled "On the Relationship Between HCI Researchers and Creators" (https://dl.acm.org/doi/pdf/10.1145/3596927). So if you have the time, I highly recommend reading it. It's an open-access article available on the web.
Computer science is a fast-paced research domain, and there are many types of cool work happening every day. Some people publish papers every half of year, or in some fast-paced manner. But what I think is really important, as an HCI researcher, is to think of what people are doing in the real world, and in my case, what artists are actually doing. Rather than treat it as research material, get involved in the artistic practice, go into the world and build a personal relationship with artists, creators, and programmers.
For those interested in HCI, I really encourage students to be interested in something in the world apart from computer science. It’s all about building better relationships between computers and human beings. Learning the computer-related aspects like how to code and do user studies, you will learn, but the human part, I see many successful researchers having a lot of interest in the broad scope of human activities alongside computers. So it’s really important for students to explore a broad spectrum of activities. Personally, I recommend cultivating an interest in cultural activities, because it’s exciting. Just as an example, in Japan, there are many Buddhist statues or temples, and how we can connect these things with computer aided technology is already an interesting question. This interdisciplinary approach can enrich your understanding and contribute to meaningful advancements in the field. You can virtually do anything, which is the most exciting part of the field.
In a broader sense, computer science-related skills such as programming helps. Specifically within Human-Computer Interaction (HCI), design skills are highly appreciated, like designing a beautiful user interface. Being a nice person is also important. For me, it’s always about making tools for people. So, it’s better to be nice with people and maintain positive relationships with creators or those who can benefit from our systems.
One limitation I see in the current realm of HCI is that it can be heavily biased in some sense. There is a term called “WEIRD” – Western, Educated, Industrialized, Rich, and Democratic. Computer science, in general, when we think of companies big in the field, we think of Microsoft, Adobe, Amazon, and Meta (formerly Facebook). Virtually everything is US or Europe-based, and not so many are Japan-based. It’s similar for the research community actually, which is frequently led by the Western culture of the community.
I think it’s important to think of the diversity of the academic field. There are many excellent researchers based in Japan, Asia, and other regions. As a field that involves humanity, it’s important to think of the diversity of perspectives.
One example is about helping anime creators. Let’s say, I want to do research that may help Pixar animators. I might be able to publish papers at SIGGRAPH, a prestigious computer graphics conference, without explaining what Pixar is. But, if I want to write a paper that helps Japanese animation production processes, because the process is unknown in the community, I need to spend the introduction explaining what it looks like and how these animations are being made. The introduction consumes a lot of space in the paper, which produces a significant effect on how the paper can be accepted or rejected.
So, it’s important to think about these cultural differences and incorporate the practices of the world into the academic field. In that way, we can think of a better relationship between computers and humans on a global scale. Being in the international community and community building is really important, and something that I want to work on and see in the near future of our field.
Dr. Kato has collaborated with international researchers to make computers more useful to people. This picture is from a workshop he held in 2023, inviting a researcher from the US.