SMU Office of Research – Like invisible scaffolding, millions of lines of computer code underpin the software we use each day, from the simplest smartphone apps to complex behemoths such as Google’s Internet services. As software becomes more sophisticated, developers and programmers must still be able to produce high quality, error-free code—the modern world’s dependence on software means that inefficiencies and mistakes can cost businesses billions of dollars every year.
On a mission to improve the coding process is Assistant Professor Jiang Lingxiao at the Singapore Management University (SMU) School of Information Systems. Professor Jiang works on tools that help developers navigate the sea of existing code, as well as take full advantage of this vast repository of valuable information.
“More and more software, especially open-source software code, is being written, and there is much accumulated knowledge in the code waiting to be extracted and reused by developers,” he explains. “The main purpose of the tools is to help developers better comprehend the software code created by themselves and others. This way, they are able to learn from each other, which in turn improves coding productivity and reduces code bugs.”
Enhancing the programmer’s toolbox
For instance, Professor Jiang is developing code search techniques that allow programmers to sift through large databases for code which they can then repurpose for their own needs. This is helpful and time-saving when programmers need to implement functions they might not be familiar with.
One such technique incorporates feedback from the user to refine and reorder search results, placing the most relevant hits at the top. Another technique, called fault localisation, helps coders identify faults that cause failures during software execution, so that they can quickly debug and repair them.
Professor Jiang’s most widely-used tool, known as DECKARD, detects code “clones”. These duplicated fragments of code are common in codebases, and may arise when programmers, in a hurry to implement certain functions, simply copy and paste lines of code into their work without fully integrating them. This poor programming practice potentially introduces errors into the code, and in the long run also makes the code longer and more complex, and hence more difficult to maintain. Clone detection thus helps developers identify bugs and prune unwieldy code.
The algorithm behind DECKARD was published in 2007 in Proceedings of the 29th International Conference on Software Engineering. “The techniques used in the tool have influenced many other tools, and many people have built further improvements on top of it,” he says.
Using contextual data to improve function
Like most programming tools, DECKARD currently focuses on analysing the program code itself. But computer code is more than just standalone strings of letters and numbers, Professor Jiang notes. “To understand code better, write code faster, and identify bugs in code more accurately, we need to look beyond the code itself into its contexts, just like we often need to look at the contexts of a sentence in English to comprehend it properly,” he advises.
Contextual information may include data dependencies between the piece of code in question and the surrounding code—that is, one depends on the output data of the other. Users who run the code add further context in the form of input data, usage patterns, feedback and complaints; developers themselves may also contribute context in the form of coding and learning behaviour, project management approaches, and choice of development tools.
The absence of contextual data may be fine for well-defined coding-related tasks, says Professor Jiang, but the data is essential for larger, more ambiguous tasks, such as the code search and fault localisation problems he works on. And higher level contexts, including the code’s potential applications in society and its commercial and cultural impact, are much more difficult to measure, he points out.
Professor Jiang and his colleagues recently developed AutoQuery, a code search engine that uses contextual dependencies to improve search results. The research, available online in the journal Automated Software Engineering, describes that in addition to keywords, the user also enters code fragments into AutoQuery to indicate the context they are working under. The combination of keywords and data dependencies allows the search engine to pick out more relevant code samples from the database.
Intelligent coding tools
Given the continuously increasing complexity and sophistication of computer software, Professor Jiang believes that intelligent, adaptable tools that can understand the needs of the programmer will soon become indispensable. In the long run, he wants to develop advanced, context-aware tools that would enable scenarios such as the following:
A developer is asked to write a new piece of software. He fires up the coding tool and gives it the task requirements, written in English.
The tool parses and breaks down these instructions, and then queries existing code databases for relevant sample code that could be useful to the developer. It then suggests possible solutions to the developer, who takes advantage of this information to design the software.
This process can be repeated, at finer grain, for individual components of the software, until the entire structure is complete. The tool can also identify errors in the software and evaluate its performance, again making suggestions for corrections and improvements.
By Sim Shuzhen