Effectively using GitHub Copilot in VSCode
Collected (non-exhaustive) tips and advice. May be periodically updated as I expand my usage of Copilot into different projects
Content
About Copilot
Part of understanding how to use Copilot effectively is understanding its basic features.
GitHub Copilot is a coding assistant that generates suggestions for your code. It is powered by the large language model (LLM) OpenAI Codex, described as a "descendant of GPT-3" in OpenAI's blog. Codex is trained on natural language data and publicly available GitHub source code. OpenAI claims that Codex performs best with Python but is also capable of assisting with other popular languages like JavaScript, Swift, or Ruby.
The model is closed-source and its architecture has not been publicly disclosed, but we can make some assumptions based on characteristics of the GPT family (which includes the popular ChatGPT). GPT models are decoder-only and generate text based on the preceding (or left) context. In a GitHub blog post from May 2023, however, the team announced that Copilot now uses a fill-in-the-middle paradigm that takes in information after (or to the right of) the working line. (Aside: These capabilities are still compatible with decoder-only assumptions. A bit of wrangling and prompt engineering can reposition right context to be input as normal left context).
For Copilot, the context used includes not only the document currently open, but other open files in your codebase. Although OpenAI and GitHub have not publicly disclosed the size of Copilot's context, one could assume that it is fairly long and at least comparable to ChatGPT's 4096 tokens (around 3000 words). Whatever the length, LLMs tend to be less effective at integrating information found in the middle of the input context, discussed in the 2023 preprint Lost in the Middle: How Language Models Use Long Contexts by Liu et al.
Copilot advice
The following assumes use of Copilot with Python.
Use norms
The more your code resembles training data, the more Copilot can help. E.g., using common packages like scikit-learn
, numpy
, pandas
, etc., improve Copilot's suggestions because there are likely enough training examples that the model has learned to generalize. Even something minor, like remembering to import numpy as np
or import pandas as pd
rather than using their full package names, will likely bring the code more in line with predictable training data.
Additionally, knowing how workflows are typically structured will help. Given examples done on your trn
dataset, Copilot will readily assume there's a corresponding tst
or val
dataset that requires the same operations.
Prompt with example code
Given preceding text, LLMs will often predict that the next lines will be similar. Providing an appropriate example of what you want code to accomplish, such as writing a small test case or only one iteration of a loop, will direct Copilot to copy that example and generalize it to the next relevant step.
Prompts can be very short, such as the first portion of a much longer function call. Often, Copilot can be faster than rifling through documentation, given that you have some intuition that a certain package will accomplish a task. For example, in array manipulation, briefly commenting the task to accomplish and then starting the line with x = numpy.
or x = np.
will likely find the numpy
function to help.
Recopying relevant code to bring it closer to the end of the context window can also improve suggestions. For example, if you have a function that requires a very specific set of formatted inputs, copying the function definition will nudge Copilot into appropriately organizing the data.
Prompt engineer comments
Although Copilot could probably infer a task based on its context in, for example, markdown paragraphs of a notebook, comments are often the best way to elicit the ideal code output.
Describe the task effectively
Keeping comments short and structurally similar to how someone would naturally annotate their code can help Copilot identify the most relevant information. Copilot will likely do better with eliciting the correct operation for a task when it is clearly named and identified.
For example, when tuning hyperparameters, the first two examples are more likely to get the "most correct" response than the wordier third option.
# Cartesian product of lists a and b
# Grid search over hyperparameters a and b
# The following generates a list of sets of the elements of lists a and b such that every combination is represented.
Elicit rewrites
Copilot can also be prompted to rewrite inefficient or messy code, even code that it initially suggested. (Why Copilot won't always inherently provide the "best possible suggestion" is a question for another day). Using the comment, "rewrite the above" can often help with fixing mediocre suggestions, as does commenting out the subpar code and letting Copilot autocomplete the next suggestion.
Writing the comment with specific directions, e.g., "rewrite the above as an iterable" or "rewrite the above using numpy", also provides a little bit more leverage for the suggestion engine.
Lower your expectations
Currently, Copilot still does not replace having a team member or collaborator with a decent background in coding. It is unlikely to be an effective tool for someone with no coding experience.
Copilot is vulnerable to the same weaknesses as most LLMs, such as impaired capability on tasks with very little training data. Occasionally, Copilot can fail to understand how data like nested lists or dictionaries are structured and suggest code that writes to non-existent keys or indices. Hallucination of terms should be expected, and Copilot may call a nonexistent function or import a package under the wrong name.
Where Copilot works well
I found Copilot to be fairly useful at streamlining repetitive, predictable tasks. It has not been especially good at generating code for unusual data-science asks or tasks very specific to the project at hand.
Copilot will probably be most effective for someone who has some fluency in Python or transferable skills in another coding language. Quick evaluation, debugging, and discarding of suggestions seems essential to smoothly integrating Copilot into one's work.
Background on my Copilot use
Mostly, I've been using Copilot suggestions for assistance in data wrangling and natural language processing. I mainly work with Python in Visual Studio Code. I have some basic experience in Python (mostly through introductory courses in college).
Boilerplate, loops, and functions
By generalizing from example code, Copilot can eliminate the need to recopy code with minor changes to variable names and is often faster than a search and replace. After developing one iteration of a loop, Copilot is fairly good at a) generalizing the code into a function, or b) rewriting the code to work over a loop.
Additionally, Copilot often helpfully suggests appropriate lambda functions, a feature of Python that I consistently forget to implement.
Data wrangling
Copilot is much better than me at memorizing the quirks of how, for instance, pandas
DataFrame are indexed. I still do much of my data wrangling in R (and mostly think along the lines of tidyverse
pipelines), but Copilot helpfully provides valid equivalents of task I know how to do in R.
Generating tutorial-like code
If something has many tutorials or examples online, such as training a feed-forward neural network or running a K-fold cross-validation loop, then Copilot needs very little assistance to generalize these basic training examples. Additionally, Copilot tends to pull from the most updated version of examples, which saves some hassle in debugging deprecated packages.