Other Useful Techniques

Outline#

Hey there! Congratulations on making it this far! Up to this point, we have covered everything from early preparation, such as Git installation and Linux basics, to primary Git usage like forking, cloning, committing, and pushing. You have learned how to work both alone and collaboratively using Git, which is a great progress! If you’re considering pursuing an IT role, such as a software engineer or web developer, you have learned one of the most important skills. Well done!

This page will focus on other techniques that are useful but may not be commonly used in Git projects, such as telling Git not to track specific files and locating a bug in history commits. You can come back and learn these techniques when you need to. Don’t panic! They are easy :-)

.gitignore#

As we know, Git keeps track of all file changes in the repository. A change can vary from as small as adding a space to as big as file creation or deletion. But is there a way to make Git ignore specific files no matter how they change?

You may have this question while working on actual projects. For example, Visual Studio Code may create a hidden folder called .vscode where VS Code stores project-specific settings and configurations. But these config files sometimes don’t belong to our project and you might not be happy to share your settings with the team.

Fortunately, Git provides something like a whitelist where you can put all of these files you don’t want to share and Git will ignore them automatically. It can also keep the remote repository clean, avoid unnecessary file changes and protect your sensitive information. This whitelist is .gitignore.

.gitignore file sits in the root of your local repository. It contains a list of patterns. When Git sees these patterns, it understands what corresponding files and directories to ignore. Here’s a simple example of .gitignore:

# Ignore temporary files
*.tmp
*.swp

# Ignore environment files
project/.env

# Ignore editor settings
.vscode/

Let’s take a closer look to understand what it means:

The first part tells Git to ignore any files with an extension “.tmp” or “.swp” where * means all characters including letters and numbers.
The second part specifically tells Git to ignore a single file inside “project/”. In another word, Git will ignore a file named “.env” in “project/”.
The last part makes the folder .vscode/ and everything in it untracked from Git. The directory .vscode/ sits in the root of this repository, the same as .gitignore.

This example should help you understand what a .gitignore file is and how it works in our projects. Git can understand many more patterns. If you are interested to learn more, you may have a look at the official documentation. Now let’s move on to other techniques.

Restore Commits#

Let’s recall the Git Data Model. There are three layers in total:

Workspace is like the present moment where you make changes to your local copy.
Staging Area is like a snapshot before time travel. You can select and prepare specific changes or files to be included before travelling.
Local Repository stores a collection of snapshots or time-travel checkpoints.

Now we know git add stages changes (e.g. new files, modifications, and deletions) that you want to include in the next commit. Snapshots are taken and moved from the workspace to the staging area.

Conversely, git restore can discard staged changes by unstaging them, that is, removing the snapshots from the staging area without modifying the workspace. It can also revert the file to its last committed state, discarding any modifications in the workspace.

It’s recommended to keep changes in the workspace after unstaging them. To do so, we can run git restore --staged <file>. If you are certain, you can discard these changes in workspace simply by removing --staged. That’s how simple git restore is!

Difference Between Two Snapshots#

This section includes material from “Monash SCI1022 Git Introduction” by Alberto F. Martin and Santiago Badia, licensed under Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

So far we have seen many examples of committing in previous pages. Commits contain the staged changes. However, we may also want to modify the staged files sometime later. When we do so, it would be useful to see the changes we made on these staged files before adding them. Let’s explore how it can be done in the following example.

Suppose we have added a README.md document in the last commit, which lists Peter as one of the authors. Now we added a new name “John” to the author list. Note that this new change has not been added into the staging area. We can confirm it by checking the git status:

$ git status
On branch main
Your branch is up to date with 'origin/main'.

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
        modified:   README.md

no changes added to commit (use "git add" and/or "git commit -a")

Great! Let’s see what happens after running git diff now:

$ git diff
diff --git a/README.md b/README.md
index 2aaa441..1138769 100644
--- a/README.md
+++ b/README.md
@@ -3,3 +3,4 @@ This project is for showcasing the examples in Git Manual.
 ## Authors

 1. [Peter](u7xxxxxx@anu.edu.au)
+2. [John](u7xxxxxx@anu.edu.au)

By default, git diff shows the difference between the staged version of files (or the snapshots) and the same files with unstaged changes. In our case, we only modified one file README.md by adding a new line in the end. Let’s take a look at this log from top to bottom.

To quit the log and get back to the terminal, simply press q.

The first line diff --git a/README.md b/README.md tells you there’s a difference between two versions of the file README.md. The second line index 2aaa441..1138769 100644 contains two things:

the hashes (2aaa441..1138769) representing the file’s state before and after the changes;
a file mode number 100644 which indicates the file is a regular file instead of an executable or a symlink.

In the third and fourth line, the log tells us that lines prefixed with - in the next part indicate what was removed in the staged README.md while lines prefixed with + indicate what was added in the new version of README.md.

Next, we see a hunk header (@@ -3,3 +3,4 @@) and the file content. A hunk header shows where the changes in a file start and how many lines are involved before and after the change. It looks like this: @@ -start,count -start,count @@, showing where the changes are in the README.md:

-3,3 means in the staged file, the change starts at line 3 and affects 3 lines.
+3,4 means In the new file, the change starts at line 3 but now affects 4 lines (since a line was added).
+2. [John](u7xxxxxx@anu.edu.au) has a + prefixed and thus tells us that this line was the new line added.

Likewise, if we replace “Peter” with “John”, the last part of log will be changed to:

-1. [Peter](u7xxxxxx@anu.edu.au)
+1. [John](u7xxxxxx@anu.edu.au)

Because git diff compares the changes between the staged version and the unstaged version, once we stage our new changes using git add, we will find git diff outputs nothing. In this case, if we want to see the comparison between the changes we staged and the last commit, we need to add --staged at the end of command. By doing so, we should see the original log back:

$ git diff --staged
diff --git a/README.md b/README.md
index 2aaa441..1138769 100644
--- a/README.md
+++ b/README.md
@@ -3,3 +3,4 @@ This project is for showcasing the examples in Git Manual.
 ## Authors

 1. [Peter](u7xxxxxx@anu.edu.au)
+2. [John](u7xxxxxx@anu.edu.au)

Above is the basic use of git diff. It’s a handy tool for tracking and reviewing changes before making commits! Apart from this, git diff can also:

compare changes in specific file: git diff <file>;
compare between two branches: git diff <branch-name_1>..<branch-name_2>;
compare between two commits: git diff <commit-hash_1> <commit-hash_2>;
compare the workspace with a commit: git diff <commit-hash>.

Remove a Remote Commit#

In the previous chapter Reverting and Resetting, you have learned how to “remove” local commits. How about the a remote commit that you’ve accidentally pushed to GitLab? Maybe it contains a bug, sensitive information, or was just plain wrong. This section will show you how to get rid of it, but be warned: rewriting history, especially remote history, is a potentially disruptive action. Only do this if you’re absolutely sure you know what you’re doing and you’ve communicated with your team.

Why is this risky? If other people have already pulled your bad commit, they’ll have a different version of history than you after you rewrite it. This can lead to confusion, merge conflicts, and generally a bad time. It’s always better to fix a bad commit with a new commit that reverts or corrects the issue if possible.

When is it okay to do this?

You’re working on a branch that only you are using. This is the safest scenario.
You’ve just pushed the commit and you’re absolutely certain nobody else has pulled it yet.
You’ve coordinated with your entire team and everyone agrees this is the right course of action.

Here’s how to do it:

Find the commit before the bad one. Use git log to search for it and note down its commit hash (the full hash, just to be safe).
Reset your local branch. Run git reset --hard <commit-hash> and replace <commit-hash> with the actual commit hash you found in step 1.
Force push to the remote (With Lease!): git push origin main --force-with-lease.

Why use --force-with-lease instead of --force? --force-with-lease is a safer option. It checks if the remote branch is in the state you expect it to be before pushing. This helps prevent accidentally overwriting someone else’s changes if they’ve pushed to the branch since you last fetched.

That’s it! Make sure you double check and communicate with your teammate before you do this.

Remove a Remote Branch#

In page 6, we learned that git branch -d <branch> can remove a local branch. But how to remove a remote branch using commands?

The solution is pretty simple: git push origin --delete <branch>. Compared to the command itself, perhaps the hardest part is to communicate clearly with your teammates and avoid mistakenly removing important branches on GitLab 😄

Locate Bugs#

As your team develops your project, many new features are introduced, so as bugs. Remember, no programme is perfect. You may be surprised to find out that a feature stops working or behaves strangely one day. Take it easy because that’s actually normal life for software developers. Now we need to efficiently pinpoint where goes wrong.

However, there are too many commits in the history after we check git log because the project has been developed for a while. Sometimes the commit messages can be obscure, making it difficult to understand what this commit actually does. At this point, the least thing we want to do is to time travel back to each of those suspicious commits and see if the bug exists. So the question is: Do we have an easy way to narrow down the range of commits which introduced this bug? Yes! We have a powerful Git command to help you, that is, git bisect.

What does `git bisect` work#

git bisect begins by asking you to specify a good commit, where the bug doesn’t exist, and a bad commit, which is the latest commit you discover the bug. Git then checks out a commit halfway between these two commits and asks you to test if the bug exists. Based on your feedback, Git will further check out to another commit sitting in between to narrow down the range and ask for your feedback again. This is called binary search. Now that you know how it works, you should understand why we recommend you to keep a clean commit history.

How to use `git bisect`#

The steps to use git bisect are easy to follow.

First, we need to copy the hash code of the good commit where the bug doesn’t exist.
Second, we tell Git to start a bisect session: git bisect start.
Third, we mark the current commit as bad: git bisect bad.
Then, we tell Git which commit is good with the has code we copied in step 1: git bisect good <commit-hash>.
Next, Git will check out a commit halfway between the good and bad commit and prompt you to test whether the bug exists in that commit.
After testing, we should mark this commit as either good or bad: git bisect good or git bisect bad.
Finally, repeat step 2 to step 6 until Git identifies the first bad commit. To end the bisect session and return to where you started, use git bisect reset.

With a clean commit history, this command is efficient for debugging, especially in large codebase with an extensive commit history.

Moving On#

This chapter is supplementary. With the power of git bisect to track down bugs, git restore to undo changes, .gitignore to keep your project clean and more, you’ve gained valuable tools to maintain and debug your code efficiently. These commands help streamline your workflow by managing changes and focusing on what truly matters in your repository.

As we are getting closer to the end of Git Manual, we’ve covered most knowledge of Git for both individual and team projects. But team projects often come with their own challenges — especially when multiple people work on the same code. So far, the given examples in all previous pages never encounter one of the most common hurdles: conflict. That’s why the next page will tackle conflict handling. You’ll learn how to identify, resolve, and prevent conflicts to keep your workflow smooth and hassle-free. Ready? Let’s dive in!