blog/pb071/pexam/b-garbage_collect.md

---
slug: garbage_collect
title: Practice exam B
description: |
  Garbage everywhere…
---

# Garbage Collection

:::caution Exam environment

* During the exam you will be provided with a barebone _exam session_ on the
  _faculty computers_.
* In browser you are only allowed to have the following tabs open:
  * [C documentation](https://en.cppreference.com)
  * page containing the assignment
* You **are not** allowed to use your own source code, e.g. prepared beforehand
  or from the seminars.
* You have **5 minutes** to read through the assignment and ask any follow-up
  questions should be there something unclear.
* You have **60 minutes** to work on the assignment, afterward your work will be
  discussed with your seminar tutor.

:::

You have gotten into a trouble during your regular upgrade of your archLinux[^1]
installation… You've been carelessly running the upgrades for months and forgot
about clearing up the caches.

Your task is to write a program `garbage_collect` that will evaluate the shell
history provided as a file and will try to find files or directories that are
suspiciously big and decide which of them should be deleted to free some space.

## Format of the shell history

You are provided one file consisting of the captured buffer of the terminal. You
can see only two commands being used:

1. `cd ‹somewhere›` that changes the current working directory.

   At the beginning you start in the root of the filesystem (i.e. `/`).

   You are **guaranteed** that `‹somewhere›` is:
   * `.` that is a current working directory (i.e. does nothing),
   * `..` that moves you up one level (in case you are in `/`, does nothing), or
   * is a valid directory in the current working directory.

   :::caution

   There are no guarantees or restrictions on the names of the files or
   directories!

   :::

1. `ls` that will list files in the current working directory and their
   respective sizes. If there is a directory in the current working it has `dir`
   instead of the size.

```
$ ls
dir a
14848514 b.txt
8504156 c.dat
dir d
$ cd a
$ cd .
$ cd .
$ cd .
$ ls
dir e
29116 f
2557 g
62596 h.lst
$ cd e
$ ls
584 i
$ cd ..
$ cd ..
$ cd d
$ ls
4060174 j
8033020 d.log
5626152 d.ext
7214296 k
```

For this input, you will get following file system:
```
- / (dir, size=48381165)
  - a (dir, size=94853)
    - e (dir, size=584)
      - i (file, size=584)
    - f (file, size=29116)
    - g (file, size=2557)
    - h.lst (file, size=62596)
  - b.txt (file, size=14848514)
  - c.dat (file, size=8504156)
  - d (dir, size=24933642)
    - j (file, size=4060174)
    - d.log (file, size=8033020)
    - d.ext (file, size=5626152)
    - k (file, size=7214296)
```

## Format of the output

Your program should support 2 switches:

* `-gt ‹min_size›` that will print out suspiciously big files.
* `-f ‹total_size› ‹min_unused›` that will print out a file to be deleted.

### `-gt ‹min_size›`

With this switch you are provided one additional argument:

* `min_size` that is the lower bound (inclusive) for size of any file or
  directory that is supposed to be listed.

When your program is being run with this switch, it is is supposed to print out
all files **and** directories that are bigger than the provided `min_size`.

### `-f ‹total_size› ‹min_unused›`

With this switch you are provided two additional arguments:

* `total_size` that is a total size of the filesystem[^2].
* `min_unused` that is a minimum of free space required for an upgrade.

Your program should find **exactly one** file or a directory that is of the
smallest size, but big enough to free enough space for the upgrade to proceed.

In other words, if that file or directory is deleted, following should hold:
$$
\mathtt{total\_size} - \mathtt{used} \geq \mathtt{min\_unused}
$$

## Example usage

You can have a look at the example usage of your program. We can run your
program from the shell like

    $ ./garbage_collect shell_history.txt -gt 10000000
    24933642 /d
    14848514 /b.txt
    48381165 /

    $ ./garbage_collect shell_history.txt -f 70000000 30000000
    24933642 /d

## Requirements and notes

* Define **structures** (and **enumerations**, if applicable) for the parsed
  information from the files.
* For keeping the “records”, use some **dynamic** data structure.
  * Don't forget to consider pros and cons of using _specific_ data structures
    before going through implementing.
* You **are not required** to produce 1:1 output to the provided examples, they
  are just a hint to not waste your time tinkering with a user experience.
* If any of the operations on the input files should fail,
  **you are expected to** handle the situation _accordingly_.
* Failures of any other common functions (e.g. functions used for memory
  management) should be handled in **the same way** as they were in the
  homeworks and seminars.
* Your program **must free** all the resources before exiting.

[^1]: Also applies to Fedora, but… we use arch btw :wink:
[^2]: duh!