Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Indexing scripts rework #348

Open
wants to merge 21 commits into
base: master
Choose a base branch
from
Open

Conversation

tleb
Copy link
Member

@tleb tleb commented Nov 8, 2024

Hi!

⟩ git diff --stat=80 master..indexing-scripts-rework
 README.adoc                  |  35 ++--------
 docker/Dockerfile            |   4 +-
 utils/common.sh              |  24 -------
 utils/index                  | 157 +++++++++++++++++++++++++++++++++++++++++++
 utils/index-all-repositories |  96 --------------------------
 utils/index-repository       |  29 --------
 utils/pack-repositories      |  48 -------------
 utils/update-elixir-data     |  38 -----------
 8 files changed, 165 insertions(+), 266 deletions(-)

This is a big rework of the wrapper scripts around indexing. Notice how we remove all scripts to replace by a single one called utils/index.

  • It takes as first argument the data path, to avoid using hardcoded paths. (It also avoids hardcoded update.py path.)
  • It started from index/index-repository, so it can be called in the same manner (eg ./utils/index /srv/elixir-data musl https://git.musl-libc.org/git/musl). This will init the project (if not already existing), add remote (if not already existing), fetch and index.
  • It gained the ability to be called like this: ./utils/index /srv/elixir-data musl. It will notice that we want musl and automatically add the right remote URL from its list. This is matched on the project name.
  • It can do ./utils/index /srv/elixir-data --all. This will add all known projects remote (if not already existing) and fetch+index them. Replaces index-all-repositories and update-elixir-data.
  • It does something similar to pack-repositories, but automatically. Previous setup meant manual intervention was required. We remove the section about pack-repositories from the README.

Opinions @fstachura? Commit messages contain much more information.

Closes #342

COPY sources in two steps: (1) copy requirements.txt then do `pip
install` stuff then (2) copy all remaining sources.

This means the iterating time to rebuild the Docker image when editing
sources is much shorter: from 22.3s to 7.3s on my machine.

Signed-off-by: Théo Lebrun <[email protected]>
Previous sequence:
 - git clone ...               # first fetch
 - git remote add remote0 ...
 - git fetch remote0           # second fetch
 - git remote add remote1 ...
 - git fetch remote1           # third fetch

Now:
 - git init
 - git remote add remote0 ...
 - git remote add remote1 ...
 - git remote add remote2 ...
 - git fetch --all -j4         # all fetches at the same time

Signed-off-by: Théo Lebrun <[email protected]>
This is pretty useful as update-elixir-data gets called often to check
for new updates. Most often, there are none, so checking all remotes at
the same time is useful. This only applies to the kernel, that is the
only project using multiple (three) remotes.

Signed-off-by: Théo Lebrun <[email protected]>
Simplify the script. We never `cd` into the directory, we instead use
`git -C`. Avoid repeating it by creating a $git variable.

Signed-off-by: Théo Lebrun <[email protected]>
Make utils/index-repository idempotent, meaning we can call it multiple
times on the same repo and same remotes without issues.

Also allow adding new remotes to an existing repo.

Signed-off-by: Théo Lebrun <[email protected]>
$ELIXIR_THREADS fallback to nproc is straight forward code, much more
than the incantation to find the path to the Elixir install path.
Remove the incantation and replace by simple code:

    if test -z "$ELIXIR_THREADS"; then
        ELIXIR_THREADS="$(nproc)"
    fi

Signed-off-by: Théo Lebrun <[email protected]>
Allow calling like:

    ./utils/index musl

That will do the same thing as before (fetch+index).
It works only if a previous call was made to add remotes.

Signed-off-by: Théo Lebrun <[email protected]>
Previously:

    LXR_PROJ_DIR=/srv/elixir-data ./utils/update-elixir-data

Now:

    ./utils/index /srv/elixir-data --all

The impact is slightly different: it also has the side-effect of
creating all known projects (Linux, U-Boot, etc.) if they didn't exist.
We have asked around and we are not aware of any other Elixir instance.
To keep the previous behavior, if people don't want to index all
supported projects:

    x=/srv/elixir-data
    find $x -mindepth 1 -maxdepth 1 -printf "%f\n | \
        xargs -L1 -r ./utils/index $x

Signed-off-by: Théo Lebrun <[email protected]>
Avoid the following Git warning:

hint: Using 'master' as the name for the initial branch. This default branch name
hint: is subject to change. To configure the initial branch name to use in all
hint: of your new repositories, which will suppress this warning, call:
hint:
hint:   git config --global init.defaultBranch <name>
hint:
hint: Names commonly chosen instead of 'master' are 'main', 'trunk' and
hint: 'development'. The just-created branch can be renamed via this command:
hint:
hint:   git branch -m <name>

Signed-off-by: Théo Lebrun <[email protected]>
utils/pack-repositories did the following on repos which have a gc.log
file existing (created when GC fails):

    git prune
    git gc --aggressive
    git prune
    git gc --aggressive

Here we:

 - Delete utils/pack-repositories; we don't want that detection to be
   done manually. Instead, we integrate the gc.log detection into
   utils/index that should be called often.

 - Create a hidden flag ($ELIXIR_GC) to allow manual trigger.

 - Replace the above sequence with a simpler `git gc --aggressive`.
   Let's trust Git.

 - Do a `git gc --auto` in the default case. This call is automatically
   done by porcelain commands but we don't run any so let's give Git an
   opportunity to cleanup from time to time (heuristic based).

 - Replace the gc.log detection from:
      find . -name gc.log
   To:
      test -e $data/$project/repo/gc.log
   It should be more reliable. With the first approach we risk projects
   that contain a file gc.log to trigger the detection on each run.

Signed-off-by: Théo Lebrun <[email protected]>
New script utils/index does an automatic call to `git gc --auto` and if
it detects a gc.log file, it runs `git gc --aggressive`.

There shouldn't be any reason for people to have to think about that
aspect. Remove that info from the README and make it lighter weight.

Signed-off-by: Théo Lebrun <[email protected]>
Previously, to start an indexing from scratch:

    ./utils/index /srv/elixir-data musl https://git.musl-libc.org/git/musl

This is annoying as the script already has the remote URLs for all known
projects. Now, a call without remote will automatically add the remote
URLs matching the project name:

    ./utils/index /srv/elixir-data musl

This copies the behavior that was previously only implemented for --all.

Signed-off-by: Théo Lebrun <[email protected]>
Stop writing a global file when initializing projects. This can cause
permission issues. We instead pass the option manually for each Git
process call using:

    git -c safe.directory=...

Signed-off-by: Théo Lebrun <[email protected]>
Instead, start from $0 and move back up two times. So, something like:

    ./elixir/utils/index
    ./elixir/utils
    ./elixir
    ./elixir/update.py

Signed-off-by: Théo Lebrun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

utils/index-* should support starting from existing Git repositories
1 participant