Package Management - Formats and Registries

Since my mind has been on it, I wanted to work out some of the ideas I had for formats in my packaging system. In this case, I'm going to focus on a single one, NuGet, because I have a fair amount of experience with this and it has some of the complexities that are throwing me.

Series

This is going to be a series of posts, but I have no idea of how fast I'll be writing them out. I want to work out my ideas, maybe have a few conversations, and then start to move to more technical concepts.

Configuration Files

All of the configuration for the system will be in a series of JSON5 (or JSON or whatever formats are officially supported) that are merged together with any conflict producing an error message and stopping the systems.

Assuming $GITDIR is the top-level directory for a Git repository, then the configuration would be in $GITDIR/.config/bakfu. All the files will be gathered together, but a .gitignore could ignore *.user.* which means authentication information could be stored locally but have a common configuration on top of that.

In this case, the files all have the same schema which is a required component because it also identifies the version of the file.

{
    "$schema": "...",
}

Any more details on how we look for configuration files will have to wait for another post.

What is a Format?

Right now, a “format” is a specific format of a package, such as NuGet package or a NPM one. Inspired by SGML catalogs and how I like to see things in Git repositories, a format looks roughly like this JSON5.

{
    // In this file, the various components don't have to be URL encoded.
    "formats": {
        "nuget": {
            "defaults": {
                "authority": "nuget",
            },
            "authorities": {
                "nuget": {},
            },
        },
    },
}

Since we merge files, that means I could create project-specific authority for packages that aren't (and probably never will be) on the official NuGet server. For example, my personal Forgejo instance at https://src.mfgames.com/mfgames-cil.

{
    // In this file, the various components don't have to be URL encoded.
    "formats": {
        "nuget": {
            "authorities": {
                "src.mfgames.com/mfgames-cil": {},
            },
        },
    },
}

Authorities

The authority itself is the complex part of the file because it needs to handle how to search for packages, how to download them, authorization needed, and how verification is done.

// merged formats.nuget.authorities:
"nuget": {
    // "enabled": true, // Implied so it can be disabled
    "registries": {
        "nuget.org": {
            "protocol": "nuget-v3",
            "url": "https://api.nuget.org/v3/index.json",
        },
    ],
}

Another file could merge additional registries in, much like you can have a proxy feed in DevOps.

// merged formats.nuget.authorities:
"nuget": {
    "registries": {
        "nuget.org": {
            "protocol": "nuget-v3",
            "url": "https://api.nuget.org/v3/index.json",
        },
        "example": {
            "protocol": "nuget-v3",
            "url": "https://example.org/proxied-feed/v3/index.json",
        },
    },
}

Controlling Order

In the NuGet.config file, there is also the ability to clear out the list of registries and use only a single set of identified registries. In this case, it would be a combination of disabling the known ones, changing the search for other files (the later post), and using a set of ordering controls.

// merged formats.nuget.authorities:
"nuget": {
    "registries": {
        "nuget.org": {
            "protocol": "nuget-v3",
            "url": "https://api.nuget.org/v3/index.json",
            "enabled": "false",
            // Below implies "search": { "after": ["example"] } }
        },
        "example": {
            "protocol": "nuget-v3",
            "url": "https://example.org/proxied-feed/v3/index.json",
            "search": {
                "before": ["nuget.org"],
            }
        },
    },
}

Additional Packages

In NuGet.config, it is possible to map a set of packages to a given URL, such as all MfGames* can only be found at a specific server. In those cases, that should be treated as a separate authority with its own set of registries (that may be a duplicate if it is also a proxy feed).

In the example below, contoso would be treated as a separate authority than the nuget default.

<!-- NuGet.config -->
<?xml version="1.0" encoding="utf-8"?>
<configuration>
  <packageSourceMapping>
    <packageSource key="nuget.org">
      <package pattern="*" />
    </packageSource>
    <packageSource key="contoso.com">
      <package pattern="Contoso.*" />
      <package pattern="NuGet.Common" />
    </packageSource>
  </packageSourceMapping>
</configuration>

Protocol

The protocol determines how the registry is accessed. I could see a number of possibilities:

  • NuGet V3 protocol, obviously NuGet-centric
  • NPM access
  • Directory location
  • gRPC proxy server

Since this library needs to be implemented across a number of platforms and libraries, I would expect that unknown protocols would be filtered out (maybe with a warning) and then the ones that can be accessed are used. If there are no valid ones, then the system should blow up.

The protocol also determines what additional settings might be required such as authentication, file system layout, or the like. It would be obvious specific to that protocol, so it is thrown into a generic “settings” object to control those things.

"local": {
    "protocol": "file-v1",
    "url": "file:///${env:HOME}/src/other/project",
    "settings": {
        // The layout for "bob" would be "b/bob".
        "package": "${PACKAGE_NAME:0-0}/${PACKAGE_NAME}/${PACKAGE_VERSION}",
    },
},

Search Controls

Another aspect if how to handle searching. NuGet will search all the sources at the same time and the first one that responds gets it. However, in some cases, one might want only certain ones to be searched and then stop if it isn't there (such as a full feed proxy verses a subset proxy feed).

I would see this as controlled by two components, at the authority and for an individual repository.

// merged formats
"nuget": {
    "authorities": {
        "nuget": {
            "search": {
                "concurrent": true,
                "defaultOrder": "Alphabetical",
            },
            "registries": {
                "nuget.org": {
                    "protocol": "nuget-v3",
                    "url": "https://api.nuget.org/v3/index.json",
                    "search": {
                        "notFound": "Stop",
                        "timeout": {
                            "time": "00:01:00",
                            "action": "Retry",
                            "maximumRetries": 3,
                        },
                    },
                },
                "example": {
                    "protocol": "nuget-v3",
                    "url": "https://example.org/proxied-feed/v3/index.json",
                    "search": {
                        "notFound": "Continue",
                    },
                },
            },
        },
    },
}

Conclusion

Well, that's my thoughts on authorities and how to search them for packages. One thing you might notice is that I don't have offline packages in the above examples. I want to treat offline (cached) files as a first-class concepts in the packaging and so that requires its own discussion. Not to mention, the cache is for all formats, not just one.

I also want to eventually introduce the ability to have services provide opinions on packages. This way, I could set up a service that translates CVE alerts into controls of the packages found or allow a project-specific settings that would hide packages that were incompatible with the current project. I just don't know how to call them.

Computer science cannot solve two things: cache invalidation, how to name things, and off-by-one errors.

Metadata

Categories:

Tags: