Capabilities

Lately, I've been thinking about capabilities of server APIs and how the clients interact with them. This is driven by a couple things I'm considering doing in the future but also with the frustration that certain features in the fediverse are difficult to implement for clients.

In specific, I'm very fond of Markdown posts and emoji responses. I love Markdown for all things and I like the low-attention responses of emojis when you don't need to have a big post to say “I'm sorry” or “that blows”. My family and work buddies use emoji responses fairly heavily, and it is well supported with Slack, Discord, and even Microsoft Teams as of this week.

Mastodon decided not to implement them which is one reason I like Pleroma and Misskey, though Misskey has historically been too heavy to put on my servers so one of my accounts uses Pleroma.

However, my favorite Android client, Tusky handles Markdown fairly well on the Glitch instance I'm on, but doesn't for Pleroma and it doesn't do emoji responses at all. Huksy supports these as a fork of Tusky specifically for Pleroma. But Husky doesn't really play well with the Glitch, which means I struggle to find One Client to Rule Them All™.

Likewise, I want to switch my Pleroma for GoToSocial for various reasons, but they also support Markdown but not emoji responses. Mainly because there is no standard for emoji responses and if I want to get emoji responses, I need broader acceptance or patterns to work with.

So, this comes down to capabilities. I want there was a way of having Tusky be able to query the server to determine it can handle emoji responses, so it can turn on that logic so I don't have to have Tusky and Husky. Simple, but I also want it to have the ability to know if the server can handle Markdown (base Mastodon can't, Glitch can) without doing various server identification scans.

This means a fedi server should have some ability to say “I can handle this feature” that can be queried. If there was a standard for that, then Tusky or a CLI or any other client could selectively enable or disable the feature as needed.

Fine-Grained

This is talking about a fine-grained approach to APIs. We have many different fedi servers out there, being developed and also in production. Likewise, we have a number of clients that are also growing as the fedi becomes more popular. I think we're going to see features moving from server to server as they become interesting and there is no way every client can make that choice for every server.

Identifying those abilities will make it easier to say “this client supports Markdown for those servers who have it”. It decouples the tie between the servers and the clients by giving a bit of an abstraction, not unlike how the Language Server Protocol has made developing language refactoring a lot easier across multiple clients by working to an interface.

Approach 1 - Instance Information

The first approach, which is somewhat specific to fedi servers, is to update the instance information (Mastodon, Pleroma doesn't have it implemented) to provide that information to clients. This would require updating Glitch and submitting a patch to Pleroma (and maybe Misskey) once some sort of understand/agreement could be found on how to identify these features.

This is probably the easiest and I think would really benefit the client ecosystem as a whole, not only to give me the features I want.

Approach 2 - OpenAPI Support

A more generic approach, which is beyond just fedi servers, is to encourage most API servers to also expose their OpenAPI v3.1.0 file in a well-known location that describes all the endpoints they support for that specific instance. This has benefits beyond just describing what they can handle.

However, knowing there is an endpoint that can handle something doesn't mean it works the same way. Pleroma may have the same abilities as Glitch for Markdown, but they may not be communicating them in the same way or require different inputs. There needs to be a signal that indicates a specification is in use.

References

OpenAPI has a $ref which can be used to describe an endpoint; this reference is a JSON Reference that allows an endpoint to point to another specification either in the same document or to an external URL.

It is the external URL that interests me. If that points to a fragment of OpenAPI specification that is documented for a specific feature, that would be a signal that the endpoint impelments that specific feature. In other words, if there was an /api/emoji endpoint that is described as:

{
    "$ref": "https://well.known.location/emoji.xml"
}

Then a client could use that information to assume that the endpoint has those abilities and can handle those requests.

Capabilities

As mentioned before, Mastodon doesn't have that ability since a posting is a posting, be it a response or a full text. OpenAPI also has the ability to add additional specifications to an endpoint, such as a x-additional-refs that has an array of $ref object that do point to specific ones.

That way, then Mastodon post status endpoint could add a x-additional-$refs that would allow it to describe the ability to have polls (added in 2.8.0), schedules (2.7.0), in a clear manner that clients would know it could handle those correctly without guessing or inferring from server information. Or… emoji.

{
    "paths": {
        "/api/v1/statuses": {
            "post": {
                "operationId": "postStatus",
                "parameters": [],
                "x-additional-$refs": [
                    "https://well.known.location/emoji.xml"
                ]
            }
        }
    }
}

This approach also appeals to some other projects I have in mind, so I'm curious to see what others think of it.

Thoughts

Obviously, I have a specific goal in mind: Markdown and emoji. But hopefully there is also a benefit for allowing other servers to provide this information and make it easier to create clients and interfaces to their servers.

Metadata

Categories:

Tags: