Multi forge web service with unique features

Bonjour,

@aschrijver you suggested that:

On the contrary FedeProxy has the opportunity to be forever more powerful than native forge-2-forge support, when it sits as an independent service in the middle between forges.

I my mind all software development activities, without exception, can be supported by a single forge instance. As long as this instance has federation built in otherwise it is a silo isolated from the other forges. But as long as there is a galaxy of federated forges, the developer can seamlessly use them without even noticing how many there are.

What you are suggesting, if I understand correctly, is that there should exist an additional online service, independent of any forge. And that it would not make sense to implement the features of this service natively in software forges. This is intriguing and I’d be interested if you have a use case in mind.

Cheers

What we currently have is:

  • ForgeFed: A protocol specification that standarizes ActivityPub message formats & interaction for forge2forge git sync.

ForgeFed focuses on a subset of the functionality forges provide (e.g. leaves out Issue Management). FedeProxy aims to expand forge interoperability further from there.

So what may be deliverables of FedeProxy project are:

  • FedeProxy protocol: Seamlessly expands and/or extends on what ForgeFed provides. Species a standard way of federated communication that can itself be further extended over time.

  • FedeProxy platform: A full-blown ActivityPub-enabled server, from which people spin up FedeProxy instances. Importantly this platform has a ports & adapters architecture, and a plugin mechanism to add these.

  • Application-specific extensions: These may be anything, but it starts with e.g. OAuth or Webhook apps that make a specific forge software communicate to a FedeProxy instance.

Now let’s say I have a Gitea server and a Gitlab server and want to make them interoperate together. Then:

  • I could use an existing FP instance hosted by someone else, that has both Gitea and Gitlab connector plugins.
  • I could spin up my on FP instance with both plugins
  • Maybe the Gitlab is not under my control, but already exposed on fedi by a FP instance to which I don’t have access. I could spin up my own FP instance with only a Gitea connector and ask to configure my instance on the Gitlab endpoint.

Note that I talk about Gitea / Gitlab i.e. on forge level, but the configuration has a granularity to the individual repositories where different configs (e.g. auth/authz) can be specified.

Now this is all great: forge2forge interop… been there, done that.

What happens if a forge implements native FedeProxy protocol support? Well, they may not need to spin up their own FP instance. They directly connect to remote instances and have better integration of FedeProxy config etc. in the UI.

Where does extra power in this set up come from? It opens many different scenario’s and use cases that go well beyond forge synchronisation:

  • A portal website that aggregates activity metrics on a broad range of projects, where people filter and search based on their interests.
  • A documentation website (e.g. CMS or Wiki) that receives as input filtered project-related information that are relevant to Technical Writers.
  • A CI/CD service that does cross-forge complex multi-project builds, and reports back to each one of them.
  • A Trello board, Mattermost workspace, Basecamp project, Matrix chatroom, etc. that processes project activity in any way it wants.

None of these extended use cases are part of the FedeProxy project itself, but they are enabled by it.

About the FedeProxy platform the following observations:

  • Though it is a full-blown AP server app, it need not be ‘heavy-weight’ and might be installed on the same machine as the forge software.

  • The platform has its own UI (or CLI), at a minimum to do the config and installs of plugins. But the UI may go further… it might provide a auditing & reporting service of all that’s happening in connected repositories. It may offer dashboards e.g. for non-technical stakeholders that are not likely to engage with a forge account, but are interested to be informed on specific project activity. Again, these UI’s are also extensions and not part of the FedeProxy project itself… anyone can build them (and publish them to an ‘awesome-fedeproxy’ curated list :wink: )

What FedeProxy does in essence is provide the solid foundation of an ecosystem to emerge.

I’ll try to figure out the rationale for each idea, as if I was to explain to someone else and convince them it is both worth pursuing and not in the scope of a forge, reason why it has to be implemented as a separated web service.

You have me convinced on this one. It’s not in the scope of fedeproxy but it becomes increasingly necessary. One outcome of the User Research is that the number of publicly available forges is growing fast (although the majority of them are not open to the general public). There is a need for something like https://instances.social, a search engine, etc. A meta forge of some kind. And, obviously, this service needs to be federated.

Hum :thinking: … but if it is federated, why wouldn’t it be in the scope of what a forge provides? I mean, imagine forges are federated natively. They all have the ability to federate each and every features they provide. Issues, pull requests, etc. It would make sense for them to get a list of all forges in existence from the “aggretated forge” web service, get information and send data.

That’s a brain dump @aschrijver, forgive me :sweat_smile: I’m now convinced that you have a point but I’m confused as to where it should be implemented.

If they all offer native support, then they should have native support for the FedeProxy protocol (and not something else). The FedeProxy platform then is still the central concept (though multiple federated instances) where other applications can plug in to, and that is the most easy and natural extension point. The universal connector. It provides the greatest versatility in possible use cases.

A (admittedly somewhat flawed) analogy might be that of an old-fashioned LAN adapter / switch. You don’t have incoming LAN cable to your TV and then outgoing cable to your stereo and another one to your PC, creating a whole web of cables. Plus your adapter might include a Firewall as extra service instead of firewalls in each device. You can connect one adapter to another one upstairs, etc.

What you call “FedeProxy protocol” is really “ActivityPub + forgefed”, right?

If all forges implement federation as well as a service that relies on the aggregation of all forges (for instance a federated search in projects hosted on all forges), why would they need another service? Maybe for caching search results :thinking: Or speeding up the search by maintaining an index made of the aggregation of the indexes of all forges… that would be helpful indeed. Anything else you have in mind?

There seem to be something recursive in how federated services can build on federated service which build on federated services etc. It sounds like a good thing.

If these fit all the concepts you want to model, then yes. But e.g. for Issue Management and other areas I think it quickly makes sense to define your own - to-be-standardized - extension for it. Do you want to map a Ticket Comment as an as:Note? Maybe, but it is entirely non-descriptive. Maybe an as:Note with addditional properties? Then you already have your extension format right there.

That federated search could build their own AP server from scratch, but since there’s already FP platform with plugin support, they might as well decide to save this work and just write a plugin for that, which comes with its own UI (either on the FP instance, or a separate webserver with a custom API).

All in all there are many different use cases that can be brainstormed, but on the whole you want to avoid restraining the solution you offer unnecessarily. Many use cases will come from unexpected corners by people who see an opportunity. The potential cost/benefit ratio of offering additional extensiblity and flexibility may be well-worthy of the effort to create the architecture that facilitates it.

Yea, I see them as ‘lego blocks’ (where blocks are what we currently think of as ‘apps’). By putting blocks together you get the actual thing you want, and the blocks themself are not foremost in your mind when looking at the end result.

I’ve described this a bit in From silo-first to task-oriented federated app design. The apps become sorta ‘irrelevant’, existing on a lower architectural level and instead you focus on the tasks / processes / things you want to achieve with them.

I think I understand what you have in mind now :slight_smile: And I agree that with federation new features will emerge. They do not exist at the moment because there is no federation between forges. But I disagree with the idea that those features should be implemented outside of a forge (FedeProxy or any third party service really). Once forges are ActivityPub enabled and use it to send/receive forgefed type messages, they will be the most natural environment to implement the plugins/components you suggest. Of course not all forge codebases will be favorable but I’m hopeful: Gitea is promising.

It is not that they should, but could be implemented outside of a forge, and also I think many scenario’s will be unlikely to be ever adopted in forge software.

For instance forges are unlikely to store and process global aggregated timelines of project activity, which costs gigabytes of db space, eats processing power. History of message exchanges mostly does not need to be recorded, messages can be discarded after they updated the project. But a dedicated FP instance may sit right there for that purpose.

Also consider all the scenario’s where - even though project activity is public - you don’t have control - e.g. admin or other types of privilege - of the forge or repo. Can’t say “Hey, I have this interesting thing I want to do with your project data, can you make some config changes and install this plugin?” and expect it to happen.

That aggregated search service… does it need to maintain a long list of all known forge installations to get its data, or does it simply need to subscribe to a single FP relay instance (or a small whitelist of those)?

We might have a vidcall one of these days to discuss ideas… some of this is hard to convey in typed text :slight_smile:

I agree :slight_smile:

As a developer working on a project I would want to be able to have a local copy (via federation) of all the activity of all forks. But of course I would not be interested in getting that amount of information on a project that is only a dependency of my project and for which I only care about a single issue.

I can see how some forge instances with lots of resources could get very ambitious and keep all the data they can collect regarding each and every project they get in contact with.

But I think the reasonable default would be that such information is not collected by default. To be more concrete, you may create a project on your own forge that “follows” a project on another forge and get a local copy of all it contains. That would be the default. But you could also tick a box that says that you’re interested to recursively follow all followers of this project, effectively getting activity updates from all forges where this project exists.

Even the largest projects, such as the Linux kernel can be federated in this way with resources that are affordable to an individual. But in the vast majority of software projects the space/cpu requirement are much, much lower :slight_smile:

GitHub is huge because it has many software projects but when thinking about each of them individually, everything is manageable with very little resources.

That would be a problem if getting a local copy of all the data for processing purposes was not possible or because it requires resources that are expensive.

My other job (fedeproxy is 50% only) is currently to work on the storage of https://www.softwareheritage.org/ which crawls forges to extract the code they contain. It is used by researchers to analyze the Free Software code that was harvested (around 750TB currently). It currently is highly specialized, in the sense that it has no commonality with any forge although it contains and publishes repositories of code and you can browse the code using an interface that is very much like what you find on GitLab. It would make sense to me if, over the years, it evolves into being just another forge, federated with the others. If the people maintaining the project are not interested in managing users, it could be read-only. It would just be configured to follow every project of every forge and store whatever it contains, everywhere.

1 Like

Absolutely: let’s add that to the agenda of the next fedeproxy update June 24th if you’re available? Or any other time really.

Another way to approach the same question would be to argue that a software forge should not be the monolithic service it currently is. A forge should be made of independent but interoperable services, each with their well defined REST API + ActivityPub:

  • A web interface with user management
  • A DVCS
  • An issue tracker
  • A CI
  • etc.

Interestingly this may be going in the opposite direction, as exemplified by the CI: they previously were loosely coupled (Travis etc.) and no forge had its own thing. But now that GitLab has its own CI and it is getting popular, it may be an incentive for GItea and others to follow this path. In any case Jenkins, Zuul etc. did not agree on an API that would make it easier for forges to support all of them. And the same can be said of redmine, trac etc.

But if there were established standards for each component of a forge to be independent and nicely plugin each other, I still don’t which feature would not fit in any of them.

I think both ways of looking at it are worthwhile, and both are covered by what FedeProxy might provide. On the one hand you have the big commercial players that want to offer a comprehensive and complete ‘developer experience’, like Github and Gitlab. On the other end of the spectrum you’ll find opinianated very minimalistic approaches, such as SourceHut.

I expect in both these opposite approaches you’ll find resistance to the deeper integration opportunities that FedeProxy provides. In Github/Gitlab case the FP openness is against walled garden / vendor lock-in, hence a threat. And on the minimalistic end there’s little interest for the extra ‘bells and whistles’ (in their opinion).

Codeberg is integrating CI (see #428) and this might just lead to adoption in Gitea, who knows.

The question is the level of integration you can achieve for third-party extensions, and for native features any forge project will be very hesitant to ‘just add the feature’. They should fit a clear product roadmap and tie in well to services offered to the users/customer. Many features won’t fit, I think.

1 Like