@pilou (and everyone else really): this is a working draft for the software architecture that will be the first implementation of fedeproxy. It will be presented to @arthurlogilab around June 15th, 2021.
Feel free to comment, oppose, etc. : it is nothing more than a brain dump on my part at this point
Issue A is created on Forge A by user A
Issue B is created on Forge B by user B
User B adds a link to issue A in issue B
All comments / edits on Issue A made by user A are copied over to issue B
All comments / edits on issue B made by user B are copied over to issue A
All comments / edits are published via ActivityPub
The service will need to be configured with credentials, for instance an API token with administrative access to a GitLab instance. Instead of it being in a configuration file, I figured it would make more sense that it is set in a database and with a REST API to modify it. The database with the configuration of the web service is the only persistent state that the web service needs to keep.
The ultimate source of truth is the state saved in the DVCS. If the notification comes from a random source it will be spam and the worst that can happen is that it triggers a refresh of the state of the issue that is not needed.
I did not think about the proxy missing notifications and I don’t have answer to this one just yet. Good point!
Not really. I’m inclined to go with whatever makes more sense in the django environment at this point in time. Ideally it would be OpenAPI compatible. Last time I checked it was not very trendy but it was about a year or two ago.
I thought the fedeproxy web service should probably provide an API for developers to use it. But then remembered that the idea is that fedeproxy is transparent and therefore does not provide any way to interact with it. All interactions are with the forge the developer prefers and fedeproxy is merely a relay. It may be silly to write down such a misguided thought but … it may help in the future if it comes back to me
Here’s an example of where domain analysis will improve your use case. Consider this: Do you only deal with comments in the context of an Issue? I argue this is not the case, and the use case will be flawed if it does, which makes it unusable even for a MVP.
What is an Issue? It is a statement of something that needs to be addressed, something that needs attention at a future time. It probably relates to a Project and implicitly or explicitly refers to Project artifacts.
Addressing an Issue means that issue-related Activities need to take place. A Comment is just one such Activity.
Consider this scenario:
Issue A and Issue B mirror each other on different code forges via a link
- User A comments to Issue A: "PR #123 is ready for merge"
- The Comment is transferred to Issue B
- Dependabot updates dependencies in Repo A --> an Activity in Issue A informs about this
- The update breaks the build --> CI report in Issue A informs about this
- User A comments: "Oops, all is broken, lotsa trouble now"
- The Comment is transferred to Issue B
User B now has the wrong context and thinks that the PR is problematic. Might do a lengthy code review and comment: “All seems right. I see nothing wrong with it” to the confusion of User A.
So how might you avoid that in Ubiquitous Language:
An Issue is resolved by Activities
A Comment is an Activity
A Note is an Activity
Don’t know if Note is the best domain terminology, but it is just an example.
Another thing to consider if you want to do manual linking like this. Why wouldn’t the code forge integrations make an API call and synchronise an issue?
User A creates Issue A in Repository A
User A federates Issue A to Repository B
So UI-wise you may have a dropdown of pre-configured repo’s. The underlying code takes care of issue synchronisation. This avoids User B being lazy and creating an incomplete derivative of Issue A. (Note that “federates” is not domain terminology, just example here, but the federation takes placee ‘under the covers’ and is no user concern)
Also note that I use the ubiquitous language term Repository instead of Forge. The repository might well live on the same forge. As a dev I shouldn’t care about that fact, and the underlying funcitonality you’d build already facilitates the requirement “Repository can be Local or Remote”. So you get that almost for free.
Did you take a peek at the Outreach project? In the README you find the Domain Model with Ubiquitous Language below. Then each of these will be elaborated into Gherkin scenario’s that will become automated BDD tests.
The project specifications are still mostly me playing with the best setup, but here is a BDD test for Launch Community. The Gherkin part can be copied as-is to an e.g. test/features folder in the codebase that’s configured for the BDD testsuite. You might even automate that. And this way codebase and documentation are always up-to-date (all documentation follows README-driven development for the same reason of keeping docs + code in sync).
I wouldn’t be so sure about that. Anyway it is not needed I agree. But having FedeProxy in the middle means that you can offer value-added services there, with features that will never become part of any forge. These would be surfaced through a FedeProxy user-facing platform UI.
But it is possible to avoid that using extensions (e.g. widgets) in each forge that are fed AP messages (they have an actor/event-based AP ‘API’).
A FedeProxy platform UI might be ideal for non-technical or external stakeholders (without privilege) that you want to take along in the software development process. Then talking (far) future roadmap, of course, but you should have the option to add this with ease once the need arises… you might also see these as additional related projects that work on the core functionality of FedeProxy and may be developed by entire different dev teams.
Although I now understand what An Issue is resolved by Activities means (in good part thanks to our discussions), it would have confused me six month ago because I did know what Activities meant in that context. I was however familiar, as I am now, with the notion of issue as well as the fact that it is made of two things:
metadata that describe the issue
a conversation, i.e. comments ordered in chronological order
For this reason I would argue that comments / edits on issues is probably adequate because people who are familiar with the sub-domain “Issues in the context of a development environment” immediately understand what it means without the need to explain further. And also because it fully represents (and that’s user research jargon, I think) how the mental model of the developer matches with the conceptual model of issue trackers as they currently exist.
Just to clarify: I don’t want that. User research shows that developers currently do that, for lack of a tool that automates this for them.
The benefit of a dropdown is clear but I’m not sure how to implement that: how would repos be pre-configured? The lack of UI is an important aspect of this first step, so that the focus can be on the UX, i.e. what follows, which is essentially a convention and copy/pasting a URL.
But then it would make more sense to me if that was implemented as a contribution to Gitea (because GitHub & GitLab are not amicable to that kind of contribution, AFAIK). I cannot forsee a UI fedeproxy would provide that would not be relevant to Gitea. Or maybe I don’t see something and you have an example in mind?
Ah yes. I typed my examples off the top of the head without doing the background work, because its more on the concept of DDD I wanted to explain (and in haste probably inaccurate as dependabot will probably report to the PR only, not the Issue).
But there is a danger to looking at API’s. While the API’s are very good input to analyse the domain that is implicitly/explicitly implemented within these apps, they are NOT the domain. They are convenient endpoints for information exchange that may be denormalized, and may carry user interface and/or implementation details.
Comparing Gitlab and Gitea for instance you already see that they use different (but synonymous?) domain terminology e.g. Notes versus Comments. (A notebook with a log of activities/notes may be most accurate).
Also - having taken only a quick peek - Gitlab uses a system boolean flag in their API to create different Note Types (“A Note has a Type”?). Probably having system=true and body="closed" renders something with a ‘Close’ icon, and maybe closes the issue? I am not familiar enough with the API to answer that. On the other hand the Gitea API has entirely different mechanics. But underlying both may be the same domain model. They’d have variations/extensions and synonymous terminology, though.
The domain describes from the perception of what the stakeholders / domain experts want, and not what particular software forces them to do due to their limitations or design choices. The domain is software-independent, a universal model.
Have a look at the following issue chosen randomly from the Gitea repository:
Suppose you are interviewing Lunny and they say “An Issue has Comments. That’s it. And you can edit Comments.” then you hand them this image and respond “So here you made 4 comments? And you can edit e.g. the second one to show a different Label?”.
I guess here both of you would come to new domain insights. Lunny will answer something like “Well no, I actually made just 1 Comment - the issue description itself - and the other entries are the System acting on behalf of me to keep track of project activity that is related to this Issue. I cannot edit these as they are a log of things that already happened. A history of issue-related activity”.
Forget my mention of UI specifics. They are irrelevant when it comes to domain modeling. Domain is independent of UI (in technical terms this relates to ‘inversion of control’). UI decisions are made based on a stable domain model, and there can be many UI’s surfacing the same domain to stakeholders.
There’s many ways this can be done, and it depends on features you offer. From UX standpoint it is indeed best to not bother the dev team with additional steps they must take into account for their process. But at some level you need to know which repo is allowed to sync with which other repo’s and configure privileges and what-have-you. Might as well be tasks for stakeholders acting in an admin or project manager role, idk.
The following is completely brainstorm / imaginary / examplary… only stakeholder needs can tell.
FedeProxy Platform sits in the middle of your ‘developer project federation’ as a spider in its web (domain concepts needed?). This is where value-added services can be modeled in addition to specific extensions to indivual code forges (not drawn). What might these be?
An Admin Service that provides config, auditing, auth/authz, install/upgrade, etc. (might have a CLI, web UI, or even Android app).
A consolidated external API that abstracts away the various code forges that participate in the project federation, focuses just on the domain model of the development process (where the code forge is irrelevant, an ‘implementation detail’).
Dashboard UI’s that are targeted to specific stakeholder audiences, filtering information. Think of reporting to keep non-technical stakeholders (e.g. the customer) into the loop, but also allowing them to still interact with the project (goes beyond mere reporting).
(…all kinds of integrations like this)
None of these are MVP material, but anticipating that someday you may want to offer them has implications for the architecture of FedeProxy.
(Note that, imho, a very naive implementation of FedeProxy would be to focus too much on the ‘proxy’ part and create a codebase that is merely a collection of adapters from forge type A to forge type B and as forge support is added an explosion of adapters occurs. No domain model is needed here, but a very constrained system results that may grow into a Ball of Mud as you extend it with unforeseen features. This is probably not what you have in mind, but wanted to mention anyway)
The subdomain “Issues in the context of a development environment” is only defined differently by Gitea or GitLab or any other issue tracker really. The subdomain is not well defined nor is it standardized. The differences between implementations are sometime significant enough and converting from a given implementation (Phabricator) to another (GitLab) is problematic. For instance an issue in GitLab is bound to a software project: it cannot exist otherwise. Phabricator does not have the same constraint and as a result some Phabricator issues cannot be easily mapped to GitLab issues (that was a takeaway from the user research interviews).
The same is true for the “DVCS” subdomain: Mercurial and Git developers would give different definitions (not just about details).
I’m sure you will agree that, by that definition, the “Issues in the context of a development environment” does not exist yet. The variations in the implementations that are found in Gitea, GitLab etc. are rooted in domain definitions that only partly overlap and are therefore not (yet) universal. And not just on details
There indeed are parts of the issue or even the comments attached to it that are immutable (depending on your permissions or the implementation). Some labels cannot be removed, sometime comments cannot be deleted, etc.
Understood. In the context of fedeproxy I have no intention to undertake the enormous task of defining a domain. I however acknowledge that all forges are based on a domain definition. It is nowhere to be found because it was never written down. But even when implicit, the domain definition that the developers have in mind when working on the forge does exist and it worth investigating. To be less vague, fedeproxy is concerned about identifying commonalities in the “Issues in the context of a development environment” domain as implemented by different forges. All implementations provide comments ordered in chronological order and this is where fedeproxy can help with federation. A contrario fedeproxy cannot federate issues that are not bound to a software projects (although it belongs to the “Issues in the context of a development environment” domain) because some issue trackers do not implement it.
Or maybe fedeproxy could just federate whatever it can. Issues, repos, projects have permissions that already imposes restrictions on what the user can do. Since fedeproxy uses the developer accounts on each forge to act on their behalf, I’m not sure why it would be useful to further control what they can do.
I understand what you are suggesting and maybe someone will go in this direction. This is a little too ambitious for me at this stage
My personal inclination, long term, is to work so that fedeproxy disappears because all forges are interoperable and federated. If it does not make itself as invisible as possible right from the start, I feel (not sure ) that it would work against this goal.
This is exactly what I have in mind: very well put. In other words: I believe there is room for incrementally improving federation on specific subdomains, with immediate results. Domain modeling would require convincing forge authors to agree on it and followup with an implementation before anything can be used by the developers and it would take years. The price to pay for my approach is the explosion of adapters. But my hope is that, as more developers use these adapters daily, forge authors are pressured to achieve interoperability (for instance by agreeing on a domain model) which would make federation possible. Or that forge authors are drawn to federation which requires some kind of interoperability. I don’t see in the future and I’m unable to guess which is more likely to happen, reason why I pursue both.
There is no enormous task. A domain model often starts with a single sentence. It is broken down into multiple sentences based on some analysis and insight gleaned from your user research. In the case of Campaign Management for Outreach, the domain model turned out to be 18 brief plain-English sentences and a diagram. Afterwards with a MVP ready more sentences will be added that reflect new features on the roadmap.
Not explicitly defining your domain model doesn’t mean it doesn’t exist. It does, but now it is implicitly present only in your code.
Okay, but then some considerations:
A Proxy design gives you: Gitlab2Gitea, Gitlab2Github, Gitea2Gitlab, etc.
Ports & Adapters + FedeProxy domain gives: Gitlab2FP, Gitea2FP, Github2FP, etc. (less in total than the proxy design)
Now how would the domain model look like? There are 3 options. Either representing:
The intersection of features (the common denominator)
The union of viable features
The aggregate of all features exposed by supported forges
Option 1) may be too limited and 3) too much. Option 2) the middle road, says that some features may be supported in most forges, but not necessarily all of them. You only do that if they add significant value, and you may do that later in your roadmap (just start with option 1).
In the proxy design it is harder to keep up with new releases of a forge, as it may require updating multiple adapters.
Note too that your federation model (message format, msg exchange) are also representative of your domain, especially if you want them to become a kind of de-facto standard. Federation is part of Ports & Adapters.
Anyway, I am not trying to ‘sell you onto DDD’ nor onto specific features, requirements of architecture decision, I just liked to explain the concept in more detail. I am sorry if this derails this thread a bit. Feel free to move to a different topic.
Why would that be? What would be their gain? The walled gardens / lock-in are very often intentional.
The intention here is definitely to go for the second option, the first one would require a lot more time. The GitLab export format is going to be used as a pivot (what you wrote as FP). This is de facto borrowing the domain defined by GitLab, which is not ideal.
Yes and part of the work of this first development step is to write a converter mapping the GitLab issue format into the ForgeFed issue/ticket format.
On the contrary! I very much need that kind of discussions to make sure I’m not pushing fedeproxy in a direction that would be problematic. Re-formulating / re-thinking the development plan using different concepts / vocabulary helps a lot in that regard.
Yes but there is a fundamental difference between say Facebook users and GitHub users. The later are Free Software developers, they are the people with the skills to create the change, they do not have to wait for it to happen. I’m hoping this is what will ultimately create the conditions to either:
motivate developers to take an active part in the making of forges that are working towards federation or at least interoperability
motivate the most reluctant forge developers to work towards federation or interoperability on a subset of their features