A post was merged into an existing topic: Do you support the fedeproxy project?
Yes, and tried to get in touch but the author is currently not responsive. It is complementary and it would make sense to provide real world feedback collected by fedeproxy users (even if they are just a handful ) to help develop the vocabulary and data model.
I did not look into the history of the forgefed project enough to figure out if they were inspired by https://en.wikipedia.org/wiki/DOAP and the ontologies doap-* found at https://ontologi.es/ (doap-bugs + doap-changesets + doap-tests + doap-deps) that you mentioned.
If I recall correctly, they were trying to reinvent that wheel.
Great news to see these topics going forward
Small typo here : I think you’re talking about Nicolas Chauvat (@nchauvat here)
We exchanged some links with forgefed creators back in octobre 2019, but it didn’t go any further : https://talk.feneas.org/t/any-use-for-the-semanticweb-ontologies-for-forgefed-doap-for-example/170
Maybe further contact can be established using the forum : https://talk.feneas.org/c/forgefed/10 before the two projects grow further apart ? Oh and since this is related to ActivityPub too, you might have some luck contacting them on the fediverse : https://floss.social/@forgefed & https://todon.nl/@fr33domlover & https://floss.social/@bill_auger (although there doesn’t seem to be recent activity).
Thanks for the links and the context, I completely missed that! And I concur, the main issue here is that there is no activity in the past three months or so.
I would like to use http://ontologi.es/doap-changeset# to represent a patch / commit and looking for:
- a python package that I could use to wrap this into an ActivityPub payload, similar to RDF::DOAP
- a description explaining what it means
I suppose this is where the patch / commit should be included but…
dcs:Change a owl:Class ; vs:term_status "stable" ; rdfs:isDefinedBy dcs: ; rdfs:label "Change"@en ; rdfs:comment "A change to something. Use rdfs:label to briefly describe the change. Use rdfs:comment for additional information."@en .
If I read the ontology correctly, there is nothing in there to actually include the diff.
You could try to extend it as in:
<https://heptapod.logilab.fr/changesets/abcd123> a dcs:Changeset ; fedeproxy_namespace:has_text_diff "<diff in textual form>" ; rdfs:label "ci: fixing the typo in .gitlab-ci.yml" ; dc:creator <https://heptapod.logilab.fr/users/nico> ; dc:date "2021-01-20:18:06"^^xsd:datetime.
I tried to answer quickly, but this probably needs more thought.
To manipulate rdf data in Python, I recommend rdflib.readthedocs.io/
Do you have an example in python or in pseudo-code of what you would want to write ?
This is actually very helpful. I’m not sure yet what I’d like to write because I’m unsure how to use ActivityPub at the moment. Something like:
If that makes any sense
I have very superficial knowledge of ActivityPub.
From what I understand, the
outbox you mention would be the
outbox of a Repository or a Project
Actor and the
Activity would be the changeset added to the Repository. Is that right ?
I also assume that parameter to
addActivity() would be a json-ld structure returned by
Hence in buildPayloadFromDiff you could write something along the lines of:
import rdflib # make sure the rdflib jsonld plugin is installed https://github.com/RDFLib/rdflib-jsonld from rdflib import RDF, RDFS, DC, URIRef, Literal DCS = rdflib.Namespace("http://ontologi.es/doap-changeset#") FEDE = rdflib.Namespace("https://fedeproxy.eu/onto/") def buildPayloadFromDiff(changeset): g = rdflib.Graph() uid = URIRef(changeset.uid) g.add( (uid, RDF.type, DCS.Changeset) ) g.add( (uid, FEDE.has_text_diff, Literal(changeset.patchdiff)) g.add( (uid, RDFS.label, Literal(changeset.message.lines)) g.add( (uid, RDFS.comment, Literal(changeset.message)) g.add( (uid, DC.creator, URIRef(changeset.author) ) g.add( (uid, DC.date, Literal(changeset.date) ) return g.serialize(format='json-ld')
Of course it is only a sketch and there is a lot to do, but hopefully you get the general idea.
My understanding of ActivityPub is not good either but yes, that’s what I have in mind. A Project as in “software project including repository and all other things, issues, merge/pull requests etc.”.
Although an Activity is JSON-LD, the “payload” probably does not need to be JSON-LD. I’m sorry for not using the right vocabulary. Maybe it’s content? Anyway, you understood what I meant despite me being very vague, I’m impressed
The example you provide is crystal clear, thank you. Assuming all aspects of a software project is represented in a DVCS (not just the code but also issues etc.), the changeset may be the only kind of content that needs to be federated.
The data model to represent an issue or a merge/pull request needs to be decided but it may not be a requirement to implement ActivityPub in fedeproxy. Here is a tentative example to illustrate what I mean:
- On a given project P on GitLab
- I comment on an issue for P on GitLab
- Fedeproxy for GitLab uses the GitLab API to GET all my issues
- Every issue is exported by fedeproxy in a separate file in the repository of the P project on GitLab in the format used by GitLab import/export
- My comment is represented by the diff between the updated issue files and the previous issue files
- Fedeproxy for GitHub receives the activity “apply patch/changeset” from Fedeproxy for GitLab and applies the patch on the issues that are on the P project on GitHub
- For every issue modified by the patch, Fedeproxy for GitHub reads the issues and uses the GitHub API to apply the differences, e.g. for each of my comments verifies it is up to date (creates a new one if it does not already exists, update the message if it existed but does not contain the same thing, removes if it is absent)
Here I assume, for the sake of a draft implementation, that the data model used to represent issues etc. is the GitLab import/export format but it really is a way to not address the issue of a well defined generic format independent of forges rather than advocating the GitLab import/export format is a good format.
What I find appealing in this approach is that the definition of the data model to represent a software project could be 100% decoupled from the definition of the protocol and data model used for federated forges to communicate.
Does that make any sense ? I may have overlooked something that makes this completely impractical, please be blunt Or maybe this is just gibberish and does not make sense at all. Again, please be blunt.
I think of it differently. In my mind, every object could be an actor syndicated using ActivityPub: projects, tickets, dvcs, etc.
For example I find interesting to make a gitlab issue appear as a toot in mastodon and when users reply to it, the comments are added to the issue in gitlab.
Or to have a project replicated from github in our heptapod, and when we add an issue to it, its get added to github on the other side.
Now that I think of it, I am not so sure FEDE.has_text_diff is such a good idea, because the underlying DVCS already has a synchronisation mechanism, why redoing it ?
Maybe what is needed is just to exchange with ActivityPub the metadata describing the different events/activities of the actors (project, issue, repository, pipelines, merge requests, etc), but not the code itself ?
As I said, I know little about ActivityPub and one thing I have not understood is why sending the content when doing outbox.addActivity() instead of just stating “the object at this url changed” and let the client GET the new version at this url in case it wants the new version.
We are dealing with data that is exposed on the web and identified by a URL, why should we encapsulate that data into a new protocol instead of letting the client GET the data using HTTP ?
I agree that this is the better option: sending the URL where the object is located makes more sense than sending the object itself.
The rest of your message gives me a lot to think about and it will take me a little longer to reply
This is precisely the use case I’m most interested in. Could not agree more
Very good point and I was going in the wrong direction, I stand corrected. Looking at the PeerTube documentation and trying to transpose in the context of fedeproxy:
- A PeerTube server is the equivalent of a software project (e.g. https://github.com/ceph/ceph)
- A PeerTube video (represented by this ActivityPub extension) is the equivalent of a commit in the software project
- The protocol to fetch a PeerTube video (i.e. http, webtorrent, etc.) is the equivalent of the repository protocol (i.e. git protocol, mercurial protocol)
- The format of the content of a PeerTube video (i.e. mp4, webm) is the equivalent of the format of the data contained in the repository (i.e. anything really)
To continue with the comparison with PeerTube, there are multiple protocols and formats, including a custom REST API and ActivityPub with extensions. In the case of fedeproxy there are git/mercurial protocols and we’re discussing here about what should be communicated via the ActivityPub protocol. And I also think interactions with software development issues via ActivityPub messages are in scope.
I spent time looking at
- https://www.w3.org/TR/json-ld/ and now better understand @nchauvat advice regarding https://ontologi.es/ (doap-bugs + doap-changesets + doap-tests + doap-deps) because it really is designed for reusing RDF with JSON
in addition to the PeerTube documentation and the associated code. It begins to make sense but I’m still unsure about how to put it all together. The above use case can be rewritten as:
- On a given project P on GitLab
- User U comments on an issue for P on GitLab
- Fedeproxy for GitLab:
- uses the GitLab API to get the issue
- saves it in a file using the format used by GitLab import/export
- commits the file in the repository of the project (in a branch dedicated to saving issues)
- published the permalink of the commit including my comment as an ActivityPub activity in the outbox of user U on GitLab
- Fedeproxy for GitHub polls the fedeproxy for GitLab and upon the reception of the permalink
- fetches the commit from the GItLab repository
- reads the content of the file describing the issue and uses the GitHub API to apply the differences, e.g. for each of my comments verifies it is up to date (creates a new one if it does not already exists, update the message if it existed but does not contain the same thing, removes if it is absent)
I realized that although it would be nice to extend ActivityPub to represent issues and other aspects of a software project in the way forgefed does, it is only one of several ongoing efforts to have a format representing a software project (other than the code itself):
- https://ontologi.es/ (doap-bugs + doap-changesets + doap-tests + doap-deps)
The above use case proposes to use https://docs.gitlab.com/ce/user/project/settings/import_export.html because it is easier in the context of GitLab. But that may be a mistake because it will never evolve into a standard. However I’m conflicted about the other two because I’m not entirely sure they will evolve into a standard either and they would be significantly more complicated to use to represent all aspects of issues and pull/merge requests that are in the scope of fedeproxy.
This is where I am today
I reached a point where I need to write code to better understand how all this could work. And since I’m very new to all this the chances are very high that all of it will be thrown away once I have a better understanding. I’m leaning towards:
- Representing issues using the GItLab import/export format
- Storing them in a dedicated branch of the code repository of the software project
- Announcing issues in the same way PeerTube does with videos using the permalink of a commit as the object instead of the permalink of a video
- Copying pasting ActivityPub code from activitystreams2
Thank you for this writeup.
I had overlooked the fact that you wanted to setup a proxy that would use the API to get the data, then expose that data using ActivityPub. For some reason I was thinking about modifying GitLab, but of course this can not be done with GitHub. Ok.
That also explains why you want to export everything, store it onto the disk and diff it the next time you query the project data.
Could we separate the two ? There is one communication channel between the forge and the proxy and another one between the proxy and the fediverse using ActivityPub.
My comments were targeted at the second communication channel. This is the place where I think using RDF written as JSON-LD and the doap* ontologies is making the most sense, because the doap data model is already in use.
For example https://www.cubicweb.org/project/cubicweb?vid=doap is a description of the cubicweb project using DOAP. It is serialized as RDF/XML instead of RDF/JSON-LD, but that is a detail. What is important is that any application that can load and transform DOAP data can make sense of it. I am convinced this is something to leverage and build on.
Announcing issues in the same way PeerTube does with videos using the permalink of a commit as the object instead of the permalink of a video
I look forward to this
A minimal viable product could be “fedeproxy is able to publish issues using ActivityPub and other tools of the fediverse can comment on these issues and fedeproxy will write the comments back to the forge”.
What do you think ?
Exactly. Hence the “proxy” part of the project, to be able to do something when modifying the server code is not an option, which is true for GitHub.com but also for GitLab.com. However I’m convinced using the API and a proxy must be designed as a temporary measure that will disappear once the forges have federation natively implemented. Which is really tricky because it’s so easy to forget the “temporary” aspect when working on a project. One way to achieve that is to translate every aspect of the proxy into merge requests for GitLab/Gitea and try really hard to get them merged. The discussion and resistance, the need to split such merge requests into tiny baby steps, etc. should keep fedeproxy from drifting into something that is too far ahead and (most likely) disconnected from what is ultimately desirable.
That sounds sane, is it something like:
- GitHub <- GitHub API -> fedeproxy
- fedeproxy <- ActivityPub -> fediverse
Understood, thanks for clarifying. I’d like to say it’s crystal clear but I’m afraid not. Thanks to your explanations it now makes a lot more sense. 48h ago I was still quite confused about many aspects. But I still think I’m missing a few very important parts of the puzzle.
I should take a look at bots designed to be ActivityPub clients (https://github.com/tootsuite/mastodon-bridge, https://github.com/yogthos/mastodon-bot) and others with a similar purposes (https://github.com/zedeus/nitter)
In an earlier comment I kind of dismissed this idea saying it’s in scope and implying that it may not be something that needs to be implemented in a first iteration. I’m still unconvinced but I feel that I’m missing something. You see something that I don’t and that gives me pause.
During breakfast this morning I realized fedeproxy should be able to leverage the UI of Mastodon to (for instance) express: “https://github.com/ceph/ceph” follows “https://mygitlab.com/myuser/ceph”. If, behind the scene, fedeproxy makes it so both are seen as ActivityPub conformant servers, maybe that would work. Or maybe it’s twisted. In any case, once a software project talks ActivityPub, there can be bots interpreting what it says in useful ways and humans using existing UIs to perform all ActivityPub conformant operations such as following without reinventing the wheel.
As you can see, it’s still very very fuzzy and confused but it’s making progress. I think
The UX is really simple : enter a RSS URL, choose an activitypub username, which can then be followed from the fediverse (read-only, no interaction).
Maybe there is an even simpler approach which is to use Mastodon instead of implementing a server. A software project following other software projects is a way to express that they are federated. And people with account on the corresponding forges have control over the Mastodon account of the project. For instance, if I’m the owner of https://github.com/ceph/ceph I can login the Mastodon user matching the project and set it to follow https://mygitlab.com/ceph/ceph. The fedeproxy server is a client of the Mastodon instance and acts by federating https://mygitlab.com/ceph/ceph and https://github.com/ceph/ceph because it reads from Mastodon that the relationship exists.
Which leaves us with much less code to write. Going back to the data model and vocabulary, I’m still inclined to think implementing the “Commit Push” activity is a better first step than “Adding an issue comment”.