The PageRank component
Last updated
Last updated
To compute the PageRank component of DevRank, the GitHub ecosystem is modelled as a network. The nodes of the network are the set of repositories and developers.
To build the network, we use primarily data that is freely available in the GitHub Archive. When you sign-up to Quira we complement this with extra datapoints to be able to provide the most accurate approximation of the ecosystem.
We have three types of edges in the network:
Developer -> Repo, that represent stargazer events in the network (developer stars a repo),
Repo → Repo, that represents dependencies (a repository lists another repo as a dependency as e.g. in package.json
or requirements.txt
).
Repo -> Developer, that represent commit events in the network (developer commits to a repo).
By drawing these edges between stargazers, repos and contributors, we complete a path where reputation flows from the stargazers and dependencies to the relevant contributors.
If we consider all such edges within GitHub, we will construct a large directed network where reputation travels from one developer to another.
DevRank uses the PageRank algorithm on this resulting network to compute the stationary state probabilities of a random walker in the network. These raw probabilities indicate the importance of a developer within the network, in the same way that PageRank calculates the importance of web pages. Crucially, a link from a high-profile source in the network is worth more than a link from a lower-profile source.
In sum, the PageRank models reputation as a number that is proportional to the endorsements that can be attributed to your commits.
A white-paper about DevRank and Stargazer Reputation is in the works and will be published soon.