Sunday, January 17, 2016

Graphing Every Debian Package

I decided to try and graph every Debian Package and its dependencies. And thus was born the Blogger label "Not Worth The Time Invested" with which this was tagged.

The first part was fairly easy to do: gather data. I wrote a short PHP script to read the package lists and write a DOT digraph with every package pointing to its dependencies. This file was sorta big, but that was expected. I then proceeded to try to put it through GraphViz. This was an utter failure, and the search for better graph rendering software was on.

Of course, the only real choice here is Gephi. It's a remarkable tool that can render large graphs relatively fast. I then used the OpenOrd layout engine and added labels. My system couldn't handle making a bigger graph, so there are some superclusters that have overlapping labels. However you can see some labels that are on the periphery of these clusters, so it'll give you an idea of what is centered in there. Like, for instance, you have a Python supercluster and a libc6 supercluster on there.

Then you have some isolated nodes and networks. That doesn't mean they don't depend on anything, it just means it isn't listed as a dependency in the repository.

I made Gephi render a massive 1 Gigapixel image (32768 x 32768) and wrote a ImageMagick script to split it into tiles. It resulted in around 20k image tiles each 256x256. I then used a Google Map to map the graph tile images so that it's sorta browsable. This was the largest resolution I could muster and the best interactivity I could reach.

So here it is. It's sorta a mess, but it works and you can see the clusters fairly well and (barely) read the labels at the highest zoom level. If anyone can suggest how to make this better, let me know, I'd be happy to learn graph rendering!

No comments:

Post a Comment