"Who Talks (To Whom) - And How Much" Also known as, "Plotting the directed magnitude graph of an internally related dataset". Author: Gene Boggs Implemented: Sometime in October 2000 Conference talk: June 15, 2001 at YAPC::NA in Montreal, Quebec Documented: June 18, 2001 to Present DISCLAIMER This was presented as a fragmented free-association. If you were at my talk, you know what I'm talking about. ;-) I suppose it's important to say that this all could have been done with some other tool that you use proficiently - possibly even something made to plot things! I use Perl, for its high level of expressiveness, among other things. _______________________________________________________________ THE IDEA As a kid, I vaguely saw behavior as a system of vectors working on different scales. Walking and riding my bicycle made it pretty obvious. Later, I saw that things like playing soccer (A.K.A. futbol), and having conversations all had vector components. Eventually every behavior I saw was all vectors! AAHHH!! This might possibly be some kind-of obsessive-compulsive disorder - I'm not really sure. ;-) When I finally decided to study vector calculus, linear algebra and discrete math, I got a glimmer of a clue about how to formalize these sort of things. Anyway, seeing vectors in conversation really made me itch! But who wants to transcribe reams of babble and gibberish? A court recorder, maybe... When I discovered the IRC, and its discrete means of communication (i.e. typing), I realized that I could quantitatively express a superficial, but sociologically interesting, qualitative aspect. As a geek, I was -fully- excited. My dream of formalizing everything was getting closer! # # ADD MORE DESCRIPTION HERE! # THE GRAPH Enough babbling! Here is a small graph plot. This is a polar map with a (currently) linear scale, which shows two visible dimensions: Direction and Magnitude. Direction is represented by the edges of a directed graph, that is made up of the folks talking to each other. Magnitude is represented by a graduation of concentric rings (or "energy shells"). The outermost ring has magnitude zero. The plot center is set at the maximum amount of magnitude considered - in our case, this is the total amount of talking of the "top talker". The interesting thing here (too me) is that these are both independent variables. Anyway, you will notice that John and Paul are the "top talkers", whereas George and Ringo are farther from the center. The arrows may be small and hard to resolve, but you will notice that Ringo never talks to anyone and John only talks to Paul. Another sociologically interesting observable is that George tries to talk to everyone, but only Paul cares. This is all made-up and arbitrary, of course. Direction and magnitude are given by a simple, square matrix (a person's directed communications) plus an extra column for "public" (or undirected) expressions, like this: +--------+---------------------------------+ | Public | John Paul George Ringo Gene | |--------+---------------------------------| John | . | 0 . . . . | | | | Paul | . | . 0 . . . | | | | George | . | . . 0 . . | | | | Ringo | . | . . . 0 . | | | | Gene | . | . . . . 0 | +--------+---------------------------------+ The dots represent the i,j'th matrix values, and can be any positive number or zero. The public component of a person's row can also be any number - zero would mean that they only talk with others individually or privately. I suppose that John could have talked to himself, in which case the zero value for his "John" column would be much more. :-) The magnitude for a person is the sum of every one of their columns, including the public column. The direction is given by a non-zero value in one of a person's columns. These make up the two most visible dimensions of the graph. # # ADD THE RIGOROUS MATHEMATICAL NOTATION! # OK. Here is the simple hash in which our data lives. It is a standard (perl) hash of hashes, where the keys are the people who are talking, and the values are hashes of "who they talked to and how much". A third "quasi-dimension" is the computed radial position. Currently, this is set at a random value (ugh!) - a random position of a node at its magnitude. To see the actual node positioning code, look at the Node::get_coord() function in the ../perl_code/Node.pm file. The graph is actually built with the Perl module, Graph::Directed and is "slapped" on top of a polar magnitude scale (ala the GD Perl module), with a bit of math (ala Math::Trig). The fact that we are dealing with an actual directed graph means that we can ask complicated questions ("interesting observables") about individuals or any size group, as a whole, and get definite answers back. THOUGHTS At any given instant, the graph could look completely different, but over the larger scale, patterns emerge and disappear. This crazy type of dynamics is otherwise known as... CHAOS! The "whotalks" analogy is an attempt at a rigorous treatment of the "sociogram". THE FUTURE - plot "heavier" nodes first, with the greatest (linear) distance between them. - use a (e.g. linear) system of equations to find the optimal position of related data points (and thus the edges between them), instead of a random radial position. - use force directed and follow-your-nose methods to plot "sub-graphs". - rewrite IRC log parsing and re-parse/plot the graphs (both Beatles and #perl plots). * DONE with the help of coral and japhy. - use hyperbolic view techniques. * Ken Williams (A.K.A. DrMath) suggested this excellent idea and points out this site: http://www.geom.umn.edu/docs/research/webviz/webviz/ - plot points on the appropriate concentric ring based on time, rather than "optimal linear relation". * Mark Rogaski (A.K.A. wendigo) suggested this excellent idea at the party after the conference. - this graph might act as an interface to the threads of conversation or even some type of real-time data investigation. Clicking on a node or edge could trigger an appropriate action. * coral pointed out this fascinating, interactive graph, which includes the source code: http://www.metastatic.org/wlm/ - this needs the ability to handle larger scale datasets. - "Invert" the graph by plotting the number of times a person is -talked to- rather than showing their blabbing. ;-) - view only select dataset slices. - use logarithmic scales in different geometries (see above). - make eye-candy 3+D plots and shadows. - animations and key framing - both temporal (and according to any other dynamic I suppose). - thread highlighting and coloration of nodes, grid and edge lines. - use curved edges of varying lengths, thickness and coloration to represent magnitude of direct communication (or some other "encoded" dimension). - OOP subclass modules, instead of just including their functionality. - For a really good time, use AI::Fuzzy or real number values for partial matrix membership. How about binary or symbolic matrices? - use a persistent database with time ranges. - Uhhhhh... Ummmmmm... * Please write to me if you have any other cool enhancement ideas or problem fixes/clarifications, so that I can include them here. :-) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ June 2003 IMPORTANT UPDATE: I finally made a module set to facilitate graph drawing in general. The subclass that generates the "whotalks" type drawings, described above is: http://search.cpan.org/author/GENE/Graph-Drawing/lib/Graph/Drawing/Random.pm Yay!