"Who Talks (To Whom) - And How Much"

  Also known as, "Plotting the directed magnitude graph of an 
  internally related dataset".

  Author:          Gene Boggs <gene@ology.net>
  Implemented:     Sometime in October 2000
  Conference talk: June 15, 2001 at YAPC::NA in Montreal, Quebec
  Documented:      June 18, 2001 to Present


DISCLAIMER

This was presented as a fragmented free-association.  If you were at my 
talk, you know what I'm talking about.  ;-)

I suppose it's important to say that this all could have been done with 
some other tool that you use proficiently - possibly even something 
made to plot things!  I use Perl, for its high level of expressiveness, 
among other things.

  _______________________________________________________________


THE IDEA

As a kid, I vaguely saw behavior as a system of vectors working on 
different scales.  Walking and riding my bicycle made it pretty obvious.
Later, I saw that things like playing soccer (A.K.A. futbol), and having 
conversations all had vector components.  Eventually every behavior I 
saw was all vectors!  AAHHH!!

This might possibly be some kind-of obsessive-compulsive disorder - I'm
not really sure.  ;-)

When I finally decided to study vector calculus, linear algebra and 
discrete math, I got a glimmer of a clue about how to formalize these 
sort of things.

Anyway, seeing vectors in conversation really made me itch!  But who 
wants to transcribe reams of babble and gibberish?  A court recorder, 
maybe...  When I discovered the IRC, and its discrete means of 
communication (i.e. typing), I realized that I could quantitatively 
express a superficial, but sociologically interesting, qualitative 
aspect.  As a geek, I was -fully- excited.  My dream of formalizing 
everything was getting closer!

#
# ADD MORE DESCRIPTION HERE!
#


THE GRAPH

Enough babbling!

Here is <A HREF="small_beatles.png">a small graph plot</A>.

This is a polar map with a (currently) linear scale, which shows two 
visible dimensions:  Direction and Magnitude.

Direction is represented by the edges of a directed graph, that is made 
up of the folks talking to each other.

Magnitude is represented by a graduation of concentric rings (or 
"energy shells").  The outermost ring has magnitude zero.  The plot 
center is set at the maximum amount of magnitude considered - in our 
case, this is the total amount of talking of the "top talker".

The interesting thing here (too me) is that these are both independent 
variables.

Anyway, you will notice that John and Paul are the "top talkers", 
whereas George and Ringo are farther from the center.

The arrows may be small and hard to resolve, but you will notice that 
Ringo never talks to anyone and John only talks to Paul.  Another 
sociologically interesting observable is that George tries to 
talk to everyone, but only Paul cares.

This is all made-up and arbitrary, of course.

Direction and magnitude are given by a simple, square matrix (a person's 
directed communications) plus an extra column for "public" (or 
undirected) expressions, like this:


       +--------+---------------------------------+
       | Public | John  Paul  George  Ringo  Gene |
       |--------+---------------------------------|
John   |   .    |  0     .      .       .     .   |
       |        |                                 |
Paul   |   .    |  .     0      .       .     .   |
       |        |                                 |
George |   .    |  .     .      0       .     .   |
       |        |                                 |
Ringo  |   .    |  .     .      .       0     .   |
       |        |                                 |
Gene   |   .    |  .     .      .       .     0   |
       +--------+---------------------------------+


The dots represent the i,j'th matrix values, and can be any positive 
number or zero.  The public component of a person's row can also be 
any number - zero would mean that they only talk with others 
individually or privately.

I suppose that John could have talked to himself, in which case the 
zero value for his "John" column would be much more.  :-)

The magnitude for a person is the sum of every one of their columns, 
including the public column.

The direction is given by a non-zero value in one of a person's 
columns.

These make up the two most visible dimensions of the graph.


#
# ADD THE RIGOROUS MATHEMATICAL NOTATION!
#


OK.  Here is the simple <A HREF="../perl_code/whotalks.perl#dataset">
hash in which our data lives</A>.  It is a standard (perl) hash of 
hashes, where the keys are the people who are talking, and the values 
are hashes of "who they talked to and how much".

A third "quasi-dimension" is the computed radial position.  Currently, 
this is set at a random value (ugh!) - a random position of a node at 
its magnitude.

To see the actual node positioning code, look at the Node::get_coord() 
function in the ../perl_code/Node.pm file.

The graph is actually built with the Perl module, Graph::Directed and is 
"slapped" on top of a polar magnitude scale (ala the GD Perl module), 
with a bit of math (ala Math::Trig).

The fact that we are dealing with an actual directed graph means that we
can ask complicated questions ("interesting observables") about 
individuals or any size group, as a whole, and get definite answers back.


THOUGHTS

At any given instant, the graph could look completely different, but
over the larger scale, patterns emerge and disappear.  This crazy type 
of dynamics is otherwise known as...  CHAOS!

The "whotalks" analogy is an attempt at a rigorous treatment of the
"sociogram".


THE FUTURE


- plot "heavier" nodes first, with the greatest (linear) distance between them.

- use a (e.g. linear) system of equations to find the optimal position of 
  related data points (and thus the edges between them), instead of a 
  random radial position.

- use force directed and follow-your-nose methods to plot "sub-graphs".


- rewrite IRC log parsing and re-parse/plot the graphs (both Beatles and
  #perl plots).

  * DONE with the help of coral and japhy.

- use hyperbolic view techniques.

  * Ken Williams (A.K.A. DrMath) suggested this excellent idea and points out
    this site:  http://www.geom.umn.edu/docs/research/webviz/webviz/

- plot points on the appropriate concentric ring based on time, rather than 
  "optimal linear relation".

  * Mark Rogaski (A.K.A. wendigo) suggested this excellent idea at the party 
    after the conference.

- this graph might act as an interface to the threads of conversation 
  or even some type of real-time data investigation.  Clicking on a node 
  or edge could trigger an appropriate action.

  * coral pointed out this fascinating, interactive graph, which includes the
    source code:  http://www.metastatic.org/wlm/


- this needs the ability to handle larger scale datasets.

- "Invert" the graph by plotting the number of times a person is -talked to-
  rather than showing their blabbing.  ;-)

- view only select dataset slices.

- use logarithmic scales in different geometries (see above).

- make eye-candy 3+D plots and shadows.

- animations and key framing - both temporal (and according to any other 
  dynamic I suppose).

- thread highlighting and coloration of nodes, grid and edge lines.

- use curved edges of varying lengths, thickness and coloration to represent
  magnitude of direct communication (or some other "encoded" dimension).

- OOP subclass modules, instead of just including their functionality.

- For a really good time, use AI::Fuzzy or real number values for partial 
  matrix membership.  How about binary or symbolic matrices?

- use a persistent database with time ranges.

- Uhhhhh... Ummmmmm...


* Please write to me <gene@ology.net> if you have any other cool enhancement
  ideas or problem fixes/clarifications, so that I can include them here.  :-)

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

June 2003
IMPORTANT UPDATE:

I finally made a module set to facilitate graph drawing in general.
The subclass that generates the "whotalks" type drawings, described 
above is:

http://search.cpan.org/author/GENE/Graph-Drawing/lib/Graph/Drawing/Random.pm

Yay!