Saturday, April 30, 2011

People you may know

The 'people you may know' feature on Facebook is fascinating and creepy. I guess you could say the same for many things that Facebook does, like telling you where and when you met someone based on a series of assumptions. Anyway, the point is that the little friend-suggestion sidebar is always staring at me, and I have looked at the expanded page a couple of times before, but I decided to take a more detailed look at what it was up to.

I scrolled down the page and took notes on (1) which row of friend suggestions each person showed up in, where the top row = 1; (2) the number of mutual friends reported; and (3) whether I actually knew the person. Here, I categorised "people I know" as someone I would recognise and talk to if I ran into them on the street, and who would most likely do the same. "People I know of" are generally people I know of through other people and may have met once. I might recognise them on the street but talking to them would probably be awkward or creepy. I scrolled and recorded until I got bored of scrolling and writing, which is of course an extremely systematic way to collect data. But that came up to a decent sample of 242. It turns out I don't know most people on that page.

When it suggests people it thinks you might want to 'friend', Facebook tells you how many Facebook-friends you have in common. So I took a look at how good an indicator this actually is for predicting if you actually know someone. Here are the distributions for the number of Facebook-friends I had in common with people that were suggested, sorted by whether I knew them. The arrows indicate median values. The median number of mutual Facebook-friends did increase across the categories, though they are similar for people I know and people I know of.

But the real question is, how well does Facebook's metric of 'number of mutual Facebook-friends' predict whether I might actually want to be Facebook-friends with friend suggestion X? That's the basic purpose behind this annoying little sidebar on Facebook, right? So I collapsed the first two categories (people I don't know and people I know of...but not enough to be a 'friend' and not a creeper) into one where "Consider Facebook-friending = 0" and the third category of people I know was "Consider Facebook-friending = 1*"

Here is a logistic regression I ran in Stata with the number of mutual friends as a single predictor. It actually turns out a statistically significant relationship (P<0.001) that is not a particularly good fit to the data. But on average, I get a 13% increase in the odds that I will actually be interested in friending someone with every one more mutual Facebook-friend.

I guess that is basically saying what everyone kind of knows already: that if you have more friends in common with someone, you are more likely to know them, even in Facebook-world. So if Facebook was aiming to suggest people that you are likely to click on/friend on its "people you may know" page, it would start the list with people who had more mutual Facebook-friends and go down from there, right? I guess not. This is what I got plotting the number of mutual friends for each person against how far down the page they were (row number). It is rather strange. There is a nice negative relationship starting at the 26th row (a point which any person who wasn't looking for useless data would be unlikely to get to) and a big mess in all the top rows. I have no idea what is going on here...





* I did not actually friend any of them. There are already too many people on my facebook.

Thursday, April 28, 2011

Student activity monitor!

The blog has been quiet for a while, but I've been putting that time to good use by collecting data. Some time ago I realised that this little panel on the MyCourses* page had been staring at me all along. It tells you how many people from each of your classes are logged on to the system (and you can see who they are). So I basically have a way to track student activity over time. Over the past 2+ weeks I have been recording the time and the number of online students by course every time I logged on, with at least 20-minute intervals between consecutively recorded data points.


The UCS, UFB and CCB Elections category is a dummy course that contains candidate statements, endorsements, etc. for the elections and presumably, all undergraduates are 'enrolled' in this dummy course on the system. This gives me a large enough sample to look at general patterns of student activity over a general 24 hour period on a weekday. It is actually quite interesting! It looks like peak hours of academic activity are in the mid-afternoon 2-3ish and at night. In the morning there is a steep increase from 9 to noon and there is a noticeable drop around dinnertime.

Of course, this is as much of a graph of my own activity as of general student activity (since I can only get the numbers when I log on to the system myself). I am almost never up past midnight and I generally get up between 7 and 7.30am so there is a big data hole between midnight and 7. I would love to fill in some of it, particularly the 12mn to 2? 3? 4? am part because I want to know what time the number starts falling and people go to bed. But my sleep > data on other people's sleep so I may never know. (If you are a late night-early morning worker and want to help me collect data on this you are most welcome to!)

Breaking it down by class, these are trends for physics 40 (which I take) and bio 42 - ecology (which I TA). These are standardised by the total class size (physics has slightly more than 3 times as many enrolled students as ecology). They both show similar trends of steep increase in the morning, but are much more variable for the rest of the day (though if you take the average it would be pretty much a straight line from noon to midnight). My favourite part about these graphs are the outliers :)

I think this is something worth looking at again next semester, over a longer period of time. It might also be fun to compare weekends vs. weekdays and...so many things.


*people from Singapore: MyCourses is pretty much exactly like the NUS IVLE, but with a clunkier interface.

Sunday, April 17, 2011

Algae people!

I attended my second scientific conference in five weeks (also my second conference EVER) this weekend: the 50th anniversary symposium of the Northeast Algal Society at Woods Hole. As might be expected from a group that focuses on algae, this was a much smaller, intimate conference than the Benthic Ecology Meeting, and they gave everyone a full list of attendees' names, affiliations and contact information. AKA graphable data. So here we go...

I wanted to look at where people were from, i.e. which institutions sent the most algae people out to Woods Hole. This graph is something similar to a rank abundance curve, with institutions ranked by the number of attendees on the abscissa and the number of attendees on the ordinate (the terms 'abscissa' and 'ordinate' make me happy because though they are so ridiculously obscure). The institutions with the biggest contingents were: University of New Brunswick, UConn, URI, UNH and Northeastern. I was the only one from Brown :)


Here is a plot of how far people were from their home institution. It is based on point to point distance measurements in Google Earth, so it most certainly underestimates the actual distance traveled by each person (especially because Cape Cod is a funny shape).


Woods Hole is awesome and today was sunny so I was very happy. Here is a picture from near the conference building.

Wednesday, April 13, 2011

Minimum ages

I spent most of today making regression models to analyse nutritional data from the McDonalds menu for my statistics class. It was surprisingly fun.

Unrelatedly, here is a quick graph.


Make of it what you will.

I'm not voting in this year's election because I no longer have an address in Singapore and therefore I no longer have a constituency :(

Sunday, April 10, 2011

My mileage*

I should be studying physics, but I found a great distraction: graphing the total (cumulative) distance I've travelled since the start of freshman year. This includes travel by air, road, rail and water that is from one town/city to another (so it excludes general day to day moving around, going to the grocery store etc. but includes things like Providence to Newport or Nahant).

I was quite impressed. I've done slightly over 120,000 miles, which is almost 5 times around the Earth, and about halfway to the moon. I haven't quite made the entire Canadian coastline, though...
I'm going to keep this dataset and re-graph it when I graduate. How many miles can one travel in 4 years of undergrad education? We shall see.



* Obviously I am using miles because I have lived in the US too long.

Friday, April 1, 2011

"Spring Break"


My next real break will probably occur in 2012. At least spring will get here sooner...