--------------------------- PONE-D-14-28071 Performance of Social Network Sensors During Hurricane Sandy PLOS ONE --------------------------- Supporting data --------------------------- 1. Disclaimer The data provided here is intended solely for the purpose of verification of the study presented in the paper "Performance of Social Network Sensors During Hurricane Sandy". Due to the Twitter and Topsy Analytics policies tweets extracted from this dataset cannot be publicly displayed as part of a commercial application or website. The datasets in their current form provided in this collection are a property of the National Information and Communications Technologies Australia (NICTA). If you would like to collaborate and use the data for your research please contact Dr Manuel Cebrian: manuel.cebrian@nicta.com.au --------------------------- 2. Data content and structure The collection contains two datasets stored as tables of the PostgreSQL database: a) table of tweets (combined_rawdata): Contains following information (refer to SQL script in Section 4b): tweet ID, message text, timestamp, user ID, count of followers, count of followees, relative sentiment, absolute sentiment, discrete sentiment, retweeted status, offset time (unused), latitude, longitude, PostGIS geometry object based on latitude and longitude b) table of followees (combined_followees): Contains following information (refer to SQL script in Section 4c): user_id, followees list Note - list of followees is a concatenated string from two separate sets described in the paper (hashtag and keywords sets) and as such strings often contain duplicated user IDs. In implementing 'friendship paradox' method make sure that your random selection of a friend selects one from a cleaned-up list of unique IDs of followees. --------------------------- 3. Repository content Please refer to http://wiki.datadryad.org/Large_File_Transfer Large files are available for download split into 1GB parts. Tweets database contains 5 separate pieces (rawdata1,rawdata2,...,rawdata5) and followees database 9 separate pieces (followees1,followees2,...,followees9). Once downloaded, the files could be recombined with 'cat' command (Unix, Linux, MacOSX) or 'copy' command on Windows: cat rawdata[1-5] > combined_rawdata.backup cat followees[1-9] > combined_followees.backup --------------------------- 4. Restoring the tables You need to create PostgreSQL database and restore the tables from backups: combined_rawdata.backup combined_followees.backup We recommend using pgAdmin with PostGIS extension enabled. In pgAdmin you need to - a) create a database: -- Database: sandy -- DROP DATABASE sandy; CREATE DATABASE sandy WITH OWNER = postgres ENCODING = 'UTF8' TABLESPACE = pg_default LC_COLLATE = 'C' LC_CTYPE = 'C' CONNECTION LIMIT = -1; b) create tweets table: -- Table: combined_rawdata -- DROP TABLE combined_rawdata; CREATE TABLE combined_rawdata ( tweet_id bigint NOT NULL, text text, created_at timestamp with time zone, user_id bigint, user_followers_count integer, user_friends_count integer, topsy_doc_sentiment_rel numeric, topsy_doc_sentiment_abs numeric, topsy_doc_sentiment integer, retweeted_status_id bigint, time_min timestamp with time zone, lat_final numeric, lng_final numeric, geom geometry(Point,4326), CONSTRAINT combined_rawdata_tweet_id_pkey PRIMARY KEY (tweet_id) ) WITH ( OIDS=FALSE ); ALTER TABLE combined_rawdata OWNER TO postgres; c) create followees table: -- Table: combined_followees -- DROP TABLE combined_followees; CREATE TABLE combined_followees ( user_id bigint NOT NULL, followee_ids character varying, CONSTRAINT combined_followees_user_pkey PRIMARY KEY (user_id) ) WITH ( OIDS=FALSE ); ALTER TABLE combined_followees OWNER TO postgres; d) restore tables 'combined_followees' and 'combined_rawdata' from backup files, setting the owner to 'postgres'. e) create indexes on 'combined_rawdata': -- Index: combined_rawdata_geom_index -- DROP INDEX combined_rawdata_geom_index; CREATE INDEX combined_rawdata_geom_index ON combined_rawdata USING gist (geom); -- Index: combined_rawdata_user_id_idx -- DROP INDEX combined_rawdata_user_id_idx; CREATE INDEX combined_rawdata_user_id_idx ON combined_rawdata USING btree (user_id); ---------------------------