Jurong East

Twitter Sentiment Analysis 🐦😀📈

An investigation of the twitter activity in the Jurong East Area

The city is a complex system of elements and interactions. Twitter has more recently allowed people to share messages with the wider public. Some of these are geo-located enabling us to see where these messages have been shared from. This has the potential to enable us to understand an urban area better which may in tun enable a more informed planning of future requirements.

The data shown here is a capture of publicly available tweets from 2012-2015 in the Jurong East Area. It attempts to show some of the key insights of this data. More specifically it focuses on the use of 'emoji' (small graphics used to represent emotion amongst other things). These are used as simple language agnostic method of capturing how people are feeling at a certain space.

Base Data

There were are over 360,000 tweets in this area over 2015.
This covers a reasonable about of the Jurong East area and already shows some valuable insight into how much various areas are used. We can see a strong correlation between major MRT routes and transportation interchanges as well as shopping hubs. We can also see that general activity is in the residential areas with less social activity in the industrial areas to the south west.

The messages of the tweets cover many languages and sentiments making it hard to decipher the sentiment of these easily. However one approach is to look at the emoji used which represent a more universally understood set of icons to express emotion. Of the original data set over 40,000 tweets used the emoji with the below facial expressions:
😀😬😁😂😃😄😅😆😇😉😊🙂🙃☺️😋😌😍😘😗😙😚😜😝😛🤑🤓😎🤗😏😶😐😑😒🙄🤔😳😞😟😠😡😔😕🙁☹️😣😖😫😩😤😮😱😨😰😯😦😧😢😥😪😓😭😵😲🤐😷🤒🤕😴
These represent a set which are clearer to understand in terms of emotional sentiment as compared to 🤖 for example. Usefully we can see that the spatial distribution of these represents a reasonable sample of the overall data.

Data Aggregation

Due to the nodal distribution of tweets around major centres, it helps to simplify the data based on location. Here we can see clusters of points with areas which indicate number of aggregated tweets. Looking at generating collections of non-duplicate nodes of 25m radius and 50m radius we can see diffrent levels of detail. It appears that 25m is a good scale for this area as the details of the MRT and other areas. Where as the 50m actually makes much of the residential area look uniform and still isn't able to combine the messages around the major hubs. As a result 25m radius data aggregation has been selected.

We can see a concentration of activity happening at major transport interchanges such as Jurong East and West stations, as well as mode local food courts such as Taman Jurong Food Center. Roads with activity are typically major but not made up a large roadways often with bypass type roads near them. For example Corporation Drive compared to Corporation Road or Jurong East Ave 1 on the east side of Jurong Town Hall Road compared to the west side. These seem to indicate that smaller scale roads and clustering of amenities help drive up social engagement. This data also highlights the relatively under represented parks such as the Lake District and Bukit Batok Nature Reserve. Even the Bird Park tourist attraction shows very little relative activity.

Time

Time also plays an important factor. If we break down the tweet data into a number of discrete time zones we can see some other features. It is possible to observe the typical activity times. Interestingly these appear to be focused in the morning and just after midnight. The evening represents less active times then expected and the day is quite quiet. Perhaps highlighting the 'commuter belt' nature of the area. It is also possible the younger demography of twitter also effects the activity times.

One of the striking features of this data is that amount of activity occurring during the night. Much of the evening activity is centered around eating and shopping as to be expected however this is considerably less than the morning. Similarly in keeping with a mostly residential area there is relatively low volumes of afternoon activities centered at peoples homes. However there is a noticably high amount of activity at night in both residential and malls/public areas and this extend to as late as 00:00-4:00.

Happy Sad

Using the emoji it is possible to also obtain a sense of the emotion or sentiment of a place. One approach is by coding the emoji on a scale from -5 (sad) to 5 (happy). Below is the given number for each of these:
😀:5, 😬:2, 😁:5, 😂:2, 😃:5, 😄:5, 😅:3, 😆:3, 😇:2, 😉:1, 😊:4, 🙂:3, 🙃:1, ☺️:5, 😋:2, 😌:0, 😍:3, 😘:3, 😗:1, 😙:2, 😚:2, 😜:0, 😝:0, 😛:0, 🤑:0, 🤓:1, 😎:2, 🤗:2, 😏:0, 😶:0, 😐:0, 😑:0, 😒:-2, 🙄:-2, 🤔:-1, 😳:-2, 😞:-3, 😟:-3, 😠:-4, 😡:-5, 😔:-3, 😕:-3, 🙁:-4, ☹️:-5, 😣:-4, 😖:-5, 😫:-5, 😩:-5, 😤:-4, 😮:-2, 😱:-3, 😨:-3, 😰:-3, 😯:-1, 😦:-1, 😧:-1, 😢:-4, 😥:-4, 😪:-3, 😓:-1, 😭:-3, 😵:-3, 😲:0, 🤐:0, 😷:0, 🤒:-2, 🤕:-2, 😴:-1

The emoji used in tweets can then be collated and summed using this scale to find the overall score of a collection of messages. This currently counts all emoji in a message, which allows for extremely happy 😊😊😊 message to be counted as more expressive. However it might also be worth just counting the overall sentiment of a message. From his we can see some interesting qualities such as the average sentiment which is 0.38 over all the tweets (most are somewhat positive), but relatively neutral considering that a post with just a 😀 would get 5.0.

This data is less clear and better analysis needs to be undertaken. Generally it appears that 'happier' places tend to be more focused areas around major shopping areas and food establishments. Where as 'sadder' places are typically more dispersed and residential.

This is just some of the insights that can be extracted from social data. Other views could look at features such as differences in weekend and weekday activity. Using the message itself to find specifically what the activities are being undertaken and where (sport, dining, shopping etc.).

Whilst it represents a self selected set of people it is useful as it highlights more social activities which are not typically captured using other sets. More data can also be introduced this represents just 2015 other years can be added, but also other sources such as Instagram are also useful.

A wider study is being undertaken to look at the social activities around transport interchanges on the main MRT stations in Singapore. This comparative research aims to understand the relationship between MRT usage and neighbourhood planning especially for 'last mile' journeys and the social use of transit nodes which this study has shown are acutely nodal and concentrated in Singapore.