DataSpark: Setting the stage for Big Data Analytics

Strata + Hadoop World is one of the world’s leading conference on big data, featuring some of the most progressive leaders in the industry, delving into areas that include Big Data in telecommunications and finance; smart cities and urban automation; IoT and intelligent real-time applications; data science and advanced analytics; chat, machine learning, and AI; security, governance and ethics; as well as the issues involved in becoming a data-centric company,

Other topics include design, visualisation, and VR, Hadoop use cases, Hadoop internals and development, production-ready Hadoop, Spark and beyond.

Strata + Hadoop World first came to Singapore in 2015 with DataSpark participating as an exhibitor. The sold-out conference also featured two speakers from DataSpark covering how the telco landscape could be invigorated by using data assets to create new applications, as well as the use of telco data to monitor traffic in Singapore.

With the resounding response during the inaugural conference in Singapore, DataSpark participated again in this year’s Strata + Hadoop World 2016, as an exhibitor with a booth in the Sponsor Pavilion.

As a thought leader in its mobility intelligence, two speakers from DataSpark were also showcased in the conference.

Mobility as a vital sign of people and the economy

Ying Shao Wei, Chief Operating Officer, spoke on “mobility as a vital sign of people and the economy”. The audience learnt how telco-enabled insights could provide deep, refreshing and actionable perspectives on the health of urban infrastructure such as road and train systems; the economy, in terms of trade activities and major tourism events; as well as the general well-being of the populace.

These telco-enabled insights were gleaned from the software platforms and data science engines that DataSpark has built to make sense of the interconnected world of digital devices and more than two hundred million users across Singapore, Australia, Indonesia, and Thailand.

The company has successfully applied data science methodologies and techniques, such as data mining and machine learning, to make discoveries about the interactions between users and their increasing number of devices, from mobile phones and tablets to TV set-top boxes and broadband devices.

With their expertise and developments in Big Data and analytics, DataSpark is well-positioned to ride the wave of Big Data adoption that the industry and governments are looking to embrace.

The Singapore government, for example, has announced a fifty percent increase in budget to allocate S$30 billion to build infrastructure, including Changi Airport’s new Terminal 5 and improvements to public transport and housing. This is in line with the government’s increase in global infrastructure spending last year.

Ying shared how DataSpark’s software was helping government agencies and the travel and hospitality sector understand tourism trends in Southeast Asia.

Information was being collated using anonymised and aggregated cellular data. These were used to analyse how different segments of international travellers in Singapore exhibited different mobility patterns. “Medical tourists” from the region were observed to generally spend a day at a medical centre followed or preceded by shopping activities. British tourists tended to leave their hotels later in the morning having spent longer nights out – compared to leisure visitors from China. When Singaporeans traveled to Thailand on their long weekends (such as this year’s long weekend over national day), they quickly converged on the Bangkok shopping malls in the mornings!

Ying also explained how DataSpark’s software provided hyperlocal insights about footfall activities at various centers of economic activity, such as retail malls, convention centers and hotels; as well as large-scale tourism events, including F1, SEA games, and other national events.

The footfall volume and dwelling time at these events, as well as the profile of the visitors, served not only as important indicators of the economic success of these places and events, but enabled targeted economic policies and timely interventions by different economic actors. For example, granular and accurate spatio-temporal insights allowed for more personalised marketing – targeting the right place, right time, and right segment.

The real-time insights from DataSpark’s software help the organisers and public authorities better understand how crowds build up and disperse and detect anomalies in the flow of people, enabling a better marshalling of ground resources to ensure public safety.



From telco data to spatial-temporal intelligence APIs: Architecting through microservices

The other speaker from DataSpark was Chandra Sekhar Saripaka, who spoke on how to go “from telco data to spatial-temporal intelligence APIs”, by “architecting through microservices”. Chandra, who is a Senior Data Engineer at DataSpark, delved into the production architecture at DataSpark and how it worked through terabytes of spatial-temporal telco data each day in PaaS mode. He also showcased how DataSpark operated in SaaS mode.

Chandra shared with fellow data scientists attending his talk how the creation of big data solutions demanded a well-thought-through system architecture. This was especially the case for solutions that processed data at terabytes scale and which produced spatial-temporal real-time insights at speed.

The architecture would need to support the creation of a data pipeline that involved ingestion, processing, indexing, caching and retrieval of insights; at the end of which a collection of data metrics had to be propagated across all the system layers.

Chandra detailed how DataSpark internalised DevOps infrastructure into the architectures.

A Docker-based platform on Mesos with microservices served to produce a service resilient ecosystem for managing modular APIs and packaging infrastructures to support both cloud and on-premise data centers.

For faster and better ETL, Chandra discussed how the ease of ETL processes from any source to sink using distributed computing of Spark was achieved by componentisation of Spark APIs with other ecosystem tools. This helped the ingestion and processing layer and also accorded faster access to operate in both streaming and batch modes.

Chandra also described how the features from the data could be translated into APIs that could then be used in dashboards and exposed as APIs without sacrificing the data artifacts. This was achieved by streamlining the indexing and caching process to translate the semi processed data to temporal and spatial collections and caches.

The session concluded with how flexible insights could be generated through APIs across different dates, hours, and minutes over different locations. Chandra shared how the flexibility of APIs through Docker-based microservices could be achieved by querying on temporal and location indices and caches.



(left to right) Ms Ying Li, CTO; David Kurniawan, Product Manager; Andrew Tan, Data Science Consultant


Networking with other thought leaders

DataSpark’s booth in the exhibitor’s area attracted much interest from attendees and speakers at the conference, during the two days that it was put up at the Sponsor Pavilion, including enquiries on career opportunities with the company.

Staff from DataSpark also took the opportunity to exchange ideas and experiences in Big Data solutioning with other thought leaders and practitioners in the field.

For more information on how you can exploit Big Data and analytics for your business or organisation, or for enquiries on career opportunities with the company, contact DataSpark today.