Big Data & Real Time Analytics at Idealo.de !

Big Data & Real Time Analytics bei Idealo.de

Willkommen zur nächsten Runde mit zwei Punkten:

• Talk 1: Kai Wähner, “How to Apply Big Data Analytics and Machine Learning to Real Time Processing

• Talk 2: Nico Ring (HPI), Lawrence Benson (HPI), Martin Gerlach (idealo) “Moving from DIY clusters to Spark for Data Processing

Wir freuen uns sehr, diesmal idealo.de als Host zu haben!

Bis dann.

Euer Big Data Beers Team

Abstract Talk II:

Abstract: 
The idealo price comparison platform processes large amounts of e-Commerce data provided by registered online shops. Over the past 15 years, the data volume has been constantly increasing, and so has the need for higher processing speed.
In order to consolidate the current heterogeneous architecture and improve speed and scalability, idealo and the Hasso Plattner Institute are evaluating a new approach using the stateful streaming capabilities of Spark in a joint project. Tasks that have to be performed on the data include normalization, deduplication, product matching and classification.

Abstract Talk I:

“Big Data” has gained a lot of momentum recently. Vast amounts of operational data are collected and stored in Hadoop and other platforms on which historical analysis will be conducted. Business Intelligence tools and distributed statistical computing are used to find new patterns in this data and gain new insights and knowledge, that can then be leveraged for promotions, up- and cross-sell campaigns, improved customer experience or fraud detection.

One of the key challenges in such environments is to quickly turn these new found insights and patterns into action while processing operational business data in real time. This is necessary to ensure we are making customers happy, increase revenue, optimize margin or prevent fraud when it matters most. “Fast Data” provides a stream processing approach to automate decisions and initiate actions in real-time that are based on the statistical insights as obtained from Big Data platforms.

This session uses real world use cases and success stories to explain the concepts behind stream processing and its relation to Hadoop, Spark, and other big data platforms. The session discusses a flexible solution architecture that combines the speed of fast data decisioning with the intelligence obtained from big data analysis. We will zoom in on different implementation patterns, best practices and pitfalls for implementing a closed loop system from big data capture and storage, historical analysis to find insights, capture these insights into statistical and mathematical models and algorithms, and deploy these models to a real-time processing systems to turn these insights into action.

A live demonstration illustrates how a developer can leverage different technologies, frameworks and products to implement such closed loop approach including big data analytics, machine learning, stream processing, model fitness tracking and human oversight. The audience will learn how to choose the right tool for the right job and how to combine them. The live demonstration is built on technologies and frameworks such as Apache Hadoop (HDFS, Hive, HBase, Flume, Zookeeper), Apache Spark (MLlib, SparkSQL, SparkR), Stream Processing (Apache Storm, TIBCO StreamBase), and statistical platforms such as R language based TERR, PMML, H2O’s Sparkling Water and Spark’s MLlib.

Bio

Kai Wähner works as Technical Lead at TIBCO. Kai’s main area of expertise lies within the fields of Integration, Big Data, Analytics, SOA, Microservices, BPM, Cloud Computing, Java EE and Enterprise Architecture Management. He is speaker at international IT conferences such as JavaOne, ApacheCon or OOP, writes articles for professional journals, and shares his experiences with new technologies on his blog (www.kai-waehner.de/blog). Contact: kontakt@kai-waehner.de or Twitter: @KaiWaehner. Find more details and references (presentations, articles, blog posts) on his website: www.kai-waehner.de

Data Visualization!

Tableau Part I: The beautiful science of data visualization

Why visualizations you may ask? The human perception system is visual and can process images, pictures, diagrams, etc. much quicker than it can understand raw numbers. But raw number tables have been used so often in the past and even still been created too much in the present. They are hard to understand and therefore many decision are made based on opinions or best guesses. Oliver Linder will show how to analyze data visually in an easy and fast way… and even working with data is fun now!

Tableau Part II: How Tableau can support working with Big Data

The world is generating more and more data (unstructured/structured). But just collecting data isn’t enough. They want to be analyzed. Oliver Linder demonstrates how Tableau can make sense of the data by connecting to Big Data Platforms easily. What prerequisites are required, what can be done if the Big Data platform doesn’t perform well? We will discuss approaches to get the most out of using a modern visual analytics tool on top of Big Data.

We will have Beer and Bionade (each 1€).

Happy to see you all!!

Tipps zur Anfahrt (in German):

Zwischen U-Amrumer und U-Leopoldplatz sind die großen Betonbuchstaben der Beuth Hochschule (rechts von der schwarzen Dampfmaschine). Rechts davon ist die große Mensa. Wenn man vor der Mensa steht, ist es das Gebäude 25m links davon (Haus Bauwesen). Dort hinein und gleich die linke Treppe hoch bis geht nicht mehr. Es wird alles mit “Big Data Meetup” ausgeschildert sein!

Machine Learning and Deep Learning

Back from summer vacation we want to start a fresh series of new events!

Talk 1:

The History and Near Future of Deep Learning, Dave Kammeyer (30-45 min)

In the past few years, the field of Deep Learning has emerged from seemingly nowhere and taken over the fields of Speech Recognition, Image Recognition, Natural Language Processing and (soon) Machine Translation. In this talk, Dave will give an overview of the field, its history, and some of the near-term opportunities and challenges in Deep Learning.

Talk 2:

Infrastructure, Data and Machine Learning at lateral.io (30 min)
Benjamin Wilson & Stephen Enright-Ward

Lateral.io is a Berlin-based machine learning startup. We offer an API that enables content providers to serve recommendations based on the textual content and user behaviour, and that allows enterprises to navigate and rediscover their huge stores of documents. Our service is multi-domain and multi-lingual. I’ll talk a little about the architecture of our API ecosystem, about our machine learning and about our experience using popular big data tools.

Lightning Talk 3:

Propose your Lightning Talk! (15 min)

Furthermore we will raffle free tickets for 

A) the distributed-matters.org IT conference on 19th September in Berlin. Don’t miss the chance to come or use our discount code for a 20% discount to talk to Kyle Kingsbury (“Call me maybe project”), Salvatore Sanfillipo (Redis) and many more…

B) the 2 day Flink-Forward Conference 12./13th October at Kulturbrauerei with speakers from OGoogle, Huawei, Amadeus, Ericsson, Zalando, etc.  and including two days of workshops.

Tipps zur Anfahrt:

Zwischen U-Amrumer und U-Leopoldplatz sind die großen Betonbuchstaben der Beuth Hochschule (rechts von der schwarzen Dampfmaschine). Rechts davon ist die große Mensa. Wenn man vor der Mensa steht, ist es das Gebäude 25m links davon (Haus Bauwesen). Dort hinein und gleich die linke Treppe hoch bis geht nicht mehr. Es wird alles mit “Big Data Meetup” ausgeschildert sein!

On Real-Time Monitoring and Data Analytics

Dear Big Data Beers Members,

it is my pleasure to announce the next meetup!

This time it is all about real-time monitoring, unified data analytics and sensor data.

We are extremely looking forward to meet up with
Tobias and Nakul from Trademob,
Robert from dataArtisans and
Martin Scholl.

Our schedule:

“Apache Flink: Unified data analytics with a streaming engine”
Robert Metzger

“Real-Time Monitoring of Distributed Systems”
Tobias Kuhn, Nakul Selvaraj. 

“Cities, Us & Bigger Visual Data” a lightning talk 

Martin Scholl

It would be a pleasure to have you all here!

Crate.IO: Jodok Batlog

Dear all, welcome to 2015 – The year of hoverboards and self-lacing shoes.
We are extremely looking forward to our first meetup this year.

[Update 1]
Just letting you all know that Jodok is very much looking forward to the meetup and that Crate.IO will be sponsoring the drinks this time.

[Update 2]
We are sorry to announce, that Dirk Bartels cannot make it this time. The talk is postponed.
Instead we’re calling out to people who are interested in giving a Lightning Talk. You can use the contact button on this page.

Talk of the night:

Data Storage Layer on Docker: Crate.IO
Jodok Batlog from Crate.IO

Postponed: Big Data at idealo – how we cope with handling daily a billion external data entries in near real time in a 24/7 environment
Dirk Bartels from Idealo

Lisa Green & Stephen Merity: CommonCrawl.org!

Dear Big Data Beers Members,

it is my pleasure to announce the next meetup!

Lisa Green and Stephen Merity will speak about CommonCrawl.org!

The first will be a visionary talk and the second a technical talk.

Speaker

Lisa Green www.linkedin.com/in/lisagreen

Title
Big Open Web Data

Abstract
The Web is the largest collection of data in human history and can provide immensely rich corpus for scientific research, technological advancement, and innovative new businesses.  It is crucial for our information-based society that the Web be openly accessible to anyone who desires to utilize it.  The Common Crawl Foundation builds and
maintains an open repository of web crawl data that can be accessed and analyzed by everyone.  This presentation will discuss the relationship between open data and innovation, explain the mission and vision of Common Crawl, and demonstrate the value of an open repository of web crawl data through an overview of previous work.

Speaker
Stephen Merity http://smerity.com

Title
Experiments in web scale data

Abstract
The Common Crawl corpus contains petabytes of web crawl data and is a treasure trove of potential experiments. But the scale can be intimidating! To introduce you to the possibilities and to help you navigate such a vast collection, this presentation will take a a detailed, technical look at how the data has been used by various experiments, and how you can use a variety of frameworks handle the task of processing and analyzing such a dataset.

It would be a pleasure to have you all here!

Aerospike Talk and Flink Hackathon I

Dear all,

despite official school holidays we have two events with two topics in the pipeline. We start the last Wednesday this October (29th) with the following two topics:

1. Khosrow Afroozeh (Aerospike) will talk about Aerospike / Big Data

2. Then we have a small hackathon to get up and running with Apache Flink!

The abstract for the Aerospike talk and the detailed set-up / datasets for Flink will follow soon!

So stay tuned and we hope to see you all!

Your Big Data Beers Team

Aerospike Talk and Flink Hackathon I

Dear all,

despite official school holidays we have two events with two topics in the pipeline. We start the last Wednesday this October (29th) with the following two topics:

1. Khosrow Afroozeh (Aerospike) will talk about Aerospike / Big Data

2. Then we have a small hackathon to get up and running with Apache Flink!

The abstract for the Aerospike talk and the detailed set-up / datasets for Flink will follow soon!

So stay tuned and we hope to see you all!

Your Big Data Beers Team