Saturday, April 30, 2016

Lev Manovich - 100 billion rows per second: the culture industry in the early 21st century


After around 2013, we start seeing more discussions of social and political issues around the use of large-scale consumer and social media data and automatic algorithms. These discussions cover data and law, data and privacy, data and labour, etc. The events at the NYC-based Data & Society Institute offer many examples of such discussions. As did the Governing Algorithms conference at NYU in 2013, and the Digital Labor conference at New School for Social Research in 2014. In terms of publications, the academic journal Big Data and Society, from foundation in 2014 onward, is of central significance.

However, I have not yet seen these discussions or publications cover the idea I am proposing here – which is to think of media analytics as the primary determinant of the new condition of the culture industry, marking a new stage inmedia history. The algorithmic analysis of "cultural data" and the customization of cultural products is at work not only in a few visible areas such as Google Search and Facebook news feeds that have already been discussed – it is also at work in all platforms and services where people share, purchase and interact with cultural goods and with each other. When Adorno and Horkheimer were writing Dialectic of Enlightenment, interpersonal interactions were not yet directly part of the culture industry. But in "software culture", they too have become "industrialized"

Lev Manovich - 100 billion rows per second
The companies that sell cultural goods and services online (for example, Amazon, Apple, Spotify, Netflix), organize and make searchable information and knowledge (Google), provide recommendations (Yelp, TripAdvisor), enable social communication and information sharing (Facebook, QQ, WhatsApp, Twitter, etc.) and media sharing (Instagram, Pinterest, YouTube, etc.) all rely on computational analysis of massive media data sets and data streams. This includes information about online behaviour (browsing pages, following links, sharing posts and "liking", purchasing,), traces of physical activity (posting on social media networks in a particular place at a particular time), records of interaction (online gameplay) and cultural "content" – songs, images, books, movies, messages, and posts. Similarly, human-computer interaction – for example, using a voice-user interface – also depends on computational analysis of countless hours of, in this case, voice commands.

For example, to make its search service possible, Google continuously analyses the full content and mark-up of billions of web pages. It looks at every page on the web it can reach – the text, layout, fonts used, images and so on, using over 200 signals in total). To be able to recommend music, the streaming services analyse the characteristics of millions of songs. 

For example, Echonest, which powers Spotify, has used its algorithms to analyse 36,774,820 songs by 3,230,888 artists. Spam detection involves analysis of texts of numerous emails. Amazon analyses purchases of millions of people to recommend books. Contextual advertising systems such as AdSense analyse the content of web pages in order to automatically select relevant ads for display on those pages.

Video game companies capture the gaming actions of millions of players to optimize game design. Facebook algorithms analyse all updates by all your friends to automatically select which ones to show in your feed. And it does so for every one of Facebook's 1.5 billion users. According to estimates, in 2014 Facebook was processing 600 terabytes of fresh data per day.

The development of algorithms and software systems that make all this analysis possible is carried out by researchers in a number of academic fields including machine learning, data mining, computer vision, music information retrieval, computational linguistics, natural language processing and other areas of computer science. The newer term "data science" refers to professionals with advanced computer science degrees who know contemporary algorithms and methods for data analysis (described by the overlapping umbrella terms of "data mining", "machine learning" and "AI"), as well as classical statistics. Using current technologies, they can implement the gathering, analysis, reporting and storage of big data. 

To speed up the progress of research, most top companies share many parts of their key code. For example, on 9 November 2015, Google open sourced TensorFlow, the data and media analysis system that powers many of its services. Companies also open sourced their software systems for organizing massive datasets, such as Cassandra and Hive (Facebook).

The practices involved in the massive analysis of content and interaction data across media and culture industries were established between approximately 1995 (early web search engines) and 2010 (when Facebook reached 500 million users). Today they are routinely carried out by every large media company on a daily basis – and increasingly in real-time.

This is the new "big data" stage in the development of modern technological media. It follows on from previous stages such as mass reproduction (1500-), broadcasting (1920-) and the Web (1993-)…
Read more:
http://www.eurozine.com/articles/2016-02-02-manovich-en.html

see also
Paul Mason - Global cybercrime has infected the very soul of capitalism with evil
Paul Mason - The end of capitalism has begun
George Monbiot - Growth: the destructive god that can never be appeased