Clubhouse & a Culture of Voyeurism

Clubhouse is the latest viral example of a "real-time" social network. Sure, over the last decade there have been a flurry of new social media networks. But when you think about it, each one is just a slight iteration off of Facebook.

Facebook but with a word count.

Facebook but with a required image.

Facebook but with a required video.

Facebook but with disappearing content.

Facebook but with editing tools.

All of these networks have feeds and followers, and they are "content- centric." Clubhouse breaks from the familiar paradigm. It's not about freezing your mind into a server. It's about live co-presence. It's refreshing. It points to a bright future of real-time social media. We can imagine iterations that exist in other forms, like video, or through virtual reality.

However, based on my first impression of Clubhouse (which could change over time), I get the feeling that certain design decisions are limiting the value people can take from real-time audio. A real-time network should encourage participation and facilitate random collisions of like-minded people. But Clubhouse is a stage. Actually, it's a circuit of digital stages. And like all stages, there is a massive audience peaking in on a handful of presenters. It's contributing to a culture of voyeurism that thrives off the relationship between influencers and scrollers.

Is it possible to design a social network where active participation is the default mode of usage? Or will social hierarchies always emerge when humanity inserts itself into a digital network?

Parisian Cafes

Let's think about Twitter for a second. People have said it's like a Parisian cafe on the internet, but it's more like a bulletin board. You take your thoughts, you freeze them, then you pin them to the Internet, where they exist and accumulate likes until the end of time. Even direct messaging, which is "live" chat, is just a private thread of frozen words.

Clubhouse is one of the first networks that is truly live.

Sure, the sessions can be recorded, but it's all about the potential magic that can emerge when a group of people show up at the same time. It's not a visual place, but it's a digital place. You can drop into it, meet other randos, and jive. The cafe metaphor is much more appropriate for real-time social networks.

My problem with Clubhouse is that it's a shitty cafe.

I'm stoked that a cafe could exist on the Internet! Don't get me wrong. But this isn't the best we can do. Not by far. Clubhouse is like a strip of cafes, but each one has a main event that no one is allowed to talk during. Clubhouse is like Coachella, but the audience is on mute. Clubhouse is like a podcast that I can dial into, but don't.

Stages vs. Hallways

During my 2-3 hours on Clubhouse, I didn't participate in any of the discussions. I was passive. Maybe that's my own fault. No one was stopping me from raising my hand. Sometimes an admin would even encourage it. But I didn't, even though at times I had things to say, I didn't. I would guess that most others feel the same. I get the sense that there is a 1:100 relationship between participators and voyeurs. Maybe it's something people need to get used to. Or maybe it's just more comfortable to be a consumer than a contributor.

In our content-based social networks, the feed acts as a public stage. But in Clubhouse, it was designed to be a literal digital stage. It's the perfect metaphor. The classic "feed" underwent a phase change to become the thing it was destined to be. But if we think about conferences, which are groups of like minded people, the real value doesn't lie in the stage. It exists in the hallways, after one event, and before the next. I think we could design an audio-based social network that's more like the hallway than the stage. I want to talk to the interesting people that aren't on the stage.

The value in real-time social media could be in participation, serendipity, and community. Voyeurism isn't a given, it's an emergent feature of a particular design strategy. We could do away with the stage & hand-raising dynamic all together. We can take inspiration from other digital audio-strategies that exist, but haven't been conglomerated into a network with a social graph. If you built a social network off of different design principles, it could result in a community of an entirely different ethos.

Spatial Audio

A new concept called Spatial Audio has been showing up in video-conferencing apps, but it hasn't yet made it's way into a network. The best example I've found is Spatial Chat. It's great for virtual events, happy hours, and conferences. I'm helping put together a 5-day virtual conference with around 800 people in March using Spatial Chat.

The idea is that each participant exists as a bubble with the autonomy to move. You can drag and move your bubble around the browser (a digital space). The novelty here is that you only hear & see the bubbles that are within your proximity. You can move around to engage with different people in the room. This allows for 50 people to be in a single space, all talking at the same time. It allows for non-rigid break out rooms that aren't pre-defined. They form and collapse and re-form as they would at an actual cocktail party.

The other neat thing is that you can have different rooms within a single event. These are like events within the tab that people can move between. In our conference we have a lobby, a stage, and then a virtual booth. We are hosting technologies, and each one is getting their own tab / virtual booth. Imagine how this could be applied to a social network. An event could be generated around a happening (ie: Tesla buys $1.5 billion in Bitcoin). There could be a series of tabs/rooms that each cover a unique line of discussion around it. This is almost like Icebreaker, a network that auto-generates 1 on 1's with a series of pre-defined prompts. The difference here is that you have the autonomy to move in and around within a prompt.

Audio nuggets

What would a social network be like that fused spatial audio and audio notes? It would be a pairing between the traditional content-based paradigm, and the new real-time model that Clubhouse has introduced.

Maybe the initial activity is to meet people in a space and talk about an idea, but then each person can record their own voice note on a topic or issue. You hit "Record Take," and you turn into a "ghost." Your avatar becomes half visible, you are muted, and everyone else in the room is muted. This lets you record a 30-second audio note that is associated with the prompt & topic you are in.

An audio nugget can exist in 3 places:

1) It stays anchored in the room it was created in

When you move through a space, in addition to approaching people and talking to them, you can approach nuggets to hear what people from the past had to say. Imagine stumbling upon a 30 second audio nugget from 6 hours ago on how Naval thinks about Tesla buying Bitcoin.

2) It gets linked into your profile

A person's profile is just an audio stream of their take on the series of issues they engage with. The content is a 30 second audio nugget, but you could listen to someone for 10 minutes straight, and hear 20-30 ideas that they've recorded in the past.

3) It gets shown in a traditional metric-based feed.

This could become a new way to consume the news. Bite-sized audio nuggets. When a news article pops-up, you get to understand through the nuggets that are trending and getting the most engagement. Instead of long-form YouTube videos of podcasts, I make sense of a happening from short bursts from the best people on the Internet.


Clubhouse is a neat experiment, and I need to both 1) spend more time in it, and 2) be more active in it. There is big potential in real-time audio networks, but I think there are two missing elements. Spatial audio will allow for active participation within large groups. Audio notes will allow pieces of viral content to emerge from spontaneous conversations.

Weekly Newsletter:


| Twitter - NFTs - michael@michaeldean.site - 2021 |