Preparing Your Data

This page will help you determine what text is valid and what could potentially skew your results. Our technology performs best when samples come from written or spoken language, including conversational language, formal or informal language from a variety of sources including blog posts, survey responses, social media posts, transcribed calls, short text samples, or text messages.

Raw text works best, meaning that it's unnecessary to tokenize, lemmatize, stem, remove stop words, or remove punctuation.

note

To analyze your text in the Receptiviti UI, you must ensure that the sample data is in a single column because the UI cannot score multiple columns of text. Additionally, the UI is case sensitive so ensure that all your data labels match.

Elements to Include and Exclude

Refer to the table below for details surrounding what to include and what to exclude from your text before using the Receptiviti API.

Element	Example(s)	Include?	Action	Comment
Text Encoding		utf-8	Encode your text strings in utf-8	More details here.
@Mentions	`@bigScaryPup`	Yes	Leave in your text if this is relevant to your use case. Exclude, if not.	For most use cases, @Mentions are data noise and not natural language and do not indicate underlying psychology or emotion. Currently, an @Mention adds 1 to the word count (wc).
Hashtags	`#lolnotlol`	Yes	Retain hashtags: we score them.	Hashtags are separated and parsed by the API. The individual components of the hashtags count towards word count. #thiswillbescored will be split up into `this will be scored` and count as 4 words. Currently, hashtags adds the number of tokens in the hashtag to wc and 1 to hashtags.
Emojis	`\xf0\x9f\x8c\xbb` 😂😡	Yes	Retain emojis in your text.	Emojis are visual representations of emotions, common objects and situations. They are powerful tools to uncover psychological and emotional meaning in language.
URLs	`http://receptiviti.com`	Yes	Leave in your text if this is relevant to your use case. Exclude, if not.	For most use cases, URLs are data noise and not natural language and do not indicate underlying psychology or emotion. However, if they are relevant to your use case, feel free to leave them in your text. Currently, a URL adds 1 to the wc and to the urls category.
Email headers	`From: [email protected]`	No	Remove all email headers, and only use email body as text. Remember, if your email body is in html, follow the instructions below to strip html tags from your text.	Email headers are data noise and not natural language for Receptiviti’s metrics. They do not indicate underlying psychology or emotion. They will count towards the total number of words and thereby skew scores.
Email metadata	`Mon, 24 Aug 2020 10:16:07 -0700 (PDT)`	No	Remove all email metadata, and only use email body as text. Remember, if your email body is in html, follow the instructions below to strip html tags from your text.	Email metadata are data noise and not natural language for Receptiviti’s metrics. They do not indicate underlying psychology or emotion. They will count towards the total number of words and thereby skew scores.
Email footers and confidentiality disclaimers	`Head office: 150 Bloor St. West, Suite 310, Toronto, Ontario`	No	Remove all email footers and legal disclaimers from your email. Remember to use only email body as text.	Email footers and confidentiality disclaimers are data noise and not natural language for Receptiviti's metrics. They do not indicate underlying psychology or emotion. They will count towards the total number of words and thereby skew scores.
HTML	`<!DOCTYPE html>`	No	Strip all HTML tags and only retain relevant content within the tags e.g., text within the `<p>` tags could be natural language and therefore valid for analysis.	HTML tags specify formatting, not naturally spoken or written language. The text within some HTML tags may be useful (depending on your application). Tools like BeautifulSoup can help you do this.
Code	`Print("Hello World")`	No	Remove all code snippets from your text.	Code snippets are not natural language and do not indicate underlying psychology or emotion. They will count towards the total number of words and thereby skew scores.

Optimizing Outcomes Using Aggregated Text Methods

Depending on the level of insight you aim to produce, it can be useful to take different approaches to structuring your text samples for analysis. For example, sentence-level insights work best when you want to identify emotions related to key topics of interest. Or, you can split conversational language into chronological segments to evaluate linguistic progression over time. Below are several examples of different useful text aggregation methods:

Analysis of the Individual

Use case: Understanding the character traits of a speaker.
Method: Aggregate all language data from a single speaker. You can use the Group By feature in the UI's charting and graphing tools to aggregate by speaker.

Analysis of Answers to Key Questions

Use case Exploring and comparing trends in responses across individual questions.
Method: Aggregate language data based on individual question (combine the responses that were given in answer to each question).

Analysis of Answers by Question Theme

Use case: Exploring and comparing trends in responses across multiple related questions.
Method: Aggregate language data based on question theme (combine the responses that were given in answer to each set of related questions).

Analysis of Conversation Over Time (longitudinal)

Use case: Evaluating trends in conversation progression or long-term progression.
Method: Split the conversation’s language into chronological segments; segments can be split based on word count, time, a conversation outline, days, weeks, etc. Time segmentation can be done in the UI's Line Chart using the Segments slider.

Analysis of Topics by Emotion

Use case: Identifying emotions related to key topics of interest.
Method: Parse language data into sentences and sore using pre-built or custom taxonomies and our emotions framework. See the Topics by Emotions or Emotions by Topic chart types to measure this.

note
The UI splits text samples by sentence by default.

Analysis of Rapport (Language Style Matching)

Use case: Understanding the extent to which two or more speakers are attentively engaged.
Method: Analyze and compare each speaker pair’s language. See the Heatmap chart type to measure this.

Elements to Include and Exclude​

Optimizing Outcomes Using Aggregated Text Methods​

Analysis of the Individual​

Analysis of Answers to Key Questions​

Analysis of Answers by Question Theme​

Analysis of Conversation Over Time (longitudinal)​

Analysis of Topics by Emotion​

Analysis of Rapport (Language Style Matching)​