Velosimo | Integration Platform for eGovernment

OpenAPI Directory | Velosimo Admin

The Football Prediction API allows developers to get predictions for upcoming football (soccer) matches, results for past matches, and performance monitoring for statistical models.

text

2021-06-21

v 2

IdealSpot GeoData

Hyperlocal Demographics, Vehicle Traffic, Economic, Market Signals, and More. Use this API to request IdealSpot hyperlocal geospatial market insight and geometry data. ## Detailed Description Use this API as your **local economy microscope** by querying [IdealSpot](https://www.idealspot.com) hyperlocal market insight and geometry data. We offer the most precise, extensive, and frequently-updated local market data in the US. Our data is available across the entire US and can be queried at geographic scales ranging from the micro (Census block) through the macro (nation). Better data and analysis leads to a better understanding of local market opportunities and risks. Integrate with your commercial real estate and marketing applications, machine learning workflows, and other investment analytics. Our goal is to offer the most complete snapshot of the geographically distributed consumer and retail economy. We start with the fundamentals of consumers and business establishments. To connect retailers with consumers, we provide mobility data like vehicle traffic and mobile device data. To describe consumer intent, we provide geolocated digital marketing data. We believe that accurate capital allocation through reliable local market data is foundational to creating robust, healthy, and livable communities for all. We hope you and your clients find tremendous value in this service. ## Features Query data and GeoJSON geometries at these scales for any US latitude and longitude: * Rings (0.5 km+) * Drive time (1-60 minutes) * Bike time (3-60 minutes) * Walk time (5-60 minutes) * Public transit time (5-60 minutes) * Administrative region (US, states, core-based statistical areas, counties, Census-designated places, Census tracts, zipcodes, Census block groups, opportunity zones) | Data Feature | Description | | ------- | ------------------------------| | Demographics, Housing, Spending | *Updated Quarterly*. Current and historical market data including population, spending, and housing. Vendor (PopStats) is relied upon by Walgreens, Ulta Beauty, Blackstone, etc | | Labor, Business Establishments, Economic Conditions | *Updated Quarterly*. Traditional market data including workforce, business establishment counts, and economic conditions like local GDP, unemployment rates. Vendor (PopStats) is relied upon by Walgreens, Ulta Beauty, Blackstone, etc | | Consumer segmentation | *Updated Annually*. Demographics grouped into narrative-oriented segments. | | Vehicle Traffic | *Updated semi-annually*. Gold standard in vehicle traffic data from INRIX. Counts by day of week, time of day and side of street. | | Rings and Travel time polygons | *Estimate in Real-time*. Rings alongside drive time, walk time, bike time, and public transit time polygons. Request as GeoJSON geometries for mapping or use with data queries | | Administrative region polygons | *Updated Annually*. GeoJSON administrative regions from US Census Bureau: block groups, tracts, counties, CBSAs, states, opportunity zones, USPS zipcodes | | Internet Search Volumes | 30 day rolling averages for geolocated advertising potential across 450 business categories from major search engines | | Social Media Interest | 30 day rolling average for geolocated advertising potential across 450 business categories from major social networks | ### Coming Soon! This API powers our local market research platform at [IdealSpot.com](https://www.idealspot.com). The functionality exposed so far is only a portion of our current capabilities. We will be exposing additional API features in time so watch this space! | Data Feature | Description | | ------- | ------------------------------| Mobile device visit counts, points of interest, brands | Fresh point of interest data across 3000+ brands, millions of POI, and historical foot traffic counts using mobile data for those points of interest Origin/destination trips from mobile devices | Map origins and destinations of devices visiting an arbitrary catchment area Competition search | Search all major point-of-interest aggregators in one query Environment/climate | Expected weather patterns like temperature and precipitation Filter search API | Query data for all counties in state, all tracts in MSA, etc Road segment tiles | Plot road segments on maps in interactive applications ## Developer Website For more detail see https://developer.idealspot.com/

text

2021-06-21

v 1.0

Language Identification (Prediction)

Automatic language detection for any texts. Supports over 150 languages.

text

2019-04-30

v 1.0.0

SpellCheckPro

text

2023-03-15

v 1.0.0

Asynchronous Speech-To-Text API Documentation

Rev.ai provides quality speech-text recognition via a RESTful API. All public methods and objects are documented here for developer reference. For a real-time speech to text solution, use Rev.ai's [Streaming API](/docs/streaming). # Base Endpoint The base url for this version of the API is > `https://api.rev.ai/speechtotext/v1` All endpoints described in this documentation are relative to this base url. # Quick Start Follow the [getting started checklist](https://www.rev.ai/getting_started) ## Get your Access Token You can generate your [access token](#section/Authentication/Access-Token) on the [settings page](https://www.rev.ai/access_token) of your account. This access token only needs to be generated once and never expires. You can re-generate your token, however this will invalidate the previous token. ## Submit a File To submit an audio file for transcription to Rev.ai: ``` curl -X POST "https://api.rev.ai/speechtotext/v1/jobs" -H "Authorization: Bearer $REV_ACCESS_TOKEN" -H "Content-Type: application/json" -d "{\"media_url\":\"https://www.rev.ai/FTC_Sample_1.mp3\",\"metadata\":\"This is a sample submit jobs option\"}" ``` You’ll receive a response like this: ~~~ { "id": "Umx5c6F7pH7r", "created_on": "2018-09-15T05:14:38.13", "name": "sample.mp3", "metadata": "This is a sample submit jobs option for multipart", "status": "in_progress" } ~~~ The `id` (in this case `Umx5c6F7pH7r`) will allow you to retrieve your transcript. ## Get Your Transcript Once a transcription job's `status` becomes `transcribed`, you can retrieve the transcript in JSON format by running: ``` curl -X GET "https://api.rev.ai/speechtotext/v1/jobs/{id}/transcript" -H "Authorization: Bearer $REV_ACCESS_TOKEN" -H "Accept: application/vnd.rev.transcript.v1.0+json" ``` Alternatively you can get the plain text version by running: ``` curl -X GET "https://api.rev.ai/speechtotext/v1/jobs/{id}/transcript" -H "Authorization: Bearer $REV_ACCESS_TOKEN" -H "Accept: text/plain" ``` You can poll for the `status` of your job by querying for the job periodically: ``` curl -X GET https://api.rev.ai/speechtotext/v1/jobs/{id} -H "Authorization: Bearer $REV_ACCESS_TOKEN" ``` **Note:** Polling is NOT recommended in a production server. Rather, use [webhooks](#section/Webhooks) to asynchronously receive notifications once the transcription job completes If you have any further questions, contact us at # Submitting Files Two `POST` request formats can be used to submit a file: `application/json` or `multipart/form-data`. ## JSON This is the preferred method of file submission. Uses the `media_url` property to provide a direct download URL to the Rev.ai server. This method supports the use of pre-signed URLs. Links to videos hosted on platforms like Youtube are not valid because they are not direct download links. **Important note on presigned URLs:** Signed URLs usually have an expiration time which is configurable. To ensure the Rev.ai server can access the link, make sure the expiration time is set to 2 hours or more. In the event you plan on resending the same file, make sure to generate a new presigned URL. ## FormData Used to send a local file to the Rev.ai server. This allows the customer to send the file directly from the host machine. Certain limits apply to this format, see the [Async API Limits section](#section/Async-API-Limits) for more detals. # Turnaround Time and Chunking Often, especially for shorter files, your transcript will be ready in 5 minutes or less. It generally takes no longer than 15 minutes to return longer audios. If you require faster turn around time please contact Chunking is the act of breaking audio files into smaller segments. Rev.ai uses this method to decrease turnaround time of audios greater than 3 minutes in length. # Webhooks If the optional `callback_url` is provided, the API will make an HTTP POST request to the `callback_url` with the following request body when the job either completes successfully or fails. ## Sample Webhook **On Successful Transcription Job** ``` { "job": { "id": "Umx5c6F7pH7r", "status": "transcribed", "created_on": "2018-05-05T23:23:22.29Z", "callback_url": "https://www.example.com/callback", "duration_seconds": 356.24, "media_url": "https://www.rev.ai/FTC_Sample_1.mp3" } } ``` **On Failed Transcription Job** ``` { "job": { "id": "Umx5c6F7pH7r", "status": "failed", "created_on": "2018-05-05T23:23:22.29Z", "callback_url": "https://www.example.com/callback", "failure": "download_failure", "failure_detail": "Failed to download media file. Please check your url and file type" } } ``` **Important notes for using webhooks:** The API will make a POST request, not a GET request, to the `callback_url`. The request body is the job details. You can unsubscribe from a webhook by responding to the webhook request with a 200 response. If a webhook invocation does not receive a 200 Rev.ai will retry the `callback_url` every 30 minutes until either 24 hours have passed or we receive a 200 response. For initial webhook testing, you can try using a third party webhook testing tool such as [https://webhook.site/](https://webhook.site/). # Async API Limits The following default limits apply per user, per endpoint and are configurable by Rev.ai support. If you have any further questions, contact us at - 10,000 transcription requests submitted every 10 minutes - 500 transcriptions processed every 10 minutes - Multi-part/form-data requests to the /jobs endpoint have a concurrency limit of 10 and a file size limit of 2GB - POST requests to the /jobs endpoint that use the media_url property do not have a concurrency limit or file restriction. They are only limited by the first two bullet points # Error Handling The API indicates failure with 4xx and 5xx HTTP status codes. 4xx status codes indicate an error due to the request provided (e.g., a required parameter was omitted). 5xx error indicate an error with Rev.ai's servers. When an 4xx error occurs during invocation of a request, the API responds with a [problem details](https://tools.ietf.org/html/rfc7807) as HTTP response payload. The problem details information is represented as a JSON object with the following optional properties: | Property | Description | |------------|-----------------------------------------------| | type | a URI representing the type for the error | | title | a short human readable description of type | | details | additional details of the error | | status | HTTP status code of the error | In addition to the properties listed above, the problem details object may list additional properties that help to troubleshoot the problem. ## Example Errors ``` // Bad Submit Job Request { "parameter": { "media_url": [ "The media_url field is required" ] }, "type": "https://www.rev.ai/api/v1/errors/invalid-parameters", "title": "Your request parameters didn't validate", "status": 400 } // Invalid Transcript State { "allowed_values": [ "transcribed" ], "current_value": "in_progress", "type": "https://rev.ai/api/v1/errors/invalid-job-state", "title": "Job is in invalid state", "detail": "Job is in invalid state to obtain the transcript", "status": 409 } ``` ## Retrying Failed Requests Some errors can be resolved simply by retrying the request. The following error codes are likely to be resolved with successive retries. | Status Code | Error | |---|:---| | 429 | Too Many Requests | | 502 | Bad Gateway | | 503 | Service Unavailable | | 504 | Gateway Timeout | Note: With the exception of the 429 status code, it is recommended that the maximum number of retries be limited to 5 attempts per request. The number of retries can be higher for 429 errors but if you notice consistent throttling please contact us at .

text

2021-08-02

v v1

Einstein Vision and Einstein Language

Provided by [Salesforce](https://www.einstein-hub.com/) � Copyright 2000�2020 salesforce.com, inc. All rights reserved. Salesforce is a registered trademark of salesforce.com, inc., as are other names and marks. Other marks appearing herein may be trademarks of their respective owners. **Last updated:** Aug 17, 2020

text

2021-06-21

v 2.0.1

SelectPdf HTML To PDF API

SelectPdf HTML To PDF Online REST API is a professional solution that lets you create PDF from web pages and raw HTML code in your applications. The API is easy to use and the integration takes only a few lines of code.

text

2021-06-21

v 1.0.0

Semantria

Semantria applies Text and Sentiment Analysis to tweets, facebook posts, surveys, reviews or enterprise content.

Social

text

2018-08-24

v 4.0

Psycholinguistic Text Analytics

We aim to provide the deepest understanding of people through psychology & AI

text

2021-06-21

v 1.0

Tafqit

Convert numbers to their Arabic text representation

text

2021-07-19

v v1

TAGGUN Receipt OCR Scanning API

Expects only running software, real reactions, and beautifully crafted APIs to serve your every desire to transcribe a piece of paper to digital form.

text

2023-03-06

v 1.10.9

Text Analytics & Sentiment Analysis API | api.text2data.com

The current api version is v3.4

The api methods listed below can be called directly from this page to test the output. You might set the api_key to pre-authenticate all requests on this page (this will work if your secret is blank).

API endpoint URL: http://{apiName}.text2data.com/v3/ {method}

The api can be consumed directly or using our SDK. Our Excel Add-In and Google Sheets Add-on are also using this api to process the data.

text

2020-08-19

v v3.4

Tisane API Documentation

Tisane is a natural language processing library, providing: * standard NLP functionality * special functions for detection of problematic or abusive content * low-level NLP like morphological analysis and tokenization of no-space languages (Chinese, Japanese, Thai) Tisane has monolithic architecture. All the functions are exposed using the same language models and the same analysis process invoked using the [POST /parse](#561264c5-6dbe-4bde-aba3-7defe837989f) method. Other methods in the API are either wrappers based on the process, helper methods, or allow inspection of the language models. The current section of the documentation describes the two structures used in the parsing & transformation methods. # Getting Started This guide describes how to setup your Tisane account. The steps you need to complete are as follows: * Step 1 – Create an Account * Step 2 – Save Your API Key * Step 3 – Integrate the API ## Step 1 – Create an Account Navigate to [Sign up to Tisane API](https://tisane.ai/signup/). The free Community Plan allows up to 50,000 requests but comes with a limitation of 10 requests per minute. ## Step 2 - Save Your API Key You will need the API key to make requests. Open your [Developer Profile](https://tisane.ai/developer/) to find your API keys. ## Step 3 - Integrate with the API In summary, the POST /parse method has 3 attributes: *content*, *language*, and *settings*. All 3 attributes are mandatory. For example: `{"language": "en", "content": "hello", "settings": {}}` Read on for more info on the [response](#response-reference) and the [settings](#settings-reference) specs. The method doc pages contain snippets of code for your favorite languages and platforms. # Response Reference The response of the [POST /parse](#561264c5-6dbe-4bde-aba3-7defe837989f) method contains several sections displayed or hidden according to the [settings](#settings-reference) provided. The common attributes are: * `text` (string) - the original input * `reduced_output` (boolean) - if the input is too big, and verbose information like the lexical chunk was requested, the verbose information will not be generated, and this flag will be set to `true` and returned as part of the response * `sentiment` (floating-point number) - a number in range -1 to 1 indicating the document-level sentiment. Only shown when `document_sentiment` [setting](#settings-reference) is set to `true`. * `signal2noise` (floating-point number) - a signal to noise ranking of the text, in relation to the array of concepts specified in the `relevant` [setting](#settings-reference). Only shown when the `relevant` setting exists. ## Abusive or Problematic Content The `abuse` section is an array of detected instances of content that may violate some terms of use. **NOTE**: the terms of use in online communities may vary, and so it is up to the administrators to determine whether the content is indeed abusive. For instance, it makes no sense to restrict sexual advances in a dating community, or censor profanities when it's accepted in the bulk of the community. The section exists if instances of abuse are detected and the `abuse` [setting](#settings-reference) is either omitted or set to `true`. Every instance contains the following attributes: * `offset` (unsigned integer) - zero-based offset where the instance starts * `length` (unsigned integer) - length of the content * `sentence_index` (unsigned integer) - zero-based index of the sentence containing the instance * `text` (string) - fragment of text containing the instance (only included if the `snippets` [setting](#settings-reference) is set to `true`) * `tags` (array of strings) - when exists, provides additional detail about the abuse. For instance, if the fragment is classified as an attempt to sell hard drugs, one of the tags will be *hard_drug*. * `type` (string) - the type of the abuse * `severity` (string) - how severe the abuse is. The levels of severity are `low`, `medium`, `high`, and `extreme` * `explanation` (string) - when available, provides rationale for the annotation; set the `explain` setting to `true` to enable. The currently supported types are: * `personal_attack` - an insult / attack on the addressee, e.g. an instance of cyberbullying. Please note that an attack on a post or a point, or just negative sentiment is not the same as an insult. The line may be blurred at times. See [our Knowledge Base for more information](http://tisane.ai/knowledgebase/how-do-i-detect-personal-attacks/). * `bigotry` - hate speech aimed at one of the [protected classes](https://en.wikipedia.org/wiki/Protected_group). The hate speech detected is not just racial slurs, but, generally, hostile statements aimed at the group as a whole * `profanity` - profane language, regardless of the intent * `sexual_advances` - welcome or unwelcome attempts to gain some sort of sexual favor or gratification * `criminal_activity` - attempts to sell or procure restricted items, criminal services, issuing death threats, and so on * `external_contact` - attempts to establish contact or payment via external means of communication, e.g. phone, email, instant messaging (may violate the rules in certain communities, e.g. gig economy portals, e-commerce portals) * `adult_only` - activities restricted for minors (e.g. consumption of alcohol) * `mental_issues` - content indicative of suicidal thoughts or depression * `allegation` - claimed knowledge or accusation of a misconduct (not necessarily crime) * `provocation` - content likely to provoke an individual or a group * `disturbing` - graphic descriptions that may disturb readers * `no_meaningful_content` - unparseable gibberish without apparent meaning * `data_leak` - private data like passwords, ID numbers, etc. * `spam` - (RESERVED) spam content * `generic` - undefined ## Sentiment Analysis The `sentiment_expressions` section is an array of detected fragments indicating the attitude towards aspects or entities. The section exists if sentiment is detected and the `sentiment` [setting](#settings-reference) is either omitted or set to `true`. Every instance contains the following attributes: * `offset` (unsigned integer) - zero-based offset where the instance starts * `length` (unsigned integer) - length of the content * `sentence_index` (unsigned integer) - zero-based index of the sentence containing the instance * `text` (string) - fragment of text containing the instance (only included if the `snippets` setting is set to `true`) * `polarity` (string) - whether the attitude is `positive`, `negative`, or `mixed`. Additionally, there is a `default` sentiment used for cases when the entire snippet has been pre-classified. For instance, if a review is split into two portions, *What did you like?* and *What did you not like?*, and the reviewer replies briefly, e.g. *The quiet. The service*, the utterance itself has no sentiment value. When the calling application is aware of the intended sentiment, the *default* sentiment simply provides the targets / aspects, which will be then added the sentiment externally. * `targets` (array of strings) - when available, provides set of aspects and/or entities which are the targets of the sentiment. For instance, when the utterance is, *The breakfast was yummy but the staff is unfriendly*, the targets for the two sentiment expressions are `meal` and `staff`. Named entities may also be targets of the sentiment. * `reasons` (array of strings) - when available, provides reasons for the sentiment. In the example utterance above (*The breakfast was yummy but the staff is unfriendly*), the `reasons` array for the `staff` is `["unfriendly"]`, while the `reasons` array for `meal` is `["tasty"]`. * `explanation` (string) - when available, provides rationale for the sentiment; set the `explain` setting to `true` to enable. Example: ``` json "sentiment_expressions": [ { "sentence_index": 0, "offset": 0, "length": 32, "polarity": "positive", "reasons": ["close"], "targets": ["location"] }, { "sentence_index": 0, "offset": 38, "length": 29, "polarity": "negative", "reasons": ["disrespectful"], "targets": ["staff"] } ] ``` ## Entities The `entities_summary` section is an array of named entity objects detected in the text. The section exists if named entities are detected and the `entities` [setting](#settings-reference) is either omitted or set to `true`. Every entity contains the following attributes: * `name` (string) - the most complete name of the entity in the text of all the mentions * `ref_lemma` (string) - when available, the dictionary form of the entity in the reference language (English) regardless of the input language * `type` (string) - a string or an array of strings specifying the type of the entity, such as `person`, `organization`, `numeric`, `amount_of_money`, `place`. Certain entities, like countries, may have several types (because a country is both a `place` and an `organization`). * `subtype` (string) - a string indicating the subtype of the entity * `mentions` (array of objects) - a set of instances where the entity was mentioned in the text Every mention contains the following attributes: * `offset` (unsigned integer) - zero-based offset where the instance starts * `length` (unsigned integer) - length of the content * `sentence_index` (unsigned integer) - zero-based index of the sentence containing the instance * `text` (string) - fragment of text containing the instance (only included if the `snippets` setting is set to `true`) Example: ``` json "entities_summary": [ { "type": "person", "name": "John Smith", "ref_lemma": "John Smith", "mentions": [ { "sentence_index": 0, "offset": 0, "length": 10 } ] } , { "type": [ "organization", "place" ] , "name": "UK", "ref_lemma": "U.K.", "mentions": [ { "sentence_index": 0, "offset": 40, "length": 2 } ] } ] ``` ### Entity Types and Subtypes The currently supported entity types are: * `person`, with optional subtypes: `fictional_character`, `important_person`, `spiritual_being` * `organization` (note that a country is both an organization and a place) * `place` * `time_range` * `date` * `time` * `hashtag` * `email` * `amount_of_money` * `phone` phone number, either domestic or international, in a variety of formats * `role` (a social role, e.g. position in an organization) * `software` * `website` (URL), with an optional subtype: `tor` for Onion links; note that web services may also have the `software` type assigned * `weight` * `bank_account` only IBAN format is supported; subtypes: `iban` * `credit_card`, with optional subtypes: `visa`, `mastercard`, `american_express`, `diners_club`, `discovery`, `jcb`, `unionpay` * `coordinates` (GPS coordinates) * `credential`, with optional subtypes: `md5`, `sha-1` * `crypto`, with optional subtypes: `bitcoin`, `ethereum`, `monero`, `monero_payment_id`, `litecoin`, `dash` * `event` * `file` only Windows pathnames are supported; subtypes: `windows`, `facebook` (for images downloaded from Facebook) * `flight_code` * `identifier` * `ip_address`, subtypes: `v4`, `v6` * `mac_address` * `numeric` (an unclassified numeric entity) * `username` ## Topics The `topics` section is an array of topics (subjects, domains, themes in other terms) detected in the text. The section exists if topics are detected and the `topics` [setting](#settings-reference) is either omitted or set to `true`. By default, a topic is a string. If `topic_stats` [setting](#settings-reference) is set to `true`, then every entry in the array contains: * `topic` (string) - the topic itself * `coverage` (floating-point number) - a number between 0 and 1, indicating the ratio between the number of sentences where the topic is detected to the total number of sentences ## Long-Term Memory The `memory` section contains optional context to pass to the `settings` in subsequent messages in the same conversation thread. See [Context and Long-Term Memory](#context-and-long-term-memory) for more details. ## Low-Level: Sentences, Phrases, and Words Tisane allows obtaining more in-depth data, specifically: * sentences and their corrected form, if a misspelling was detected * lexical chunks and their grammatical and stylistic features * parse trees and phrases The `sentence_list` section is generated if the `words` or the `parses` [setting](#settings-reference) is set to `true`. Every sentence structure in the list contains: * `offset` (unsigned integer) - zero-based offset where the sentence starts * `length` (unsigned integer) - length of the sentence * `text` (string) - the sentence itself * `corrected_text` (string) - if a misspelling was detected and the spellchecking is active, contains the automatically corrected text * `words` (array of structures) - if `words` [setting](#settings-reference) is set to `true`, generates extended information about every lexical chunk. (The term "word" is used for the sake of simplicity, however, it may not be linguistically correct to equate lexical chunks with words.) * `parse_tree` (object) - if `parses` [setting](#settings-reference) is set to `true`, generates information about the parse tree and the phrases detected in the sentence. * `nbest_parses` (array of parse objects) if `parses` [setting](#settings-reference) is set to `true` and `deterministic` [setting](#settings-reference) is set to `false`, generates information about the parse trees that were deemed close enough to the best one but not the best. ### Words Every lexical chunk ("word") structure in the `words` array contains: * `type` (string) - the type of the element: `punctuation` for punctuation marks, `numeral` for numerals, or `word` for everything else * `text` (string) - the text * `offset` (unsigned integer) - zero-based offset where the element starts * `length` (unsigned integer) - length of the element * `corrected_text` (string) - if a misspelling is detected, the corrected form * `lettercase` (string) - the original letter case: `upper`, `capitalized`, or `mixed`. If lowercase or no case, the attribute is omitted. * `stopword` (boolean) - determines whether the word is a [stopword](https://en.wikipedia.org/wiki/Stop_words) * `grammar` (array of strings or structures) - generates the list of grammar features associated with the `word`. If the `feature_standard` setting is defined as `native`, then every feature is an object containing a numeral (`index`) and a string (`value`). Otherwise, every feature is a plain string #### Advanced For lexical words only: * `role` (string) - semantic role, like `agent` or `patient`. Note that in passive voice, the semantic roles are reverse to the syntactic roles. E.g. in a sentence like *The car was driven by David*, *car* is the patient, and *David* is the agent. * `numeric_value` (floating-point number) - the numeric value, if the chunk has a value associated with it * `family` (integer number) - the ID of the family associated with the disambiguated word-sense of the lexical chunk * `definition` (string) - the definition of the family, if the `fetch_definitions` [setting](#settings-reference) is set to `true` * `lexeme` (integer number) - the ID of the lexeme entry associated with the disambiguated word-sense of the lexical chunk * `nondictionary_pattern` (integer number) - the ID of a non-dictionary pattern that matched, if the word was not in the language model but was classified by the nondictionary heuristics * `style` (array of strings or structures) - generates the list of style features associated with the `word`. Only if the `feature_standard` setting is set to `native` or `description` * `semantics` (array of strings or structures) - generates the list of semantic features associated with the `word`. Only if the `feature_standard` setting is set to `native` or `description` * `segmentation` (structure) - generates info about the selected segmentation, if there are several possibilities to segment the current lexical chunk and the `deterministic` setting is set to `false`. A segmentation is simply an array of `word` structures. * `other_segmentations` (array of structures) - generates info about the segmentations deemed incorrect during the disambiguation process. Every entry has the same structure as the `segmentation` structure. * `nbest_senses` (array of structures) - when the `deterministic` setting is set to `false`, generates a set of hypotheses that were deemed incorrect by the disambiguation process. Every hypothesis contains the following attributes: `grammar`, `style`, and `semantics`, identical in structure to their counterparts above; and `senses`, an array of word-senses associated with every hypothesis. Every sense has a `family`, which is an ID of the associated family; and, if the `fetch_definitions` setting is set to `true`, `definition` and `ref_lemma` of that family. For punctuation marks only: * `id` (integer number) - the ID of the punctuation mark * `behavior` (string) - the behavior code of the punctuation mark. Values: `sentenceTerminator`, `genericComma`, `bracketStart`, `bracketEnd`, `scopeDelimiter`, `hyphen`, `quoteStart`, `quoteEnd`, `listComma` (for East-Asian enumeration commas like *、*) ### Parse Trees and Phrases Every parse tree, or more accurately, parse forest, is a collection of phrases, hierarchically linked to each other. At the top level of the parse, there is an array of root phrases under the `phrases` element and the numeric `id` associated with it. Every phrase may have children phrases. Every phrase has the following attributes: * `type` (string) - a [Penn treebank phrase tag](http://nliblog.com/wiki/knowledge-base-2/nlp-1-natural-language-processing/penn-treebank/penn-treebank-phrase-level-tags/) denoting the type of the phrase, e.g. *S*, *VP*, *NP*, etc. * `family` (integer number) - an ID of the phrase family * `offset` (unsigned integer) - a zero-based offset where the phrase starts * `length` (unsigned integer) - the span of the phrase * `role` (string) - the semantic role of the phrase, if any, analogous to that of the words * `text` (string) - the phrase text, where the phrase members are delimited by the vertical bar character. Children phrases are enclosed in brackets. E.g., *driven|by|David* or *(The|car)|was|(driven|by|David)*. Example: ``` json "parse_tree": { "id": 4, "phrases": [ { "type": "S", "family": 1451, "offset": 0, "length": 27, "text": "(The|car)|was|(driven|by|David)", "children": [ { "type": "NP", "family": 1081, "offset": 0, "length": 7, "text": "The|car", "role": "patient" }, { "type": "VP", "family": 1172, "offset": 12, "length": 15, "text": "driven|by|David", "role": "verb" } ] } ``` ### Context-Aware Spelling Correction Tisane supports automatic, context-aware spelling correction. Whether it's a misspelling or a purported obfuscation, Tisane attempts to deduce the intended meaning, if the language model does not recognize the word. When or if it's found, Tisane adds the `corrected_text` attribute to the word (if the words / lexical chunks are returned) and the sentence (if the sentence text is generated). Sentence-level `corrected_text` is displayed if `words` or `parses` are set to *true*. Note that as Tisane works with large dictionaries, you may need to exclude more esoteric terms by using the `min_generic_frequency` setting. Note that **the invocation of spell-checking does not depend on whether the sentences and the words sections are generated in the output**. The spellchecking can be disabled by setting `disable_spellcheck` to `true`. Another option is to enable the spellchecking for lowercase words only, thus excluding potential proper nouns in languages that support capitalization; to avoid spell-checking capitalized and uppercase words, set `lowercase_spellcheck_only` to `true`. # Settings Reference The purpose of the settings structure is to: * provide cues about the content being sent to improve the results * customize the output and select sections to be shown * define standards and formats in use * define and calculate the signal to noise ranking All settings are optional. To leave all settings to default, simply provide an empty object (`{}`). ## Content Cues and Instructions `format` (string) - the format of the content. Some policies will be applied depending on the format. Certain logic in the underlying language models may require the content to be of a certain format (e.g. logic applied on the reviews may seek for sentiment more aggressively). The default format is empty / undefined. The format values are: * `review` - a review of a product or a service or any other review. Normally, the underlying language models will seek for sentiment expressions more aggressively in reviews. * `dialogue` - a comment or a post which is a part of a dialogue. An example of a logic more specific to a dialogue is name calling. A single word like "idiot" would not be a personal attack in any other format, but it is certainly a personal attack when part of a dialogue. * `shortpost` - a microblogging post, e.g. a tweet. * `longform` - a long post or an article. * `proofread` - a post which was proofread. In the proofread posts, the spellchecking is switched off. * `alias` - a nickname in an online community. * `search` - a search query. Search queries may not always be grammatically correct. Certain topics and items, that we may otherwise let pass, are tagged with the `search` format. `disable_spellcheck` (boolean) - determines whether the automatic spellchecking is to be disabled. Default: `false`. `lowercase_spellcheck_only` (boolean) - determines whether the automatic spellchecking is only to be applied to words in lowercase. Default: `false` `min_generic_frequency` (int) - allows excluding more esoteric terms; the valid values are 0 thru 10. `subscope` (boolean) - enables sub-scope parsing, for scenarios like hashtag, URL parsing, and obfuscated content (e.g. *ihateyou*). Default: `false`. `lang_detect_segmentation_regex` (string) - allows defining custom language detection fragment boundaries. For example, if multiple languages may be used in different sentences in the same text, you may want to define the regex as: `(([\r\n]|[.!?][ ]))` . `domain_factors` (set of pairs made of strings and numbers) - provides a session-scope cues for the domains of discourse. This is a powerful tool that allows tailoring the result based on the use case. The format is, family ID of the domain as a key and the multiplication factor as a value (e.g. *"12345": 5.0*). For example, when processing text looking for criminal activity, we may want to set domains relevant to drugs, firearms, crime, higher: `"domain_factors": {"31058": 5.0, "45220": 5.0, "14112": 5.0, "14509": 3.0, "28309": 5.0, "43220": 5.0, "34581": 5.0}`. The same device can be used to eliminate noise coming from domains we know are irrelevant by setting the factor to a value lower than 1. `when` (date string, format YYYY-MM-DD) - indicates when the utterance was uttered. (TO BE IMPLEMENTED) The purpose is to prune word senses that were not available at a particular point in time. For example, the words *troll*, *mail*, and *post* had nothing to do with the Internet 300 years ago because there was no Internet, and so in a text that was written hundreds of years ago, we should ignore the word senses that emerged only recently. ## Output Customization `abuse` (boolean) - output instances of abusive content (default: `true`) `sentiment` (boolean) - output sentiment-bearing snippets (default: `true`) `document_sentiment` (boolean) - output document-level sentiment (default: `false`) `entities` (boolean) - output entities (default: `true`) `topics` (boolean) - output topics (default: `true`), with two more relevant settings: * `topic_stats` (boolean) - include coverage statistics in the topic output (default: `false`). When set, the topic is an object containing the attributes `topic` (string) and `coverage` (floating-point number). The coverage indicates a share of sentences touching the topic among all the sentences. * `optimize_topics` (boolean) - if `true`, the less specific topics are removed if they are parts of the more specific topics. For example, when the topic is `cryptocurrency`, the optimization removes `finance`. `words` (boolean) - output the lexical chunks / words for every sentence (default: `false`). In languages without white spaces (Chinese, Japanese, Thai), the tokens are tokenized words. In languages with compounds (e.g. German, Dutch, Norwegian), the compounds are split. `fetch_definitions` (boolean) - include definitions of the words in the output (default: `false`). Only relevant when the `words` setting is `true` `parses` (boolean) - output parse forests of phrases `deterministic` (boolean) - whether the n-best senses and n-best parses are to be output in addition to the detected sense. If `true`, only the detected sense will be output. Default: `true` `snippets` (boolean) - include the text snippets in the abuse, sentiment, and entities sections (default: `false`) `explain` (boolean) - if `true`, a reasoning for the abuse and sentiment snippets is provided when possible (see the `explanation` attribute) ## Standards and Formats `feature_standard` (string) - determines the standard used to output the features (grammar, style, semantics) in the response object. The standards we support are: * `ud`: [Universal Dependencies tags](https://universaldependencies.org/u/pos/) (default) * `penn`: [Penn treebank tags](https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html) * `native`: Tisane native feature codes * `description`: Tisane native feature descriptions Only the native Tisane standards (codes and descriptions) support style and semantic features. `topic_standard` (string) - determines the standard used to output the topics in the response object. The standards we support are: * `iptc_code` - IPTC topic taxonomy code * `iptc_description` - IPTC topic taxonomy description * `iab_code` - IAB topic taxonomy code * `iab_description` - IAB topic taxonomy description * `native` - Tisane domain description, coming from the family description (default) `sentiment_analysis_type` (string) - the type of the sentiment analysis strategy. The values are: * `products_and_services` - most common sentiment analysis of products and services * `entity` - sentiment analysis with entities as targets * `creative_content_review` - reviews of creative content (RESERVED) * `political_essay` - political essays (RESERVED) ## Context and Long-Term Memory Human understanding of language is not a simple "sliding window" with scope limited to a sentence. Language is accompanied by gestures, visuals, and knowledge of the previous communication. Sometimes, code-words may be used to conceal the words' original meaning. When detecting abuse, a name of an ethnicity or a religious group may be not offensive, but when superimposed over a picture of an ape or a pig, it is meant of offend. When translating from a language without gender distinctions in verbs (like English) to a language with distinctions (like Russian or Hebrew), there is no way to know from an utterance alone if the speaker is female. When a scammer is collecting details piecemeal over a series of utterances, knowledge of previous utterances is needed to take action. Tisane's Memory module allows pre-initializing the analysis, as well as reassigning meanings, and more. The module is made of three simple components that are flexible enough for a variety of tasks: ### Reassignments Reassignments define the attributes to set based on other attributes. This allows to: * assign gender to 1st or 2nd person verbs, generating accurate translations * overwrite original meaning of a group of words with all their inflected forms to analyze code-words and secret language * add an additional feature or a hypernym to a family and more, within a scope of a call. The `assign` section is an array of structures defining: * `if` - conditions to match: * `regex` - a regular expression (RE2 syntax) * `family` - a family ID * `features` - a list of feature values. A feature is a structure with an `index` and a `value`. For example: `{"index":1, "value":"NOUN"}`. * `hypernym` - a family ID of a hypernym * `then` - attributes to assign * `family` - a family ID * `features` - a list of feature values. A feature is a structure with an `index` and a `value`. For example: `{"index":1, "value":"NOUN"}`. * `hypernym` - a family ID of a hypernym Examples: * the speaker is female: \`"assign":\[{"if":{"features":\[{"index":9,"value":"1"}\]},"then":{"features":\[{"index":5,"value":"F"}\]}}\] * assume that a mention of a container refers to an illegal item: \`"assign":\[{"if":{"family":26888},"then":{"hypernym":123078}}\] ### Flags An array of flag structures that add some context. A flag is a structure with an `index` and a `value`. For example: `{"index":36, "value":"WFH"}`. Aside from the flags returned in the `memory` section of the response, these flags can be set: * `{"index":36, "value":"PEBD"}` (agents_of_bad_things) - the context is about a bad player or an agent responsible for bad things * `{"index":36, "value":"BADANML"}` (bad_animal) - the context is an animal that symbolizes bad qualities (e.g. pig, ape, snake, etc.) * `{"index":36, "value":"BULKMSG"}` (bulk_message) - the message was sent in bulk * `{"index":36, "value":"DETHR"}` (death_related) - the context is something related to death * `{"index":36, "value":"EARNMUCH"}` (make_money) - the context is related to making money * `{"index":36, "value":"IDEP"}` (my_departure) - the author of the text mentioned departing * `{"index":36, "value":"SECO"}` (sexually_conservative) - any attempt to exchange photos or anything that may be either sexual or non-sexual is to be deemed sexual * `{"index":36, "value":"TRPA"}` (trusted_party) - the author of the text claims to be a trusted party (e.g. a relative or a spouse) * `{"index":36, "value":"WSTE"}` (waste) - the context is about waste, organic or inorganic * `{"index":36, "value":"WOPR"}` (won_prize) - prize or money winning was mentioned or implied * `{"index":36, "value":"WFH"}` (work_from_home) - work from home was mentioned * `{"index":5, "value":"ORG"}` (organization) - an organization was mentioned * `{"index":5, "value":"ROLE"}` (role) - a role or a position was mentioned ### Antecedents The section contains structures to be used in coreference resolution. The attributes are: * `family` - the family ID of the antecedent * `features` - the list of features. Every feature is a structure with an `index` and a `value`. For example: `{"index":36, "value":"WFH"}`. ## Signal to Noise Ranking When we're studying a bunch of posts commenting on an issue or an article, we may want to prioritize the ones more relevant to the topic, and containing more reason and logic than emotion. This is what the signal to noise ranking is meant to achieve. The signal to noise ranking is made of two parts: 1. Determine the most relevant concepts. This part may be omitted, depending on the use case scenario (e.g. we want to track posts most relevant to a particular set of issues). 2. Rank the actual post in relevance to these concepts. To determine the most relevant concepts, we need to analyze the headline or the article itself. The headline is usually enough. We need two additional settings: * `keyword_features` (an object of strings with string values) - determines the features to look for in a word. When such a feature is found, the family ID is added to the set of potentially relevant family IDs. * `stop_hypernyms` (an array of integers) - if a potentially relevant family ID has a hypernym listed in this setting, it will not be considered. For example, we extracted a set of nouns from the headline, but we may not be interested in abstractions or feelings. E.g. from a headline like *Fear and Loathing in Las Vegas* we want *Las Vegas* only. Optional. If `keyword_features` is provided in the settings, the response will have a special attribute, `relevant`, containing a set of family IDs. At the second stage, when ranking the actual posts or comments for relevance, this array is to be supplied among the settings. The ranking is boosted when the domain, the hypernyms, or the families related to those in the `relevant` array are mentioned, when negative and positive sentiment is linked to aspects, and penalized when the negativity is not linked to aspects, or abuse of any kind is found. The latter consideration may be disabled, e.g. when we are looking for specific criminal content. When the `abuse_not_noise` parameter is specified and set to `true`, the abuse is not penalized by the ranking calculations. To sum it up, in order to calculate the signal to noise ranking: 1. Analyze the headline with `keyword_features` and, optionally, `stop_hypernyms` in the settings. Obtain the `relevant` attribute. 2. When analyzing the posts or the comments, specify the `relevant` attribute obtained in step 1.

text

2023-03-06

v 1.0.0

VisibleThread API

## Introduction The VisibleThread b API provides services for analyzing/searching documents and web pages. To use the service you need an API key. **Contact us at support@visiblethread.com to request an API key**. The services are split into **Documents** and **Webscans**. ### Documents Upload documents and dictionaries so you can : - Measure the readability of your document - search a document for all terms from a dictionary - retrieve all paragraphs from a document or only matching paragraphs ### Webscans Analyze web pages so you can: - Measure the readability of your web content - Identify & highlight content issues e.g. long sentences, passive voice The VisibleThread API allows you to programatially submit webpage urls to be scanned, check on the results of a scan, and view a list of previous scans you have performed. ------------- The VisibleThread API is a HTTP-based JSON API, accessible at https://api.visiblethread.com Each request to the service requires your API key to be successful. ## Getting Started With Webscans Steps: 1. Enter your API key above and hit **Explore**. 2. Run a new scan by submitting a **POST to /webscans** (title and some webUrls are required). The scan runs asynchronously in the background but returns immediately with a JSON response containing an "id" that represents your scan. 3. Check on the status of a scan by submitting **GET /webscans/{scanId}**, if the scan is still in progress it will return a HTTP 503. If it is complete it will return a HTTP 200 with the appropriate JSON outlining the urls scanned and the summary statistics for each url. 4. Retrieve all your previous scan results by submitting **GET /webscans**. 5. Retrieve detailed results for a url within a scan (readability, long sentence and passive language instances) by submitting **GET /webscans/{scanId}/webUrls/{urlId}** (scanId and urlId are required) ## Getting Started With Document scans: Steps: 1. Enter your API key above and hit **Explore** 2. Run a new scan by submitting a **POST to /documents** (document required). The scan runs asynchronously in the background but returns immediately with a JSON response containins an "id" that represents your scan 3. Check on the status of a scan by submitting **GET /documents/{scanId}**, if the scan is still in progress it will return a HTTP 503. If it is complete it will return a HTTP 200 with the appropriate JSON outlining the document readability results. It will contain detailed analysis of each paragraph in the document 4. Retrieve all your previous scan results by submitting **GET /documents** ### Searching a document for keywords The VisibleThread API allows you to upload a set of keywords or a 'dictionary'. You can then perform a search of a already uploaded document using that dictionary Steps (Assuming you have uploaded your document using the steps above): 1. Upload a csv file to use as a keyword dictionary by submitting a **POST to /dictionaries** (csv file required). This returns a JSON response with the dictionary Id 2. Search a document with the dictionary by submitting a **POST to /searches** (document id and dictionary id required). 3. Get the resuhlts of the search by submitting **GET /searches/{docId}/{dictionaryId}" . This will return JSON response containing detailed results of searching the document using the dictionary. 4. To view the list of all searches you have performed submit a **GET /searches**. Below is a list of the available API endpoints, documentation & a form to try out each operation.

text

2021-06-21

v 1.0

Word Associations API

The Word Associations Network API allows developers to embed the ability to find associations for a word or phrase into their mobile apps or web services. Words are grouped by semantics, meaning, and psychological perception. The Word Associations Network API currently supports English, French, Spanish, German, Italian, Portuguese, and Russian vocabulary. Please [register and subscribe](https://api.wordassociations.net/subscriptions/) to one of available tariff plans to get a valid API key.

text

2021-06-21

v 1.0

Wordnik

Wordnik is the worlds biggest online English dictionary, by number of words

text

2023-03-06

v 4.0

56 api specs