AI.do Assistant

An AI-powered assistant for Stata.

Requirements

Stata 16.0 or later
Python 3.8 or later
API key from one of these providers:
- Google Gemini (recommended)
- Hugging Face

Installation

Check if Python is working in your Stata installation:

python query

If you see a Python version number, you're good to go!

If Python isn't working, you may need to:
- Install Python from python.org
- During installation, check "Add Python to PATH"
Install AI.do Assistant:

net install aido, from("https://raw.githubusercontent.com/alexsvobo/Ai.do/main/") replace

Configure your AI provider in Stata:

// For Google Gemini (recommended):
aido_config, provider(gemini) apikey("YOUR-API-KEY")

// For Hugging Face:
aido_config, provider(huggingface) apikey("YOUR-API-KEY") modelid("Qwen/Qwen2.5-Coder-32B")

Quick Start

Load any dataset:

sysuse auto

Ask a question:

aido "write a report on these results"

Dialog Box may be used as UI:

db aido

Features

Database Context: AI.do automatically analyzes your loaded dataset, including variables, data types, statistics, and missing values, enabling contextually relevant answers about your specific data.
Results Context Awareness: Captures both r-class (return values) and e-class (estimation) results from your recent Stata commands, allowing the AI to reference your latest analyses in its responses.
Chat Session Memory: Maintains conversation history within your Stata session, enabling follow-up questions and continuous dialogue without repeating context.
Multiple AI Provider Support:
- Google Gemini (recommended for now)
- Hugging Face models
Progress Visualization: Real-time progress indicators show preprocessing status and API connection activity.
Dialog Box Interface: Simple graphical interface via db aido for those who prefer point-and-click interaction.

Support

Need help? Open an issue on our GitHub page

License

MIT License - See LICENSE file for details

Notes

This code initializes PyStata, a Python-Stata bridge included with Stata. A full Python installation is required to run it. When making your first AI request, the Stata startup message will appear in the results window—this automatic behavior confirms PyStata has been successfully initialized.

Example Session

. aido "prove to me that you know our conversational context.
--------------------------------------------------
╭------------------------------------------------╮
│                AI.do Query                     │
│                  Processing                    │
╰------------------------------------------------╯
--------------------------------------------------

[▓▓▓▓▓▓▓▓▓▓]
Analyzing dataset variables: 12 variables found

� Processing make...                             [%]
� Processing price...                            [%]
� Processing mpg...                              [%]
� Processing rep78...                            [%]
▋ 42%
� Processing headroom...                         [42%]
� Processing trunk...                            [42%]
� Processing weight...                           [42%]
� Processing length...                           [42%]
� Processing turn...                             [42%]
▋ 83%
� Processing displacement...                     [83%]
� Processing gear_ratio...                       [83%]
� Processing foreign...                          [83%]
✓ Preprocessing complete

→ Sending query to AI.do...

Okay, I can demonstrate my awareness of the conversational context. Here's how I'll prove it, drawing from our previous interactions and the information you'
> ve now provided:

1.  **Awareness of Past Instructions:** I remember that you initially prompted me with the command `more` when there was no data loaded. I correctly identifi
> ed that the "more" command would be ineffective with an empty dataset and advised you to load data first. This shows I understand Stata's basic commands an
> d their limitations.

2.  **The "Prove It" Thread:** I recall you asking me to prove I could be a useful Stata assistant in a professional sense. In response, I provided a detaile
> d breakdown of my capabilities in data management, statistical analysis, programming, data visualization, and ensuring reproducibility. I emphasized the im
> portance of do-files, which is crucial for professional Stata use.

3.  **Bayesian Challenge:** I remember that you specifically requested that I showcase my expert-level Stata coding skills, including Bayesian analysis. I pr
> ovided a detailed Stata do-file demonstrating a Bayesian linear regression with informative priors, MCMC diagnostics, and posterior predictive checking. I 
> highlighted the importance of informative priors and model diagnostics, reflecting a deeper understanding of Bayesian principles.

4.  **Current Dataset:** I understand that you have now loaded the `auto.dta` dataset.  I can see from your description that it contains information about di
> fferent car models, including variables like `price`, `mpg`, `weight`, and `foreign`.

5.  **Recent Results:** I know the last result returned `r(N): 74`, which means that the most recent command was likely one that returned the number of obser
> vations.

6.  **Building on Previous Bayesian Example, Addressing the Specific Dataset:**  Given the context, I'll now show how the *previous* Bayesian code could be a
> dapted to the loaded dataset, *and* I'll incorporate what I know about the data:

```stata
/*
  Expert-Level Stata Code: Bayesian Linear Regression with Informative Priors
  (Adapted for auto.dta)
*/

clear all //remove all variables and free all space
set more off // turn off the --more--

sysuse auto, clear // Load the auto dataset (redundant, but good practice)

// 1. Data Preparation (Addressing Potential Issues and Scaling)

// Check for missing values in the key variables.  Act on them *before* bayes.
misstable summarize price weight foreign mpg rep78 headroom trunk length turn displacement gear_ratio

// Since 'rep78' has missing values, we'll drop observations with missing values
// A more sophisticated approach might involve imputation, but that is beyond the scope of this example.
drop if missing(rep78)

// Scale weight to improve MCMC convergence.
generate weight_scaled = weight / 1000  // Weight in thousands of lbs.
label variable weight_scaled "Weight (thousands of lbs)"

// Consider standardizing mpg as well:
generate mpg_scaled = (mpg - rmean(mpg)) / rsd(mpg)
label variable mpg_scaled "MPG (Standardized)"


// 2. Define Informative Priors (Crucial for Expert-Level Bayesian Analysis)

// priors
// The example below is only illustrative.
// The user should choose parameters based on their prior experience.
// and/or use the reference data.
summarize price weight foreign mpg
display r(mean)
display r(sd)

// Example Priors Based on the 'auto.dta' Data Itself:
//   - We'll use information *from* the dataset to inform our priors, but this is purely illustrative.
//   - In a real analysis, you'd use *external* prior knowledge.
//   - This is just to show how to adapt the priors.
local price_mean = r(mean)
local price_sd   = r(sd)
local weight_mean = r(mean)
local weight_sd   = r(sd)

// 3. Bayesian Linear Regression Model

bayes,  rseed(12345) mcmcsize(10000) burnin(2500)  prior({price: weight_scaled}, normal(`weight_mean'/1000, `weight_sd'/1000)) ///
       prior({price: foreign}, normal(0, `price_sd'/2)) ///
       prior({price:mpg_scaled}, normal(0, `price_sd'/2)) ///
       prior({price:_cons}, normal(`price_mean`, `price_sd`)) ///
       prior(sigma2, invgamma(2, 100000)): regress price weight_scaled foreign mpg_scaled

// 4. Model Diagnostics (Essential for Ensuring MCMC Convergence)

// Trace plots:  Visually inspect the MCMC chains for stationarity.
bayesgraph diagnostics weight_scaled foreign mpg_scaled _cons sigma2, title("Trace Plots")

// Autocorrelation plots:  Check for autocorrelation in the MCMC chains.
bayesgraph ac weight_scaled foreign mpg_scaled _cons sigma2, title("Autocorrelation Plots")

// Histograms and Kernel Density Plots of the Posterior Distributions
bayesgraph histogram weight_scaled foreign mpg_scaled _cons sigma2, title("Posterior Histograms")
bayesgraph kdensity weight_scaled foreign mpg_scaled _cons sigma2, title("Posterior Density Plots")

// 5. Summarize Results (Posterior Means, Credible Intervals)
bayesstats summary  // Display posterior means, standard deviations, and credible intervals
bayesstats ess       // Effective Sample Size:  Should be reasonably large (rule of thumb: > 200)
bayesstats pvalues    // Posterior probabilities of being greater or less than zero


// Posterior Predictive Checking
//Simulate draws from the posterior predictive distribution:
simulate price_pred = rnormal(_b[_cons] + _b[weight_scaled]*weight_scaled + _b[foreign]*foreign + _b[mpg_scaled]*mpg_scaled, sqrt(_b[sigma2])), reps(500)

// Compare the distribution of the predicted prices to the actual prices.
histogram price,  title("Actual Price")
histogram price_pred, title("Posterior Predictive Price")


// Test the hypothesis that the coefficient on \'weight_scaled\' is greater than 0.\
bayesstats hypothesis {weight_scaled > 0}


**Key Demonstrations of Context Awareness:**

*   **`sysuse auto, clear`**:  While technically redundant (because the environment indicates the data is loaded), this shows I know *the name of the dataset
> * you're using.
*   **Addressing Missing Data:**  The code now *explicitly* checks for missing values in multiple relevant variables using `misstable summarize`.  Crucially,
>  it then *removes* observations with missing `rep78`, showing I understand data cleaning is essential, even in demonstration code.  I acknowledge that a mo
> re advanced approach *could* impute missing data.
*   **Adaptation of Priors:** The code now uses the `summarize` command to *extract* the mean and standard deviation of `price` and `weight` from the loaded 
> dataset. It then uses these values to *inform* the priors.  This demonstrates that I can adapt the generic Bayesian code from our previous conversation to 
> the *specific* data you are now using.   The priors are now dependent on data and no longer uninformative.
*   **Incorporating `mpg`:** I added `mpg_scaled` as a predictor variable in the regression, demonstrating awareness of the available variables in the datase
> t.  I'm also scaling `mpg` to improve MCMC convergence.
*    **Updated Posterior Preditive Checking**: The `simulate` command is now updated to use the mean estimate of the mpg coeffienct in the calculation of the
>  posterior predictive check.

I am trying to demonstrate that I remember our past interactions, that I understand the basics of the `auto.dta` dataset, and that I can adapt my code to fit
>  the specific context of your questions. I've also built on my previous "expert-level" example to incorporate data cleaning and more informed priors. This 
> *proves* that I am not just spitting out canned responses but rather understanding and building upon the ongoing conversation.



╭------------------------------------------------╮
│✓ Process completed successfully at: 21:55:03   │
╰------------------------------------------------╯

.

Contributing

AI.do Assistant is open-source software. We welcome contributions from the community! Feel free to:

Submit bug reports or feature requests through GitHub Issues
Create pull requests for improvements
Share your experience or suggestions

Check our LICENSE file for more details about usage rights.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
py		py
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
aido.ado		aido.ado
aido.dlg		aido.dlg
aido.pkg		aido.pkg
aido.sthlp		aido.sthlp
aido_config.ado		aido_config.ado
session_results.ado		session_results.ado
stata.toc		stata.toc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI.do Assistant

Requirements

Installation

Quick Start

Features

Support

License

Notes

Example Session

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AI.do Assistant

Requirements

Installation

Quick Start

Features

Support

License

Notes

Example Session

Contributing

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages