# PolymathPlus Regression Specification ## What This Program Type Solves Regression programs in PolymathPlus fit models to tabular data or compute numerical integrals from tabular data. This specification covers: - Polynomial regression: `polyfit`, including linear regression when order is `1` - Multi-linear regression: `mlinfit` - Nonlinear regression: `nlinfit` - Numerical integration from table data: `integrate` Common rules that always apply: - `#` starts a comment. Comments are ignored by validation. - Blank lines are allowed and ignored. - A regression program must contain exactly one data table block and exactly one regression command. - The regression command must be one of: `polyfit`, `mlinfit`, `nlinfit`, or `integrate`. - Variable names must start with a letter and then use only letters, digits, or underscore. - Avoid built-in function names or conditional keywords as variable names. Examples to avoid include `sin`, `sqrt`, `log`, `if`, `then`, and `else`. - Keep one consistent program style. Do not mix regression with DEQ, LEQ, NLE, NLP, or table-free program formats. ## User Rules For Writing A Valid Regression Program ### 1. Required structure Every regression program must contain: - One data table block enclosed by `[` and `]` - One regression command - Optional explicit equations outside the table - Optional nonlinear-regression initial guesses, only for `nlinfit` Additional structure rules: - Exactly one table start line (`[`) and one table end line (`]`) are allowed. - More than one data table is invalid. - More than one regression command is invalid. - Unknown lines outside the table are invalid. Invalid multiple commands: ```polymathplus [ x y 1 2 2 3 3 4 4 5 5 6 ] polyfit x y 2 integrate y(x) 1 5 ``` Repair: ```polymathplus [ x y 1 2 2 3 3 4 4 5 5 6 ] polyfit x y 2 ``` ### 2. Data table rules Table format: - First non-comment, non-blank line inside `[` ... `]` is the header. - Header tokens are variable names separated by spaces. - Header variable names must be unique. - Each numeric row must have exactly the same number of columns as the header. - Numeric table cells must be valid numeric literals, including decimal and scientific notation. - Comments and blank lines inside the table are ignored. - Table cells are separated by one or more spaces. Valid table: ```polymathplus [ x y 0.5 1.255 0.387 1.25 0.24 1.189 0.136 1.124 0.04 0.783 0.011 0.402 ] ``` Invalid row with missing column: ```polymathplus [ x y 1 2 2 3 4 4 5 5 6 ] polyfit x y 2 ``` Repair: ```polymathplus [ x y 1 2 2 3 3 4 4 5 5 6 ] polyfit x y 2 ``` Invalid duplicate header variable: ```polymathplus [ x x 1 2 2 3 3 4 4 5 5 6 ] polyfit x x 2 ``` Repair: ```polymathplus [ x y 1 2 2 3 3 4 4 5 5 6 ] polyfit x y 2 ``` ### 3. Explicit-equation rules Explicit equations are optional helper definitions and are allowed in all regression subtypes. Rules: - Explicit variable names must be unique. - Explicit expressions must be valid math expressions. - Circular and self-referencing explicit dependencies are invalid. - For `nlinfit`, explicit variables may depend only on table variables or other explicit variables. - For `nlinfit`, explicit variables must not depend on fitted model-parameter variables. Invalid circular explicit definitions: ```polymathplus [ x y 1 2 2 3 3 4 4 5 5 6 ] a = b + 1 b = a + 1 polyfit x y 2 ``` Repair: ```polymathplus [ x y 1 2 2 3 3 4 4 5 5 6 ] a = x + 1 polyfit x y 2 ``` ## Command Rules ### 4. Polynomial and linear regression: `polyfit` Syntax: ```txt polyfit x y order polyfit x y order origin ``` Rules: - `x` and `y` must be table variables or explicit variables. - Recommended: write `order` as an integer. - Current precheck behavior: decimal-looking values such as `2.5` can be accepted and parsed as `2`; avoid this because it is ambiguous. - `order = 1` is classified as linear regression. - Higher orders are classified as polynomial regression. - The polynomial order must not exceed available numeric data rows. - Optional trailing `origin` is allowed. Valid: ```polymathplus [ Time BOD 1 0.6 2 0.7 4 1.5 6 1.9 8 2.1 10 2.6 ] polyfit Time BOD 2 ``` Invalid order too high for the data: ```polymathplus [ x y 1 2 2 3 ] polyfit x y 4 ``` Repair: ```polymathplus [ x y 1 2 2 3 3 4 4 5 5 6 ] polyfit x y 4 ``` ### 5. Multi-linear regression: `mlinfit` Syntax: ```txt mlinfit x1 x2 ... y mlinfit x1 x2 ... y origin ``` Rules: - List predictor variables first and the dependent variable last. - Optional trailing `origin` is allowed. - All listed variables must be table variables or explicit variables. - Table row count must be large enough for the selected model size. - Recommended: use at least one predictor variable and one dependent variable. - Current precheck behavior: command variables are counted as unique variables, so duplicated variable names are not counted twice. Some underspecified commands may pass precheck if the listed variables are known; avoid them because they do not describe a useful multi-linear model. - Validator row-count rule: if `k` is the number of unique listed variables after `mlinfit`, including `y`, then: - without `origin`: rows must be at least `k + 1` - with `origin`: rows must be at least `k + 2` Valid: ```polymathplus [ y x1 x2 293 1.61 851 230 15.5 820 172 22 1058 91 45 1201 125 33 1357 125 40 1115 ] mlinfit x1 x2 y ``` Invalid too few rows: ```polymathplus [ y x1 x2 293 1.61 851 230 15.5 820 172 22 1058 ] mlinfit x1 x2 y ``` Repair: ```polymathplus [ y x1 x2 293 1.61 851 230 15.5 820 172 22 1058 91 45 1201 ] mlinfit x1 x2 y ``` ### 6. Nonlinear regression: `nlinfit` Syntax: ```polymathplus nlinfit y = expression m(a) = 2 m(b) = 1 ``` Rules: - Left-hand side must be `nlinfit dependentVariable = expression`. - The dependent variable must be a table variable or explicit variable. - The model expression must be valid math syntax. - Recommended: the model expression should include at least one independent variable from table data or an explicit helper based on table data. - Current precheck behavior: it treats explicit variables as available independent variables even if they are constants, so a model can pass precheck while being statistically unhelpful. For modeling quality, make the fitted expression depend on table data. - Fitted model parameters are variables used in the model expression that are not table variables and not explicit variables. - At least one fitted model parameter must exist. - Initial guesses use `m(parameter)=number`. - Initial guesses are required for fitted model parameters. - Initial-guess variables must match fitted model parameters exactly. - Extra, missing, or misspelled initial guesses are invalid. - For `nlinfit`, explicit variables must not depend on fitted model parameters. Valid: ```polymathplus [ x y 0.5 1.255 0.387 1.25 0.24 1.189 0.136 1.124 0.04 0.783 0.011 0.402 ] nlinfit y = a*x/(b + x) m(a)=2 m(b)=1 ``` Invalid missing initial guess: ```polymathplus [ x y 0.5 1.255 0.387 1.25 0.24 1.189 0.136 1.124 0.04 0.783 0.011 0.402 ] nlinfit y = a*x/(b + x) m(a)=2 ``` Repair: ```polymathplus [ x y 0.5 1.255 0.387 1.25 0.24 1.189 0.136 1.124 0.04 0.783 0.011 0.402 ] nlinfit y = a*x/(b + x) m(a)=2 m(b)=1 ``` Invalid extra initial guess: ```polymathplus [ x y 0.5 1.255 0.387 1.25 0.24 1.189 0.136 1.124 0.04 0.783 0.011 0.402 ] nlinfit y = a*x/(b + x) m(a)=2 m(b)=1 m(c)=1 ``` Repair: ```polymathplus [ x y 0.5 1.255 0.387 1.25 0.24 1.189 0.136 1.124 0.04 0.783 0.011 0.402 ] nlinfit y = a*x/(b + x) m(a)=2 m(b)=1 ``` Invalid because there is no fitted parameter: ```polymathplus [ x y 0.5 1.255 0.387 1.25 0.24 1.189 0.136 1.124 0.04 0.783 0.011 0.402 ] nlinfit y = 2*x/(1 + x) ``` Repair: ```polymathplus [ x y 0.5 1.255 0.387 1.25 0.24 1.189 0.136 1.124 0.04 0.783 0.011 0.402 ] nlinfit y = a*x/(b + x) m(a)=2 m(b)=1 ``` Invalid explicit helper depending on fitted model parameter: ```polymathplus [ w y 0.5 1.255 0.387 1.25 0.24 1.189 0.136 1.124 0.04 0.783 0.011 0.402 ] scale = a + 1 nlinfit y = scale*w/(b+w) m(a)=2 m(b)=1 ``` Repair: ```polymathplus [ w y 0.5 1.255 0.387 1.25 0.24 1.189 0.136 1.124 0.04 0.783 0.011 0.402 ] nlinfit y = a*w/(b+w) m(a)=2 m(b)=1 ``` Optional nonlinear-regression solver metadata: ```polymathplus #@NLR_SOLUTION_METHOD_INDEX = 1 ``` Known runtime values: - `0`: MRQMIN - `1`: L-M - `2`: GaussNewton Precheck note: precheck treats this directive as comment metadata and does not validate the method index. ### 7. Numerical integration: `integrate` Syntax: ```txt integrate y(x) x0 xf integrate y(x) x0 xf sections integrate y(x) x0 xf sections method ``` Rules: - `y` and `x` must be table variables or explicit variables. - `x0` and `xf` must be valid numeric values. - Decimal comma is accepted in command bounds by validator numeric parsing, for example `0,03`. - Optional `sections` must be an integer and must be `<= 2000`. - Optional `method` must be one of: `akm`, `ccs`, or `lin`. - Integration requires at least 5 numeric table rows. Valid: ```polymathplus [ t C F 0 0 0 0.4 32900000 0.03294052 1 62200000 0.09521712 2 81200000 0.1435766 3 83100000 0.16450234 4 78500000 0.16179901 ] integrate F(t) 0 4 120 ``` Invalid method: ```polymathplus [ x y 0.03 0.1 0.1 0.2 0.2 0.5 0.3 0.8 0.4 1.1 ] integrate y(x) 0.03 0.4 120 ccm ``` Repair: ```polymathplus [ x y 0.03 0.1 0.1 0.2 0.2 0.5 0.3 0.8 0.4 1.1 ] integrate y(x) 0.03 0.4 120 ccs ``` Invalid integration bound: ```polymathplus [ x y 0.03 0.1 0.1 0.2 0.2 0.5 0.3 0.8 0.4 1.1 ] integrate y(x) start 0.4 120 ccs ``` Repair: ```polymathplus [ x y 0.03 0.1 0.1 0.2 0.2 0.5 0.3 0.8 0.4 1.1 ] integrate y(x) 0.03 0.4 120 ccs ``` Invalid too many sections: ```polymathplus [ x y 0.03 0.1 0.1 0.2 0.2 0.5 0.3 0.8 0.4 1.1 ] integrate y(x) 0.03 0.4 2001 ccs ``` Repair: ```polymathplus [ x y 0.03 0.1 0.1 0.2 0.2 0.5 0.3 0.8 0.4 1.1 ] integrate y(x) 0.03 0.4 120 ccs ``` Invalid too few rows: ```polymathplus [ x y 1 2 2 3 3 4 4 5 ] integrate y(x) 1 4 ``` Repair: ```polymathplus [ x y 1 2 2 3 3 4 4 5 5 6 ] integrate y(x) 1 5 ``` ## Practical Checklist Before Solve - Exactly one table block is present and correctly enclosed by `[` and `]`. - Header variable names are valid and unique. - Every numeric row matches the header column count. - Exactly one regression command is present. - Command syntax matches the selected subtype. - All required variables are known from the table or explicit equations, except fitted parameters in `nlinfit`. - Explicit variables are unique, valid, and non-circular. - For `nlinfit`, model parameter guesses are numeric and match fitted parameters exactly. - For `integrate`, table has at least 5 rows and optional method/sections are valid. ## Verification Notes This hardened repo-local copy was checked against regression behavior in: ```txt C:\dev\js\solver_precheck\solver_precheck.js C:\dev\js\solver_precheck\solver_precheckTests.js ``` The examples were selected or adapted from the test corpus and validator behavior. Test-only comments were removed, and example variable names/comments were adjusted for documentation use.