• 0 Posts
  • 4 Comments
Joined 1 year ago
cake
Cake day: June 16th, 2023

help-circle


  • I have dozens of projects in varying levels of completion and maybe like 2 finished projects. Here’s my list, steal to your liking because I come up with ideas I want to see in the world, and clearly I’m not a great medium for that:

    • Philotic - p2p network of Python servers based on a generalization of process forking. Every server runs the same file (global scope is initialization) but some id-based guard like an annotation lets them do different things. I designed this to act as a lower layer for eventually splitting an LLM across multiple computers, something very obnoxious to do manually but relatively easy to code.
    • Servitor - Actually probably will continue working on this, it’s a library which makes it easy to use LLMs as “semantic functions”, effectively letting you describe a natural language task and execute it as if it were a normal function.
    • TAO - Type Annotated Objects, more or less CBOR with some personal improvements. 0-0x7f are tiny ints, the first nibble is a type, and the second nibble is usually an embedded length prefix (or signal for a longer length prefix). Being nibble-based and having a dedicated DEBUG type makes it a lot easier to read in hexdumps, and gives twice as many type prefixes to work with. I rearranged the types a bit to be more sane than CBOR (which eg has separate types for negative and positive integers), and also added streaming and varint support.
    • STM/Snow - Structured Text Markup (in-progress name, “Snow” is a bit too informal?), a text serialization format where all data is (un)quoted text, “tags” which are { followed by data or text: data pairs, then }, or “markup” which is [ followed by plaintext interspersed with tags and ending with ]. The mixed positional/named attribute model of tags makes its object model a generalization of XML using JSON-like syntax and I’ve found it to be very easy to implement parsing.
      • My “pie in the sky” dream is to completely overhaul HTML/CSS/JS for STM/simplified CSS/WASM, but that’s never going to happen 😞
    • Munchy - IDL-like language for representing file formats as an executable schema. Eventual goal was for it to be powerful enough to parse textual formats like JSON, which tend to be more contextual. At some point I found a similar project using YAML to define the schemas, but not being a DSL made it more confusing IMO.
    • RetroArch file - A common file format for RetroArch to combine ROMs, patches, cheats, saves, etc into one cohesive format. Never got far with this one.
    • Binary MIME aka contype. I even wrote an RFC apparently? Adorable.
    • LLM modification - A paper I wrote about a potential LLM modification replacing the FF layers with a shared vector database to decouple memorization objectives from semantic and syntactic objectives, resulting in smaller foundation models. Predictably no one cared and probably no one should care, but it might be an interesting experiment to implement.
      • Probably a more useful modification I haven’t seen yet would be to implement kv caching using a prefix tree rather than a per-request cache. That would make semantic functions a lot faster, since it could cache the prompt between requests and it would only have to process the data.
    • Preference vectors - Simple stochastic updating of “preference” and “feature” vectors to transparently associate preferences with content. This would allow people to essentially build their own “The Algorithms”, since the update operation can be designed to create a linear space so you can eg request content close to “my preferences + my mood + heavy metal + randomness”, and share feature vectors on social media. I think when I tested it I made a weird modular space where d(0, 255) = 1, and it still worked. Stochastic updates work, even in a distributed context, because it’s a kind of “simulated annealing”.
    • Wika - Simplified and standardized WikiText parsing (which is surprisingly not actually standardized and MediaWiki essentially defines it as “whatever our PHP parser can read”). Follow-up is naturally a wiki written in anything other than PHP.
    • i2cec - ATtiny11 firmware for bridging the i2c and CEC lines of an HDMI cable so you can send remote control commands via your SMBus to an attached monitor (I accidentally got a TV instead of a normal computer monitor). Never got it to work quite right, timing was very tight.
    • U413 - A unix terminal themed BBS forum with a looong history of makes and remakes and a community getting whittled down to a handful of people.

    And finally then there’s my magnum opus, Espresso, my favorite project I keep coming back to time and time again and bikeshedding refining over many years. If anyone else takes it up I’d be ecstatic.

    • Influences: TypeScript, Python, Lua, Rust, C/++, Julia
    • Self-hosted prototype-based scripting language with its JIT written in itself (eventually)
    • Emphasis on simple rules which build up arbitrary complexity, a pathological avoidance of repetition, conciseness, and near Lispian customizability. SMOL.
    • ASCII lexing with unicode support deferred to other stages (compare to Lua, which treats > 0x7e as an identifier - I also treat <= 0x20 as whitespace).
    • PDA tokenization (used to be FSA/regex but nested format-strings required more power).
    • LR(1) parsing with concurrent bytecode emission (ala Lua), AST is built up until it can be converted to bytecode. The most extreme case is extensive destructuring assignment (Rust, Python, [P2392]) which shares a syntax with object literals, but can be treated as LR(1) by keeping track of a distinction between “lvalue” and “rvalue” compatible syntax.
    • All types defined with proto[...T] Super Sub(...args) { ... }, eg proto class Animal { ... } and proto Animal Monkey { ... }
      • The higher-order types include class, enum, union, interface, struct, etc. Compare to [P0707]
      • Note that these kinds are objects, not keywords. They define how to convert the body to a type constructor and prototype chain(s).
      • It took a few months to figure out this is possible by maintaining separate prototype chains for classes and instances.
    • Statements implicitly end, ; is an optional operator to explicitly disambiguate their ending.
      • after operator, x() after y() == (var t = x(); y(); t) - surprisingly useful for conciseness.
      • “Everything is an expression” - loops are generators, eg list comprehension works like [...for(var x in 10) x] == list(range(10))
    • Operator overloads use the operator itself as the method name, eg proto class { +(rhs) { console.log("\{this} + \{rhs}"); } }
    • Type annotations define compiler semantics: Type objects have a delegate() method which define how to represent the variable on the stack. Untyped variables use an inferred type or an implicit any, which wraps a dynamic object. This lets you create objects like int32 while still using the prototype semantics.
    • Recently I thought the syntax itself can be partially dynamic by using static scoping and macros which hook into the compiler when they’re encountered, but I’ve tried similar things a few times and it tends to lead to failure. This would need something like C#'s unsafe compilation distinction to avoid catastrophic vulnerabilities.
    • “Initialization is compilation” - When you run a file, executing the global scope (“initialization”) is treated as a stage of compilation, and the resulting module object is what is saved, not just its bytecode. Compare this to Python, which saves just the bytecode of the module.
    • Lifetime semantics (ala Rust and friends) are still a WIP.
    • Based on [P0709] and Rust’s try! semantics, exceptions are actually wrapped in a returned Result[T, E] type: try is an operator which unwraps the result and returns if it’s an error. Thus you get var value = try can_fail();. Using type object operator overloading, the Result type doesn’t need to be explicitly annotated because Result[T, E] == T | Result[T, E] == T | fail E.
      • fail keyword instead of throw/raise.
    • Really want proper coroutines using PyPy’s continulet/stacklet abstraction. Also maybe delimited continuations as the implementation for panics.
    • Structured process forking.
    • GC based on ideas from this LuaJIT 2.0 GC document.

    I could go on for hours with all of the stuff I’ve thought of for this language. If you want to know more, the README.md and ideas.md are usually the most authoritative, and specification.md is a very formal description of a subset of the stuff that is absolutely 100% decided (mostly syntax). I’ve written turing complete subsets of it before. First usable implementation to arrive by sometime in 2200 lmao. 🫠 I can also unpack the other projects if you want to know more.