Inferencer lets you run, host and deeply control the latest SOTA AI models (OSS, DeepSeek, Qwen, Kimi, GLM, MiniMax and more) from your own computer or phone. No data is sent to the cloud for processing - maintaining your complete privacy. Advanced inferencing controls give you complete control on their accuracy and outputs. Models Start in the models section where you can download the latest models directly from Hugging Face. Server Use the server feature to connect to an Inferencer running on your Mac to run even larger models over the network. Chats Select the model to interact with on the top menu bar and write a prompt to begin. At any point you can switch between models and continue the chat to see what else they can uncover. You can also selectively delete past messages to keep the model focused and less scatterbrain. Chat Controls Control the inferencing parameters including a system prompt to control how the model responds, temperature to control the accuracy of the response and more. Token Entropy and Inspection Select the inspectors to peek into the inner-workings of each word outputted and see the model's confidence levels and alternative choices. Response Control Utilise the control response feature to control the output the model generates. For example, skipping the preamble or directing the model to output in structured html. Settings Includes parental controls, an automatic deletion policy and more. Privacy For maximum privacy, all AI processing happens offline and on your device, by default. Subscriptions Basic (Free): Most features unlocked for free including unlimited chats and connecting to your Mac. Professional: Upgrade for more advanced token inspection and response control. Terms & Support Terms of Use: inferencer.com/terms Privacy Policy: inferencer.com/privacy Disclaimer Inferenced models may not always be accurate or contextually appropriate. You are responsible for verifying the information before making important decisions.