Provided by: debgpt_0.7_all bug

NAME

       DebGPT - General Purpose Terminal LLM Tool with Some Debian-Specific Design

              “AI” = “Artificial Idiot”

SYNOPSIS

       debgpt [arguments] [subcommand [subcommand-arguments]]

DESCRIPTION

       DebGPT is a lightweight terminal tool designed for everyday use with Large Language Models
       (LLMs), aiming to  explore  their  potential  in  aiding  Debian/Linux  development.   The
       possible use cases include code generation, documentation writing, code editing, and more,
       far beyond the capabilities of traditional software.

       To achieve that, DebGPT gathers relevant information  from  various  sources  like  files,
       directories,  and  URLs,  and  compiles  it  into  a prompt for the LLM.  It also supports
       Retrieval-Augmented Generation (RAG) using language embedding models.  DebGPT  supports  a
       range  of  LLM  service  providers,  both  commercial  and  self-hosted, including OpenAI,
       Anthropic, Google Gemini, Ollama, LlamaFile, vLLM, and ZMQ (DebGPT’s built-in backend  for
       self-containment).

QUICK START

       First, install DebGPT from PyPI or Git repository:

              pip3 install debgpt
              pip3 install git+https://salsa.debian.org/deeplearning-team/debgpt.git

       The   bare   minimum   “configuration”   required   to   make   debgpt   work   is  export
       OPENAI_API_KEY="your-api-key".  If anything else is needed, use the  TUI-based  wizard  to
       (re-)configure:

              debgpt config

       Or   use   debgpt  genconfig  to  generate  a  configuration  template  and  place  it  at
       $HOME/.debgpt/config.toml.   Both  config  and  genconfig  will   inherit   any   existing
       configurations.

       Upon completion, you can start an interactive chat with the LLM:

              debgpt

       Enjoy the chat!

FRONTENDS

       The  frontend  is  a  client  that  communicates  with  an  LLM  inference backend.  It is
       responsible  for  sending  user  input  to  the  backend  and  receiving  responses  while
       maintaining a history of interactions.

       Available    frontend    options    (specified   by   the   --frontend|-F)   are:   openai
       ⟨https://platform.openai.com/docs/overview⟩,  anthropic  ⟨https://console.anthropic.com/⟩,
       google   ⟨https://ai.google.dev/gemini-api/⟩,   xai   ⟨https://console.x.ai/⟩,   llamafile
       ⟨https://github.com/Mozilla-Ocho/llamafile⟩,  ollama   ⟨https://github.com/ollama/ollama⟩,
       vllm  ⟨https://docs.vllm.ai/en/latest/⟩,  zmq  (DebGPT built-in), dryrun (for debugging or
       copy-pasting information).

       Note: For non-self-hosted backends, review third-party user agreements  and  refrain  from
       sending sensitive information.

TUTORIAL

       The  following  examples  are carefully ordered.  You can start from the first example and
       gradually move to the next one.

   1. Basic Usage: Chatting with LLM and CLI Behavior
       When no arguments are given, debgpt leads you into a general terminal chatting client with
       LLM backends.  Use debgpt -h to see detailed options.

              debgpt

       During  the interactive chatting mode, you may press / and see a list of available escaped
       commands that will not be seen as LLM prompt.

       • /save <path.txt>: save the last LLM response to the specified file.

       • /reset: clear the context.  So you can start a new conversation without quiting.

       • /quit: quit the chatting mode.  You can press Ctrl-D to quit as well.

       The first user prompt can be provided through argument (--ask|-A|-a):

              debgpt -A "Who are you? And what can LLM do?"

       By specifying the --quit|-Q option, the  program  will  quit  after  receiving  the  first
       response  from  LLM.   For  instance,  we  can  let  it mimic fortune with temperature 1.0
       (--temperature|-T 1.0) for higher randomness:

              debgpt -T 1.0 -QA 'Greet with me, and tell me a joke.'

       After each session, the chatting history will be saved in ~/.debgpt as a json  file  in  a
       unique name.  The command debgpt replay can replay the last session if you forgot what LLM
       has replied to you.

       The program can write the last LLM response to a file through -o <file>, and read question
       from stdin:

              debgpt -Qa 'write a hello world in rakudo for me' -o hello.raku
              debgpt -HQ stdin < question.txt | tee result.txt

       After  gettting  familiarized  with  the  fundamental  usage  and its CLI behavior, we can
       directly move on to the most important feature of this tool,  namely  the  special  prompt
       reader – MapReduce.

   2. Context Readers for Additional Information
       Context  Reader  is  a  function  that  reads  the  plain text contents from the specified
       resource, and wrap them as a part of a prompt for the LLM.  Note, the context readers  can
       be  arbitrarily combined together or specified multiple times through the unified argument
       --file|-f.

       It can read from a file, a directory, a URL, a Debian Policy section, a  Debian  Developer
       Reference section, a Debian BTS page, a Debian build status page (buildd), a Google search
       result, etc.

       For example, we can ask LLM to explain the contents of a file, or mimick the  licensecheck
       command:

              # read a plain text file and ask a question
              debgpt -Hf README.md -a 'very briefly teach me how to use this software.'
              debgpt -Hf debgpt/policy.py -A 'explain this file'  # --file|-f for small file
              debgpt -Hf debgpt/frontend.py -A 'Briefly tell me an SPDX identifier of this file.'

              # PDF file is supported as well
              debgpt -Hf my-resume.pdf -a 'Does this person have any foss-related experience?'

       It can also read from a directory, or a URL:

              debgpt -Hf 'https://www.debian.org/vote/2022/vote_003' -A 'Please explain the differences among the above choices.'

       The unified reader --file|-f can also read from other sources with a special syntax:

       • -f bts:<bug_number> for Debian bug tracking system

         debgpt -Hf bts:src:pytorch -A 'Please summarize the above information. Make a table to organize it.'
         debgpt -Hf bts:1056388 -A 'Please summarize the above information.'

       • -f buildd:<package> for Debian buildd status

         debgpt -Hf buildd:glibc -A 'Please summarize the above information. Make a table to organize it.'

       • -f cmd:<command_line> for piping other commands’ stdout

         debgpt -Hf cmd:'apt list --upgradable' -A 'Briefly summarize the upgradable packages. You can categorize these packages.'
         debgpt -Hf cmd:'git diff --staged' -A 'Briefly describe the change as a git commit message.'

       • -f man:<man_page> and -f tldr:<tldr_page> for reading system manual pages

         debgpt -Hf man:debhelper-compat-upgrade-checklist -A "what's the change between compat 13 and compat 14?"
         debgpt -H -f tldr:curl -f cmd:'curl -h' -A "download https://localhost/bigfile.iso to /tmp/workspace, in silent mode"

       • -f  policy:<section>  and  -f  devref:<section>  for reading Debian Policy and Developer
         Reference

         debgpt -Hf policy:7.2 -A "what is the difference between Depends: and Pre-Depends: ?"
         debgpt -Hf devref:5.5 -A 'Please summarize the above information.'

         # when section is not specified, it will read the whole document. This may exceed the LLM context size limit.
         debgpt -Hf policy: -A 'what is the latest changes in this policy?'

         # more examples
         debgpt -Hf pytorch/debian/control -f policy:7.4 -A "Explain what Conflicts+Replaces means in pytorch/debian/control based on the provided policy document"
         debgpt -Hf pytorch/debian/rules -f policy:4.9.1 -A "Implement the support for the 'nocheck' tag based on the example provided in the policy document."

   3. Inplace Editing of a File
       The argument [–inplace|-i] is for in-place editing of a file.  It is a  read-write  reader
       that  does  the same as --file|-f (read-only) does, but the inplace one will write the LLM
       response back to the file.  We expect the user to use this feature for editing a file.

       If specified, the edits (in UNIX  diff  format)  will  be  printed  to  the  screen.   The
       --inplace|-i will mandate the --quit|-Q behavior, and will turn off markdown rendering.

       The following example will ask LLM to edit the pyproject.toml file, adding pygments to its
       dependencies.  This really works correctly.

              debgpt -Hi pyproject.toml -a 'edit this file, adding pygments to its dependencies.'

       If working in a Git repository, we can make things more automated: You may further  append
       --inplace-git-add-commit   to  automatically  add  and  commit  the  changes  to  the  Git
       repository.  If you want to review before  commit,  specify  --inplace-git-p-add-commit|-I
       argument instead.

              debgpt -Hi pyproject.toml -a 'edit this file, adding pygments to its dependencies.' --inplace-git-add-commit

       The     commit     resulted     by     the     above    example    can    be    seen    at
       https://salsa.debian.org/deeplearning-
       team/debgpt/-/commit/968d7ab31cb3541f6733eb34bdf6cf13b6552b7d  this  link ⟨⟩.  Recent LLMs
       are strong enough to easily and correctly add type annotations and doc strings in DebGPT’s
       python       codebase,       see       example      https://salsa.debian.org/deeplearning-
       team/debgpt/-/commit/4735b38141eafd6aa9b0863fc73296aa41562aed here ⟨⟩.

   4. Vector Retriever for Most Relevant Information
              This is WIP.  Leveraging the embeddings to retrieve.  Basically RAG.

   5. MapReduce for Any Length Context
       The “MapReduce” feature is the choice if you want the LLM to read bulk documentations.

       Generally, LLMs have a limited context length.  If you want to ask a question regarding  a
       very long context, you can split the context into multiple parts, and extract the relevant
       information from each part.  Then, you can ask the LLM to answer the question based on the
       extracted information.

       The  implementation  of  this is fairly simple: split the gathered information texts until
       the pre-defined maximum  chunk  size  is  satisfied,  ask  the  LLM  to  extract  relevant
       information  from  each chunk, and then repeatedly merge the extracted information through
       LLM summarization, untill there is only one chunk left.  As a result,  this  functionality
       can  be very quota-consuming if you are going to deal with long texts.  Please keep an eye
       on your bill when you try this on a paied API service.

       This functionality is implemented as the --mapreduce|-x argument.  The user has to specify
       the  --ask|-A|-a  argument  to  tell  LLM  what  kind of question we want to ask so it can
       extract the right information.  It will summarize if the --ask|-A|-a argument is missing.

       The key difference between the MapReduce and  Vector  Retriever  is  that  MapReduce  will
       really  make  the  language  model  read  all  information  you passed to it, while vector
       retriever will only  make  language  model  read  the  most  relevant  several  pieces  of
       information stored in the database.

       Some usage examples of MapReduce are as follows:

       • Load a file and ask a question

         debgpt -Hx resume.pdf -A 'Does this person know AI? To what extent?'

       • Load a directory and ask a question

         debgpt -Hx . -a 'which file implemented mapreduce? how does it work?'
         debgpt -Hx . -a 'teach me how to use this software. Is there any hidden functionality that is not written in its readme?'
         debgpt -Hx ./debian -A 'how is this package built? how many binary packages will be produced?'

       • Load a URL and ask a question

         debgpt -Hx 'https://www.debian.org/doc/debian-policy/policy.txt' -A 'what is the purpose of the archive?'

       • Load the whole Debian Policy document (plain text) and ask a question

         debgpt -Hx policy:all -a "what is the latest changes in this policy?"
         debgpt -Hx policy:all -A 'what package should enter contrib instead of main or non-free?'

       • Load the whole Debian Developer Reference document (plain text) and ask a question

         debgpt -Hx devref:all -A 'How can I become a debian developer?'
         debgpt -Hx devref:all -a 'how does general resolution work?'

       • If  you  don’t really bother to read policy: and devref:, or forgot which one is talking
         about the question in you mind, for instance:

         debgpt -H -x policy:all -x devref:all -a 'which document (and which section) talk about Multi-Arch: ?'

       • Load the latest sbuild log file and ask a question

         debgpt -Hx sbuild: -A 'why does the build fail? do you have any suggestion?'

       • Google search: -x google: will use your prompt as the  search  query,  and  answer  your
         question after reading the search results

         debgpt -Hx google: -a 'how to start python programming?'

       • Google  search: -x google:<search_query> gives more control over the search query.  Here
         we let LLM answer the question provided by -a based on the  search  results  of  “debian
         packaging”.

         debgpt -Hx google:'debian packaging' -a 'how to learn debian packaging?'

       The  -H  argument  will  skip printing the first prompt generated by debgpt, because it is
       typically very lengthy, and only useful for debugging and development purpose.  To further
       tweak  the  mapreduce  behavior, you may want to check the --mapreduce_chunksize <int> and
       --mapreduce_parallelism <int> arguments.

   6. Piping through Everywhere
       Being able to pipe the inputs and outputs among different programs is one of  the  reasons
       why I love the UNIX philosophy.

       The  pipe  mode is useful when you want to use debgpt in a shell script Try the follows on
       the Makefile in debgpt repo.  Later we will introduce  a  in-place  editing  functionality
       which is more convenient than this one.

              cat Makefile | debgpt -a 'delete the deprecated targets' pipe | tee tmp ; mv tmp Makefile; git diff

       The pipe mode can be used for editing something in vim in-place.

              # In vim debgpt/task.py, use 'V' mode to select the task_backend function, then
              :'<,'>!debgpt -a 'add type annotations and comments to this function' pipe

       This  looks interesting, right?  debgpt has a git wrapper that automatically generates the
       git commit message for the staged contents and commit the message.  Just  try  debgpt  git
       commit  --amend  to  see  how  it  works.   This will also be mentioned in the subcommands
       section.

   7. DebGPT Subcommands
       Git subcommand.

       Let LLM automatically generate the git commit message, and call git to commit it:

              debgpt git commit --amend

       If you don’t even want to git commit --amend the commited  message,  just  remove  --amend
       from it.

   8. Prompt Engineering
       As  you may have seen, the biggest variation in LLM usage happens in the context including
       how you provide the context readers, and how you ask the question through --ask|-A|-a.  By
       adjusting  the  way  you  provide  those  information  and  ask  the question, you can get
       significantly different results.  To properly make LLM work for you, you may  need  to  go
       through some basic prompt engineering methods.

       The following are some references on this topic:

       1. OpenAI’s Guide https://platform.openai.com/docs/guides/prompt-engineering

       Advanced       usage       of       LLM      such      as      Chain-of-Thought      (CoT)
       ⟨https://arxiv.org/pdf/2205.11916.pdf⟩ will not be covered in this document.  Please refer
       external resources for more information.

       The  usage  of LLM is limited by our imaginations.  I am glad to hear from you if you have
       more  good  ideas  on  how   we   can   make   LLMs   useful   for   Debian   development:
       https://salsa.debian.org/deeplearning-team/debgpt/-/issues

TROUBLESHOOTING

       • Context  overlength:  If  the result from context readers (such as feeding --file with a
         huge text file) is too long, you can switch to the  --mapreduce|-x  special  reader,  or
         switch to a model or service provider that supports longer context.

BACKEND

   Available Backend Implementations
       This tool provides one backend implementation: zmq.

       • zmq: Only needed when you choose the ZMQ front end for self-hosted LLM inference server.

       If  you  plan  to  use  the  openai  or  dryrun  frontends,  there is no specific hardware
       requirement.  If you would like to self-host the  LLM  inference  backend  (ZMQ  backend),
       powerful hardware is required.

   LLM Selections
       The  concrete hardware requirement depends on the LLM you would like to use.  A variety of
       open-access          LLMs          can          be          found          here          >
       https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard Generally, when trying to
       do  prompt  engineering  only,  the  “instruction-tuned”  LLMs  and  “RL-tuned”   (RL   is
       reinforcement learning) LLMs are recommended.

       The  pretrained  (raw)  LLMs  are not quite useful in this case, as they have not yet gone
       through instruction tuning, nor reinforcement learning tuning procedure.  These pretrained
       LLMs  will more likely generate garbage and not follow your instructions, or simply repeat
       your instruction.  We will only  revisit  the  pretrained  LLMs  when  we  plan  to  start
       collecting data and fine-tune (e.g., LoRA) a model in the far future.

       The following is a list of supported LLMs for self-hosting (this list will be updated when
       there are new state-of-the-art open-access LLMs available): • .RS 2

       Mistral7B (Mistral-7B-Instruct-v0.2) (default)
              This model requires roughly 15GB of disks space to download.
       • .RS 2

       Mixtral8x7B (Mixtral-8x7B-Instruct-v0.1)
              This model is larger yet more powerful than the default LLM.  In exchange, it poses
              even  higher hardware requirements.  It takes roughly 60~100GB disk space (I forgot
              this number.  Will check later).

       Different LLMs will pose  different  hardware  requirements.   Please  see  the  “Hardware
       Requirements” subsection below.

   Hardware Requirements
       By default, we recommend doing LLM inference in fp16 precision.  If the VRAM (such as CUDA
       memory) is limited, you may also switch to even lower preicisions such as 8bit  and  4bit.
       For pure CPU inference, we only support fp32 precision now.

       Note,  Multi-GPU  inference  is  supported by the underlying transformers library.  If you
       have multiple GPUs, this memory requirement is roughly divided by your number of GPUs.

       Hardware requirements for the Mistral7B LLM:

       • Mistral7B + fp16 (cuda): 24GB+ VRAM preferred, but needs a 48GB GPU to run all the demos
         (some  of  them  have  a  context as long as 8k).  Example: Nvidia RTX A5000, Nvidia RTX
         4090.

       • Mistral7B + 8bit (cuda): 12GB+ VRAM at minimum, but 24GB+ preferred so you can  run  all
         demos.

       • Mistral7B  +  4bit  (cuda):  6GB+ VRAM at minimum but 12GB+ preferred so you can run all
         demos.  Example: Nvidia RTX 4070 (mobile) 8GB.

       • Mistral7B + fp32 (cpu): Requires 64GB+ of RAM, but a CPU is 100~400 times slower than  a
         GPU for this workload and thus not recommended.

       Hardware requirement for the Mixtral8x7B LLM:

       • Mixtral8x7B + fp16 (cuda): 90GB+ VRAM.

       • Mixtral8x7B + 8bit (cuda): 45GB+ VRAM.

       • Mixtral8x7B  +  4bit  (cuda): 23GB+ VRAM, but in order to make it work with long context
         such as 8k tokens, you still need 2x 48GB GPUs in 4bit precision.

       See https://huggingface.co/blog/mixtral for more.

   Usage of the ZMQ Backend
       If you want to run the default LLM with different precisions:

              debgpt backend --max_new_tokens=1024 --device cuda --precision fp16
              debgpt backend --max_new_tokens=1024 --device cuda --precision bf16
              debgpt backend --max_new_tokens=1024 --device cuda --precision 8bit
              debgpt backend --max_new_tokens=1024 --device cuda --precision 4bit

       The only supported precision on CPU is fp32 (for now).  If you want to fall  back  to  CPU
       computation (very slow):

              debgpt backend --max_new_tokens=1024 --device cpu --precision fp32

       If you want to run a different LLM, such as Mixtral8x7B than the default Mistral7B:

              debgpt backend --max_new_tokens=1024 --device cuda --precision 4bit --llm Mixtral8x7B

       The  argument  --max_new_tokens  does  not  matter  much  and you can adjust it (it is the
       maximum length of each llm reply).  You can adjust it as wish.

REFERENCES

       [1] Access large language models from the command-line
              https://github.com/simonw/llm

       [2] Turn your task descriptions into precise shell commands
              https://github.com/sderev/shellgenius

       [3] the AI-native open-source embedding database
              https://github.com/chroma-core/chroma

       [4] LangChain: Build context-aware reasoning applications
              https://python.langchain.com/docs/introduction/

       [5] Ollama: Embedding Models
              https://ollama.com/blog/embedding-models

       [6] OpenAI: Embedding Models
              https://platform.openai.com/docs/guides/embeddings

       [7] Moonshot - A simple and modular tool to evaluate and red-team any LLM application.
              https://github.com/aiverify-foundation/moonshot?tab=readme-ov-file

LICENSE and ACKNOWLEDGEMENT

       DebGPT development is  helped  with  various  open-access  and  commercial  LLMs  on  code
       suggestion,  code  writing,  code  editing,  document  writing,  with  human  reviews  and
       modifications.

              Copyright (C) 2024 Mo Zhou <lumin@debian.org>

              This program is free software: you can redistribute it and/or modify
              it under the terms of the GNU Lesser General Public License as published by
              the Free Software Foundation, either version 3 of the License, or
              (at your option) any later version.

              This program is distributed in the hope that it will be useful,
              but WITHOUT ANY WARRANTY; without even the implied warranty of
              MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
              GNU Lesser General Public License for more details.

              You should have received a copy of the GNU Lesser General Public License
              along with this program.  If not, see <https://www.gnu.org/licenses/>.

AUTHORS

       Copyright (C) 2024 Mo Zhou ⟨lumin@debian.orgDebGPT(1)